A2T Domains (A2TD) ------------------------------ A2T Domains (A2TD) (Sainz and Rigau 2021) is a lexical resource generated as part of the Ask2Transformers work. It consists of WordNet synsets automatically annotated with domain information, such us BabelDomains labels. The Ask2Transformers work aims to automatically annotate textual data without any supervision. Given a particular set of labels (BabelDomains, WNDomains, ...), the system have to classify the data without previous examples. This work is based on Transformers library and it's pretrained LMs. For this particular resource we evaluated the systems on BabelDomains dataset (Camacho-Collados and Navigli, 2017) achieving 92.14 of accuracy on domain labelling. You can find the code of the Ask2Transformers work on Github: https://github.com/osainz59/Ask2Transformers Contents of the distribution ---------------------------- The current distribution of the A2T Domains consists of the following files: a2t.wordnet.babeldomains.tsv.tar.gz A TSV file containing the WordNet synsets annotated with BabelDomains labels. a2t.wordnet.babeldomains_single.tsv.tar A TSV file containing the WordNet synsets annotated with a simplified version of BabelDomains labels. a2t.wordnet.babeldomains.tsv.tar.gz =================================== This file contains the domain annotations done by the best (zero-shot) model from the Ask2Transformers work. It consists on a tabulation separated file (TSV) that contains the number of the synset, the predicted label, and the confidence score. The labels used for the annotations are the same as BabelDomains labels. For instance, the first 10 lines are: 00001740-a Transport and travel 0.2079945206642151 00001740-n Language and linguistics 0.049682412296533585 00001740-r Music 0.2753644585609436 00001740-v Health and medicine 0.39603814482688904 00001837-r Religion, mysticism and mythology 0.5239243507385254 00001930-n Physics and astronomy 0.11910667270421982 00001981-r Religion, mysticism and mythology 0.2799309492111206 00002098-a Transport and travel 0.5030499696731567 00002137-n Heraldry, honors, and vexillology 0.05259553715586662 00002142-r History 0.1701359897851944 a2t.wordnet.babeldomains_single.tsv.tar.gz =================================== This file contains the domain annotations done by the best (zero-shot) model from the Ask2Transformers work. It consists on a tabulation separated file (TSV) that contains the number of the synset, the predicted label, and the confidence score. The labels used for the annotations are a simplified version of the BabelDomains labels. For instance, the first 10 lines are: 00001740-a Computing 0.15782533586025238 00001740-n Nobility 0.02761859819293022 00001740-r Music 0.22062131762504578 00001740-v Health 0.2904718518257141 00001837-r Religion 0.3583557605743408 00001930-n Physics 0.07668651640415192 00001981-r Religion 0.2245434671640396 00002098-a Transport 0.2361915111541748 00002137-n Defense 0.03056768886744976 00002142-r History 0.1186232939362526 License ------- This package is distributed under Attribution 3.0 Unported (CC BY 3.0) license. You can find it at http://creativecommons.org/licenses/by/3.0. Publications ------------ Sainz O. and Rigau G. Ask2Transformers: Zero-Shot Domain labelling with Pre-trained Language Models. Proceedings of the 11th Global WordNet Conference (GWC 2021). Pretoria, South Africa. 2021. References ---------- Camacho-Collados, Jose, and Roberto Navigli. "BabelDomains: Large-scale domain labeling of lexical resources." In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers, pp. 223-228. 2017. Research groups involved ------------------------ IXA http://ixa.si.ehu.es Contact information ------------------- German Rigau IXA Group University of the Basque Country E-20018 San Sebastián Version 1.0. Last updated: 2020/12/10