NameTag is an open-source tool for named entity recognition (NER). NameTag identifies proper names in text and classifies them into predefined categories, such as names of persons, locations, organizations, etc. NameTag 2 recognizes nested entities (embedded entities) of arbitrary depth.
As of 2019, NameTag 2 achieves state of the art in Czech (CNEC 2.0 corpus), English (CoNLL corpus), Dutch (CoNLL corpus) and Spanish (CoNLL corpus) and nearly state of the art on the German CoNLL corpus (Straková et al. 2019). In 2021, it achieves state of the art on German GermEval corpus (unpublished). In 2023, we released a model for Ukrainian (lang-uk).
NameTag is available in these versions:
The linguistic models are free for non-commercial use and distributed under CC BY-NC-SA license, although for some models the original data used to create the model may impose additional licensing conditions.
NameTag is versioned using Semantic Versioning.
Copyright 2021 by Institute of Formal and Applied Linguistics, Faculty of Mathematics and Physics, Charles University in Prague, Czech Republic.
NameTag 2 is available from the LINDAT NameTag Web Service:
NameTag 2 source code can be found at GitHub.
The individual models are described on a NameTag 2 Models page and can be downloaded from LINDAT repository. The latest version is 2100916
.
The associated models and data are licensed under CC BY-NC-SA, although for some models the original data used to create the model may impose additional licensing conditions.
If you use this tool for scientific work, please give us credit by referencing NameTag website and Straková et al. 2019 (see BibTeX for referencing).
Acknowledgements for the individual language models are listed in NameTag 2 Models page.
The work described herein has been supported by OP VVV VI LINDAT/CLARIN project of the Ministry of Education, Youth and Sports of the Czech Republic (project CZ.02.1.01/0.0/0.0/16 013/0001781) and it has been supported by LINDAT/CLARIAH-CZ project of the Ministry of Education, Youth and Sports of the Czech Republic (LM2023062, LM2018101). It has also been supported by the Mellon Foundation grant No. G-1901-06505. It has further been supported by LUSyD GX20-16819X.
Straková Jana, Straka Milan, Hajič Jan: Neural Architectures for Nested NER through Linearization. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Copyright © Association for Computational Linguistics, Stroudsburg, PA, USA, ISBN 978-1-950737-48-2, pp. 5326-5331, 2019.
@inproceedings{strakova-etal-2019-neural, title = "Neural Architectures for Nested {NER} through Linearization", author = "Strakov{\'a}, Jana and Straka, Milan and Hajic, Jan", editor = "Korhonen, Anna and Traum, David and M{\`a}rquez, Llu{\'\i}s", booktitle = "Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics", month = jul, year = "2019", address = "Florence, Italy", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/P19-1527", doi = "10.18653/v1/P19-1527", pages = "5326--5331", }
Authors:
If you prefer to reference NameTag by a persistent identifier (PID),
you can use http://hdl.handle.net/11858/00-097C-0000-0023-43CE-E
.