Status: 
released
OS: 
Linux
Tags: 

NameTag 2

1. Introduction

NameTag is an open-source tool for named entity recognition (NER). NameTag identifies proper names in text and classifies them into predefined categories, such as names of persons, locations, organizations, etc. NameTag 2 recognizes nested entities (embedded entities) of arbitrary depth.

As of 2019, NameTag 2 achieves state of the art in Czech (CNEC 2.0 corpus), English (CoNLL corpus), Dutch (CoNLL corpus) and Spanish (CoNLL corpus) and nearly state of the art on the German CoNLL corpus (Straková et al. 2019). In 2021, it achieves state of the art on German GermEval corpus (unpublished). In 2023, we released a model for Ukrainian (lang-uk).

NameTag is available in these versions:

The linguistic models are free for non-commercial use and distributed under CC BY-NC-SA license, although for some models the original data used to create the model may impose additional licensing conditions.

NameTag is versioned using Semantic Versioning.

Copyright 2021 by Institute of Formal and Applied Linguistics, Faculty of Mathematics and Physics, Charles University in Prague, Czech Republic.

2. Current Release

NameTag 2 is available from the LINDAT NameTag Web Service:

You can download the latest release from GitHub.

3. Models

The individual models are described on a NameTag 2 Models page and can be downloaded from LINDAT repository. The latest version is 2100916.

4. License

The associated models and data are licensed under CC BY-NC-SA, although for some models the original data used to create the model may impose additional licensing conditions.

If you use this tool for scientific work, please give us credit by referencing NameTag website and Straková et al. 2019 (see BibTeX for referencing).

5. Acknowledgements

Acknowledgements for the individual language models are listed in NameTag 2 Models page.

The work described herein has been supported by OP VVV VI LINDAT/CLARIN project of the Ministry of Education, Youth and Sports of the Czech Republic (project CZ.02.1.01/0.0/0.0/16 013/0001781) and it has been supported by LINDAT/CLARIAH-CZ project of the Ministry of Education, Youth and Sports of the Czech Republic (LM2023062, LM2018101). It has also been supported by the Mellon Foundation grant No. G-1901-06505. It has further been supported by LUSyD GX20-16819X.

5.1. Publications

Straková Jana, Straka Milan, Hajič Jan: Neural Architectures for Nested NER through Linearization. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Copyright © Association for Computational Linguistics, Stroudsburg, PA, USA, ISBN 978-1-950737-48-2, pp. 5326-5331, 2019.

@inproceedings{StrakovaStrakaHajicACL2019,
  author    = {Jana Straková and Milan Straka and Jan Hajič},
  year      = 2019,
  title     = {Neural Architectures for Nested NER through Linearization},
  booktitle = {Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics},
  pages     = {5326--5331},
  publisher = {Association for Computational Linguistics},
  address   = {Stroudsburg, PA, USA},
  isbn      = {978-1-950737-48-2},
}

6. Contact

Authors:

NameTag website.

NameTag LINDAT/CLARIN entry.

7. Persistent Identifier

If you prefer to reference NameTag by a persistent identifier (PID), you can use http://hdl.handle.net/11858/00-097C-0000-0023-43CE-E.

Screenshot: