Status: 
released
OS: 
Linux, Windows, OS X
Tags: 

NameTag

1. Introduction

NameTag is an open-source tool for named entity recognition (NER). NameTag identifies proper names in text and classifies them into predefined categories, such as names of persons, locations, organizations, etc. NameTag is distributed as a standalone tool or a library, along with trained linguistic models. In the Czech language, NameTag achieves state-of-the-art performance (Straková et al. 2013). NameTag is a free software under Mozilla Public License 2.0 license and the linguistic models are free for non-commercial use and distributed under CC BY-NC-SA license, although for some models the original data used to create the model may impose additional licensing conditions. NameTag is versioned using Semantic Versioning.

Copyright 2016 by Institute of Formal and Applied Linguistics, Faculty of Mathematics and Physics, Charles University in Prague, Czech Republic.

2. Online

2.1. Online Demo

LINDAT/CLARIN hosts NameTag Online Demo.

2.2. Web Service

LINDAT/CLARIN also hosts NameTag Web Service.

3. Release

3.1. Download

NameTag releases are available on GitHub, either as a pre-compiled binary package, or source code only. The binary package contains Linux, Windows and OS X binaries, Java bindings binary, C# bindings binary and source code of NameTag and all language bindings. While the binary packages do not contain compiled Python or Perl bindings, packages for those languages are available in standard package repositories, i.e. on PyPI and CPAN.

3.1.1. Language Models

To use NameTag, a language model is needed. The language models are available from LINDAT/CLARIN infrastructure and described further in the NameTag User's Manual. Currently the following language models are available:

3.2. License

NameTag is an open-source project and is freely available for non-commercial purposes. The source code is distributed under Mozilla Public License 2.0 and the pre-compiled binaries and the associated models and data under CC BY-NC-SA, although for some models the original data used to create the model may impose additional licensing conditions.

If you use this tool for scientific work, please give credit to us by referencing NameTag website and Straková et al. 2014.

3.3. Platforms and Requirements

NameTag is available as a standalone tool and as a library for Linux/Windows/OS X. It does not require any additional libraries. As any supervised machine learning tool, it needs trained linguistic models to perform named entity recognition. The models for the Czech language are available with the tool.

4. NameTag Installation

NameTag Installation on separate page.

5. NameTag User's Manual

NameTag User's Manual on separate page.

6. NameTag API Reference

NameTag API Reference on separate page.

7. Contact

Authors:

NameTag website.

NameTag LINDAT/CLARIN entry.

8. Acknowledgements

This work has been using language resources developed and/or stored and/or distributed by the LINDAT/CLARIN project of the Ministry of Education of the Czech Republic (project LM2010013).

Acknowledgements for individual language models are listed in Nametag User's Manual page.

8.1. Publications

  • (Straková et al. 2014) Straková Jana, Straka Milan and Hajič Jan. Open-Source Tools for Morphology, Lemmatization, POS Tagging and Named Entity Recognition. In Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pages 13-18, Baltimore, Maryland, June 2014. Association for Computational Linguistics.

  • (Straková et al. 2013) Straková Jana, Straka Milan, Hajič Jan: A New State-of-The-Art Czech Named Entity Recognizer. In: Lecture Notes in Computer Science, Vol. 8082, Text, Speech and Dialogue: 16th International Conference, TSD 2013. Proceedings, Copyright © Springer Verlag, Berlin / Heidelberg, ISBN 978-3-642-40584-6, ISSN 0302-9743, pp. 68-75, 2013

8.2. Bibtex for referencing

@InProceedings{strakova14,
  author    = {Strakov\'{a}, Jana  and  Straka, Milan  and  Haji\v{c}, Jan},
  title     = {Open-{S}ource {T}ools for {M}orphology, {L}emmatization, {POS} {T}agging and {N}amed {E}ntity {R}ecognition},
  booktitle = {Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations},
  month     = {June},
  year      = {2014},
  address   = {Baltimore, Maryland},
  publisher = {Association for Computational Linguistics},
  pages     = {13--18},
  url       = {http://www.aclweb.org/anthology/P/P14/P14-5003.pdf}
}

8.3. Persistent Identifier

If you prefer to reference NameTag by a persistent identifier (PID), you can use http://hdl.handle.net/11858/00-097C-0000-0023-43CE-E.

Screenshot: