Disambiguation of Hierarchical Named Entities

Named Entity Recognition (NER) and Entity Linking are well-known, yet still challenging problems in natural language processing. Traditionally, NER systems predict coarse entity types such as Location or Person. Some datasets also provide fine-grained types such as Location -> Administrative -> Town, which supply more context to a user and can be used to guide the disambiguation of the entity when linking it to a knowledge graph.

NER systems are often evaluated in two tasks: Predicting coarse labels, e,g,. Location, and predicting the most specific label, e.g., Town. The fine-grained scenario is perceived as a more challenging variation with an extended set of labels. However, the fine-grained label taxonomy underlies an often-ignored hierarchical structure, in this example with an intermediary label Administrative. Under the constraint that there exists a path from the correct coarse type to the correct granular type, information from the entire hierarchy can be used to make more precise predictions in both tasks. A resulting model can predict labels as granular as possible, but fall back to a more general label when it would otherwise make a wrong prediction or if a granular output is not desired.

This project aims to solve this novel problem by exploring structure encodings, such as graph convolutional networks, for the label taxonomy and knowledge graphs. Approaches include the alignment of these graphs and the injection of hierarchical constraints into the model training.

Institute of Formal and Applied Linguistics

Charles University, Czech Republic
Faculty of Mathematics and Physics

Search form