Information extraction from domain-specific data

Guidelines

Information extraction is the task of automatically extracting structured information from unstructured data, usually textual documents. The basic sub-tasks include, for instance, named entity recognition, extraction of relations between the entities, and linking the entities to an ontology. Other tasks aim at assigning the documents (or their parts) with geo-spatial, temporal and content tags. Template filling attempts to extract a fixed set of fields from a document. Many of these task can also be approached as a question-answering problem. The thesis will explore those tasks using current deep-learning based models in domain-specific and multilingual settings.

References

Goodfellow, I., Y. Bengio, and A. Courville 2016. Deep learning. Cambridge, MA, USA: MIT press.

Nasar, Zara, Syed Waqar Jaffry, and Muhammad Kamran Malik. "Named entity recognition and relation extraction: State-of-the-art." ACM Computing Surveys (CSUR) 54.1 (2021): 1-39.

PhD Topics

Search form

Information extraction from domain-specific data