Data Extraction with NLP techniques and its Transformation to Linked Data

Monday, 12 May, 2014 - 13:30

Room:

Data Extraction with NLP techniques and its Transformation to Linked Data

Martin Nečaský
Barbora Vidová Hladká
Vincent Kríž

Abstract: According to the statistics provided by the International Data Corporation, 90% of all available digital data is unstructured and its amount currently grows twice as fast as structured data. In many domains, large collections of unstructured documents form main sources of information. Their efficient browsing and querying present key aspects in many areas of human activities. The project INTLIB, an INTelligent LIBrary, assumes a collection of documents related to a particular problem domain on the input. In the first phase we extract a knowledge base from the collection using natural language processing tools. In the second phase we deal with efficient and user friendly visualization and querying the extracted knowledge. We will present our results on both legislative and enviromental domain.

Institute of Formal and Applied Linguistics

Charles University, Czech Republic
Faculty of Mathematics and Physics

Search form