SPAT | ÚFAL

Tags:

Data, Information Structure, Machine Learning, Morphology

SPAT

SPAT is an acronym for "System for textual data analysis" in Czech.

The system has been developed as a part of the project number VH 023 with cooperation of Ministry of the Interior of the Czech Republic. It offers analysis of textual documents in Czech language. The analysis consists of subsequent parts:

Named entity recognition
Relation extraction
Summarization
Lemmatization

Input documents can be of arbtitrary format that contains textual data (txt, pdf, docx, ppt, xml,..). They are converted to plain text and organized into collections based on the input directory structure. The system provides a fulltext search interface for these collections. The search is accessible through REST API. The query language allows construction of complex queries using several operators. It is also possible to search for all forms of the lemmatized query, search just certain entities etc. Also similar documents to the one found can be retrieved.

The search returns structured documents enriched with the results of the analysis. The system also contains a lightweight javascript client app that communicates with the server side and displays the result in user friendly fashion. It also allows to do analysis of documents on demand.

Search form:

Search results: