SPAT is an acronym for "System for textual data analysis" in Czech.
The system has been developed as a part of the project number VH 023 with cooperation of Ministry of the Interior of the Czech Republic. It offers analysis of textual documents in Czech language. The analysis consists of subsequent parts:
Input documents can be of arbtitrary format that contains textual data (txt, pdf, docx, ppt, xml,..). They are converted to plain text and organized into collections based on the input directory structure. The system provides a fulltext search interface for these collections. The search is accessible through REST API. The query language allows construction of complex queries using several operators. It is also possible to search for all forms of the lemmatized query, search just certain entities etc. Also similar documents to the one found can be retrieved.