ParCzech is a project on compiling Czech parliamentary data into annotated corpora (short intro).

December 1, 2020

Kick-off meeting of the 2nd phase of the ParlaMint project. Czech is on board!


Visit the LINDAT/CLARIAH-CZ repository

  • ParCzech 3.0
  • ParCzech PS7 2.0
  • ParCzech PS7 1.0
    • parczech-ps7-1.0-raw.tar.gz (stenoprotocols converted into TEI-derived coding and split into speeches, links to audio files included)
    • parczech-ps7-1.0-annotated.tar.gz(stenoprotocols tokenized and processed by NameTag)
    • parczech-ps7-1.0-audio-DDD.tar (MP3 audiorecordings, DDD stands for the file number, one file may contain stenoprotocols from more than one meeting, but one meeting is not split into more than one archive)

Search in KonText

Search in TEITOK


  • Hladká Barbora, Kopp Matyáš and Straňák Pavel. Compiling Czech Parliamentary Stenographic Protocols into a Corpus. In Proceedings of the LREC 2020 Workshop on Creating, Using and Linking of Parliamentary Corpora with Other Types of Political Discourse (ParlaCLARIN II), Darja Fiser, Maria Eskevich, Franciska de Jong (eds.), pp. 18–22,  2020.
  • Kopp Matyáš, Vladislav Stankov, Jan Oldřich Krůza, Pavel Straňák, Ondřej Bojar. ParCzech 3.0: A Large Czech Speech Corpus with Rich Metadata. Text, Speech, and Dialogue. Springer International Publishing, pp. 293-304, 2021.


This work has been using language resources and tools developed and/or stored and/or distributed by the LINDAT/CLARIAH-CZ project of the Ministry of Education, Youth and Sports of the Czech Republic (project LM2018101).


Barbora Hladká

Matyáš Kopp

Pavel Straňák