Tags: 

ParCzech

ParCzech is a project on compiling Czech parliamentary data into annotated corpora.

December 1, 2020

Kick-off meeting of the 2nd phase of the ParlaMint project. Czech is on board!

Download

To download the data, please visit the LINDAT/CLARIAH-CZ repository:

  • ParCzech PS7 2.0
  • ParCzech PS7 1.0
    • parczech-ps7-1.0-raw.tar.gz (stenoprotocols converted into TEI-derived coding and split into speeches, links to audio files included)
    • parczech-ps7-1.0-annotated.tar.gz(stenoprotocols tokenized and processed by NameTag)
    • parczech-ps7-1.0-audio-DDD.tar (MP3 audiorecordings, DDD stands for the file number, one file may contain stenoprotocols from more than one meeting, but one meeting is not split into more than one archive)

Search in KonText

Search in TEITOK

Cite

To properly acknowledge ParCzech PS7 2.0, please cite the following data item in the LINDAT/CLARIAH-CZ repository (txt, BibTex):


Hladká, Barbora; Kopp, Matyáš and Straňák, Pavel, 2020,  ParCzech PS7 2.0, LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL), Faculty of Mathematics and Physics, Charles University,  http://hdl.handle.net/11234/1-3436.

 @misc{11234/1-3436,
 title = {{ParCzech} {PS7} 2.0},
 author = {Hladk{\'a}, Barbora and Kopp, Maty{\'a}{\v s} and Stra{\v n}{\'a}k, Pavel},
 url = {http://hdl.handle.net/11234/1-3436},
 note = {{LINDAT}/{CLARIAH}-{CZ} digital library at the Institute of Formal and Applied Linguistics ({{\'U}FAL}), Faculty of Mathematics and Physics, Charles University},
 copyright = {Public Domain Dedication ({CC} Zero)},
 year = {2020} }

To properly acknowledge ParCzech PS7 1.0, please cite the following data item in the LINDAT/CLARIAH-CZ repository (txt, BibTex):


Hladká, Barbora; Kopp, Matyáš and Straňák, Pavel, 2020,  ParCzech PS7 1.0, LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL), Faculty of Mathematics and Physics, Charles University,  http://hdl.handle.net/11234/1-3174.


@misc{11234/1-3174,
 title = {{ParCzech} {PS7} 1.0},
 author = {Hladk{\'a}, Barbora and Kopp, Maty{\'a}{\v s} and Stra{\v n}{\'a}k, Pavel},
 url = {http://hdl.handle.net/11234/1-3174},
 note = {{LINDAT}/{CLARIAH}-{CZ} digital library at the Institute of Formal and Applied Linguistics ({{\'U}FAL}), Faculty of Mathematics and Physics, Charles University},
 copyright = {Public Domain Dedication ({CC} Zero)},
 year = {2020} }

Publications

  • Hladká Barbora, Kopp Matyáš and Straňák Pavel. Compiling Czech Parliamentary Stenographic Protocols into a Corpus. In Proceedings of the LREC 2020 Workshop on Creating, Using and Linking of Parliamentary Corpora with Other Types of Political Discourse (ParlaCLARIN II), Darja Fiser, Maria Eskevich, Franciska de Jong (eds.), pp. 18–22,  2020.

Acknowledgements

This work has been using language resources and tools developed and/or stored and/or distributed by the LINDAT/CLARIAH-CZ project of the Ministry of Education, Youth and Sports of the Czech Republic (project LM2018101).

Contact

Barbora Hladká

Matyáš Kopp

Pavel Straňák