Tags: 

ParCzech

ParCzech is a project on compiling Czech parliamentary data into annotated corpora (short intro), which is developed in symbiosis with the ParlaMint project and its part ParlaMint-CZ.

Download

Visit the LINDAT/CLARIAH-CZ repository

  • ParCzech 4.0
    • This version covers the same period as ParlaMint-CZ corpus v4.0 (http://hdl.handle.net/11356/1860). ParCzech corpus follows and extends the ParlaMint schema. Both annotated and non-annotated versions include hypertext references to voting and parliamentary prints. In addition to ParlaMint's recommendation, the annotated version contains source audio alignment, PDT xtag, and more detailed CNEC2.0 named entity categorization.
  • ParCzech 3.0
  • ParCzech PS7 2.0
  • ParCzech PS7 1.0
    • parczech-ps7-1.0-raw.tar.gz (stenoprotocols converted into TEI-derived coding and split into speeches, links to audio files included)
    • parczech-ps7-1.0-annotated.tar.gz(stenoprotocols tokenized and processed by NameTag)
    • parczech-ps7-1.0-audio-DDD.tar (MP3 audiorecordings, DDD stands for the file number, one file may contain stenoprotocols from more than one meeting, but one meeting is not split into more than one archive)

Search in KonText

Search in TEITOK

Publications

  • Hladká Barbora, Kopp Matyáš and Straňák Pavel. Compiling Czech Parliamentary Stenographic Protocols into a Corpus. In Proceedings of the LREC 2020 Workshop on Creating, Using and Linking of Parliamentary Corpora with Other Types of Political Discourse (ParlaCLARIN II), Darja Fiser, Maria Eskevich, Franciska de Jong (eds.), pp. 18–22,  2020.
  • Kopp Matyáš, Vladislav Stankov, Jan Oldřich Krůza, Pavel Straňák, Ondřej Bojar. ParCzech 3.0: A Large Czech Speech Corpus with Rich Metadata. Text, Speech, and Dialogue. Springer International Publishing, pp. 293-304, 2021.

Bibtex for Referencing

 


@InProceedings{kopp2021tsd,
    editor = "Ek{\v{s}}tein, Kamil and P{\'{a}}rtl, Franti{\v{s}}ek and Konop{\'{\i}}k, Miroslav",
    author = "Kopp, Maty{\'{a}}{\v{s}} and Stankov, Vladislav and Kr{\r{u}}za, Jan and Stra{\v{n}}{\'{a}}k, Pavel and Bojar, Ond{\v{r}}ej",
    booktitle = "Text, Speech, and Dialogue",
    booksubtitle = "24th International Conference, TSD 2021, Olomouc, Czech Republic, September 6–9, 2021, Proceedings",
    title = "{ParCzech} 3.0: {A} {L}arge {Czech} {S}peech {C}orpus with {R}ich {M}etadata",
    year = "2021",
    publisher = "Springer",
    organization = "University of West Bohemia",
    address = "Cham, Switzerland",
    series = "Lecture Notes in Computer Science",
    isbn = "978-3-030-83526-2",
    pages = "293--304",
    doi = "10.1007/978-3-030-83527-9\_25",
}
@misc{ParCzech4.0,
    author = "Kopp, Maty{\'a}{\v s}",
    title = "{ParCzech} 4.0",
    url = "http://hdl.handle.net/11234/1-5360",
    publisher = "{LINDAT}/{CLARIAH}-{CZ} digital library",
    copyright = "Public Domain Dedication ({CC} Zero)",
    year = "2024"
}

 

Acknowledgement

This work has been using language resources and tools developed and/or stored and/or distributed by the LINDAT/CLARIAH-CZ project of the Ministry of Education, Youth and Sports of the Czech Republic (projects LM2018101 and LM2023062).

Contact

Barbora Hladká

Matyáš Kopp

Pavel Straňák