EnglishČesky
Header Image n.1Header Image n.2Header Image n.3Header Image n.4Header Image n.5

Authors

Jan Hajič, Eva Hajičová, Jarmila Panevová, Petr Sgall, Silvie Cinková, Eva Fučíková, Marie Mikulová, Petr Pajas, Jan Popelka, Jiří Semecký, Jana Šindlerová, Jan Štěpánek, Josef Toman, Zdeňka Urešová, Zdeněk Žabokrtský

Credits

The Prague Czech English Dependency Treebank 2.0 has come true as a joint effort of a number of people. Alphabetical order (based on their last names) is used throughout, except for publications (such as the Annotator's Guidelines) and tools, where the published order of the authors is respected.

Coordinator (EN, CZ): Jan Hajič

Linguistic support (EN, CZ): Eva Hajičová, Jarmila Panevová, Petr Sgall

Coordination, training, manuals:

  • EN: Silvie Cinková
  • CZ: Marie Mikulová

Valency lexicons:

  • EN (EngvalLex): Jana Šindlerová
  • CZ (PDT-ValLex): Zdeňka Urešová

Data pre-processing, annotation support and post-annotation checking:

  • EN: Silvie Cinková, Eva Fučíková, Josef Toman, Jiří Semecký
  • CZ: Marie Mikulová, Jan Popelka, Jan Štěpánek

Major software and data processing modules: Petr Pajas, Zdeněk Žabokrtský

Additional annotators training (EN): Jana Šindlerová

Annotators:

  • English deep-syntax (tectogrammatical) annotation: Kristýna Čermáková, Vojtěch Diatka, Matěj Korvas, Ema Krejčová, Jan Mašek, Anja Nedolužko, Lucie Poláková, Magdalena Rysová, Lenka Šíková, Jana Šindlerová, Kristýna Tomšů, Kateřina Veselá, Kateřina Veselovská
  • Czech deep-syntax (tectogrammatical) annotation: Zuzanna Bedřichová, Kristýna Čermáková, Jitka Faktorová, Ivana Klímová, Martina Koppová, Alena Kropíková, Michala Lvová, Aneta Pečenková, Lenka Šíková, Katka Voleková, Olga Zitová
  • Czech surface-syntax (analytical) annotation of 2.000 sentences: Ivana Klímová
  • Czech coreference annotation: Eliška Černá, Veronika Čurdová, Eliška Davidová, Vojtěch Diatka, Ivan Kafka, Radka Mačugová, Hana Vildová, Klára Zindulková, Zdeněk Zůcha

Czech translation supervision and revisions: Marie Mikulová, Jan Štěpánek

Tools:

  • TrEd: Petr Pajas, Peter Fabian
  • btred: Petr Pajas
  • PML Tree Query: Petr Pajas, Jan Štěpánek
  • Treex: Zdeněk Žabokrtský, Martin Popel, David Mareček, Ondřej Bojar, Václav Klimeš, Tomáš Kraut, Václav Novák, Jan Ptáček, Rudolf Rosa, Daniel Zeman
  • Segmentation and tokenization of Czech texts: Jan Hajič, Michal Křen
  • Morphological Analyzer of Czech: Jan Hajič, Jaroslava Hlaváčová
  • English lemmatization: Jiří Semecký
  • Czech Tagger: Jan Hajič
  • A-layer parser for Czech: Jason Baldridge, Ryan McDonald (MST parser)
  • T-layer parser for annotation of Czech: Václav Klimeš
  • Wrappers for the parsers: Jan Hajič
  • Aligner: David Mareček, Václav Novák, Zdeněk Žabokrtský
  • Web-based interface for annotation progress monitoring: Eva Fučíková, Jiří Semecký, Jan Štěpánek, Josef Toman
  • XSH: Petr Pajas

Publications:

  • Collection: Silvie Cinková
  • Formatting: Josef Toman, Silvie Cinková

DVD-ROM, web design: Josef Toman

Data validation: Eva Fučíková, Josef Toman

Accompanying documentation: Silvie Cinková, Josef Toman, Jan Hajič

The English part of PCEDT 2.0 draws on other annotations performed worldwide. Although our linguistic approach is different in many points, we have made substantial use of these annotation efforts while automatically pre-processing the data for our annotators. We are very grateful to the teams of the flat noun phrase annotation, Penn Treebank, PropBank, NomBank and BBN Pronoun Coreference and Entity Type Corpus, whose work has saved a lot of our annotators' time. They are (at least):

  • James R. Curran and David Vadas (flat noun phrase annotation)
  • Mitchell P. Marcus, Beatrice Santorini, Mary Ann Marcinkiewicz and Ann Taylor (Penn Treebank)
  • Martha Palmer, Paul Kingsbury, Olga Babko-Malaya, Scott Cotton, and Benjamin Snyder (PropBank)
  • Martha Palmer, Karin Kipper, Edward Loper, Szuting Yi, Susan Brown, Arrick Lafranchi, Russell-Lee Goldman, Derek Trumbo, Andy Dolbey, Hoa Trang Dang, Neville Ryan, Benjamin Snyder (VerbNet)
  • Adam Meyers, Ruth Reeves, Catherine Macleod (NomBank)
  • Ralph Weischedel and Ada Brunstein (BBN Pronoun Coreference and Entity Type Corpus)

The Czech part of the corpus was tagged with the MST Parser developed by Jason Baldridge and Ryan McDonald.

This web page uses Oxygen icons (among others). These icons can be freely copied under the LGPLv3.