CZECH ACADEMIC CORPUS 1.0 GUIDE
|česky|

1. Preface

The Prague family of annotated corpora has a new member –  the Czech Academic Corpus version 2.0 (CAC 2.0) – the morphologically and syntactically manually annotated corpus of the Czech language. The precise formulation of the CAC 2.0 would be new and old member, as there was only one version preceding the current one. The first version contained “only” morphological annotations; it was published a year ago, therefore it can be understood as outdated. The new phenomenon brought about by the CAC 2.0 is syntactical annotation – therefore we can characterise our corpus by another Praguian attribute – dependency.

The CAC 2.0 Guide is a guide to the CD-ROM, just like the previous CAC 1.0 Guide. The contents of the Guide provide all the necessary information about the project; however the user does not need to be familiar with the CAC 1.0 Guide. The CAC 1.0 Guide can be referred to for the details of the CAC project’s history and its preparation details. Nevertheless, if you are already familiar with the CAC 1.0 Guide, navigating it will be easy, as we have maintained its chapters’ organisation into three main units.

The first unit, Chapter 2, describes the main characteristics of the Czech Academic Corpus 2.0, the structure of its annotations and the documentation of the partial steps of the syntactical annotations.

The second unit, Chapters 3 through 6, contain the CD-ROM information and the documentation of the data component, tools, bonus material and tutorials. Part 3.2 introduces the corpus as a data file with an inner representation. A considerable amount of information concerns the corpus viewing tools – Bonito (part 3.3.1) and Netgraph (part 3.3.4), annotation editors – LAW (part 3.3.2) and TrEd (part 3.3.3) and tools for morpho-syntactical processing of texts (part 3.3.5). Chapter 4 is decorated with two bonuses; these are the STYX Czech electronic exercise book (part 4.1) and the TrEdVoice module for the voice control of the TrEd (part 4.2). All the tools provided and their graphical interfaces are documented and equipped with tutorials in the form of demos – see Chapter 5 for the complete list. Chapter 6 contains the installation instructions for the CD-ROM components. Chapter 7 summarises the information on the distribution of the CD-ROM.

Chapters 8 and 9 form the third unit of the Guide. They cover the personal and financial aspects of the project. You will find five annexes: Appendix A enumerates the sources of corpus’ texts; Appendix B describes the structure of lemmas for the simple orientation in the morphological annotations; Appendix C describes the structure of a morphological tag; Appendix D guides the user through syntactical annotations; Appendix E completes the Guide with web links.

This CD is being published in the final year of the project Resources and Tools for Information Systems, No. 1ET101120413, financed by the Grant Agency of the Academy of Sciences of the Czech Republic. The CD completes the comprehensive results presentation of the five years of work on the project.