Description

The Czech Legal Text Treebank (CLTT) is a manually annotated corpus of dependency trees. The treebank consists of 1,121 sentences from the legal domain.

The Czech Legal Text Treebank 2.0 (CLTT 2.0) annotates the same texts as the CLTT 1.0. The CLTT 2.0 annotation on the syntactic layer is more elaborate than in the CLTT 1.0 from various aspects. In addition, new annotation layers were added to the data: (i) the layer of accounting entities, and (ii) the layer of semantic entity relations.

Latest News

  • May 2018
    CLTT 2.0 was presented on the LREC 2018 Poster Sesstion.
     
  • September 2017
    CLTT 2.0 was published officially.
     
  • May 2016
    CLTT 1.0 was presented on the LREC 2016 Poster Session.
     
  • January 2016
    CLTT 1.0 was published officially.

Documents in CLTT

The sentences were taken from Accounting Act (563/1991 Coll., as amended) and Decree on Double-entry Accounting for undertakers (500/2002 Coll., as amended). The selection was given by the goals determined in the INTLIB project, focusing on the accounting subdomain namely.

Syntactic Annotations

The annotations in CLTT fit the framework originally formulated in the Prague Dependency Treebank (PDT) project. The dependency approach to syntactic analysis with the main role of the verb is applied. Technically, we speak about the analytical (a-) layer of annotation where each token in the sentence has one corresponding node and dependencies are assigned with the syntactic dependency function stored in the afun attribute.

To make manual annotation as easy as possible, we developed a special annotation strategy:

Accounting Entities Annotation

In the CLTT 2.0, we introduced a new annotation layer of accounting entities. We exploited the dictionary of accounting terms that was created for the RExtractor system. Subsequently, we used the RExtractor system for automatic identification of entities in the CLTT dependency trees.

The dictionary of accounting terms consists of 1,733 different terms classified into 25 categories. The RExtractor system identified 7,332 occurrences in the CLTT 2.0.

Semantic Relations Annotation

The layer of semantic relations is newly introduced in the CLTT 2.0. Relations are represented as (subject, predicate, object) triples, where subject and object have to be entities and predicate represents a relation. Three types of semantic relations were manually annotated in the CLTT texts:

  • Definitions
    Relations link an entity (subject) and its definition (object)
     
  • Rights
    Relations link an entity (subject) which have a given right (object) to do something.
     
  • Obligations
    Relations link an entity (subject) which have a given obligation (object) to do something.

The CLTT 2.0 contains 498 manually annotated relations classified into 3 categories.

Download

Browse data offline

  • To browse CLTT 2.0,  you need to run the open-source application TrEd with the Czech Legal Text Treebank 2.0 Extension. This extension can be installed directly from TrEd using Setup >> Manage Extensions >> Get New Extensions. Make sure that the repository http://ufal.mff.cuni.cz/tred/extensions/core/ is enabled in Setup >> Manage Extensions >> Edit Repositories.

    Because of a bug in TrEd, you should also install a Perl pagkage Graph::Kruskal to enabling entities presentation.
     
  • To browse CLTT 1.0, you need to install the Czech Legal Text Treebank 1.0 Extension. However, with the CLTT 2.0 release, this extensions becomes obsolete and we provide no support for it.

Browse data online

  • CLTT 2.0 is available online as well, you can use PMLTQ service at LINDAT.
     
  • CLTT 1.0 is available at PMLTQ service as well :-)

Authors

Reference

Please use the following text to cite CLTT 2.0:

  • Kríž Vincent, Hladká Barbora: Czech Legal Text Treebank 2.0. In: Proceedings of the 11th International Conference on Language Resources and Evaluation (LREC 2018), Copyright © European Language Resources Association, Paris, France, ISBN 979-10-95546-00-9, pp. 4501-4505, 2018
     
  • Kríž, Vincent and Hladká, Barbora, 2017, Czech Legal Text Treebank 2.0, LINDAT/CLARIN digital library at the Institute of Formal and Applied Linguistics (ÚFAL), Faculty of Mathematics and Physics, Charles University,http://hdl.handle.net/11234/1-2498

Please use the following text to cite CLTT 1.0:

  • Kríž Vincent, Hladká Barbora, Urešová Zdeňka: Czech Legal Text Treebank 1.0. In: Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC 2016), Copyright © European Language Resources Association, Paris, France, ISBN 978-2-9517408-9-1, pp. 2387-2392, 2016, http://www.lrec-conf.org/proceedings/lrec2016/pdf/936_Paper.pdf
     
  • Kríž, Vincent; Hladká, Barbora and Urešová, Zdeňka, 2015, Czech Legal Text Treebank, LINDAT/CLARIN digital library at Institute of Formal and Applied Linguistics, Charles University in Prague, http://hdl.handle.net/11234/1-1516.

Licence

Distributed under CC BY-NC-SA licence.