Description

The Czech Legal Text Treebank (CLTT) is a manually annotated corpus of dependency trees. The treebank consists of 1,121 sentences from the legal domain.

The Czech Legal Text Treebank 2.0 (CLTT 2.0) annotates the same texts as the CLTT 1.0. The CLTT 2.0 annotation on the syntactic layer is more elaborate than in the CLTT 1.0 from various aspects. In addition, new annotation layers were added to the data: (i) the layer of accounting entities, and (ii) the layer of semantic entity relations.

Latest News

  • September 2017
    CLTT 2.0 was published officially.
     
  • May 2016
    CLTT 1.0 was presented on the LREC 2016 Poster Session.
     
  • January 2016
    CLTT 1.0 was published officially.

Documents in CLTT

The sentences were taken from Accounting Act (563/1991 Coll., as amended) and Decree on Double-entry Accounting for undertakers (500/2002 Coll., as amended). The selection was given by the goals determined in the INTLIB project, focusing on the accounting subdomain namely.

Syntactic Annotations

The annotations in CLTT fit the framework originally formulated in the Prague Dependency Treebank (PDT) project. The dependency approach to syntactic analysis with the main role of the verb is applied. Technically, we speak about the analytical (a-) layer of annotation where each token in the sentence has one corresponding node and dependencies are assigned with the syntactic dependency function stored in the afun attribute.

To make manual annotation as easy as possible, we developed a special annotation strategy:

Accounting Entities Annotation

In the CLTT 2.0, we introduced a new annotation layer of accounting entities. We exploited the dictionary of accounting terms that was created for the RExtractor system. Subsequently, we used the RExtractor system for automatic identification of entities in the CLTT dependency trees.

The dictionary of accounting terms consists of 1,733 different terms classified into 25 categories. The RExtractor system identified 7,332 occurrences in the CLTT 2.0.

Semantic Relations Annotation

The layer of semantic relations is newly introduced in the CLTT 2.0. Relations are represented as (subject, predicate, object) triples, where subject and object have to be entities and predicate represents a relation. Three types of semantic relations were manually annotated in the CLTT texts:

  • Definitions
    Relations link an entity (subject) and its definition (object)
     
  • Rights
    Relations link an entity (subject) which have a given right (object) to do something.
     
  • Obligations
    Relations link an entity (subject) which have a given obligation (object) to do something.

The CLTT 2.0 contains 498 manually annotated relations classified into 3 categories.

Download

Browse data

  • To browse CLTT 2.0,  you need to run the open-source application TrEd with the CLTT Extension. This extension can be installed directly from TrEd using Setup >> Manage Extensions >> Get New Extensions. Make sure that the repository http://ufal.mff.cuni.cz/tred/extensions/core/ is enabled in Setup >> Manage Extensions >> Edit Repositories.

Authors

Reference

Please use the following text to cite CLTT:

Kríž, Vincent; Hladká, Barbora and Urešová, Zdeňka, 2015, Czech Legal Text Treebank, LINDAT/CLARIN digital library at Institute of Formal and Applied Linguistics, Charles University in Prague, http://hdl.handle.net/11234/1-1516.

BIBTEX | CMDI

Licence

Distributed under CC BY-NC-SA licence.