The Czech Legal Text Treebank (CLTT) is a manually annotated corpus of dependency trees. The treebank consists of 1,121 sentences from the legal domain.
The Czech Legal Text Treebank 2.0 (CLTT 2.0) annotates the same texts as the CLTT 1.0. The CLTT 2.0 annotation on the syntactic layer is more elaborate than in the CLTT 1.0 from various aspects. In addition, new annotation layers were added to the data: (i) the layer of accounting entities, and (ii) the layer of semantic entity relations.
CLTT 2.0 was presented on the LREC 2018 Poster Sesstion.
CLTT 2.0 was published officially.
CLTT 1.0 was presented on the LREC 2016 Poster Session.
CLTT 1.0 was published officially.
Documents in CLTT
The sentences were taken from Accounting Act (563/1991 Coll., as amended) and Decree on Double-entry Accounting for undertakers (500/2002 Coll., as amended). The selection was given by the goals determined in the INTLIB project, focusing on the accounting subdomain namely.
The annotations in CLTT fit the framework originally formulated in the Prague Dependency Treebank (PDT) project. The dependency approach to syntactic analysis with the main role of the verb is applied. Technically, we speak about the analytical (a-) layer of annotation where each token in the sentence has one corresponding node and dependencies are assigned with the syntactic dependency function stored in the afun attribute.
To make manual annotation as easy as possible, we developed a special annotation strategy:
Accounting Entities Annotation
In the CLTT 2.0, we introduced a new annotation layer of accounting entities. We exploited the dictionary of accounting terms that was created for the RExtractor system. Subsequently, we used the RExtractor system for automatic identification of entities in the CLTT dependency trees.
The dictionary of accounting terms consists of 1,733 different terms classified into 25 categories. The RExtractor system identified 7,332 occurrences in the CLTT 2.0.
Semantic Relations Annotation
The layer of semantic relations is newly introduced in the CLTT 2.0. Relations are represented as (subject, predicate, object) triples, where subject and object have to be entities and predicate represents a relation. Three types of semantic relations were manually annotated in the CLTT texts:
Relations link an entity (subject) and its definition (object)
Relations link an entity (subject) which have a given right (object) to do something.
Relations link an entity (subject) which have a given obligation (object) to do something.
The CLTT 2.0 contains 498 manually annotated relations classified into 3 categories.
Browse data offline
To browse CLTT 2.0, you need to run the open-source application TrEd with the Czech Legal Text Treebank 2.0 Extension. This extension can be installed directly from TrEd using Setup >> Manage Extensions >> Get New Extensions. Make sure that the repository http://ufal.mff.cuni.cz/tred/extensions/core/ is enabled in Setup >> Manage Extensions >> Edit Repositories.
Because of a bug in TrEd, you should also install a Perl pagkage Graph::Kruskal to enabling entities presentation.
To browse CLTT 1.0, you need to install the Czech Legal Text Treebank 1.0 Extension. However, with the CLTT 2.0 release, this extensions becomes obsolete and we provide no support for it.
Browse data online
Please use the following text to cite CLTT 2.0:
Kríž Vincent, Hladká Barbora: Czech Legal Text Treebank 2.0. In: Proceedings of the 11th International Conference on Language Resources and Evaluation (LREC 2018), Copyright © European Language Resources Association, Paris, France, ISBN 979-10-95546-00-9, pp. 4501-4505, 2018
Kríž, Vincent and Hladká, Barbora, 2017, Czech Legal Text Treebank 2.0, LINDAT/CLARIN digital library at the Institute of Formal and Applied Linguistics (ÚFAL), Faculty of Mathematics and Physics, Charles University,http://hdl.handle.net/11234/1-2498.
Please use the following text to cite CLTT 1.0:
Kríž Vincent, Hladká Barbora, Urešová Zdeňka: Czech Legal Text Treebank 1.0. In: Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC 2016), Copyright © European Language Resources Association, Paris, France, ISBN 978-2-9517408-9-1, pp. 2387-2392, 2016
Kríž, Vincent; Hladká, Barbora and Urešová, Zdeňka, 2015, Czech Legal Text Treebank, LINDAT/CLARIN digital library at Institute of Formal and Applied Linguistics, Charles University in Prague, http://hdl.handle.net/11234/1-1516.
Distributed under CC BY-NC-SA licence.