We present the Czech Court Decisions Dataset (CCDD) that is a dataset of 300 court decisions published by The Supreme Court of the Czech Republic (SC) and the Constitutional Court of the Czech Republic (CC). In these decisions selected entities  are manually detected and classified.

Sources

CCDD contains 150 court decisions published by the Supreme Court of the Czech Republic in 2012 and we selected them randomly with respect to their distribution over the senates. Next CCDD contains 150 court decisions published by the Constitutional Court of the Czech Republic in 2004 - 2012.

Annotation

The following entities are recognizied in CCDD:

  • reference to a court decision
  • reference to an act
  • applicability of the act
  • institution

In addition, court decision references are linked with the institutions that issued them.  Each applicability entity follows an act reference. For manual annotation we used the web-based annotation tool Brat. The annotators marked entity occurrences and label them with an appropriate tag. Then they marked relations between court decisions and institutions.

Accessibility of CCDD

Annotation Quality

We did a single annotation of 300 court decision. However, to get the inter-annotator agreement we selected 15 random documents from the dataset and annotated them by three annotators. In average the annotators marked 551 institutions, 258 court decision references, 402 act references, and 42 applicabilities. We used the Fleiss' kappa to calculate the agreement. We report kappa = 0.85 that we interpret as almost perfect agreement.

Statistics

The table below presents the CCDD statistics:

  SC CC
Entity type # of entities # of tokens Average entity length # of entities # of tokens Average entity length
Institution 4,891 13,714 2.8 6,318 15,798 2.5
Decision references 1,449 6,967 4.8 1,644 8,146 5.0
Act references 4,387 33,628 7.7 2,597 18,774 7.2
Applicability 247 1,179 4.8 233 938 4.0

 

Authors

Licence

Distributed under CC BY-NC-SA 4.0 licence.

Ackowledgement

We gratefully acknowledge support from the Technology Agency of the Czech Republic (grant no. TA02010182).