We present the Czech Court Decisions Dataset (CCDD) -- a dataset of 300 manually annotated court decisions published by The Supreme Court of the Czech Republic (SC) and the Constitutional Court of the Czech Republic (CC).

Document Sources

CCDD contains 150 documents published by the Supreme Court of the Czech Republic in 2012. We selected them randomly with respect to their distribution over senates. CCDD contains other 150 documents published by the Constitutional Court of the Czech Republic from 2004 to 2012.

Annotation

All 300 documents in CCDD are manually annotated using the following entities:

  • reference to court decision
  • reference to act
  • applicability of act
  • institution

In addition, a court decision reference are linked with an institution entities if the institution issued the court decision. Links between applicabilities and act references are not annotated. Each applicability entity follows an act reference.

For manual annotation we used a web-based annotation tool Brat. Annotators mark entity occurrences and label them with an appropriate tag. Then they mark relations between court decisions and institutions if they appear there.

Annotation Quality

We did a single annotation of 300 court decision. However, to get the inter-annotator agreement (IAA), we selected 15 random documents from the dataset and annotated them by three independent annotators. In average, the annotators marked 551 institutions, 258 references to court decisions, 402 references to acts and 42 applicabilities. We used the Fleiss' kappa to calculate the agreement. We report kappa = 0.85 that we interpret as almost perfect agreement.

Statistics

Table below presents statistics on the 300 annotated documents averaged over 10 cross-validation folds:

Court Statistics Split Act Decision Applicability Institution Total
SC # of tokens train 43,117 11,074 1,262 12,425 332,535
    test 5,348 1,855 265 1,450 36,999
  # of entities train 3,949 1,304 222 2,485 7,487
    test 439 145 25 276 943
CC # of tokens train 19,675 12,780 843 14,767 312,191
    test 2,707 2,127 102 1,743 34,701
  # of entities train 2,338 1,481 210 3,206 7,910
    test 260 165 23 356 879

Table below presents average entity lengths in tokens. The minimal reference length corresponds to four tokens. According to the entity lengths, the act references are the most complex entities while the institution references are the simplest ones.

  SC CC
  train test train test
Act reference 10.9 12.2 8.4 10.4
Decision 8.5 12.8 8.6 12.9
Applicability 5.7 10.7 4.0 4.4
Institution 5.0 5.3 4.6 4.9

Authors

Licence

Distributed under CC BY-NC-SA 4.0 licence.

Ackowledgement

We gratefully acknowledge support from the Technology Agency of the Czech Republic (grant no. TA02010182).