Subject annotation

Preamble 1.0

Preamble 1.0 is a multilingual annotated corpus of the preamble of the EU REGULATION 2020/2092 OF THE EUROPEAN PARLIAMENT AND OF THE COUNCIL of 16 December 2020 on a general regime of conditionality for the protection of the Union budget. The corpus consists of four language versions of the preamble (source texts downloaded from the following web pages):

Czech (https://eur-lex.europa.eu/legal-content/CS/TXT/PDF/?uri=CELEX:32020R2092)
English (https://eur-lex.europa.eu/legal-content/EN/TXT/PDF/?uri=CELEX:32020R2092)
French (https://eur-lex.europa.eu/legal-content/FR/TXT/PDF/?uri=CELEX:32020R2092)
Polish (https://eur-lex.europa.eu/legal-content/PL/TXT/PDF/?uri=CELEX:32020R2092)

The corpus was published in October 2022 at LINDAT/CLARIAH-CZ repository (http://hdl.handle.net/11234/1-4912) under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0) licence.

The annotation comprises of annotation of subjects, while an annotated subject is always only a single word, i.e., in sentence "European leaders said that...", only "leaders" is annotated as a subject. In accordance with this rule, which follows the Universal Dependencies framework, each member of a coordinated subject is annotated separately, articles are not included.

The annotations come as a result of the work of the NPFL134 course students (see below); each preamble was independently annotated by two students and their automatically unified results were subsequently curated by an arbiter.


Annotation task

Annotate subjects in the sentences of the preamble of the EU regulation 2020/2092 on a general regime of conditionality for the protection of the Union budget in one of the two languages - English and Italian. Our motivation to organize this task is twofold:

  • evaluate UDPipe system on manually annotated data
  • get experience with readability of legal texts

The homework should be completed by May 13, 2023.  

Annotation instructions

  • To log in to Brat editor use the credentials sent via e-mail.
  • Read carefully the preamble and identify subject(s) in each sentence. We follow the Universal Dependencies annotation guidelines where the basic units of annotation are (syntactic) words, which means that the subject is exactly one word in our annotation task. Typically, it is a noun, pronoun or relative pronoun. Mark all subjects standing in a coordinated construction separately.

Annotation tool

Brat -- https://quest.ms.mff.cuni.cz/brat/npfl134_2023/index.xhtml#/

New annotation
  • Mark a subject with the mouse
  • Press Enter
​Delete the annotation
  • Double click on the annotation
  • Press Delete

Questions & Answers

  • Please ignore the error message that appears in Brat at the beginning of annotating

    The SimSem connection has not been configured, please contact the administrator.
    Rapid annotation mode error; returning to normal mode.

  • If the pop-up annotation window  (or the window for editing the annotation) is too high to fit your screen, so you don't see the button OK or Delete, control the size of a web page using Ctrl +/-.