CoNLL-2009 Shared Task:
Syntactic and Semantic Dependencies in Multiple Languages

 

 

CoNLL-2009 Shared Task Description

The task builds on the CoNLL-2008 task and extends it to multiple languages. The core of the task is to predict syntactic and semantic dependencies and their labeling. Data is provided for both statistical training and evaluation, which extract these labeled dependencies from manually annotated treebanks such as the Penn Treebank for English, the Prague Dependency Treebank for Czech and similar treebanks for Catalan, Chinese, German, Japanese and Spanish languages, enriched with semantic relations (such as those captured in the Prop/Nombank and similar resources). Great effort has been devoted to provide the participants with a common and relatively simple data representation for all the languages, similar to the last year's English data.

The objective is to perform and evaluate Semantic Role Labeling (SRL) using a dependency-based representation for both syntactic and semantic dependencies in a quite novel way (1). Furthermore, the SRL problem will address not only propositions centered around verbal predicates but also around nouns and other major part-of-speech categories (where available).

The syntactic dependencies to be modeled will be more complex than the ones used in the previous CoNLL evaluations: Johansson and Nugues have shown that a richer set of syntactic dependencies improves semantic processing (2). The proposed evaluation offers a practical framework to perform joint learning for the two problems. There are several issues that make this task attractive: We believe that the proposed dependency-based representation is a better fit for many applications (e.g., Information Retrieval, Information Extraction) where it is often sufficient to identify the dependency between the predicate and the head of the argument constituent rather than extracting the complete argument constituent.

Furthermore, it was shown that the extraction of dependencies can be performed with state-of-the-art performance in linear time (3), which can give a significant boost to the adoption of this technology in real-world applications and across multiple languages.

The task will continue investigating in several important research directions. For example, is the dependency-based representation better for SRL than the constituent-based formalism? Does joint learning improve syntactic and semantic analysis?

Training data provided

For training, the data will contain the values of the gold-standard annotation of HEAD, DEPREL, PRED and APREDs, and it will also contain the gold-standard lemma (LEMMA), part-of-speech (POS) and morphological features (FEAT). Training and development sections will be designated as such, even though their content will be formally the same. Similarly to the CoNLL-2008 shared task, this shared task evaluation is separated into two challenges:

'Closed Challenge'. Systems have to be built strictly with information contained in the given training corpus, and tuned with the development section. In addition, the PropBank frames and similar resources for the other languages can also be used (see Official Resources). Note that this means that constituent-based parsers or SRL systems can not be used in this challenge because the constituent-based annotations are not provided in our training set (unless trained in an un- or semi-supervised manner using only the data provided). The aim of this challenge is to compare the performance of the participating systems in a fair environment.

'Open Challenge'. Systems can be developed making use of any kind of external tools and resources. The only condition is that such tools or resources must have not been developed with the annotations of the test set, both for the input and output annotations of the data. In this challenge, we are interested in learning methods which make use of any tools or resources that might improve the performance. For example, we encourage the use of rich semantic information, by using WordNet, VerbNet or a WSD system. Also, in this challenge participants are encouraged to use constituent-based parsers and SRL systems, as long as these systems were trained only with the sections of TreeBank/PropBank/NomBank used in the shared task training corpus (the exact section numbers will be announced in due time; similarly for the other languages available). To encourage the participation of the groups that are only interested in SRL, the organizers will provide the output of state-of-the-art dependency parsers as input in this challenge. The comparison of different systems in this setting may not be fair, and thus ranking of systems is not necessarily important.

The task in detail

The participants are required to chose if their system computes both the syntactic dependencies and the semantic ones (joint system), or just the latter (SRL-only system). We strongly encourage to do the joint task. The participants are required to provide the output of the selected type of system in all the languages provided. Description of the representation and format of the provided input and required output can be found in Data format section. In order to make life easier for all and to avoid guessing which words are annotated as predicates (which is often arbitrary and language- and annotation-schema-dependent), we provide a clear indication for it (even in the test data): the column FILLPRED contains Y for those lines that should be considered predicates (not necessarily just verbs, but all words that are taking arguments in the particular annotation schema and language).

Specifically, the task consists of automatically filling the data columns for HEAD (syntactic dependency), DEPREL (type of dependency), PRED (frame, roleset, sense, or whatever it is called for a particular language), and APREDs (PREDs' argument dependencies and labels). The HEAD and DEPREL are optional (see below for the consequences for evaluation), and PRED is to be filled only for rows with FILLPRED=Y. Similarly, APREDs are to be filled only for the corresponding columns. The 'input' will consist of sentences with uniquely identified words, each of which will carry (on the same data line) the automatically pre-analyzed PLEMMA (lemma), PPOS (Part of Speech), PFEAT (other morphological and lexical features). For those interested only in the Semantic Role Labeling task, PHEAD (automatically predicted head node), and PDEPREL (automatically inferred dependency relation) will also be provided. The 'input' test data distributed at the runtime period will contain only the information described above, even though all the columns described in the data will be present (but possibly unfilled).

Each participant will be allowed to submit the output of 1 (one) system, with results for (possibly) both challenges. The type of submission (joint/SRL-only) will have to be specified. Participants are asked to be honest in declaring the type of the system; we discourage just copying the provided PHEAD and PDEPREL into HEAD and DEPREL - just say your system is a SRL-only and leave HEAD and DEPREL blank.

Evaluation

The output sent by the participants will be evaluated using a common evaluation script similar to the 2008 shared task, for all languages. The principles will be also similar to the CoNLL-2008 Shared Task: there will be separate Recall, Precision and F-measure scores for the syntactic dependencies, the SRL task and a joint score. Therefore, three rankings will be produced; the main one will be, given the emphasis on the "joint" prediction, the one based on the joint (labeled) macro F-score (see the detailed scorer description).

For those who submit only the SRL results, we will additionally compute (as a separate "system") their joint score using the input-supplied PHEAD/PDEPREL as if it is the true HEAD and DEPREL. For ranking of the SRL-only systems, the SRL F-score will be primary.

Learning Curve (Optional)

To assess the future implications for annotation and reserach, the participants are encouraged to submit (up to one week after the test output deadline, using similar uploading mechanism TBA) data for the learning curve of at least the large-data languages (English and Czech are recommended, but you can choose any other available language as well): we suggest using 25%, 50% and 75% of the training size as the minimal granularity of the data points for this task. Ideally participants should run both challenges under this setting and send the outputs properly identified by the training data size.

Performance (Optional)

Participants are encouraged to submit, by email to stranak@ufal.mff.cuni.cz, the memory footprint and the timing of both training and testing phase using their systems. While it is clear that this is also largely hardware- and OS-dependent, it should give a clue about the possible time efficiency and thus a hint at a set of possible applications which need certain response time. Depending on the entries received, they will appear in the common task desription paper.

Publication of the results

Participants are requested to submit a system description paper (see the deadlines published elsewhere) after the test period is over and they know the official scores.

In the paper, participants may also report results on restricted or extended task(s) and possibly only on some langauges (apart from the official ones); e.g., they can opt to ignore the FILLPRED column and make the prediction of the argument-bearing predicates themselves. Or, they can report results of more than the one system officially permitted for result submission.

References

(1) Hacioglu K. Semantic Role Labeling Using Dependency Trees. In Proceedings of COLING-2004, 2004.

(2) Johansson R. and Nugues P. Extended Constituent-to-dependency Conversion for English. In Proceedings of NODALIDA 2007, 2007.

(3) Nivre J., Hall J., Nilsson J. and Eryigit G. Labeled Pseudo-Projective Dependency Parsing with Support Vector Machines. In Proceedings of the CoNLL-X Shared Task, 2006.

Data format

Columns (overview)

ID FORM LEMMA PLEMMA POS PPOS FEAT PFEAT HEAD PHEAD DEPREL PDEPREL FILLPRED PRED APREDs

Description of the columns

ID, FORM, LEMMA, POS, FEAT, HEAD and DEPREL are the same as in the CoNLL-2006 and CoNLL-2007 Shared Tasks.

FEAT is a set of morphological features (separated by |) defined for a particular language, e.g. more detailed part of speech, number, gender, case, tense, aspect, degree of comparison, etc.

The P-columns (PLEMMA, PPOS, PFEAT, PHEAD and PDEPREL) are the autoamtically predicted variants of the gold-standard LEMMA, POS, FEAT, HEAD and DEPREL columns. They are produced by independently (or cross-)trained taggers and parsers.

PRED is the same as in the 2008 English data. APREDs correspond to 2008's ARGs. FILLPRED contains Y for lines where PRED is/should be filled.

Deviations / differences from the common format

All the languages come in the same data format. However, due to inherent differences, language-specific descriptions of the content of the data columns, including tagset and label descriptions, are attached to each dataset. They will be available as soon as the data is ready for the participants.

Common Values

UNDERSCORE ('_') is used for "unknown", "unannotated", "unfilled" etc. value (simply for all those cells in the large data table that do not display a defined label from the label sets described for each language in the documentation). The same character is also used in all cells of those columns which are completely unfilled because (e.g.) data for the particular language is not available.

Examples

This is an example sentence from the Czech Data; for English, see below.

The "P-" columns are filled with values from the corresponding gold-standard columns. The trial data (to be released soon) will have tagger and parser outputs there.

ID FORM LEMMA PLEMMA POS PPOS FEAT PFEAT HEAD PHEAD DEPREL PDEPREL FILLPRED PRED APRED1 APRED2 APRED3 APRED4 APRED5 APRED6 APRED7 APRED8 APRED9 APRED10 APRED11 APRED12 APRED13 APRED14 APRED15 APRED16
1 Z z z R R SubPOS=R|Cas=2 SubPOS=R|Cas=2 10 10 AuxP AuxP _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
2 této tento tento P P SubPOS=D|Gen=F|Num=S|Cas=2 SubPOS=D|Gen=F|Num=S|Cas=2 3 3 Atr Atr Y tento _ RSTR _ _ _ _ _ _ _ _ _ _ _ _ _ _
3 knihy kniha kniha N N SubPOS=N|Gen=F|Num=S|Cas=2|Neg=A SubPOS=N|Gen=F|Num=S|Cas=2|Neg=A 1 1 Adv Adv Y kniha _ _ _ _ _ _ _ DIR1 _ _ _ _ _ _ _ _
4 - - - Z Z SubPOS=: SubPOS=: 8 8 AuxG AuxG _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
5 po po po R R SubPOS=R|Cas=6 SubPOS=R|Cas=6 8 8 AuxP AuxP _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
6 mnoha mnoho mnoho C C SubPOS=a|Cas=6 SubPOS=a|Cas=6 7 7 Atr Atr Y mnoho _ _ _ RSTR _ _ _ _ _ _ _ _ _ _ _ _
7 stránkách stránka stránka N N SubPOS=N|Gen=F|Num=P|Cas=6|Neg=A SubPOS=N|Gen=F|Num=P|Cas=6|Neg=A 5 5 Adv Adv Y stránka _ _ _ _ REG _ _ _ _ _ _ _ _ _ _ _
8 pobuřující pobuřující pobuřující A A SubPOS=G|Gen=F|Num=S|Cas=2|Neg=A SubPOS=G|Gen=F|Num=S|Cas=2|Neg=A 3 3 Atr Atr Y pobuřující _ RSTR _ _ _ _ _ _ _ _ _ _ _ _ _ _
9 - - - Z Z SubPOS=: SubPOS=: 8 8 AuxG AuxG _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
10 lze lze lze V V SubPOS=B|Num=S|Per=3|Ten=P|Neg=A|Voi=A SubPOS=B|Num=S|Per=3|Ten=P|Neg=A|Voi=A 0 0 Pred Pred Y v-w1757f1 _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
11 totiž totiž totiž D D SubPOS=b SubPOS=b 10 10 AuxY AuxY _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
12 Linda Lind Lind N N SubPOS=N|Gen=M|Num=S|Cas=4|Neg=A|Sem=S SubPOS=N|Gen=M|Num=S|Cas=4|Neg=A|Sem=S 13 13 Obj Obj Y Lind _ _ _ _ _ _ _ PAT _ _ _ _ _ _ _ _
13 poznat poznat poznat V V SubPOS=f|Neg=A SubPOS=f|Neg=A 10 10 Sb Sb Y v-w4210f3 _ _ _ _ _ ACT _ _ _ _ _ _ _ _ _ _
14 jako jako jako J J SubPOS=, SubPOS=, 15 15 AuxY AuxY _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
15 mistra mistr mistr N N SubPOS=N|Gen=M|Num=S|Cas=4|Neg=A SubPOS=N|Gen=M|Num=S|Cas=4|Neg=A 12 12 Atv Atv Y mistr _ _ _ _ _ _ COMPL2 COMPL _ _ _ _ _ _ _ _
16 protikladu protiklad protiklad N N SubPOS=N|Gen=I|Num=S|Cas=2|Neg=A SubPOS=N|Gen=I|Num=S|Cas=2|Neg=A 15 15 Atr Atr Y protiklad _ _ _ _ _ _ _ _ APP _ _ _ _ _ _ _
17 , , , Z Z SubPOS=: SubPOS=: 20 20 AuxX AuxX _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
18 který který který P P SubPOS=4|Gen=Y|Num=S|Cas=1 SubPOS=4|Gen=Y|Num=S|Cas=1 20 20 Sb Sb Y který _ _ _ _ _ _ _ _ _ _ _ ACT _ _ _ _
19 prý prý prý T T SubPOS=T SubPOS=T 20 20 AuxY AuxY _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
20 nešetří šetřit šetřit V V SubPOS=B|Num=S|Per=3|Ten=P|Neg=N|Voi=A SubPOS=B|Num=S|Per=3|Ten=P|Neg=N|Voi=A 15 15 Atr Atr Y v-w6711f1 _ _ _ _ _ _ _ _ RSTR _ _ _ _ _ _ _
21 sebe se se P P SubPOS=6|Num=X|Cas=4 SubPOS=6|Num=X|Cas=4 26 26 Obj_M Obj_M Y se _ _ _ _ _ _ _ _ _ _ _ PAT _ _ _ _
22 , , , Z Z SubPOS=: SubPOS=: 26 26 AuxX AuxX _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
23 čtenáře čtenář čtenář N N SubPOS=N|Gen=M|Num=P|Cas=4|Neg=A SubPOS=N|Gen=M|Num=P|Cas=4|Neg=A 26 26 Obj_M Obj_M Y čtenář _ _ _ _ _ _ _ _ _ _ _ PAT _ _ _ _
24 , , , Z Z SubPOS=: SubPOS=: 26 26 AuxX AuxX _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
25 Židy Žid Žid N N SubPOS=N|Gen=M|Num=P|Cas=4|Neg=A|Sem=E SubPOS=N|Gen=M|Num=P|Cas=4|Neg=A|Sem=E 26 26 Obj_M Obj_M Y Žid _ _ _ _ _ _ _ _ _ _ _ PAT _ _ _ _
26 ani ani ani J J SubPOS=^ SubPOS=^ 20 20 Coord Coord _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
27 Němce Němec Němec N N SubPOS=N|Gen=M|Num=P|Cas=4|Neg=A|Sem=E SubPOS=N|Gen=M|Num=P|Cas=4|Neg=A|Sem=E 26 26 Obj_M Obj_M Y Němec _ _ _ _ _ _ _ _ _ _ _ PAT _ _ _ _
28 . . . Z Z SubPOS=: SubPOS=: 0 0 AuxK AuxK _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

The following English Example has been adapted from the CONLL 2008 task description (http://www.yr-bcn.es/conll2008/) to fit the CONLL 2009 format specification. Like the 2008 version, our tokenization differs from the Penn Treebank in its treatment of hyphens and slashes, e.g., we break up "New York-based" into "New", "York", "-", "based". Unlike the 2008 task, we do not include Penn Treebank's alternate tokenization in our version of the data.

ID FORM LEMMA PLEMMA POS PPOS FEAT PFEAT HEAD PHEAD DEPREL PDEPREL FILLPRED PRED APRED1 APRED2 APRED3 APRED4 APRED5 APRED6
1 W. w. w. NNP NNP _ _ 3 3 NAME NAME _ _ _ _ _ _ _ _
2 Ed ed ed NNP NNP _ _ 3 3 NAME NAME _ _ _ _ _ _ _ _
3 Tyler tyler tyler NNP NNP _ _ 18 18 SBJ SBJ _ _ _ _ A1 _ _ _
4 , , , , , _ _ 3 3 P P _ _ _ _ _ _ _ _
5 37 37 37 CD CD _ _ 6 6 NMOD NMOD _ _ _ _ _ _ _ _
6 years year year NNS NNS _ _ 7 7 AMOD AMOD _ _ _ _ _ _ _ _
7 old old old JJ JJ _ _ 3 3 NMOD NMOD _ _ _ _ _ _ _ _
8 , , , , , _ _ 3 3 P P _ _ _ _ _ _ _ _
9 a a a DT DT _ _ 12 12 NMOD NMOD _ _ _ _ _ _ _ _
10 senior senior senior JJ JJ _ _ 12 12 NMOD NMOD _ _ A3 _ _ _ _ _
11 vice vice vice NN NN _ _ 12 12 NMOD NMOD _ _ A3 _ _ _ _ _
12 president president president NN NN _ _ 3 3 APPO APPO Y president.01 A0 _ _ _ _ _
13 at at at IN IN _ _ 12 12 LOC LOC _ _ A2 _ _ _ _ _
14 this this this DT DT _ _ 16 16 NMOD NMOD _ _ _ _ _ _ _ _
15 printing print print VBG VBG _ _ 16 16 NMOD NMOD _ _ _ A1 _ _ _ _
16 concern concern concern NN NN _ _ 13 13 PMOD PMOD Y concern.02 _ A0 _ _ _ _
17 , , , , , _ _ 3 3 P P _ _ _ _ _ _ _ _
18 was be be VBD VBD _ _ 0 0 ROOT ROOT _ _ _ _ _ _ _ _
19 elected elect elect VBN VBN _ _ 18 18 VC VC Y elect.01 _ _ _ _ _ _
20 president president president NN NN _ _ 19 19 OPRD OPRD Y president.01 _ _ A2 A0 _ _
21 of of of IN IN _ _ 20 20 NMOD NMOD _ _ _ _ _ A2 _ _
22 its its its PRP$ PRP$ _ _ 24 24 NMOD NMOD _ _ _ _ _ _ A2 _
23 technology technology technology NN NN _ _ 24 24 NMOD NMOD _ _ _ _ _ _ A1 _
24 group group group NN NN _ _ 21 21 PMOD PMOD Y group.01 _ _ _ _ _ _
25 , , , , , _ _ 20 20 P P _ _ _ _ _ _ _ _
26 a a a DT DT _ _ 28 28 NMOD NMOD _ _ _ _ _ _ _ _
27 new new new JJ JJ _ _ 28 28 NMOD NMOD _ _ _ _ _ _ _ AM-TMP
28 position position position NN NN _ _ 20 20 APPO APPO Y position.02 _ _ _ _ _ A1-REF
29 . . . . . _ _ 18 18 P P _ _ _ _ _ _ _ _