CoNLL-2009 ST Evaluation Data Download

CoNLL-2009 Shared Task Evaluation Data Download and System Output Upload

This page contains instructions for downloading the evaluation data for the CoNLL 2009 Shared Task, and for uploading the system results.

Due to the relatively complicated licensing issues you are already familiar with, the data is split into two packages; one for Ch, Cz, En will be made available to you from the LDC, the remaining four languages' data are available for download from this website (see below).

For the LDC-distributed data (Ch, Cz, En), you will be emailed instructions for data download automatically within hours AFTER YOU DOWNLOAD the other data; in other words, you have to download the other four datasets FIRST (by filling the download forms, an email will be sent automatically to LDC to mail you the required instructions). For the remaining four datasets, you will need your User ID and password that has been emailed to you when you have signed the training and development data license (please note that the license agreement can no longer be signed).

Before proceeding to download the (Ca/Ge/Ja/Sp) data, please read the following paragraphs very carefully.

Procedure

The overall outline of the procedure is as follows: you download the evaluation data now, and submit the results before March 20, 2009, using the upload interface on the CoNLL webpages (see the pointer(s) below). Please make sure you SUBMIT THE ACTUAL OUTPUT of your system(s), with the appropriate columns filled in by your system(s) and the other columns intact. We will then evaluate them and make the scores avilable to you. Only these scores can then be published as the "official" evaluation scores.

The task is divided along three dimensions (cf. the Task Description Page):

"Open" vs. "Closed" challenge
Joint vs. SRL-only task
In-domain vs. out-of-domain data (for Cz, En, Ge only)

As per the rules, you can only submit "Joint" task or the "SRL-only" task results, but not both. However, you can submit either of or both the open and closed challenge outputs. We have also prepared, as a little surprise, a small out-of-domain evaluation data for three of the languages; you may also submit these results, but it is optional.

As the rules have also stated, you have to submit results for all the seven languages.

Getting the Evaluation Data

The Evaluation Data Contents

The evaluation data have exactly the same FORMAT as the trial, training and development data, but they DO NOT contain the correct results (so there is no point using the scorer by you on those...). Thus, given the nature of the two tasks, there are two different sets of evaluation data - one for the joint task and one for the SRL-only task.

In the joint task, the evaluation data have empty values in the HEAD, PHEAD, DEPREL, PDEPREL, PRED, and APREDs columns. These (except PHEAD and PDEPREL) are also the columns where we expect you to fill the results generated by your joint task system(s) (if you subscribed to the joint taks).

In the SRL-only task, the evaluation data have only the HEAD, DEPREL, PRED and APREDs columns eliminated (so they do contain the PHEAD and PDEPREL columns filled in by our state-of-the-art parsers). Thus, if you do the SRL-only task, you are supposed to fill in only the PRED and APREDs.

Please recall also that for both tasks (for the SRL part), you are only filling in the PRED in rows marked by 'Y' in the FILLPRED column.

Downloading the data

We offer you the freedom to change your mind :-) and regardless of the registration information you have provided previously, to select the task now (you will be offered the choice as the first thing on the evaluation data download page, see the link below). However, once you confirm your selection, there will be NO WAY TO CHANGE IT again; please be very careful when making this selection now.

You will also be asked to make the "challenge" selection: you may choose open or close (or both). This selection can be changed later, anytime before you upload your results in the coming week.

There is no need to opt formally for the out-of-domain data processing; you will get these data together with the "compulsory" sets, and it is up to you to submit their results (or not) - obviously, we strongly encourage you to do so for more reliable comparison of the ood-systems.

Finally, here is the evaluation data selection and download link.

Your System Output Upload

File-naming Conventions for Output Data Upload

When submitting (uploading) your results, please pack all the results using "zip" (Zip 2.31, 3/8/2005 or newer on Unix/Linux; WinZip or similar on Windows) to a single file (the name of which does not matter). The files within the zip file, however, MUST BE NAMED according to the following scheme:

CoNLL2009-ST-evaluation-<language>-<task>-<challenge>[-ood].txt

where

<language>

Catalan
Chinese
Czech
English
German
Japanese
Spanish

<task>

Joint
SRLonly

<challenge>

open
closed

Please note also that the "-ood" suffix must be used for the out-of-domain data (if submitted). During upload, your zip file will be renamed to contain only your unique ID, and its contents will be checked against the above mentioned rules for file naming. Submissions with wrong filenames will be refused.

The coding of your results must be in UTF-8 (i.e., the same as the coding of the evaluation data as distributed to you).

Example: for the file containing the results of the joint task for German, closed challenge on the out-of-domain data use CoNLL2009-ST-evaluation-German-Joint-closed-ood.txt as the filename.

System Output Upload Procedure

There is an interface for data upload (see the link below). You have to submit, in a single .zip file as described above, your system(s)' output for ALL SEVEN LANGUAGES (you do NOT send the Ch/Cz/En data back to LDC, even though you did get them there...:-).

After zip-ping the data as described above, go to system output upload page. Please note that this is the same script as for the evaluation data download, but it remembers that you have already downloaded your data and asks for your results instead of offering the download again. Unless your browser has been kept open since the evaluation data download, you will have to provide your ID and password again when accessing this page.

You may upload your system(s)' output as many times as you wish before the deadline; the previously uploaded output will always be overwritten, even is you upload your results under a different zip filename. Thus, only the youngest upload will be officially evaluated and their scores posted.

The deadline is March 20, midnight HST (that's Hawaiian Standard Time; don't worry about DST, there is none on Hawaii). That is, e.g., 6am (March 21) Eastern DST, or 10am (March 21) GMT. At that time, the upload script will be disabled.

The results (scores) will be available on a system results page (coming soon, but not sooner than the close of the evaluation period). The results will be anonymized so that you will be able to only identify yours, but will see the other scores as well.

(After-evaluation) Full Evaluation (Gold) Data

We are now providing the full evaluation data for follow-up experiments to all participants who uploaded system results. Please download the following two files (please have your evaluation data userid and password ready; SRL-only task participants need only to download the first one):

CoNLL2009-ST-Gold-Both_tasks.zip
CoNLL2009-ST-Gold-Joint.zip

(for a quick look, here is the README file.)

move them to a clean directory, unpack and read the README file provided (the data is provided only as an addition to the original "competition" evaluation data, to avoid licensing issues).

CoNLL-2009 Shared Task: Syntactic and Semantic Dependencies in Multiple Languages

CoNLL-2009 Shared Task:Syntactic and Semantic Dependencies in Multiple Languages