Version 1.0, November 19, 2015
Vít Baisa, Silvie Cinková, Ema Krejčová, Anna Vernerová
VPS-GradeUp is a collection of triple manual annotations of 29 English verbs based on the Pattern Dictionary of English Verbs (PDEV)[1] and comprising the following lemmas: abolish, act, adjust, advance, answer, approve, bid, cancel, conceive, cultivate, cure, distinguish, embrace, execute, hire, last, manage, murder, need, pack, plan, point, praise, prescribe, sail, seal, see, talk, urge . It contains results from two different tasks:
In both tasks, the annotators were matching verb senses defined by the PDEV patterns with 50 actual uses of each verb (using concordances from the BNC [2]). The verbs were randomly selected from a list of completed PDEV lemmas with at least 3 patterns and at least 100 BNC concordances not previously annotated by PDEV’s own annotators. Also, the selection excluded verbs contained in VPS-30-En[3], a data set we developed earlier. This data set was built within the project Reviving Zellig S. Harris: more linguistic information for distributional lexical analysis of English and Czech and in connection with SEMEVAL 2015.
The annotators were all trained linguists familiar with PDEV, but they were not English native speakers.
VPS-GradeUp comes as a single .csv file separated with semicolons and each cell enclosed by double quotes, encoded in UTF 8. It contains 22,800 rows and 43 columns with a header. Click here to download the data, documentation and PDEV entry snapshots from October 2015 (as used for the annotation) via the LINDAT-CLARIN repository.
Each row primarily represents one observation in the Graded-Decision experiment; i.e. one score on a 7-point Likert scale rendering how well a given PDEV pattern (e.g. [Pattern] 3) of a given verb lemma (e.g. abolish) illustrates a given KWIC identified by an index (e.g. 2.1).
Graded decisions are filled in all rows containing .1 at the end of the KWIC index (Column SentID). Most rows with KWICs indexed with .2 do not contain any graded decisions (NA filled in). These rows contain the unused alternative readings.
Each row also contains all WSD (best-fit) annotation related to the given KWIC; i.e. the WSD information repeats for each KWIC as many times as the given verb has PDEV patterns. To explore the WSD results independently of the graded decisions, mind to eliminate duplicate rows.
Column name |
Description |
Example |
JointID |
Unique ID for each row containing lemma, KWIC ID, and pattern number |
abolish:Sent_1.1:Pattern_1 |
PatternID |
NB: only unique in combination with the Lemma column; when working with all lemmas, use JointID! |
1 |
Lemma |
|
abolish |
SentID |
NB: only unique in combination with the Lemma column; when working with all lemmas, use JointID! |
1.1
|
LikAV LikEK LikSC |
Score on the 7-point Likert scale saying how well the given PDEV pattern illustrates the given KWIC according to the annotator identified by their initials. 1 = Irrelevant, 7 = Perfect match. |
7 5 7 |
WSDNumAV WSDNumEK WSDNumSC |
For each annotator separately: ID of the best-fitting pattern in a classical WSD setup, when the annotator is forced to select only one pattern, or claim that the given KWIC is not a verb (value not verb) or that no pattern is really suitable (unclassified). |
2 3 3 |
UnderstandAV UnderstandEK UnderstandSC |
For each annotator, the options are 1 and 0. (1 = the annotator is confident that they understand the KWIC well, 0 indicates comprehension problems) |
1 1 1
|
KWIC |
The annotated BNC KWIC – the largest span allowed by BNC. The key word is capitalized and surrounded by three spaces on both sides. Apostrophes and double quotes are escaped. Horizontal ellipsis is rendered by the corresponding HTML entity … (as copied from the BNC). |
Anna Tomforde and Michael Farr PRESIDENT Franois Mitterrand , the first head of state of the wartime Allies to visit East Germany , said yesterday that the existence of two sovereign German states could not be ` ABOLISHED at a stroke \' . Reflecting French anxiety over German reunification , Mr Mitterrand said the two Germanys were jointly responsible for stability in Europe . ` German unity depends first of all on the German people … |
BNCdocID |
The document code the KWIC was associated with in the BNC |
AAK/1 |
Number of Patterns |
How many patterns (senses) the given verb lemma has in PDEV (Pattern Dictionary of English Verbs) |
3 |
CommentsAV CommentsEK CommentsSC |
Annotators’ comments. Most of them are in English, but some are in Czech. |
NA |
WSDExploitAV_ coercion_agent WSDExploitEK_ coercion_agent WSDExploitSC_ coercion_agent
|
Exploitation markup. Binary values. 1 = the agent of the keyword was coerced into a different PDEV Semantic Type, although it actually corresponds to the Semantic Type listed in the pattern definition. 0 = no markup |
0 (1 would occur e.g. if the pattern definition contained the Semantic Type Liquid for agent and the KWIC said: The second cup poured on the floor. Although, strictly speaking, cup corresponds to Container, Liquid is evidently meant at the same time. |
WSDExploitAV_ coercion_object WSDExploitEK_ coercion_object WSDExploitSC_ coercion_object
|
Cf. coercion agent above. Applies to direct object. Binary (0,1). |
|
WSDExploitAV_ coercion_other WSDExploitEK_ coercion_other WSDExploitSC_ coercion_other
|
Cf. coercion agent above. Typically applies to indirect object and adverbials, but it can apply to any clause element except agent and object. Binary (0,1). |
|
WSDExploitAV_ meaning_shift WSDExploitEK_ meaning_shift WSDExploitSC_ meaning_shift |
Exploitation markup indicating any type of meaning shift between the implicature of the selected pattern (sense) and the KWIC; e.g., metaphor or any rhetorical figure. Binary (0,1). |
|
WSDExploitAV_ unexpected_agent WSDExploitEK_ unexpected_agent WSDExploitEK_ unexpected_agent |
Exploitation markup indicating that the agent of the given KWIC does not conform to the Semantic Type prescribed by PDEV. A more general markup than coercion. Binary (0,1). |
|
WSDExploitAV_ unexpected_object WSDExploitEK_ unexpected_object WSDExploitEK_ unexpected_object |
Cf. unexpected agent and coercion object above, applies to direct object. Binary (0,1). |
|
WSDExploitAV_ unexpected_other WSDExploitEK_ unexpected_other WSDExploitEK_ unexpected_other |
Cf. unexpected agent and coercion other above, applies to indirect object and all other clause elements except agent and direct object. Binary (0,1). |
|
[1] P. Hanks and J. Pustejovsky, “A Pattern Dictionary for Natural Language Processing,” Rev. Francaise Linguist. Appliquée, vol. 10, no. 2, 2005.
[2] “British National Corpus, version 3 (BNC XML edition).” British National Corpus Consortium, 2007.
[3] S. Cinková, M. Holub, A. Rambousek, and L. Smejkalová, “A database of semantic clusters of verb usages,” in Proceedings of the 8th International Conference on Language Resources and Evaluation (LREC 2012), \.Istanbul, Turkey, 2012, pp. 3176–3183.
If you make use of this data set in 2015, please cite this web site. Several papers have been submitted, but we have not received any notification yet. Please return to this web site to obtain a more appropriate reference by March 2016.