In this task, the annotators were each given a PDEV entry concerning the given verb lemma and a BNC concordance with this lemma being the keyword in context (KWIC). For each PDEV pattern, the annotator answered the question “How well does this pattern illustrate the use of the target verb in this context?” using a 7-point Likert scale. The Likert scale had the following anchors:
1 = Irrelevant
2 = Somewhat relevant but poor match
3 = Literal match but there is a meaning shift (in idioms)/there is a substantial meaning shift (other cases)
4 = Certainly a match, but quite many things wrong
5 = Partial match - different domain/granularity/atypical arguments
6 = Good match – just extend pattern definition
7 = Exact match
There was a consensus before the annotation started that, whenever it was possible to say whether the reason for a mismatch was rather syntactic or semantic, the semantic mismatches should be considered graver than the syntactic ones.
The annotators should also judge their own comprehension of the concordance (1 = understood, 0 = comprehension issues).
For each verb, there were 50 concordances to be judged against each pattern in the corresponding PDEV entry (that is, the number of graded decisions varies per verb according to the number of patterns it has in PDEV). Each verb was treated in a separate online survey form, where each concordance was dedicated a separate page and an additional optional page with exactly the same questions. The alternative page was there to capture an alternative reading of the concordance in case the concordance had been distinctly ambiguous. The alternative reading was strictly dedicated to evident ambiguity cases with no other comprehension issues. There are only a few cases in the data. Without any alternative readings, the data set contains 11,400 graded decisions.