EnglishČesky
Header Image n.1Header Image n.2Header Image n.3Header Image n.4Header Image n.5

Content

Introduction

Node types

Types of edges

Node structure

Functors

Formemes

Grammatemes

Valency

Additional specifications

In this text we present the main principles of the tectogrammatical representation applied to English and use English examples, but features that are not language-specific to English apply to the Czech tectogrammatical representation as well.

The valency (argument structure) in PCEDT 2.0 draws on the valency theory of the Functional Generative Description (FGD). The theory originally focused on verbs. The major part of this section is therefore devoted to verb valency, although most applies to other parts of speech as well. Noun- and adjective-specific valency features are described separately.

Obligatory vs. non-obligatory complementations

Four types of complementations are distinguished: obligatory participants, optional participants, obligatory modifiers and free modifiers. This implies that the line to be drawn between a participant (i.e. argument) and a modifier (i.e. adjunct) is not identical with obligatoriness. The distinguishing criterion is the so-called dialog test. The dialogue test helps us determine which complementations of a given verb are obligatory and which are optional. The dialogue test is based on the difference between questions asking about something that is supposed to be known to the speaker - because it follows from the meaning of the verb he/she has used, and questions about something that does not necessarilly follow from the meaning of the used verb. Answering a question about a semantically obligatory complementation of a particular verb, the speaker - who has used the verb - cannot sensibly say: I don't know. Compare the following dialogues:

  • A: When he saw it, he bought it.
  • B: Who?
  • A: *I don't know.

Who is the buyer must be familiar to the speaker, otherwise his utterance would not make sense. The same goes for the thing bought:

  • A: When he saw it, he bought it.
  • B: What did he buy?
  • A: *I don't know.

On the other hand, it does not sound odd when the speaker does not know e.g. for whom or from whom the buyer bought the object:

  • A: When he saw it, he bought it.
  • B: From/for whom did he buy it?
  • A: I don't know.

Obligatory complementations are complementations that must be known to the speaker in a sensible dialog.

Participants vs. modifiers

To make a distinction between inner participants and modifiers, the following criteria are used:

  • Can the given type of complementation modify a particular verb occurence more than once, or at most once?
  • Can the given type of complementation modify any verb, or is there a (more or less) closed class of verbs that can be modified by it?

If the complementation can modify a particular verb in a particular meaning only once (coordination takes only one valency slot), it is an inner participant. If the verb in that meaning can take more complementations of the same type, the complementation is a modifier.

If the complementation of a particular type can be attached to (almost) any verb, it is a modifier. E.g. spatial and temporal adverbials are typically modifiers, since virtually all events can be described in terms of time and space.

Participant types: ACT, PAT, ADDR, ORIG and EFF

All participants and all relevant modifiers are listed in the valency frame of the given verb sense in the valency lexicon. The valency frames in the Czech valency lexicon, PDT-Vallex, list only participants (obligatory as well as optional) and obligatory modifiers. The English valency lexicon (Engvallex) captures even typical free modifiers. The reason is that the first version of Engvallex arose as a semi-automatic transformation of one of the early versions of PropBank. Even though some arguments listed by PropBank would not have qualified for participants and obligatory modifiers according to FGD, we did not want to throw away the information obtained from PropBank. All optional complementations are marked with a question mark in the lexicon.

There are five types of verb participants. They are marked with the functors ACT (Actor), PAT (Patient), ADDR (Addressee), ORIG (Origin) and EFF (Effect). When determining the type of the participant in question, two kinds of criteria are used: syntactic (when only the Actor (ACT) and Patient (PAT) are involved) and semantic (when more than two participants are involved). When the verb opens just one slot for a participant, the participant is always called ACT, regardless the semantics. E.g. in the causative alternation of boil, the causer has the functor ACT, whereas the substance boiled has the functor PAT. At the same time, in the inchoative alternation, the substance boiled has the functor ACT, no matter that, semantically, it is not an agent, because the first valency slot opened for an inner participant is always ACT. Unlike PropBank, the PDT-style valency lexicons make absolutely no links between arguments with the same thematic roles across different alternations (which usually means across different valency frames). When a verb opens two slots for inner participants, the logical subject of the predication gets the functor ACT and the other participant becomes PAT. PAT are typically direct objects of active clauses, but also (systematically) verb complements of copula verbs. E.g. in Peter is my friend, Peter (as the individual to be classified) is ACT and friend (the category to which Peter is said to belong) is PAT.

When a verb opens three slots and more, the functor labels seek to reflect the thematic roles of the participants more accurately. Two participants are always ACT and PAT. The third (fourth, fifth) participant acquires a label from the range ORIG, ADDR and EFF. For instance, the verb sell in one of its senses has the obligatory participants ACT (the seller), PAT (the thing sold) and the optional arguments ADDR (the buyer) and EFF (the price for which it was sold). Secondary predications in three-argument verbs (They elected him president) always get the functor EFF. The assignment of these coarse-grained semantic labels is very intuitive in the most literal verb senses, especially when certain positions are typically occupied by humans and others by non-volitional inanimate entities. Cf.:

  • They.ACT gave him.ADDR a book.PAT
  • They.ACT are bundling their services.PAT into packages.EFF and target them to small segments of the population.

There are, though, verbs and verb senses, in which none of the semantically loaded functors seems to be the optimal match. The decision has then been made arbitrarily by the lexicographer in the valency lexicon and the data have been annotated accordingly. There are a few globally applied conventions; e.g. three-slot reciprocal verbs (someone blends something with something) have usually the combination of functors ACT, PAT, ADDR. When the assignment of labels was not straightforward, individual judgments may differ as to whether ot not two verbs in certain reading should be described by the same set of functors. This appears typically when the lexicographer had to decide whether to formulate the valency frame as ACT, ADDR, PAT or ACT, PAT, EFF. Word order is irrelevant for formulating the valency frames. Cf. the following sentences (extracted as example sentences from Engvallex):

  • They.ACT encase the concrete columns.PAT with steel.EFF
  • John.ACT surrounded the castle.ADDR with his toy soldiers.PAT.

Valency of nouns and adjectives

There are three participant functors specific to nouns: APP, MAT and AUTH. The functor APP ("appurtenance") can be described as a "have" relation in the broadest sense: John's pen, my mother, New York's trendiest club, America's mountains, etc. On the other hand, when a noun is identified as an event noun, its possessive determiners and modifiers get the same functors as events expressed by verbs. E.g.: my.ACT meeting with the other team.PAT.

The functor MAT is assigned to nouns modifying a noun that thas the meaning of a container, e.g.: a sack of potatoes.MAT, millions of people.MAT. When the noun is an artifact and a noun modifier renders its author, the modifier gets the functor AUTH.

If a noun has a very typical argument that cannot be described by APP, MAT or another semantically expressive functor (e.g. a letter from China.DIR1, a letter from my brother.ORIG, yesterday's.TWHEN meeting, annual.THO event), it usually gets the functor PAT: the portrait of the president.PAT. Adjectival modifiers are never arguments and get most typically the functor RSTR (restrictive attribute) or DESCR (descriptive attribute). Noun arguments are never regarded as obligatory.

The annotators tried to interpret complex noun phrases with semantically expressive functors as much as they could. This annotation is, of course, very inconsistent. There is nothing like a NomBank counterpart in the English part of PCEDT 2.0. So far, English nouns and their valency are not captured in the valency lexicon.

A lot of adjectives can be modified by infinitives, e.g. eager to please. The infinitive gets the functor PAT.

Valency lexicons

Engvallex valency frame

Figure1: Engvallex valency frame

The Czech valency lexicon PDT-Vallex has been comprehensively described in the Czech documentation. The English lexicon Engvallex has a very similar structure, but, in addition, it contains mapping to the current version of PropBank (the OntoNotes 4.0 release). The Engvallex entry of the verb leap in Figure1 contains three valency frames. The line below the frame presents the frame-to-frameset mappping. Originally, the mapping was maintained manually. This manual mapping got lost with the latest substantial revision of PropBank. The current mapping was derived from the annotated data. The most recent PropBank has merged all syntactico-semantic alternations of a verb sense into one common frameset, while Engvallex does not reflect the alternations by any means and gives each occurring alternation a separate frame. Therefore the most typical situation is that several Engvallex frames map onto one and the same PropBank frameset. In such a case the mapping line contains the ID of the given PropBank frameset. The mapping line is missing in some Engvallex frames. This happens when this frame was used only for sentences that have not yet been annotated with PropBank. A mapping of one Engvallex frame onto several PropBank framesets is also possible. When sentences annotated with one particular Engvallex frame were annotated with two or more different framesets, the mapping line displays them in descending frequency order and indicates the number of annotated sentences with each.

Engvallex-to-PropBank mapping

Before the OntoNotes 4.0 PropBank release, valency slots in Engvallex were mapped onto the corresponding PropBank arguments. Nevertheless, the radical revision of PropBank made most links invalid. We have restored the links, drawing on both the annotation of PropBank and PEDT 2.0. Not surprisingly, the mapping between frames and framesets, and especially the mapping of the respective arguments on each other are in many verbs more complicated than 1:1. We store the mapping information in three files (see section Documentation). The file eng_pb_links.txt contains only the most frequent frame-to-frameset mapping. The file eng_pb_links_for_all_rolesets.txt contains the information in all detail, but without pointers to the data. The file eng_pb_links_with_ids.txt contains the complete mapping information along with the infomation on where in the corpus the given mapping occurs (p-node id).

We have integrated the less comprehensive file (eng_pb_links.txt) into the Engvallex browser to facilitate its viewing. The file eng_pb_links_for_all_rolesets.txt was not integrated into the browser, since it is very complex and would make the view unhelpfully cluttered.

Engvallex to PropBank mapping in the Engvallex editor

Figure2: Engvallex to PropBank mapping in the Engvallex editor

The mapping of the respective complementations is indicated only when an Engvallex participant or an obligatory modifier maps onto a PropBank argument (an ARG- element) listed in the roleset. In Figure2, the verb swim is divided into three frames. Two of them are mapped onto PropBank framesets. The absence of the mapping in the first frame has two possible reasons: either this Engvallex frame has not been assigned to any occurrence of the verb in the entire corpus, or the sentence in which this particular frame was assigned has not yet been annotated in PropBank (unlike the PCEDT annotation, the PropBank annotation does not cover the entire PennTreebank). When the mapping information is present, the frame-to-frameset mapping is located on a separate line below the list of frame-constituting valency slots; e.g. [swim-v.xml::swim.01::1]. This description means that this particular frame maps on the swim.01 frameset and that this happens once in the corpus. The third Engvallex frame maps onto the swim.01 frameset in two cases in the corpus. When there is a mapping between an Engvallex valency slot and a PropBank argument, it is listed in a square bracket following its name. The last digit is, again, the frequency of this particular mapping. Hence, the Actor of swim in the second frame maps on the Arg0 of swim.01 once.

The information visualized in the editor and contained in the file eng_pb_links.txt is incomplete. It always contains only the most frequent frame-to-frameset mapping. When two or more frame-to-frameset mappings are equally frequent, only one is displayed. The file does not provide any information on how many occurrences of a verb were assigned a given frame, so it is impossible to see whether the most frequent mapping covers the majority of cases or whether the mapping is one-to-many with an even distribution across several framesets. The complete mapping information is in the file eng_pb_links_for_all_rolesets.txt.

The file eng_pb_links_for_all_rolesets.txt lists all mappings that occurred in the corpus for a given Engvallex frame. For instance, the frame ev-w3310f22 of the verb take (see below), consisting of the slots ACT, CPHR and PAT (a light-verb frame), maps on three PropBank framesets: take.01 (82 cases), take.02 (2 cases) and take.12 (1 case). When it maps onto take.01, ACT maps onto ARG-0 in take.01 56 times and onto ARG-1 twice. CPHR (the predicate noun) maps onto ARG-1 79 times and onto ARG-2 twice. PAT maps onto ARG-2 52 times and onto ARG-1 21 times. In addition, we get to know about mappings on PropBank elements which are not listed in the roleset that defines the given frameset. These mappings stand in curly brackets. PAT, mapping on elements in take.01, maps three times onto ARG-M-LOC, twice on ARG-M-DIR, once on ARG-M-ADV and once on ARG-M-MNR.

Engvallex frame ev-w3310f22 (take) ev-w3310f22 ACT CPHR PAT [propbank/e-v.xml::take.01::82] ACT [take.01::0::::56, take.01::1::::2]{} CPHR [take.01::1::::79, take.01::2::::2]{} PAT [take.01::2::::52, take.01::1::::21]{take.01::m::LOC::3, take.01::m::DIR::2, take.01::m::ADV::1, take.01::m::MNR::1} [propbank/e-v.xml::take.02::2] ACT [take.02::0::::1, take.02::1::::1]{} CPHR [take.02::1::::2]{} PAT [take.02::0::::1]{take.02::m::MNR::2} [propbank/e-v.xml::take.12::1] ACT [take.12::0::::1]{} CPHR [take.12::1::::1]{} PAT [take.12::1::::1]{}

In the file eng_pb_links_with_ids.txt, we have added the node IDs for each mapping. The Engvallex frame ev-w3310f22 (take) looks like this:

ev-w3310f22 take.01 - ACT:ARG0 ACT:ARG0 CPHR PAT PAT (EnglishP-wsj_0109-s17-t18) take.01 - ACT:ARG0 ACT:ARG0 CPHR:ARG1 PAT:ARG1 (EnglishP-wsj_1928-s1-t13) take.01 - ACT:ARG0 ACT:ARG0 CPHR:ARG1 PAT:ARG2 (EnglishP-wsj_0184-s9-t20, EnglishP-wsj_1012-s7-t14, EnglishP-wsj_1797-s20-t13) take.01 - ACT:ARG0 CPHR:ARG1 (EnglishP-wsj_2418-s33-t7) take.01 - ACT:ARG0 CPHR:ARG1 PAT (EnglishP-wsj_0174-s22-t16) take.01 - ACT:ARG0 CPHR:ARG1 PAT:ARG1 (EnglishP-wsj_0207-s2-t6, EnglishP-wsj_0304-s10-t7, EnglishP-wsj_0452-s3-t26, EnglishP-wsj_0559-s13-t15, EnglishP-wsj_0666-s35-t2, EnglishP-wsj_0790-s2-t14, EnglishP-wsj_1205-s3-t4, EnglishP-wsj_1424-s39-t4, EnglishP-wsj_2040-s42-t5, EnglishP-wsj_2167-s12-t34, EnglishP-wsj_2212-s1-t9, EnglishP-wsj_2300-s83-t4, EnglishP-wsj_2443-s13-t7) take.01 - ACT:ARG0 CPHR:ARG1 PAT:ARG2 (EnglishP-wsj_0077-s1-t2, EnglishP-wsj_0090-s1-t11, EnglishP-wsj_0093-s4-t7, EnglishP-wsj_0121-s38-t15, EnglishP-wsj_0121-s46-t3, EnglishP-wsj_0590-s18-t11, EnglishP-wsj_0664-s32-t7, EnglishP-wsj_0764-s40-t2, EnglishP-wsj_1146-s91-t6, EnglishP-wsj_1250-s33-t6, EnglishP-wsj_1253-s18-t4, EnglishP-wsj_1270-s15-t13, EnglishP-wsj_1320-s64-t19, EnglishP-wsj_1419-s14-t14, EnglishP-wsj_1529-s16-t12, EnglishP-wsj_1569-s45-t2, EnglishP-wsj_1619-s21-t5, EnglishP-wsj_1831-s26-t6, EnglishP-wsj_2109-s27-t14, EnglishP-wsj_2151-s36-t25, EnglishP-wsj_2156-s25-t19, EnglishP-wsj_2387-s40-t4, EnglishP-wsj_2397-s61-t10, EnglishP-wsj_2404-s4-t16, EnglishP-wsj_2444-s4-t18) take.01 - ACT:ARG0 CPHR:ARG1 PAT:ARG2 PAT:ARG2 (EnglishP-wsj_0118-s125-t7, EnglishP-wsj_1411-s16-t22) take.01 - ACT:ARG0 CPHR:ARG1 PAT:ARGM-ADV (EnglishP-wsj_1569-s58-t4) take.01 - ACT:ARG0 CPHR:ARG1 PAT:ARGM-DIR PAT:ARGM-DIR (EnglishP-wsj_1504-s36-t11) take.01 - ACT:ARG0 CPHR:ARG1 PAT:ARGM-LOC (EnglishP-wsj_1792-s13-t16) take.01 - ACT:ARG0 CPHR:ARG1 PAT:ARGM-MNR (EnglishP-wsj_0293-s39-t3) take.01 - ACT:ARG1 CPHR:ARG2 PAT:ARGM-LOC (EnglishP-wsj_0931-s11-t33, EnglishP-wsj_1368-s6-t13) take.01 - CPHR:ARG1 (EnglishP-wsj_0295-s50-t7, EnglishP-wsj_1010-s32-t28, EnglishP-wsj_1916-s27-t14) take.01 - CPHR:ARG1 PAT (EnglishP-wsj_1569-s16-t8, EnglishP-wsj_2367-s21-t8) take.01 - CPHR:ARG1 PAT:ARG1 (EnglishP-wsj_1034-s5-t13, EnglishP-wsj_1294-s24-t11, EnglishP-wsj_1525-s42-t12, EnglishP-wsj_1766-s25-t21, EnglishP-wsj_2231-s10-t10, EnglishP-wsj_2384-s31-t30, EnglishP-wsj_2412-s78-t22) take.01 - CPHR:ARG1 PAT:ARG2 (EnglishP-wsj_0118-s47-t12, EnglishP-wsj_0286-s86-t16, EnglishP-wsj_0560-s18-t35, EnglishP-wsj_0604-s24-t19, EnglishP-wsj_1162-s50-t10, EnglishP-wsj_1213-s54-t11, EnglishP-wsj_1566-s37-t30, EnglishP-wsj_1600-s31-t34, EnglishP-wsj_1705-s6-t28, EnglishP-wsj_1766-s21-t24, EnglishP-wsj_1935-s13-t9, EnglishP-wsj_2128-s6-t21, EnglishP-wsj_2265-s76-t18, EnglishP-wsj_2386-s15-t6, EnglishP-wsj_2415-s30-t6) take.01 - CPHR:ARG1 PAT:ARG2 PAT:ARG2 (EnglishP-wsj_1022-s45-t3) take.01 - CPHR:ARG1 PAT:ARG2 PAT:ARG2 PAT:ARG2 (EnglishP-wsj_1984-s23-t36) take.02 - ACT:ARG0 CPHR:ARG1 PAT:ARGM-MNR PAT:ARGM-MNR (EnglishP-wsj_0153-s17-t4) take.02 - ACT:ARG1 CPHR:ARG1 PAT:ARG0 (EnglishP-wsj_0327-s28-t5) take.12 - ACT:ARG0 CPHR:ARG1 PAT:ARG1 (EnglishP-wsj_1291-s14-t9)