5.1. The notation of valency frames

5.1. The notation of valency frames
Prev	5. The notation of valency frames and its semantics	Next

Every valency frame consists of one or more frame members. A member is either a frame element of the frame or an alternation of frame elements, where alternations are denoted by a |-separated list of alternating frame elements. An alternation is used if an element of a frame can be assigned various functors although all the cases form the same valency frame. A frame element is either obligatory, or optional. It constitutes of a functor and a description of possible realizations. The formal description of a frame element realization corresponds to a continuous section of an analytical tree the attributes of which are not filled completely, or more like (if we consider an entirely general case) permissible combinations of values of these attributes are defined by any logical expression in which these values appear. Since it is not necessary to utilize the general case in PDT, the possible realizations can be decomposed in the following ways:

each frame element contains "its" realization independently of other frame elements, and this realization corresponds to a subtree dependant on the root node of of a subtree corresponding to the realization of the whole frame.
to restrict the number of possible combinations of attribute values it is only required to maintain equality of some of these attribute values (or their parts), the so-called incomplete analytical tree, instead of a general logical expression.
the lemma of the realized subtree root node is the same for all realizations of the frame, and it is recorded separately.
in special cases it is possible to describe a realization which involves a governing node (a parent node) of the realized frame root node, and it also involves the realization which contains at least one subtree of this root node, however, no frame member corresponds to such subtrees.

frame := [ root_real_spec ] element_list 
root_real_spec := '(' realizations ')'
element_list := element|element_alternation [ ' ' element_list ]
element_alternation := oblig_elem '|' oblig_elem [ '|' element_alternation ]

The notation of a frame begins with an optional list of permissible realizations of the subtree root node (e.g. some frames with functor DPHR need a governing verb to appear in a negative form) which is recorded in round brackets. It is followed by a space-separated sequence of records of individual members of the frame. A member is either a frame element or an alternations of frame elements. Alternations are recorded as |-separated sequences of records of frame elements it consists of). The record of each frame element contains a functor. Every functor (including functors of non-arguments and alternations) must occur in the record no more than once. Members of a frame are recorded in the following canonical order based on their functors: ACT, CPHR, DPHR, PAT, ADDR, ORIG, EFF, BEN, LOC, DIR1, DIR2, DIR3, TWHEN, TFRWH, TTILL, TOWH, TSIN, TFHL, MANN, MEANS, ACMP, EXT, INTT, MAT, APP, CRIT, REG.

element := oblig_elem | facult_elem
oblig_elem := elem_spec
facult_elem := '?' elem_spec

Frame elements are either optional or obligatory. The record of an optional element is preceded by a question mark.

elem_spec := functor '(' realizations ')'
functor := 'ACT' | 'PAT' | 'ADDR' | 'EFF' | 'ORIG' | 'ACMP'
         | 'AIM' | 'APP' | 'ATT' | 'AUTH' | 'BEN' | 'CAUS' | 'CNCS'
         | ' COMPL' | 'CONTRD' | 'COND' | 'CPHR' | 'CPR' | 'CRIT'
         | 'DES' | 'DIFF' | 'DIR1' | 'DIR2' | 'DIR3' | 'DPHR'
         | 'EXT' | 'HER' | 'INTF' | apos;INTT' | 'LOC' | 'MANN' | 'MAT'
         | 'MEANS' | 'MOD' | 'PAR' | 'PARTL' | 'REG' | ' RESL'
         | 'RESTR' | 'RSTR' | 'SUBS' | 'TFHL' | 'TFRWH' | 'THL'
         | 'THO' | 'TOWH' | 'TPAR' | 'TSIN' | 'TTILL' | 'TWHEN'
         | 'TOWH' | 'VOCAT'

A member of a frame is denoted by its functor followed by a bracket containing a list of permissible realizations.

realizations := real [ ';' realizations ]

The denotations of individual permissible realizations are separated by a semi-colon.

real := '*' | '!' | '=' | node_specs
node_specs := [ '^' ] node_spec_list [ '&' ] [ node_spec_list ]
node_spec_list := node_spec [ ',' node_spec_list ]
node_spec := ( lemma_spec [ sep ] [ morph ] | sep morph ) [ dependants ]
sep := '.' | ':'
dependants := '[' node_spec_list ']'

A realization can be recorded in several ways: by an asterisk * generally representing all typical realizations of the particular functor, by ! (exclamation mark) indicating that the frame is not (can never be) realized on the surface structure, i.e. the fact that it corresponds to an empty set of annalytical nodes, = idicating a state (attribute is_state), or by a a list of comma-or-& separated denotations of sibling nodes of the incomplete analytical tree. The nodes are written in the order in which they occur in the incomplete analytical tree. In this list, the & separator can be used no more than once to separate nodes occuring to the left of their common parent node from their sibling nodes occuring on the right of the parent node. The & separator may also occur at the start or end of the list to indicate that all nodes in the list follow or precede their parent node respectively.

A node is recorded in the form of a lemma specification and/or its morphological features. Both parts do not have to be present at the same time but at least one of them must be present.

Moreover, as a special case, the record of the first node in the list may be introduced by the symbol ^, in which case it describes the parent node of the node governing the subtrees corresponding to the frame members (i.e. the parent node of the verb/noun the frame relates to) instead of describing a node realizing the particular member of the frame.

A node specification starts with an optional specification of the lemma separated from the rest of the node specification by a dot or a colon (see below). The remaining part of the node specification describes morphological constraints. If no morphological requiremetns are given it is not necessary to use a separating symbol. In such a case a separating symbol is considered to be a colon. A dot separator is used to mark the analytical node that will govern all analytical subtrees corresponding to the realization of the nodes governed by the tectogrammatical node represented by the particular frame member the record belongs to. Consequently, only one record with a dot should appear among the records of all nodes contained in the description of one realization.

lemma_spec := LEMMA | '{' lemma_set '}' | '"' FORM '"'
lemma_set := LEMMA [ ',' lemma_set ] | LEMMA ',' '...'

A lemma specification is recorded either as a literal lemma, or as a comma-separated list of literal lemmas enclosed in curly brackets, or as a token within double apostrophes representing directly the literal surface form. The record which uses a literal form instead of a lemma is usually used only if a specific surface realization is impossible to be recorded in any other way (e.g. in case of a specific dialectic or colloquial expression). The list of lemmas in curly brackets can further end with a comma followed by three dots indicating that the list of permissible lemmas is incomplete and contains only lemmas that have been collected so far (this is typical for frame elements with the functor CPHR). A token representing a literal lemma identifies uniquely an item in morphological lexicon (in fact, it consists of a basic form of a word, in some cases followed by a hyphen and a number to distinguish homonyms). The token representing a literal lemma (or a literal form) can include only alphanumerical symbols and a hyphen, all other symbols must be introduced by a backslash \. A lemma of a backslash is therefore recorded as \\.

morph := [ neg ] [ pos ] [ gender ] [ number ] [ case ] [ deg ] 
            [ agreement ] [ tag_spec ]

The record of morphological constraints consists of the record of specification of part of speech, gender, number, case, grade of adjectives, agreement, and none of these items is obligatory but one of them at least should always be present. If any of these items is not given, no constraint is imposed on the particular category (i.e. all attribute values are permissible on the corresponding node). If a lemma occurs in the record of a realization, a morphological constraint on the part of speech need not be given since it is determined unambiguously by the lemma.

neg := '~'

The ~ character indicates a constraint on the presence of negation in a morphological tag.

pos := 'a' | 'd' | 'i' | 'n' | 'u' | 'j' | 'v' | 's' | 'f' | 'c'

A part of speech is written in lower case:

a: adjective
d: adverb
i: particle
n: noun
j: subordinating conjunction
v: verb
f: verb in infinitive form
u: possessive pronoun or adjective
s: root node of a direct speech subtree
c: root node of a subtree corresponding to an (asyndetic) dependent content clause (i.e. a clause introduced by a relative pronoun or adverb)

gender := 'F' | 'M' | 'I' | 'N'

Gender is written in upper case:

F: feminine
M: masculine animate
I: masculine inanimate
N: neuter

number := 'S' | 'P'

Number is written in upper case:

S: singular
P: plural

case := '1' | '2' | '3' | '4' | '5' | '6' | '7'

Case is recorded by its number.

deg := '@1' | '@2' | '@3'

A grade of an adjective is introduced by symbol @ distinguishing it from the case number.

agreement := '#'

Agreement with the governing node in case, number and gender (only if this category exists by both nodes and it is not specified by the record of morphological constraints by the dependent node).

tag_spec := tag_pos '<' tag_values '>' [ tag_spec ]
tag_pos := [ '$1' | '$2' | '$3' | '$4' | '$5' | '$6' | '$7' | '$8' 
             | '$9' | '$10' | '$11' | '$12' | '$13' | '$14' | '$15' ]
tag_values := CHAR [ tag_values ]

If the records stated above are not sufficient to describe the constraints on a morphological tag, it is possible to give other constraints in the form of enumerated values that are permissible for particular positions of the morphological tag. The record of a constraint on the value of a particular morphological tag begins with the symbol $, followed by the number of position (1 to 15), and by a string within pointed brackets < > this string consists of all symbols that are allowed to occur in the particular position of a morphological tag. All symbols except letters, numbers and a hyphen that occur within pointed brackets must be introduced by a backslash.

A valency frame can be empty. Such valency frame is recorded in the folowing way: EMPTY.