f

The "main" word (token) element. Contains the word form from text, and then elements associated with the word form, such as lemma and tag (manual, dictionary possibilities, machine generated by various taggers), or governing node and analytical function (again, manual and/or automatic) on the analytical level, and governing node, functor and grammateme(s) on the tectogrammatical annotation level (yet again, possibilities exist to encode both manual and automatically assigned values; see also the description of the <fadd> element).

The attribute case contains an indication of the token's capitalization pattern, even though the actual capitalization from the original text is preserved, too. Only five types of capitalization are recognized and marked:

The string abbr is appended to the capitalization pattern names above if the word form has been identified as an abbreviation followed by a dot (period/fullstop) by the tokenizer. (NB: other abbreviations (even such that are not followed by a fullstop) are recognized at dictionary look-up time, but the value of the attribute case is then never ever modified again, i.e. for such abbreviations the abbr string is not added, and the fact that the token is (possibly) an abbreviation is marked elsewhere - see the elements <t>, <MMt> and <MDt>.)

The <f> element is in most cases identical to the appearance of the word form in the original text. In case of any discrepancy (such as an obvious spelling error, multiword or split phrases detected at tokenization time), the <w> element(s) is(are) used, preceding the <f> element(s); in such cases, the attribute case containing the substring gen is present in the <f> tag. Obviously, some of those discrepancies could have been discovered only in the manually annotated data; therefore, it is not guaranteed that e.g. spelling errors are marked in all data.


Content


ATTRIBUTES
CONTENT DECLARATION

Tag Minimization
Open Tag: REQUIRED
Close Tag: OPTIONAL

Parent Elements


Top Elements
All Elements


csts DTD