
The formemes a technical shortcut that facilitates searching the corpus across the tectogrammatical and the analytical layers, by specifying the query using tectogrammatical attributes only. A formeme can be regarded as a property of a t-node which specifies in which morphosyntactic form this t-node is realized in the surface sentence shape. The set of formeme values compatible with a given t-node is limited by its semantic part of speech. The formemes are particularly useful whenever you want to specify the lemma of a preposition or limit your search to just one part of speech in forms/lemmas whose part of speech is ambiguous. The formeme attribute is obtained automatically.

Set of formemes of the English part of PCEDT 3.0:

  • n:subj: semantic noun in subject position
  • n:preposition+X: semantic noun with a preposition (i.e. for)
  • n:poss: possessive form of a semantic noun
  • n:obj1: semantic noun in the position of a direct object
  • n:obj2: semantic noun in the position of a recipient ("dative") object
  • n:adv: semantic noun in adverbial position, such as Last year we met at a different restaurant.
  • n:attr: semantic noun in attributive position, such as power plant
  • n:???: semantic noun whose parent on the tectogrammatical layer is not represented on the analytical layer
  • adj:attr: semantic adjective in attributive position
  • adj:compl: semantic adjective as verbal complement
  • v:inf: semantic verb as infinitive
  • v:subordinator+ger: semantic verb as gerund, introduced by a subordinator (insert any)
  • v:attr: semantic verb modifying a noun
  • v:ger: semantic verb as gerund
  • v:subordinator+fin: finite verb as head of a subordinate clause introduced by a subordinator (insert the given subordinator when searching)
  • v:subordinator+inf: infinitive as head of a subordinate clause introduced by a subordinator (insert the given subordinator when searching)
  • v:rc: finite verb form as a head of a relative clause
  • v:fin: other finite verb forms, e.g. in a matrix clause

Set of formemes of the Czech part of PCEDT 3.0:

  • drop: node is not represented on the analytical layer
  • n:attr: semantic noun in attributive position, such as sklenice vody (glass [of] water)
  • n:preposition+case: semantic noun with a preposition and case (i.e. n:v+6)
  • n:case: adjectives hanging directly under the root – nominal usage
  • adj:poss: possesive adjective
  • adj:preposition+poss: possesive adjective with a preposition
  • adj:case: verbal complements
  • adj:attr: semantic adjective in attributive position
  • adv: adverbs derived from adjectives
  • v:inf: semantic verb as infinitive
  • v:subordinator+inf: infinitive as head of a subordinate clause introduced by a subordinator (insert the given subordinator when searching)
  • v:fin: other finite verb forms, e.g. in a matrix clause
  • v:subordinator+fin: finite verb as head of a subordinate clause introduced by a subordinator (insert the given subordinator when searching)
  • v:rc: finite verb form as a head of a relative clause
  • v:subordinator+rc: finite verb form as a head of a relative subordinate clause introduced by a subordinator (insert the given subordinator when searching)

Related publication: 

Dušek Ondřej, Žabokrtský Zdeněk, Popel Martin, Majliš Martin, Novák Michal, Mareček David: Formemes in English-Czech Deep Syntactic MT. In: Proceedings of the Seventh Workshop on Statistical Machine Translation, Association for Computational Linguistics, Montréal, Canada, ISBN 978-1-937284-20-6, pp. 267-274, 2012 (pdf)