Word-formation structure of Czech words: a data-based research

The project focuses on linguistic research into word-formation structure of Czech words using specialized language resources and tools and large language corpora. The research team, including linguists and experts in Natural Language Processing, concentrates on three closely connected topics. First, relations between word formation and corpus frequency of words is studied in order to discover how complex the morphemic structure of the top-frequent part of the Czech lexicon is and whether and how it changes with decreasing frequency. The second task deals with ambiguity of suffixes; distribution of a suffix with different bases will be followed as well as the functions of different suffixes with a given base. In the third task, direction of wordformation motivation is studied on formations that have proved problematic in linguistic descriptions and in our previous research (esp. action nouns and words with loan bases). The
research results will be published in journal articles and conference papers, and are expected to be also relevant for teaching Czech as a foreign language.