This talk takes a broad view of the development of treebanks based on the Universal Dependencies annotation scheme for Mesoamerican languages. Mesoamerica for the purposes of this talk includes Mexico, Guatemala, Belize, El Salvador and Honduras. The Mesoamerican languages come from many groups, including Uto-Aztecan, Mayan, Oto-Manguean and many others, including language isolates such as Huave. Many features are shared by many of the languages including a prevalence of verb-initial and head marking in possessive constructions and a system of relational nouns and code mixing with Spanish. The talk discusses annotation guidelines for these phenomena and also the performance of multilingual models on languages from groups or exhibiting features that are not found in their training data.
Dr. Francis M. Tyers is Assistant Professor of Computational Linguistics at Indiana University in Bloomington. He is a member of the core group that designs the guidelines for Universal Dependencies and is broadly interested in language technology for Indigenous and marginalised languages. He has worked on language technologies such as machine translation, morphological analysis, syntactic parsing and speech recognition, and with language communities from Europe, Russia and the Americas.
***The talk will be delivered in person (MFF UK, Malostranské nám. 25, 4th floor, room S1) and will be streamed via Zoom. For details how to join the Zoom meeting, please write to sevcikova et ufal.mff.cuni.cz***