Development of Assamese treebank and its application to syntactic modeling of Assamese sentences


Study the annotation guidelines of Universal Dependencies (UD) and how they apply to the morphology and syntax of Assamese. Design language-specific annotation rules within the frame of UD. If there are phenomena in the language that do not fit well in the currently available guidelines, find a solution for them. Compare Assamese with other Indo-Aryan languages that are already available in UD, assess inter-language consistency. Collect freely redistributable Assamese text, preferably from multiple genres, and annotate it following the guidelines; create a treebank reasonably sized to allow for training tokenizers, taggers, and parsers. The annotation should include enhanced graphs as defined in UD, which are useful for further semantic analysis.

Syntactic modeling of Assamese sentences: While creating the Assamese Treebank, it will be investigated, using the Treebank, how many unique syntactic sentence structures can be derived from available Assamese text corpora. This can be particularly applicable to generate valid texts for the language. It will also be compared to sentence structure in other Indo-Aryan languages.


Joakim Nivre, Marie-Catherine de Marneffe, Filip Ginter, Yoav Goldberg, Jan Hajič, Christopher D. Manning, Ryan McDonald, Slav Petrov, Sampo Pyysalo, Natalia Silveira, Reut Tsarfaty, Daniel Zeman. 2016. Universal Dependencies v1: A Multilingual Treebank Collection. In Proceedings of LREC.

Avery D. Andrews. 2007. The Major Functions of the Noun Phrase. In: Timothy Shopen (ed.): Language Typology and Syntactic Description. Volume I.: Clause Structure. Second edition, pages 132-223. Cambridge University Press, Cambridge, UK. ISBN 978-0-521-58156-1.