Guidelines
The focus of this PhD will be on creating methods that optimise syntactic parsing for various forms of non-standard language. These include, for instance, minority languages with smaller treebanks, and non-standard variations of language, such as dialects or code-switched natural languages. The thesis shall focus on handling parsing under the Universal Dependencies schema; however, there will also be some focus on other sources of data. Attention shall be paid, in particular, to word representations, their creation, and their use in parsing. Other areas of exploration include novel neural architectures designed specifically for low resource scenarios.
References
Lauriane Aufrant, Guillaume Wisniewski, François Yvon (2016).
Zero-resource Dependency Parsing: Boosting Delexicalized
Cross-lingual Transfer with Linguistic Knowledge
In Proceedings of COLING 2016, the 26th International Conference on
Computational Linguistics: Technical Papers, pp. 119–130, Osaka, Japan.
Ryan McDonald, Slav Petrov, Keith Hall (2011). Multi-Source Transfer of
Delexicalized Dependency Parsers
In Proceedings of the 2011 Conference on Empirical Methods in Natural
Language Processing (EMNLP), pp. 62–72, Edinburgh, Scotland
Daniel Zeman, Philip Resnik (2008). Cross-Language Parser Adaptation
between Related Languages
In IJCNLP 2008 Workshop on NLP for Less Privileged Languages,
pp. 35–42, Hyderabad, India
Pruthwik Mishra, Vandan Mujadia, Dipti Misra Sharma (2017). POS
Tagging for Resource Poor Indian Languages through Feature
Projection
In Proceedings of ICON 2017, Jadavpur, India
Željko Agić, Dirk Hovy, Anders Søgaard (2015). If all you have is a bit of
the Bible: Learning POS taggers for truly low-resource languages. In
Proceedings of the 53rd
Annual Meeting of the Association for
Computational Linguistics and the 7th
International Joint Conference on
Natural Language Processing (Short Papers), pp. 268–272, Beijing,
China.
Dipanjan Das, Slav Petrov (2011). Unsupervised Part-of-Speech
Tagging with Bilingual Graph-Based Projections. In Proceedings of the
49th
Annual Meeting of the Association for Computational Linguistics,
pp. 600–609, Portland, Oregon, USA.
David Yarowsky, Grace Ngai (2001). Inducing Multilingual POS Taggers
and NP Bracketers via Robust Projection across Aligned Corpora
In Proceedings of the Second Meeting of the North American Association
for Computational Linguistics (NAACL-2001), pp. 200–207, Pittsburgh, PA,
USA