Cross-lingual approaches to natural language processing have been attracting a lot of attention in the past years. The most frequently used technique is projection, which ideally assumes that the languages be close enough. However, in the proposed project we want to focus on techniques which take advantage of language differences, applying them to the task of coreference resolution. Typical examples of differences between English and Czech concerning coreference include personal pronouns in the subject position, which are usually unexpressed in Czech while rarely missing in English. On the other hand, grammatical genders as an important feature for coreference resolution are more evenly distributed over Czech nouns than notional genders on the English nouns. The main objective of the project thus is to explore the differences between English and Czech in terms of coreferential relations and to design a way of using these differences to improve coreference resolution. The proposed methods will take advantage of information observed in both manually and automatically annotated parallel corpora. The resulting system should be able to improve resolution even on monolingual data. Therefore, we plan to employ weakly supervised machine learning and machine translation in order to create synthetic parallel data.

Institute of Formal and Applied Linguistics

Charles University, Czech Republic
Faculty of Mathematics and Physics

Search form

cross-coref

Cross-lingual approaches to coreference resolution