Deep Language Understanding by Deep Learning

Guidelines

Despite applications like Machine Translation, true language understanding is still elusive. The focus of the thesis will be to analyze plain text in Czech and English to a knowledge representation graph by using supervised training with DNNs. Basic tools are available (up to natural language syntax), but semantic and knowledge extraction part is unsolved and will be the main problem to tackle. Datasets are available for at least two meaning/knowledge graph types (trees/DAGs). Deep learning will be used as the main tool for learning the relation between text and the selected meaning representation, for both Czech and English languages. Properly designed experiments will be used to test various system configurations, and results will be evaluated by standard metrics used in the area of meaning representation and language understanding. Evaluation will be also extended to downstream applications, such as IE or entailment or question-answering, using the meaning representation as the formal means of representing knowledge.

References

Banarescu, L. et al. (2013). Abstract Meaning Representation for Sembanking, 7th LAW workshop, Sophia, Bulgaria, https://aclanthology.info/papers/W13-2322/w13-2322
Jonathan May and Jay Priyadarshi (2017). SemEval-2017 Task 9: Abstract Meaning Representation Parsing and Generation. In: Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017). (https://aclanthology.info/papers/S17-2090/s17-2090)
https://catalog.ldc.upenn.edu/LDC2017T10 (AMR Release 2.0)
https://ufal.mff.cuni.cz/pdt3.5 (Prague Dependency Treebank v3.5), detailed documentation at https://ufal.mff.cuni.cz/pdt2.0/doc/manuals/en/t-layer/html/index.html
UFAL's course NPFL117 (formerly NPFL114, http://ufal.mff.cuni.cz/courses/npfl114/1718-summer) by Milan Straka, online at https://slideslive.com/s/milan-straka-10654

Captions

Despite applications like Machine Translation, true language understanding is still elusive. The focus of the thesis will be to analyze plain text in Czech and English to a knowledge representation graph by using supervised training with DNNs. Basic tools are available (up to natural language syntax), but semantic and knowledge extraction part is unsolved and will be the main problem to tackle. Datasets are available for at least two meaning/knowledge graph types (trees/DAGs).