Principal investigator (ÚFAL): 
Provider: 
Grant id: 
18-02196S
ÚFAL budget: 
2 612 000
Duration: 
2018-2020

LSD

Linguistic Structure Representation in Neural Networks

In the last few years, there has been a significant change in the area of natural language processing (NLP). The established statistical methods with easily interpretable steps often using linguistically annotated corpora were outperformed by modern methods based on deep neural networks. These methods now dominate in most of the established NLP tasks, such as machine translation, sentiment analysis, image captioning, or speech recognition. Neural networks solving these tasks very rarely use linguistic annotations.

The aim of this project is to analyze and describe the neural networks, how and what specifically they learn in particular NLP tasks. We will search for language features and structures in them and compare them with annotated corpora or established linguistic theories. We will try to answer questions: how the neural networks deal with function words, with negation, with passives, how their internal word representation in vector space corresponds to part-of-speech tags or morphological features, or which tree representations of sentences fit best for given NLP tasks.

End-to-End NLP Tasks Demo: http://quest.ms.mff.cuni.cz/neuralmonkey-czm/