Sesame Unsupervised Learning from Raw Texts and Its Applications

Monday, 30 March, 2020 - 15:00

Room:

Stream:

Sesame Unsupervised Learning from Raw Texts and Its Applications

Milan Straka (ÚFAL MFF UK)

In last two years, many traditional NLP tasks have seen substantial improvements by advanced unsupervised pretraining from raw texts, for example Elmo or the Transformer-based BERT. I will illustrate these improvements on our results in POS tagging, lemmatization, syntactic parsing, semantic parsing and named entity recognition. Furthermore, these pretraining techniques are effective also in multilingual setting, where they allow both massive multilingual models (I will present a model performing POS tagging, lemmatization and syntactic parsing of 75 languages) and zero-shot cross-lingual transfer (for example, running question answering system in Czech by training only on English data). Finally, I will mention recent improvements in the original BERT architecture.

Due to the COVID-19 crisis, the talk did not take place physically at ÚFAL and was broadcast remotely as a Zoom meeting. The recording is available here.

Institute of Formal and Applied Linguistics

Charles University, Czech Republic
Faculty of Mathematics and Physics

Search form

Sesame Unsupervised Learning from Raw Texts and Its Applications

Milan Straka (ÚFAL MFF UK)