Domain Adaptation for Natural Language Generation

Annotation [EN]

The performance of neural natural language generation (NLG) systems is dependent on the amount of available
in-domain training data. Current solutions for domain adaptation are limited – they require very similar
domains or complex input representations and use a rather crude technique of delexicalization. This project
will aim to develop a neural NLG model capable of generating comprehensible text in domains with lack of in-
domain training data. The model will use domain-independent semantic representations learned from large
amounts of unannotated data to improve implicit language understanding and selecting data matching the
domain for efficient fine-tuning. Outcomes from the project will improve usability of neural NLG systems in
practice and help current understanding of domain-independent semantic representations. The project will
also explore ways of improving automatic evaluation of NLG system outputs for accelerating future NLG
research.

Anotace [CZ]

Kvalita výstupu systémů pro generování přirozeného jazyka založených na neuronových sítích závisí na
množství dostupných trénovacích dat pro konkrétní doménu. Současná řešení pro doménovou adaptaci jsou
omezená – vyžadují velmi podobné domény nebo komplexní vstupní reprezentace a využívají techniku
delexikalizace, která zanedbává detaily výstupu. Cílem projektu bude vyvinout neuronový model pro
generování přirozeného jazyka schopný generovat srozumitelný text i v doménách, pro které neexistuje
dostatek trénovacích dat. Model bude postaven na doménově nezávislých sémantických reprezentacích
vytvořených z velkého množství neanotovaných dat, které zlepší jeho schopnost pracovat s jazykem nezávisle na
doméně, a technice selekce dat, která umožní efektivní ladění modelu pro konkrétní doménu. Výstupy z
projektu zlepší praktickou využitelnost systémů pro generování přirozeného jazyka založených na neuronových
sítích a pomohou lépe pochopit podstatu doménově nezávislých sémantických reprezentací. Projekt se také
bude zabývat možnostmi zlepšení automatického hodnocení výstupu systémů pro generování přirozeného
jazyka pro zvýšení efektivity dalšího výzkumu v této oblasti.

Publications

Zdeněk Kasner (2024): Data-to-Text Generation with Neural Language Models (PhD thesis). In: (url, bibtex)
Zdeněk Kasner, Ioannis Konstas, Ondřej Dušek (2023): Mind the Labels: Describing Relations in Knowledge Graphs With Pretrained Models. In: Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, pp. 2398-2415, Association for Computational Linguistics, Kerrville, TX, USA, ISBN 978-1-959429-44-9 (url, bibtex)
Rudali Huidrom, Ondřej Dušek, Zdeněk Kasner, Thiago Castro Ferreira, Anya Belz (2022): Two Reproductions of a Human-Assessed Comparative Evaluation of a Semantic Error Detection System. In: Proceedings of the 15th International Conference on Natural Language Generation: Generation Challenges, pp. 52-61, Association for Computational Linguistics, Stroudsburg, PA, USA, ISBN 978-1-955917-60-5 (url, local PDF, bibtex)
Zdeněk Kasner, Ondřej Dušek (2022): Neural Pipeline for Zero-Shot Data-to-Text Generation. In: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics: ACL 2022, pp. 3914-3932, Association for Computational Linguistics, Stroudsburg, PA, USA, ISBN 978-1-955917-21-6 (url, bibtex)
Sourabrata Mukherjee, Zdeněk Kasner, Ondřej Dušek (2022): Balancing the Style-Content Trade-Off in Sentiment Transfer Using Polarity-Aware Denoising. In: 25th International Conference on Text, Speech and Dialogue, pp. 172-186, Springer, Cham, Switzerland, ISBN 978-3-031-16269-5 (url, bibtex)
Zdeněk Kasner, Simon Mille, Ondřej Dušek (2021): Text-in-Context: Token-Level Error Detection for Table-to-Text Generation. In: Proceedings of the 14th International Conference on Natural Language Generation (INLG 2021), pp. 259-265, Association for Computational Linguistics, Stroudsburgh, PA, USA, ISBN 978-1-954085-51-0 (pdf, bibtex)
Ondřej Dušek, Zdeněk Kasner (2020): Evaluating Semantic Accuracy of Data-to-Text Generation with Natural Language Inference. In: Proceedings of the 13th International Conference on Natural Language Generation (INLG 2020), pp. 131-137, Association for Computational Linguistics, Stroudsburgh, PA, USA, ISBN 978-1-952148-54-5 (url, bibtex)
Zdeněk Kasner, Ondřej Dušek (2020): Train Hard, Finetune Easy: Multilingual Denoising for RDF-to-Text Generation. In: Proceedings of the 3rd International Workshop on Natural Language Generation from the Semantic Web (WebNLG+), pp. 171-176, Association for Computational Linguistics, Stroudsburg, PA, USA, ISBN 978-1-952148-59-0 (url, bibtex)
Zdeněk Kasner, Ondřej Dušek (2020): Data-to-Text Generation with Iterative Text Editing. In: Proceedings of the 13th International Conference on Natural Language Generation (INLG 2020), pp. 60-67, Association for Computational Linguistics, Stroudsburgh, PA, USA, ISBN 978-1-952148-54-5 (url, bibtex)
Jindřich Libovický, Zdeněk Kasner, Jindřich Helcl, Ondřej Dušek (2020): Expand and Filter: CUNI and LMU Systems for the WNGT 2020 Duolingo Shared Task. In: Proceedings of the Fourth Workshop on Neural Generation and Translation, pp. 153-160, Association for Computational Linguistics, Stroudsburg, PA, USA, ISBN 978-1-952148-17-0 (url, local PDF, bibtex)

Institute of Formal and Applied Linguistics

Charles University, Czech Republic
Faculty of Mathematics and Physics

Search form

Domain Adaptation for Natural Language Generation

Annotation [EN]

Anotace [CZ]

Publications