This talk will present three pieces of my recent work on neural NLG:
1) Shared task on natural language generation I co-organized, The E2E NLG Challenge. I will show an analysis of the results, comparing 21 different NLG systems, which points out the problems current neural NLG still needs to address.
2) An experiment with automatic cleaning of crowdsourced data. Since there was a lot of semantic errors in E2E challenge system outputs, we decided to look at the training corpus and fix the semantic noise it contained. We achieved an up to 97% reduction in semantic errors when training on the cleaned data, which suggests that neural NLG systems are not robust to training data noise.
3) A neural system for automatic quality estimation of NLG outputs – this system is trained on a small number of human-rated NLG outputs and is then able to rate unseen NLG outputs with better correlation against humans than traditional metrics such as BLEU score, without requiring human references.