A two-stage syntax-based natural language generator

Monday, 9 March, 2015 - 13:30

Room:

A two-stage syntax-based natural language generator

Ondřej Dušek

Abstract: This talk is an overview of our recent experiments with a novel syntax-based natural language generation system that is trainable from unaligned pairs of input meaning representations and output sentences.

The generator is divided into two stages: sentence planning, which, given a meaning representation, incrementally builds deep-syntactic trees, and surface realization which converts these trees into sentences. The sentence planner is based on A*-search with a perceptron ranker; surface realization uses a mostly rule-based pipeline from the Treex/TectoMT NLP toolkit.

We include our first results on an English restaurant information data set. They show that training from unaligned data is feasible and the
outputs of our generator are mostly fluent and relevant, but there are still problems that need to be solved.

Institute of Formal and Applied Linguistics

Charles University, Czech Republic
Faculty of Mathematics and Physics

Search form

A two-stage syntax-based natural language generator

Ondřej Dušek