Monday, April 4, 2016 - 13:30

Growing Trees: Non-Linear Incremental Parsing during Writing


Incremental parsing in Computer Science is defined as a process occuring during the creation of a program by parsing the structure of the program code and then to update the parse according to changes to the code during editing. Incremental parsing in Natural Language Processing is defined by parsing a sentence or a text in a time-linear fashion, similar to how humans speak and hear. However, when we read and write language, we do not process in a linear fashion but can go back and forth. During writing, we can add to previously written sentences, but we can also delete parts of it.

In this talk, I present the idea of non-linear incremental parsing in NLP, i.e., incremental parsing of natural language text in a similar way incremental parsing is applied to code. As previously shown, processing of code relies on formal characteristics like being unambiguously highly structured documents. Natural language text is structured as well, both on the document level and on the text level. The latter structure, however, is implicit and often ambiguous – we can easily come up with different syntax trees when analysing complex sentences in isolation. The parse tree of a document in preparation is a prerequisite for the development of editing functions making use of appropriate units of this document.

I will discuss issues of when and how to update parse trees during writing. I will also argue for the development of new methods for comparing parse trees, which could be used as another view on the text allowing the author to evaluate different versions of a sentence from a syntactic and stylistic point of view.

Cerstin Mahlow is a computational linguist and holds a PhD from the University of Zurich and a Magistra Artium degree from the Friedrich-Alexander University Erlangen-Nuremberg. After research positions in Zurich, Basel, Konstanz, and Stuttgart, she is now a senior researcher at the Leibniz Institut für Deutsche Sprache in Mannheim. Her main research focus is in writing technology, a currently underresearched topic at the intersection of linguistics, writing research, document engineering, computer science, and computational linguistics. In her doctoral thesis, she showed that design principles for programming tools can be adopted to word processors by developing appropriate and language-aware functions to help people write better. Mahlow is the co-founder and co-organizer of the bi-annual international workshop series on Systems and Frameworks for Computational Morphology (SFCM).