Program of the 2009 PIRE meeting in Prague

Friday, Dec. 11, 2009 (Kolkovna - Olympia restaurant)
18:30		Get-together dinner

Saturday, Dec. 12 (School of Computer Science, Malostranske nam. 25, Praha 1) Room S1 (all day)
Start Time	Speaker	Title
09:00	Jan Hajic (Charles Univ.)	Welcome & morning coffee
09:30	Martin Popel (Charles Univ.)	Deep Syntactic Machine Translation with Hidden Markov Tree Models (Slides)
10:00	Katerina Veselovska (Charles Univ.)	Increasing Prague English Dependency Treebank Credibility
10:30		Coffee Break
11:00	Antske Fokkens (Saarland Univ.)	A word order library for grammar customization
11:30	Bart Cramer (Saarland Univ.)	Deep grammar engineering using a mid-depth treebank
12:00		Lunch (in the buildings' cafeteria)
14:00	Eugene Charniak (Brown Univ.)	New developments at Brown / BLLIP
14:30	Engin Ural (Brown Univ.)	A computational investigation of the effect of phonological variation on word segmentation and lexical acquisition
15:00		Coffee Break
15:30	Anoop Deoras (JHU)	Simulated Annealing for Confusion Network Decoding
16:10	Carolina Parada (JHU)	Contextual Information Improves OOV Detection in Speech (Slides)
16:50	Puyang Xu (JHU)
17:20	All	Discussion and closing
17:45	F. Jelinek, D. Klakow, E. Charniak, J. Hajic, S. Khudanpur, M. Johnson	PIRE PI Meeting
Sunday, Dec. 13 Social Program
10:00	All	Trip to Karlstejn (organizer: Lucie Mladova, mladova 'at' ufal...cz)

Abstracts of the scheduled talks

Martin Popel (Charles Univ.): Deep Syntactic Machine Translation with Hidden Markov Tree Models

I will present our modular English-Czech machine translation system with transfer phase on deep syntactic (tectogrammatical) layer as it is implemented in TectoMT framework. After showing statistics on translation errors and their sources, I will describe recent improvements of the system. One of the most helpful improvements was the utilization of Hidden Markov Tree Models in the transfer phase of translation, which is interpretable as labeling nodes of dependency trees.

Katerina Veselovska (Charles Univ.): Increasing Prague English Dependency Treebank Credibility

In my talk I will present the state of the art of building a manually annotated parallel treebank for English and Czech. I will focus on improvements concerning manual annotation of tectogrammatical tree structures in PEDT (the English part of the treebank), including rule-based pre-annotation etc. I will also introduce some methods for annotation of specific phenomena, e.g. treatment of non-dependency relations, and give a brief overview of interannotator agreement.

Antske Fokkens (Saarland Univ.): A word order library for grammar customization

The Grammar Matrix (Bender et al. 2002) is a tool helping grammar developers start building a deep grammar of a natural language. Recently, Matrix developers have been working on interfaces (called libraries) between linguistic descriptions and grammar implementations. The idea is that grammar writers can get a basic implementation of their grammar by filing out a questionnaire defining the typological properties of the language. Our research attempts at designing a library that can handle word order phenomena. In this talk, I will provide a general overview of the grammar customization project focusing on word order phenomena.

Bart Cramer (Saarland Univ.): Deep grammar engineering using a mid-depth treebank

I will outline how a HPSG grammar for German can be extracted from a mid-depth dependency treebank (the Tiger treebank). The novelty in this approach in the use of a core grammar and lexicon, which can account for linguistically more interesting phenomena. A technique to automatically create so-called 'dynamic treebanks' is presented, and preliminary results on treebank creation and parsing are presented. Last, an outlook is given to future work that will address the issues of efficiency and robustness.

Engin Ural (Brown Univ.): A computational investigation of the effect of phonological variation on word segmentation and lexical acquisition

In this study, we are proposing a computational model that investigates the effect of the phonemic phonological variation on word segmentation. The model segments the child directed speech and acquires the underlying forms that might differ from the surface form by a word final /t/ that is deleted by a phonological rule. We begin by explaining how the data currently used to study word segmentation of child-directed speech ignores phonological variation, and we propose using forced alignment to construct more realistic training data contains phonological variation. Lastly we discuss the bigram word-segmentation model and present our results from an unsupervised word-segmentation model that is trained on the forced aligned data and incorporates a simple word final /t/ deletion process.

Anoop Deoras (JHU): Simulated Annealing for Confusion Network Decoding

In this work, we propose a novel re-scoring framework for confusion networks, called Iterative Decoding. In it, the integration of various complex and sentence level knowledge sources is much easier and the re-scoring is much faster as compared to the conventional N best list method. For a comparable performance obtained on an LVCSR task, the search effort required by our method is 22 times less than that by the N best list method. We have also extended this method by making use of 'Simulated Annealing' techniques to avoid trapping into local maxima. Currently, we are investigating acoustic re-scoring in conjunction with language model re-scoring on confusion networks. We are also working towards consensus re-scoring using longer dependency language models and acoustic models.

Carolina Parada (JHU): Contextual Information Improves OOV Detection in Speech

Out-of-vocabulary (OOV) words represent an important source of error in large vocabulary continuous speech recognition (LVCSR) systems. These words cause recognition failures, which propagate through pipeline systems impacting the performance of downstream applications, such as understanding and translation. The detection of OOV regions in the output of a LVCSR system is typically addressed as a binary classification task, where each region is independently classified using local information. In this talk, I will show that jointly predicting OOV regions, and including contextual information from each region, leads to substantial improvement in OOV detection. Compared to the state-of-the-art, we reduce the missed OOV rate from 42.6% to 28.4% at 10% false alarm rate.
This is joint work with Fred Jelinek, Mark Dredze (JHU), and Denis Filimonov (UMD).

PIRE 2009 Meeting Prague

Institute of Formal and Applied Linguistics

Faculty of Mathematics and Physics, Charles University, Prague, Czech Republic