Program of the 2009 PIRE meeting in Prague
Friday, Dec. 11, 2009 (Kolkovna - Olympia restaurant) |
||
---|---|---|
18:30 | Get-together dinner | |
Saturday, Dec. 12 (School of Computer Science, Malostranske nam. 25, Praha 1) Room S1 (all day) |
||
Start Time | Speaker | Title |
09:00 | Jan Hajic (Charles Univ.) | Welcome & morning coffee |
09:30 | Martin Popel (Charles Univ.) | Deep Syntactic Machine Translation with Hidden Markov Tree Models (Slides) |
10:00 | Katerina Veselovska (Charles Univ.) | Increasing Prague English Dependency Treebank Credibility |
10:30 | Coffee Break | |
11:00 | Antske Fokkens (Saarland Univ.) | A word order library for grammar customization |
11:30 | Bart Cramer (Saarland Univ.) | Deep grammar engineering using a mid-depth treebank |
12:00 | Lunch (in the buildings' cafeteria) | |
14:00 | Eugene Charniak (Brown Univ.) | New developments at Brown / BLLIP |
14:30 | Engin Ural (Brown Univ.) | A computational investigation of the effect of phonological variation on word segmentation and lexical acquisition |
15:00 | Coffee Break | |
15:30 | Anoop Deoras (JHU) | Simulated Annealing for Confusion Network Decoding |
16:10 | Carolina Parada (JHU) | Contextual Information Improves OOV Detection in Speech (Slides) |
16:50 | Puyang Xu (JHU) | |
17:20 | All | Discussion and closing |
17:45 | F. Jelinek, D. Klakow, E. Charniak, J. Hajic, S. Khudanpur, M. Johnson | PIRE PI Meeting |
Sunday, Dec. 13 Social Program |
||
10:00 | All | Trip to Karlstejn (organizer: Lucie Mladova, mladova 'at' ufal...cz) |
Abstracts of the scheduled talks
Martin Popel (Charles Univ.): Deep Syntactic Machine Translation with Hidden Markov Tree Models
I will present our modular English-Czech machine translation system with transfer phase on deep syntactic (tectogrammatical) layer as it is implemented in TectoMT framework. After showing statistics on translation errors and their sources, I will describe recent improvements of the system. One of the most helpful improvements was the utilization of Hidden Markov Tree Models in the transfer phase of translation, which is interpretable as labeling nodes of dependency trees.
Katerina Veselovska (Charles Univ.): Increasing Prague English Dependency Treebank Credibility
In my talk I will present the state of the art of building a manually annotated parallel treebank for English and Czech. I will focus on improvements concerning manual annotation of tectogrammatical tree structures in PEDT (the English part of the treebank), including rule-based pre-annotation etc. I will also introduce some methods for annotation of specific phenomena, e.g. treatment of non-dependency relations, and give a brief overview of interannotator agreement.
Antske Fokkens (Saarland Univ.): A word order library for grammar customization
The Grammar Matrix (Bender et al. 2002) is a tool helping grammar developers start building a deep grammar of a natural language. Recently, Matrix developers have been working on interfaces (called libraries) between linguistic descriptions and grammar implementations. The idea is that grammar writers can get a basic implementation of their grammar by filing out a questionnaire defining the typological properties of the language. Our research attempts at designing a library that can handle word order phenomena. In this talk, I will provide a general overview of the grammar customization project focusing on word order phenomena.
Bart Cramer (Saarland Univ.): Deep grammar engineering using a mid-depth treebank
I will outline how a HPSG grammar for German can be extracted from a mid-depth dependency treebank (the Tiger treebank). The novelty in this approach in the use of a core grammar and lexicon, which can account for linguistically more interesting phenomena. A technique to automatically create so-called 'dynamic treebanks' is presented, and preliminary results on treebank creation and parsing are presented. Last, an outlook is given to future work that will address the issues of efficiency and robustness.
Engin Ural (Brown Univ.): A computational investigation of the effect of phonological variation on word segmentation and lexical acquisition
In this study, we are proposing a computational model that investigates the effect of the phonemic phonological variation on word segmentation. The model segments the child directed speech and acquires the underlying forms that might differ from the surface form by a word final /t/ that is deleted by a phonological rule. We begin by explaining how the data currently used to study word segmentation of child-directed speech ignores phonological variation, and we propose using forced alignment to construct more realistic training data contains phonological variation. Lastly we discuss the bigram word-segmentation model and present our results from an unsupervised word-segmentation model that is trained on the forced aligned data and incorporates a simple word final /t/ deletion process.
Anoop Deoras (JHU): Simulated Annealing for Confusion Network Decoding
In this work, we propose a novel re-scoring framework for confusion networks, called Iterative Decoding. In it, the integration of various complex and sentence level knowledge sources is much easier and the re-scoring is much faster as compared to the conventional N best list method. For a comparable performance obtained on an LVCSR task, the search effort required by our method is 22 times less than that by the N best list method. We have also extended this method by making use of 'Simulated Annealing' techniques to avoid trapping into local maxima. Currently, we are investigating acoustic re-scoring in conjunction with language model re-scoring on confusion networks. We are also working towards consensus re-scoring using longer dependency language models and acoustic models.
Carolina Parada (JHU): Contextual Information Improves OOV Detection in Speech
Out-of-vocabulary (OOV) words represent an important source of error in large
vocabulary continuous speech recognition (LVCSR) systems. These words cause
recognition failures, which propagate through pipeline systems impacting the
performance of downstream applications, such as understanding and translation. The
detection of OOV regions in the output of a LVCSR system is typically addressed as a
binary classification task, where each region is independently classified using local
information. In this talk, I will show that jointly predicting OOV regions, and
including contextual information from each region, leads to substantial improvement
in OOV detection. Compared to the state-of-the-art, we reduce the missed OOV rate
from 42.6% to 28.4% at 10% false alarm rate.
This is joint work with Fred Jelinek, Mark Dredze (JHU), and Denis Filimonov (UMD).
Puyang Xu (JHU):