Preliminary parsing results on HamleDT data, both original and harmonized (Prague) annotation. Unlabeled attachment score (UAS). We used Malt Parser (http://maltparser.org/) with the stack-lazy algorithm and a feature definition file that looks at complete morphological information, gold standard (lemmas, part of speech, values of individual morphosyntactic features).
To reduce time requirements and increase comparability, we ran first a short experiment where training data was limited to the first 5000 sentences. Treebanks that have fewer than 5000 sentences used their entire training data sets. Nevertheless the following table also contains results of the long experiment, where full training data was used for all treebanks. Most models were trained within a couple of days but it took more than one week to train the model of the original Czech treebank.
UAS, trained on max 5000 sentences | UAS, full training data | LAS | |||||
Code | TCode | Original | Prague | P>O | Original | Prague | Prague |
ar | padtr349 | 78,91% | 79,44% | 1 | 79,70% | 80,37% | 72,24% |
bg | conll2006 | 83,59% | 89,83% | 1 | 84,50% | 90,92% | 83,09% |
bn | icon2010 | 86,83% | 80,30% | 0 | 86,83% | 80,30% | 60,84% |
ca | conll2009 | 84,74% | 88,37% | 1 | 84,74% | 89,71% | 84,62% |
cs | pdt30 | 78,36% | 78,77% | 1 | 86,35% | 86,71% | 82,05% |
da | conll2006 | 88,01% | 87,72% | 0 | 88,93% | 87,97% | 80,59% |
de | conll2009 | 79,55% | 84,28% | 1 | 81,59% | 88,42% | 84,44% |
el | conll2007 | 81,92% | 82,54% | 1 | 81,92% | 82,54% | 76,59% |
en | conll2007 | 84,25% | 85,47% | 1 | 86,89% | 88,17% | 85,29% |
es | conll2009 | 82,19% | 88,07% | 1 | 90,40% | 89,76% | 85,01% |
et | puudepank | 91,32% | 88,92% | 0 | 91,32% | 88,92% | 86,30% |
eu | bdt | 74,63% | 78,52% | 1 | 75,92% | 80,72% | 74,30% |
fa | perdt | 84,80% | 82,27% | 0 | 86,98% | 84,10% | 75,37% |
fi | turku | 77,81% | 80,28% | 1 | 77,81% | 80,28% | 75,84% |
grc | agdt | 63,41% | 62,81% | 0 | 63,12% | 62,91% | 54,73% |
hi | hydt05 | 94,52% | 92,87% | 0 | 95,12% | 93,99% | 90,16% |
hu | conll2007 | 70,80% | 80,94% | 1 | 74,23% | 81,46% | 79,43% |
it | conll2007 | 85,56% | 83,11% | 0 | 85,56% | 83,11% | 78,24% |
ja | conll2007 | 78,43% | 88,37% | 1 | 80,22% | 90,15% | 73,37% |
la | ldt | 52,60% | 53,04% | 1 | 52,60% | 53,04% | 45,32% |
nl | conll2006 | 77,89% | 77,37% | 0 | 82,58% | 81,44% | 73,97% |
pt | conll2006 | 78,43% | 85,86% | 1 | 77,83% | 86,74% | 81,90% |
ro | rodt | 81,52% | 84,21% | 1 | 81,52% | 84,21% | 78,00% |
ru | syntagrus | 86,70% | 82,02% | 0 | 89,59% | 85,43% | 77,39% |
sk | sta1 | 74,76% | 76,36% | 1 | 80,73% | 82,24% | 75,35% |
sl | conll2006 | 80,77% | 81,95% | 1 | 80,77% | 81,95% | 75,00% |
sv | conll2006 | 82,98% | 81,23% | 0 | 86,99% | 84,98% | 78,89% |
ta | tamiltb | 77,58% | 77,38% | 0 | 77,58% | 77,38% | 68,43% |
te | icon2010 | 92,30% | 90,29% | 0 | 92,30% | 90,29% | 71,19% |
tr | conll2007 | 85,69% | 82,43% | 0 | 84,76% | 81,57% | 75,99% |
57% |