Status: 
in active development, with stable versions
OS: 
Linux, Windows
Tags: 

MSTperl parser

MSTperl is a Perl reimplementation of the MST parser of Ryan McDonald.

MST parser (Maximum Spanning Tree parser) is a state-of-the-art natural language dependency parser -- a tool that takes a sentence and returns its dependency tree.

In MSTperl, only some functionality was implemented; the limitations include the following:

  • the parser is a non-projective one, curently with no possibility of enforcing the requirement of projectivity of the parse trees
  • only first-order features are supported, i.e. no second-order or third-order features are possible
  • the implementation of MIRA is that of a single-best MIRA, with a closed-form update instead of using quadratic programming

On the other hand, the parser supports several advanced features:

  • parallel features, i.e. enriching the parser input with word-aligned sentence in other language
  • adding large-scale information, i.e. the feature set enriched with features corresponding to pointwise mutual information of word pairs in a large corpus (CzEng)
  • weighted/unweighted parser model interpolation
  • combination of several instances of the MSTperl parser (through MST algorithm)
  • combination of several existing parses from any parsers (through MST algorithm). 

The MSTperl parser is tuned for parsing Czech. Trained models are available for Czech, English and German. We can train the parser for other languages on demand, or you can train it yourself -- the guidelines are part of the documentation.

The parser, together with detailed documentation, is avalable on CPAN (currently, the CPAN version is not maintained). You can also download the parser from the LINDAT-Clarin repository (this one is currently more up-to-date than the CPAN version).

The most current version of the parser is always available on Github.

Screenshot: 

How to cite

Please choose one of the following two citations to refer to the MSTperl parser (the first one is the tool and is preferred; the second one is mainly the approach of using parallel features in parsing, but is the first description of MSTperl parser):

@misc{mstperl,
 title = {{MSTperl} parser (2015-05-19)},
 author = {Rosa, Rudolf},
 url = {http://hdl.handle.net/11234/1-1480},
 copyright = {Artistic License 2.0},
 language = {ces, eng},
 year = {2015}
}

@inproceedings{mstperl:2012,
  booktitle = {Proceedings of Sixth Workshop on Syntax, Semantics and Structure in Statistical Translation ({SSST}-6), {ACL}},
  title = {Using Parallel Features in Parsing of Machine-Translated Sentences for Correction of Grammatical Errors},
  author = {Rudolf Rosa and Ond{\v{r}}ej Du{\v{s}}ek and David Mare{\v{c}}ek and Martin Popel},
  year = {2012},
  publisher = {Association for Computational Linguistics},
  address = {Jeju, Korea},
  venue = {Jeju, Korea},
  pages = {39--48},
  isbn = {978-1-937284-38-1},
  url = {http://hdl.handle.net/11858/00-097C-0000-0023-7AEB-4},
}