Morseus 1.1

© 2006-2009 Daniel Zeman

Morseus stands for MORphemic SEgmentation, UnSupervised. It implements the system described in
Zeman Daniel: Using Unsupervised Paradigm Acquisition for Prefixes (revised version), in Lecture Notes in Computer Science, Vol. 5706, Evaluating Systems for Multilingual and Multimodal Information Access - 9th Workshop of the Cross-Language Evaluation Forum, Copyright © Springer Verlag, Delos Network of Excellence for Digital Libraries, Berlin / Heidelberg, ISBN 978-3-642-04446-5, ISSN 0302-9743, pp. 983-990, 2009

@inproceedings{ biblio:ZeUsingUnsupervised2009,
booktitle = {Evaluating Systems for Multilingual and Multimodal Information Access -- 9th Workshop of the Cross-Language Evaluation Forum},
series = {Lecture Notes in Computer Science},
title = {Using Unsupervised Paradigm Acquisition for Prefixes (revised version)},
editor = {Carol Peters and Thomas Deselaers and Nicola Ferro and Julio Gonzalo and Gareth Jones and Mikko Kurimo and Thomas Mandl and Anselmo Pe{\~{n}}as and Vivien Petras},
author = {Daniel Zeman},
year = {2009},
publisher = {Springer Verlag},
organization = {Delos Network of Excellence for Digital Libraries},
institution = {Delos Network of Excellence for Digital Libraries},
address = {Berlin / Heidelberg},
venue = {Universitet i {\A{A}}rhus},
series = {Lecture Notes in Computer Science},
volume = {5706},
pages = {983--990},
isbn = {978-3-642-04446-5},
issn = {0302-9743},
}

Morseus implements a simple method of unsupervised morpheme segmentation of words in an unknown language. All that is needed is a raw text corpus (or a list of words) in the given language. The algorithm identifies word parts occurring in many words and interprets them as morpheme candidates (prefixes, stems and suffixes).

Installation

Morseus is implemented in Perl. Since Perl is an interpreted language, it must be installed on your system before you can run the parser. However, it is available for most platforms and it can be downloaded for free.

Unzip the contents of the package into one folder. The Makefile shows how the sequence of scripts is to be invoked. Modify the Makefile to reflect your paths to data.

License

Copyright © 2006-2009 Daniel Zeman (zeman@ufal.mff.cuni.cz)

Morseus is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program. If not, see http://www.gnu.org/licenses/.

Please cite the above paper if you use Morseus in your academic work.

Acknowledgements

This research has been supported by the grant MSM 0021620838 of the Ministry of Education of the Czech Republic.

Download