Sunday, March 21, 2010
This Blog Has Moved
Wednesday, January 13, 2010
ElixirFM 1.1 Update + Wiki + API
The current version 1.1.927 includes important improvements in the performance of the system and comes with enhanced user and programming interfaces. Next to the ElixirFM Online Interface, the project also features:
- ElixirFM Wiki
- documentation for the project has been set up, which now brings notable information for the computational linguists and interested developers who would like to explore the ElixirFM system more deeply and use it in their applications
- ElixirFM API
- there is a powerful ElixirFM programming interface for Perl which allows you to invoke the
elixirexecutable from your code and further parse and process the results easily
The ElixirFM lexicon has been extended and refined, and a number of words have been encoded in a way that makes their deep word structure more explicit. The sources of the lexicon plus the editing software are available freely upon request.
ElixirFM now operates more smoothly in all its modes. In particular, the resolve mode involves solution pruning and its morphological analyses now comply with most linguistic constraints. Likewise, the online inflect and derive modes have been integrated with lookup, due to which word form generation becomes much more intuitive and yet more enjoyable.
Tuesday, March 3, 2009
ElixirFM 1.1 Online Interface
In the recent months, the ElixirFM project has undergone considerable improvement in various respects. We have worked most on developing the programming library and on refining the lexicon. On top of these essential components, we have built a user-friendly web application, the ElixirFM 1.1 Online Interface.
ElixirFM is a computational model of the morphology of Modern Written Arabic. It provides the user with four different modes of operation, in addition to the unique lexical resource and the other open-source functions of the implementation.
- provides tokenization and morphological analysis of the inserted text, even if you omit some symbols or do not spell everything correctly. You can experiment with entering the text not only in the original script and orthography, but also in other notations, including a purely phonetic transcription.
- lets you inflect words into the forms required by context. You only need to define the grammatical parameters of the expected word forms. You can either enter natural language descriptions, or you can specify the parameters using the positional morphological tags.
- lets you derive words of similar meaning but different grammatical category. You only need to tell the desired grammatical categories, using either natural language descriptions, or the positional morphological tags.
- can lookup lexical entries by the citation form and nests of entries by the root. You can even search the dictionary using English.
Information on the programming libraries and the research context of the project is in part available in our papers. Yet, we would like to extend the documentation according to the requirements of the users, and would be happy to discuss any unclear issues with anyone interested.
Enjoy ... and let us know in case of questions or comments :)
Wednesday, July 9, 2008
The SourceForge open-source software repository offers a number of projects related to computational processing of Arabic:
- High-level implementation of Functional Arabic Morphology
- Encode Arabic
- Implementations for encodings of Arabic, in Haskell and Perl
- Buckwalter Arabic morphological analyzer
- Arabic WordNet
- Multi-lingual concept dictionary mapping word senses in Arabic to those in the English Princeton WordNet
- Arabic morphology system that can generate and inflect Arabic verbs, derivative nouns, and gerunds
- Arabic Spellchecker Word Lists
- Arabic word list for spell checkers
Users can register with SourceForge and subscribe to the monitoring service of every project, in order to receive notifications of new updates.
Friday, May 2, 2008
A Word on the Million Words
Work on the new PADT 2.0 is now in progress. The recent developments are described in our submission to the LREC 2008 Workshop on Arabic & Local Languages:
- Prague Arabic Dependency Treebank: A Word on the Million Words
According to the paper, the expected contents of PADT 2.0 will include these annotations:
PADT 2.0 Corpus Fun. Morphology Dep. Syntax Tectogrammatics Notes Total 1,095,610 1,281,858 1,001,908 30,894 merged annotations Prague 328,240 383,482 282,252 30,894 original annotations Penn 767,370 898,376 719,656 converted annotations Prague Corpus Fun. Morphology Dep. Syntax Tectogrammatics Notes AEP 99,360 116,717 116,717 9,690 Arabic English Parallel News EAT 48,371 55,097 55,097 13,934 English-Arabic Treebank ASB 11,881 14,254 14,254 Arabic Gigaword NHR 21,445 25,329 12,613 Arabic Gigaword HYT 85,683 100,537 41,855 5,228 Arabic Gigaword XIN 61,500 71,548 41,716 2,042 Arabic Gigaword Penn Corpus Fun. Morphology Dep. Syntax Tectogrammatics Notes 1v3 151,546 172,386 172,386 Penn Arabic Treebank 1v3 2v2 141,515 161,217 161,217 Penn Arabic Treebank 2v2 3v2 335,250 394,466 394,466 Penn Arabic Treebank 3v2 4v1 149,784 178,720 Penn Arabic Treebank 4v1
Your suggestions and comments are very welcome. Thank you.