Tuesday, March 3, 2009
ElixirFM 1.1 Online Interface
In the recent months, the ElixirFM project has undergone considerable improvement in various respects. We have worked most on developing the programming library and on refining the lexicon. On top of these essential components, we have built a user-friendly web application, the ElixirFM 1.1 Online Interface.
ElixirFM is a computational model of the morphology of Modern Written Arabic. It provides the user with four different modes of operation, in addition to the unique lexical resource and the other open-source functions of the implementation.
- Resolve
- provides tokenization and morphological analysis of the inserted text, even if you omit some symbols or do not spell everything correctly. You can experiment with entering the text not only in the original script and orthography, but also in other notations, including a purely phonetic transcription.
- Inflect
- lets you inflect words into the forms required by context. You only need to define the grammatical parameters of the expected word forms. You can either enter natural language descriptions, or you can specify the parameters using the positional morphological tags.
- Derive
- lets you derive words of similar meaning but different grammatical category. You only need to tell the desired grammatical categories, using either natural language descriptions, or the positional morphological tags.
- Lookup
- can lookup lexical entries by the citation form and nests of entries by the root. You can even search the dictionary using English.
The online interface includes example queries for each of the modes. It further incorporates several interactive tools to facilitate the browsing of the results returned by the system.
Information on the programming libraries and the research context of the project is in part available in our papers. Yet, we would like to extend the documentation according to the requirements of the users, and would be happy to discuss any unclear issues with anyone interested.
ElixirFM is published under the GNU General Public License GNU GPL 3. Everyone is welcome to participate in this project!
Enjoy ... and let us know in case of questions or comments :)
Wednesday, July 9, 2008
SourceForge Projects
The SourceForge open-source software repository offers a number of projects related to computational processing of Arabic:
- ElixirFM
- High-level implementation of Functional Arabic Morphology
- Encode Arabic
- Implementations for encodings of Arabic, in Haskell and Perl
- AraMorph
- Buckwalter Arabic morphological analyzer
- Arabic WordNet
- Multi-lingual concept dictionary mapping word senses in Arabic to those in the English Princeton WordNet
- Sarf
- Arabic morphology system that can generate and inflect Arabic verbs, derivative nouns, and gerunds
- Arabic Spellchecker Word Lists
- Arabic word list for spell checkers
Users can register with SourceForge and subscribe to the monitoring service of every project, in order to receive notifications of new updates.
Friday, May 2, 2008
A Word on the Million Words
Work on the new PADT 2.0 is now in progress. The recent developments are described in our submission to the LREC 2008 Workshop on Arabic & Local Languages:
- Prague Arabic Dependency Treebank: A Word on the Million Words
- [paper]
According to the paper, the expected contents of PADT 2.0 will include these annotations:
PADT 2.0 Corpus Fun. Morphology Dep. Syntax Tectogrammatics Notes Total 1,095,610 1,281,858 1,001,908 30,894 merged annotations Prague 328,240 383,482 282,252 30,894 original annotations Penn 767,370 898,376 719,656 converted annotations Prague Corpus Fun. Morphology Dep. Syntax Tectogrammatics Notes AEP 99,360 116,717 116,717 9,690 Arabic English Parallel News EAT 48,371 55,097 55,097 13,934 English-Arabic Treebank ASB 11,881 14,254 14,254 Arabic Gigaword NHR 21,445 25,329 12,613 Arabic Gigaword HYT 85,683 100,537 41,855 5,228 Arabic Gigaword XIN 61,500 71,548 41,716 2,042 Arabic Gigaword Penn Corpus Fun. Morphology Dep. Syntax Tectogrammatics Notes 1v3 151,546 172,386 172,386 Penn Arabic Treebank 1v3 2v2 141,515 161,217 161,217 Penn Arabic Treebank 2v2 3v2 335,250 394,466 394,466 Penn Arabic Treebank 3v2 4v1 149,784 178,720 Penn Arabic Treebank 4v1
Your suggestions and comments are very welcome. Thank you.
Friday, October 12, 2007
Resolve Online Interface
The resolve function of ElixirFM has been made accessible via this online interface.
You can enter Arabic words in various notations, including the genuine orthography and the most popular transliterations. Symbols for vowels or diacritics can be omitted. The words will be analyzed as to their inflectional features as well as morphological structure.
Example requests are provided.
Enjoy ... and let us know in case of questions or comments :)
Wednesday, September 19, 2007
Functional Arabic Morphology Thesis
I am pleased to announce that I have completed and defended the doctoral thesis on the novel computational model of Arabic inflectional and derivational morphology:
The ElixirFM implementation is available via SourceForge.
We intend to improve the work further and integrate ElixirFM closely with MorphoTrees as well as with both levels of syntactic representation in the Prague Arabic Dependency Treebank.