<?xml version='1.0' encoding='UTF-8'?><rss xmlns:atom='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' xmlns:georss='http://www.georss.org/georss' version='2.0'><channel><atom:id>tag:blogger.com,1999:blog-20445582</atom:id><lastBuildDate>Sun, 21 Mar 2010 14:57:28 +0000</lastBuildDate><title>Prague Arabic Dependency Treebank ++</title><description>Data and Tools for Computational Processing of Arabic</description><link>http://ufal.mff.cuni.cz/padt/online/blogger.html</link><managingEditor>noreply@blogger.com (Otakar Smrz)</managingEditor><generator>Blogger</generator><openSearch:totalResults>19</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>25</openSearch:itemsPerPage><item><guid isPermaLink='false'>tag:blogger.com,1999:blog-20445582.post-3345329685951442425</guid><pubDate>Sun, 21 Mar 2010 14:57:00 +0000</pubDate><atom:updated>2010-03-21T15:57:28.559+01:00</atom:updated><title>This Blog Has Moved</title><description>
       This blog is now located at http://padt-online.blogspot.com/.
       You will be automatically redirected in 30 seconds, or you may click &lt;a href='http://padt-online.blogspot.com/'&gt;here&lt;/a&gt;.

       For feed subscribers, please update your feed subscriptions to
       http://padt-online.blogspot.com/feeds/posts/default.
  &lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/20445582-3345329685951442425?l=ufal.mff.cuni.cz%2Fpadt%2Fonline%2Fblogger.html' alt='' /&gt;&lt;/div&gt;</description><link>http://ufal.mff.cuni.cz/padt/online/2010/03/this-blog-has-moved.html</link><author>noreply@blogger.com (Otakar Smrz)</author><thr:total xmlns:thr='http://purl.org/syndication/thread/1.0'>0</thr:total></item><item><guid isPermaLink='false'>tag:blogger.com,1999:blog-20445582.post-846588731259075629</guid><pubDate>Wed, 13 Jan 2010 00:38:00 +0000</pubDate><atom:updated>2010-01-13T01:38:40.167+01:00</atom:updated><title>ElixirFM 1.1 Update + Wiki + API</title><description>&lt;p&gt;The &lt;a href="http://elixir-fm.wiki.sourceforge.net/"&gt;ElixirFM&lt;/a&gt; Functional Arabic Morphology project has released an update of its libraries, executables, data, and documentation at &lt;a href="http://sourceforge.net/projects/elixir-fm/files/"&gt;SourceForge&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The current version &lt;a href="https://sourceforge.net/projects/elixir-fm/files/"&gt;1.1.927&lt;/a&gt; includes important improvements in the performance of the system and comes with enhanced user and programming interfaces. Next to the &lt;a href="http://elixir-fm.sourceforge.net/"&gt;ElixirFM Online Interface&lt;/a&gt;, the project also features:&lt;/p&gt;

&lt;dl&gt;
&lt;dt&gt;&lt;a href="http://elixir-fm.wiki.sourceforge.net/"&gt;ElixirFM Wiki&lt;/a&gt;&lt;/dt&gt;
&lt;dd&gt;documentation for the project has been set up, which now brings notable information for the computational linguists and interested developers who would like to explore the ElixirFM system more deeply and use it in their applications&lt;/dd&gt;
&lt;dt&gt;&lt;a href="http://sourceforge.net/apps/trac/elixir-fm/wiki/ElixirPerl"&gt;ElixirFM API&lt;/a&gt;&lt;/dt&gt;
&lt;dd&gt;there is a powerful ElixirFM programming interface for Perl which allows you to invoke the &lt;code&gt;elixir&lt;/code&gt; executable from your code and further parse and process the results easily&lt;/dd&gt;
&lt;/dl&gt;

&lt;p&gt;The ElixirFM lexicon has been extended and refined, and a number of words have been encoded in a way that makes their deep word structure more explicit. The sources of the lexicon plus the editing software are available freely upon request.&lt;/p&gt;

&lt;p&gt;ElixirFM now operates more smoothly in all its modes. In particular, the &lt;a href="http://quest.ms.mff.cuni.cz/cgi-bin/elixir/index.fcgi?mode=resolve"&gt;resolve&lt;/a&gt; mode involves solution pruning and its morphological analyses now comply with most linguistic constraints. Likewise, the online &lt;a href="http://quest.ms.mff.cuni.cz/cgi-bin/elixir/index.fcgi?mode=inflect"&gt;inflect&lt;/a&gt; and &lt;a href="http://quest.ms.mff.cuni.cz/cgi-bin/elixir/index.fcgi?mode=derive"&gt;derive&lt;/a&gt; modes have been integrated with &lt;a href="http://quest.ms.mff.cuni.cz/cgi-bin/elixir/index.fcgi?mode=lookup"&gt;lookup&lt;/a&gt;, due to which word form generation becomes much more intuitive and yet more enjoyable.&lt;/p&gt;

&lt;p&gt;&lt;a href="http://sourceforge.net/projects/elixir-fm/"&gt;ElixirFM&lt;/a&gt; is published under the GNU General Public License &lt;a href="http://www.gnu.org/licenses/"&gt;GNU GPL 3&lt;/a&gt;. Everyone is welcome to participate in this project!&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/20445582-846588731259075629?l=ufal.mff.cuni.cz%2Fpadt%2Fonline%2Fblogger.html' alt='' /&gt;&lt;/div&gt;</description><link>http://ufal.mff.cuni.cz/padt/online/2009/12/elixirfm-11-update-wiki-api.html</link><author>noreply@blogger.com (Otakar Smrz)</author><thr:total xmlns:thr='http://purl.org/syndication/thread/1.0'>0</thr:total></item><item><guid isPermaLink='false'>tag:blogger.com,1999:blog-20445582.post-8900509878878964707</guid><pubDate>Tue, 03 Mar 2009 22:52:00 +0000</pubDate><atom:updated>2009-03-05T22:55:52.052+01:00</atom:updated><title>ElixirFM 1.1 Online Interface</title><description>&lt;p&gt;In the recent months, the &lt;a href="http://sourceforge.net/projects/elixir-fm/"&gt;ElixirFM&lt;/a&gt; project has undergone considerable improvement in various respects. We have worked most on developing the programming library and on refining the lexicon. On top of these essential components, we have built a user-friendly web application, the &lt;a href="http://quest.ms.mff.cuni.cz/elixir/"&gt;ElixirFM 1.1 Online Interface&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;ElixirFM is a &lt;a href="http://ufal.mff.cuni.cz/%7Esmrz/elixir-thesis.pdf"&gt;computational model&lt;/a&gt; of the morphology of Modern Written Arabic. It provides the user with four different modes of operation, in addition to the unique lexical resource and the other open-source functions of the implementation.
&lt;/p&gt;

&lt;dl&gt;
&lt;dt&gt;&lt;a href="http://quest.ms.mff.cuni.cz/cgi-bin/elixir/resolve/index.fcgi"&gt;Resolve&lt;/a&gt;&lt;/dt&gt;&lt;dd&gt;provides tokenization and morphological analysis of the inserted text, even if you omit some symbols or do not spell everything correctly. You can experiment with entering the text not only in the original script and orthography, but also in other notations, including a purely phonetic transcription.&lt;/dd&gt;
&lt;dt&gt;&lt;a href="http://quest.ms.mff.cuni.cz/cgi-bin/elixir/inflect/index.fcgi"&gt;Inflect&lt;/a&gt;&lt;/dt&gt;&lt;dd&gt;lets you inflect words into the forms required by context. You only need to define the grammatical parameters of the expected word forms. You can either enter natural language descriptions, or you can specify the parameters using the positional morphological tags.&lt;/dd&gt;
&lt;dt&gt;&lt;a href="http://quest.ms.mff.cuni.cz/cgi-bin/elixir/derive/index.fcgi"&gt;Derive&lt;/a&gt;&lt;/dt&gt;&lt;dd&gt;lets you derive words of similar meaning but different grammatical category. You only need to tell the desired grammatical categories, using either natural language descriptions, or the positional morphological tags.&lt;/dd&gt;&lt;a href="http://quest.ms.mff.cuni.cz/cgi-bin/elixir/lookup/index.fcgi"&gt;
&lt;/a&gt;&lt;dt&gt;&lt;a href="http://quest.ms.mff.cuni.cz/cgi-bin/elixir/lookup/index.fcgi"&gt;Lookup&lt;/a&gt;&lt;/dt&gt;&lt;dd&gt;can lookup lexical entries by the citation form and nests of entries by the root. You can even search the dictionary using English.&lt;/dd&gt;
&lt;/dl&gt;

&lt;p&gt;
The &lt;a href="http://quest.ms.mff.cuni.cz/elixir/"&gt;online interface&lt;/a&gt; includes example queries for each of the modes. It further incorporates several &lt;a href="http://www.yamli.com/editor/"&gt;interactive tools&lt;/a&gt; to facilitate the browsing of the results returned by the system.
&lt;/p&gt;

&lt;p&gt;
Information on the &lt;a href="http://ufal.mff.cuni.cz/~smrz/ElixirFM/"&gt;programming libraries&lt;/a&gt; and the research context of the project is in part available in &lt;a href="https://ufal.mff.cuni.cz:8443/bib/?section=publications"&gt;our papers&lt;/a&gt;. Yet, we would like to extend the documentation according to the requirements of the users, and would be happy to discuss any unclear issues with anyone interested.
&lt;/p&gt;

&lt;p&gt;
&lt;a href="http://sourceforge.net/projects/elixir-fm/"&gt;ElixirFM&lt;/a&gt; is published under the GNU General Public License &lt;a href="http://www.gnu.org/licenses/"&gt;GNU GPL 3&lt;/a&gt;. Everyone is welcome to participate in this project!
&lt;/p&gt;

&lt;p&gt;Enjoy ... and let us know in case of questions or comments :)&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/20445582-8900509878878964707?l=ufal.mff.cuni.cz%2Fpadt%2Fonline%2Fblogger.html' alt='' /&gt;&lt;/div&gt;</description><link>http://ufal.mff.cuni.cz/padt/online/2009/03/elixirfm-11-online-interface.html</link><author>noreply@blogger.com (Otakar Smrz)</author><thr:total xmlns:thr='http://purl.org/syndication/thread/1.0'>0</thr:total></item><item><guid isPermaLink='false'>tag:blogger.com,1999:blog-20445582.post-5942873186572901813</guid><pubDate>Wed, 09 Jul 2008 09:20:00 +0000</pubDate><atom:updated>2008-07-09T11:29:28.909+02:00</atom:updated><title>SourceForge Projects</title><description>&lt;p&gt;The &lt;a href="http://sourceforge.net/"&gt;SourceForge&lt;/a&gt; open-source software repository offers a number of projects related to computational processing of Arabic:&lt;/p&gt;

&lt;dl&gt;
&lt;dt&gt;&lt;a href="http://sourceforge.net/projects/elixir-fm/"&gt;ElixirFM&lt;/a&gt;&lt;/dt&gt;
&lt;dd&gt;High-level implementation of Functional Arabic Morphology&lt;/dd&gt;

&lt;dt&gt;&lt;a href="http://sourceforge.net/projects/encode-arabic/"&gt;Encode Arabic&lt;/a&gt;&lt;/dt&gt;
&lt;dd&gt;Implementations for encodings of Arabic, in Haskell and Perl&lt;/dd&gt;

&lt;dt&gt;&lt;a href="http://sourceforge.net/projects/aramorph/"&gt;AraMorph&lt;/a&gt;&lt;/dt&gt;
&lt;dd&gt;Buckwalter Arabic morphological analyzer&lt;/dd&gt;

&lt;dt&gt;&lt;a href="http://sourceforge.net/projects/awnbrowser/"&gt;Arabic WordNet&lt;/a&gt;&lt;/dt&gt;
&lt;dd&gt;Multi-lingual concept dictionary mapping word senses in Arabic to those in the English Princeton WordNet&lt;/dd&gt;

&lt;dt&gt;&lt;a href="http://sourceforge.net/projects/sarf/"&gt;Sarf&lt;/a&gt;&lt;/dt&gt;
&lt;dd&gt;Arabic morphology system that can generate and inflect Arabic verbs, derivative nouns, and gerunds&lt;/dd&gt;

&lt;dt&gt;&lt;a href="http://sourceforge.net/projects/arabic-spell/"&gt;Arabic Spellchecker Word Lists&lt;/a&gt;&lt;/dt&gt;
&lt;dd&gt;Arabic word list for spell checkers&lt;/dd&gt;
&lt;/dl&gt;

&lt;p&gt;Users can register with SourceForge and subscribe to the monitoring service of every project, in order to receive notifications of new updates.&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/20445582-5942873186572901813?l=ufal.mff.cuni.cz%2Fpadt%2Fonline%2Fblogger.html' alt='' /&gt;&lt;/div&gt;</description><link>http://ufal.mff.cuni.cz/padt/online/2007/01/sourceforge-projects.html</link><author>noreply@blogger.com (Otakar Smrz)</author><thr:total xmlns:thr='http://purl.org/syndication/thread/1.0'>0</thr:total></item><item><guid isPermaLink='false'>tag:blogger.com,1999:blog-20445582.post-6218235849502838837</guid><pubDate>Fri, 02 May 2008 13:34:00 +0000</pubDate><atom:updated>2008-05-04T11:07:04.710+02:00</atom:updated><title>A Word on the Million Words</title><description>&lt;p&gt;Work on the new PADT 2.0 is now in progress. The recent developments are described in our submission to the &lt;a href="http://www.lrec-conf.org/lrec2008/Workshops.html"&gt;LREC 2008 Workshop on Arabic &amp;amp; Local Languages&lt;/a&gt;:&lt;/p&gt;

&lt;dl&gt;
&lt;dt&gt;Prague Arabic Dependency Treebank: A Word on the Million Words&lt;/dt&gt;
&lt;dd&gt;&lt;a href="http://ufal.mff.cuni.cz/~smrz/LREC2008/padt-lrec.pdf"&gt;[paper]&lt;/a&gt;
&lt;!-- &lt;a href="http://ufal.mff.cuni.cz/~smrz/CCISSA2006/ccissa-slides.pdf"&gt;[slides]&lt;/a&gt; --&gt;&lt;/dd&gt;
&lt;/dl&gt;

&lt;p&gt;According to the paper, the expected contents of PADT 2.0 will include these annotations:&lt;/p&gt;

&lt;blockquote&gt;
&lt;table border="0" cellspacing="6pt" cellpadding="3pt"&gt;

&lt;tr align="left"&gt;
&lt;th&gt;PADT 2.0&lt;/th&gt;
&lt;th&gt;Corpus&lt;/th&gt;
&lt;th&gt;Fun. Morphology&lt;/th&gt;
&lt;th&gt;Dep. Syntax&lt;/th&gt;
&lt;th&gt;Tectogrammatics&lt;/th&gt;
&lt;th&gt;Notes&lt;/th&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td&gt;Total&lt;/td&gt;
&lt;td align="right"&gt;1,095,610&lt;/td&gt;
&lt;td align="right"&gt;1,281,858&lt;/td&gt;
&lt;td align="right"&gt;1,001,908&lt;/td&gt;
&lt;td align="right"&gt;30,894&lt;/td&gt;
&lt;td&gt;merged annotations&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td&gt;Prague&lt;/td&gt;
&lt;td align="right"&gt;328,240&lt;/td&gt;
&lt;td align="right"&gt;383,482&lt;/td&gt;
&lt;td align="right"&gt;282,252&lt;/td&gt;
&lt;td align="right"&gt;30,894&lt;/td&gt;
&lt;td&gt;original annotations&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td&gt;Penn&lt;/td&gt;
&lt;td align="right"&gt;767,370&lt;/td&gt;
&lt;td align="right"&gt;898,376&lt;/td&gt;
&lt;td align="right"&gt;719,656&lt;/td&gt;
&lt;td align="right"&gt;&lt;/td&gt;
&lt;td&gt;converted annotations&lt;/td&gt;
&lt;/tr&gt;

&lt;tr align="left"&gt;
&lt;th&gt;Prague&lt;/th&gt;
&lt;th&gt;Corpus&lt;/th&gt;
&lt;th&gt;Fun. Morphology&lt;/th&gt;
&lt;th&gt;Dep. Syntax&lt;/th&gt;
&lt;th&gt;Tectogrammatics&lt;/th&gt;
&lt;th&gt;Notes&lt;/th&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td&gt;AEP&lt;/td&gt;
&lt;td align="right"&gt;99,360&lt;/td&gt;
&lt;td align="right"&gt;116,717&lt;/td&gt;
&lt;td align="right"&gt;116,717&lt;/td&gt;
&lt;td align="right"&gt;9,690&lt;/td&gt;
&lt;td&gt;&lt;a href="http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2004T18"&gt;Arabic English Parallel News&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td&gt;EAT&lt;/td&gt;
&lt;td align="right"&gt;48,371&lt;/td&gt;
&lt;td align="right"&gt;55,097&lt;/td&gt;
&lt;td align="right"&gt;55,097&lt;/td&gt;
&lt;td align="right"&gt;13,934&lt;/td&gt;
&lt;td&gt;&lt;a href="http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2006T10"&gt;English-Arabic Treebank&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td&gt;ASB&lt;/td&gt;
&lt;td align="right"&gt;11,881&lt;/td&gt;
&lt;td align="right"&gt;14,254&lt;/td&gt;
&lt;td align="right"&gt;14,254&lt;/td&gt;
&lt;td align="right"&gt;&lt;/td&gt;
&lt;td&gt;&lt;a href="http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2007T40"&gt;Arabic Gigaword&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td&gt;NHR&lt;/td&gt;
&lt;td align="right"&gt;21,445&lt;/td&gt;
&lt;td align="right"&gt;25,329&lt;/td&gt;
&lt;td align="right"&gt;12,613&lt;/td&gt;
&lt;td align="right"&gt;&lt;/td&gt;
&lt;td&gt;&lt;a href="http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2007T40"&gt;Arabic Gigaword&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td&gt;HYT&lt;/td&gt;
&lt;td align="right"&gt;85,683&lt;/td&gt;
&lt;td align="right"&gt;100,537&lt;/td&gt;
&lt;td align="right"&gt;41,855&lt;/td&gt;
&lt;td align="right"&gt;5,228&lt;/td&gt;
&lt;td&gt;&lt;a href="http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2007T40"&gt;Arabic Gigaword&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td&gt;XIN&lt;/td&gt;
&lt;td align="right"&gt;61,500&lt;/td&gt;
&lt;td align="right"&gt;71,548&lt;/td&gt;
&lt;td align="right"&gt;41,716&lt;/td&gt;
&lt;td align="right"&gt;2,042&lt;/td&gt;
&lt;td&gt;&lt;a href="http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2007T40"&gt;Arabic Gigaword&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;

&lt;tr align="left"&gt;
&lt;th&gt;Penn&lt;/th&gt;
&lt;th&gt;Corpus&lt;/th&gt;
&lt;th&gt;Fun. Morphology&lt;/th&gt;
&lt;th&gt;Dep. Syntax&lt;/th&gt;
&lt;th&gt;Tectogrammatics&lt;/th&gt;
&lt;th&gt;Notes&lt;/th&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td&gt;1v3&lt;/td&gt;
&lt;td align="right"&gt;151,546&lt;/td&gt;
&lt;td align="right"&gt;172,386&lt;/td&gt;
&lt;td align="right"&gt;172,386&lt;/td&gt;
&lt;td align="right"&gt;&lt;/td&gt;
&lt;td&gt;&lt;a href="http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2005T02"&gt;Penn Arabic Treebank 1v3&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td&gt;2v2&lt;/td&gt;
&lt;td align="right"&gt;141,515&lt;/td&gt;
&lt;td align="right"&gt;161,217&lt;/td&gt;
&lt;td align="right"&gt;161,217&lt;/td&gt;
&lt;td align="right"&gt;&lt;/td&gt;
&lt;td&gt;&lt;a href="http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2004T02"&gt;Penn Arabic Treebank 2v2&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td&gt;3v2&lt;/td&gt;
&lt;td align="right"&gt;335,250&lt;/td&gt;
&lt;td align="right"&gt;394,466&lt;/td&gt;
&lt;td align="right"&gt;394,466&lt;/td&gt;
&lt;td align="right"&gt;&lt;/td&gt;
&lt;td&gt;&lt;a href="http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2005T20"&gt;Penn Arabic Treebank 3v2&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td&gt;4v1&lt;/td&gt;
&lt;td align="right"&gt;149,784&lt;/td&gt;
&lt;td align="right"&gt;178,720&lt;/td&gt;
&lt;td align="right"&gt;&lt;/td&gt;
&lt;td align="right"&gt;&lt;/td&gt;
&lt;td&gt;&lt;a href="http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2005T30"&gt;Penn Arabic Treebank 4v1&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;

&lt;/table&gt;
&lt;/blockquote&gt;

&lt;p&gt;Your suggestions and comments are very welcome. Thank you.&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/20445582-6218235849502838837?l=ufal.mff.cuni.cz%2Fpadt%2Fonline%2Fblogger.html' alt='' /&gt;&lt;/div&gt;</description><link>http://ufal.mff.cuni.cz/padt/online/2008/02/word-on-million-words.html</link><author>noreply@blogger.com (Otakar Smrz)</author><thr:total xmlns:thr='http://purl.org/syndication/thread/1.0'>0</thr:total></item><item><guid isPermaLink='false'>tag:blogger.com,1999:blog-20445582.post-4743626339691147848</guid><pubDate>Thu, 11 Oct 2007 23:19:00 +0000</pubDate><atom:updated>2007-10-12T01:56:24.784+02:00</atom:updated><title>Resolve Online Interface</title><description>&lt;p&gt;The &lt;code&gt;resolve&lt;/code&gt; function of &lt;a href="http://ufal.mff.cuni.cz/~smrz/elixir-thesis.pdf"&gt;ElixirFM&lt;/a&gt; has been made accessible via this &lt;a href="http://quest.ms.mff.cuni.cz/cgi-bin/elixir/resolve/index.fcgi"&gt;online interface&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;You can enter Arabic words in various notations, including the genuine orthography and the most popular transliterations. Symbols for vowels or diacritics can be omitted. The words will be analyzed as to their inflectional features as well as morphological structure.&lt;/p&gt;

&lt;p&gt;Example requests are provided.&lt;/p&gt;

&lt;p&gt;Enjoy ... and let us know in case of questions or comments :)&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/20445582-4743626339691147848?l=ufal.mff.cuni.cz%2Fpadt%2Fonline%2Fblogger.html' alt='' /&gt;&lt;/div&gt;</description><link>http://ufal.mff.cuni.cz/padt/online/2007/10/resolve-online-interface.html</link><author>noreply@blogger.com (Otakar Smrz)</author><thr:total xmlns:thr='http://purl.org/syndication/thread/1.0'>0</thr:total></item><item><guid isPermaLink='false'>tag:blogger.com,1999:blog-20445582.post-2122937836364288959</guid><pubDate>Wed, 19 Sep 2007 08:05:00 +0000</pubDate><atom:updated>2009-12-21T22:30:34.815+01:00</atom:updated><title>Functional Arabic Morphology Thesis</title><description>&lt;p&gt;I am pleased to announce that I have completed and defended the doctoral thesis on the novel computational model of Arabic inflectional and derivational morphology:&lt;/p&gt;

&lt;dl&gt;
&lt;dt&gt;Functional Arabic Morphology. Formal System and Implementation&lt;/dt&gt;
&lt;dd&gt;&lt;a href="http://ufal.mff.cuni.cz/~smrz/elixir-thesis.pdf"&gt;[thesis]&lt;/a&gt;
&lt;a href="http://ufal.mff.cuni.cz/~smrz/elixir-summary.pdf"&gt;[summary]&lt;/a&gt;
&lt;a href="http://ufal.mff.cuni.cz/~smrz/elixir-slides.pdf"&gt;[slides]&lt;/a&gt;&lt;/dd&gt;
&lt;/dl&gt;

&lt;p&gt;The &lt;a href="http://sourceforge.net/projects/elixir-fm/"&gt;ElixirFM&lt;/a&gt; implementation is available via &lt;a href="http://sourceforge.net/projects/elixir-fm/"&gt;SourceForge&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;We intend to improve the work further and integrate &lt;a href="http://sourceforge.net/docman/display_doc.php?docid=38247&amp;group_id=181087"&gt;ElixirFM &lt;/a&gt; closely with
&lt;a href="http://sourceforge.net/docman/display_doc.php?docid=39418&amp;group_id=181087"&gt;MorphoTrees&lt;/a&gt; as well as with both levels of syntactic representation in the
Prague Arabic Dependency Treebank.&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/20445582-2122937836364288959?l=ufal.mff.cuni.cz%2Fpadt%2Fonline%2Fblogger.html' alt='' /&gt;&lt;/div&gt;</description><link>http://ufal.mff.cuni.cz/padt/online/2007/09/functional-arabic-morphology-thesis.html</link><author>noreply@blogger.com (Otakar Smrz)</author></item><item><guid isPermaLink='false'>tag:blogger.com,1999:blog-20445582.post-4437985004691638489</guid><pubDate>Thu, 03 May 2007 16:26:00 +0000</pubDate><atom:updated>2007-05-18T13:18:33.758+02:00</atom:updated><title>ElixirFM and Functional Arabic Morphology</title><description>&lt;p&gt;The ACL 2007 Workshop on &lt;a href="http://ufal.mff.cuni.cz/acl2007/workshops/"&gt;Computational Approaches to Semitic Languages: Common Issues and Resources&lt;/a&gt; has accepted our submission:&lt;/p&gt;

&lt;dl&gt;
&lt;dt&gt;ElixirFM &amp;#151; Implementation of Functional Arabic Morphology&lt;/dt&gt;
&lt;dd&gt;&lt;a href="http://ufal.mff.cuni.cz/~smrz/ACL2007/elixir-acl.pdf"&gt;[paper]&lt;/a&gt;
&lt;a href="http://sourceforge.net/projects/elixir-fm/"&gt;[www]&lt;/a&gt;&lt;!--&lt;a href="http://ufal.mff.cuni.cz/~smrz/ACL2007/elixir-acl.pdf"&gt;[slides]&lt;/a&gt;--&gt;&lt;/dd&gt;
&lt;/dl&gt;

&lt;p&gt;The article is in its final version for the proceedings, but anyway, your suggestions or comments are very welcome. Thank you.&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/20445582-4437985004691638489?l=ufal.mff.cuni.cz%2Fpadt%2Fonline%2Fblogger.html' alt='' /&gt;&lt;/div&gt;</description><link>http://ufal.mff.cuni.cz/padt/online/2007/05/elixirfm-and-functional-arabic.html</link><author>noreply@blogger.com (Otakar Smrz)</author><thr:total xmlns:thr='http://purl.org/syndication/thread/1.0'>0</thr:total></item><item><guid isPermaLink='false'>tag:blogger.com,1999:blog-20445582.post-1555080919265146260</guid><pubDate>Wed, 17 Jan 2007 00:51:00 +0000</pubDate><atom:updated>2007-01-22T16:38:13.764+01:00</atom:updated><title>CoNLL Shared Task 2007</title><description>&lt;p&gt;The &lt;a href="http://nextens.uvt.nl/depparse-wiki/SharedTaskWebsite"&gt;CoNLL Shared Task 2007&lt;/a&gt; has been &lt;a href="http://nextens.uvt.nl/depparse-wiki/FirstCall"&gt;announced&lt;/a&gt;. The &lt;em&gt;extended data&lt;/em&gt; of PADT will be used in the competition, and we provide their rough characteristics:&lt;/p&gt;

&lt;blockquote&gt;
&lt;table border="0" cellspacing="6pt" cellpadding="3pt"&gt;
&lt;thead style="font-weight:bold"&gt;

&lt;tr&gt;
&lt;td&gt;Total&lt;/td&gt;
&lt;td align="right"&gt;116,800 tokens&lt;/td&gt;
&lt;td align="right"&gt;3,044 trees&lt;/td&gt;
&lt;td align="right"&gt;378 files&lt;/td&gt;
&lt;td&gt;annotated on the levels of analytical syntax and morphology&lt;/td&gt;
&lt;/tr&gt;

&lt;/thead&gt;
&lt;tbody&gt;

&lt;tr&gt;
&lt;td&gt;AEP&lt;/td&gt;
&lt;td align="right"&gt;9,500 tokens&lt;/td&gt;
&lt;td align="right"&gt;242 trees&lt;/td&gt;
&lt;td align="right"&gt;29 files&lt;/td&gt;
&lt;td&gt;&lt;a href="http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2004T18"&gt;Arabic English Parallel News&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td&gt;AFE&lt;/td&gt;
&lt;td align="right"&gt;13,000 tokens&lt;/td&gt;
&lt;td align="right"&gt;411 trees&lt;/td&gt;
&lt;td align="right"&gt;48 files&lt;/td&gt;
&lt;td&gt;&lt;a href="http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2003T07"&gt;Arabic 10K-word English Translation&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td&gt;ALH&lt;/td&gt;
&lt;td align="right"&gt;14,500 tokens&lt;/td&gt;
&lt;td align="right"&gt;312 trees&lt;/td&gt;
&lt;td align="right"&gt;41 files&lt;/td&gt;
&lt;td&gt;&lt;a href="http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2006T02"&gt;Arabic Gigaword&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td&gt;ANN&lt;/td&gt;
&lt;td align="right"&gt;12,500 tokens&lt;/td&gt;
&lt;td align="right"&gt;209 trees&lt;/td&gt;
&lt;td align="right"&gt;17 files&lt;/td&gt;
&lt;td&gt;&lt;a href="http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2006T02"&gt;Arabic Gigaword&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td&gt;HYT&lt;/td&gt;
&lt;td align="right"&gt;25,500 tokens&lt;/td&gt;
&lt;td align="right"&gt;457 trees&lt;/td&gt;
&lt;td align="right"&gt;47 files&lt;/td&gt;
&lt;td&gt;&lt;a href="http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2006T02"&gt;Arabic Gigaword&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td&gt;XIA&lt;/td&gt;
&lt;td align="right"&gt;26,500 tokens&lt;/td&gt;
&lt;td align="right"&gt;888 trees&lt;/td&gt;
&lt;td align="right"&gt;111 files&lt;/td&gt;
&lt;td&gt;&lt;a href="http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2006T02"&gt;Arabic Gigaword&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td&gt;XNH&lt;/td&gt;
&lt;td align="right"&gt;15,000 tokens&lt;/td&gt;
&lt;td align="right"&gt;525 trees&lt;/td&gt;
&lt;td align="right"&gt;85 files&lt;/td&gt;
&lt;td&gt;&lt;a href="http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2006T02"&gt;Arabic Gigaword&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;

&lt;/tbody&gt;
&lt;/table&gt;
&lt;/blockquote&gt;

&lt;p&gt;This year's data differ from the last year's set in two important respects:

&lt;ol&gt;
&lt;li&gt;
The extent and the quality of annotations have improved. We added new data sources, esp. AEP and AFE (with paragraph-aligned translations available). Other data sources are the newspaper texts published by Al Hayat, An Nahar, Ummah Press Service, and Xinhua.
&lt;/li&gt;
&lt;li&gt;
The morphology of the former data has been reannotated using &lt;a href="http://sourceforge.net/docman/display_doc.php?docid=39418&amp;group_id=181087"&gt;MorphoTrees&lt;/a&gt;, so that the format of all data is consistent now and the informativity of the
morphological tags is considerably higher. Lemmas and glosses based on the &lt;a href="http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2004L02"&gt;Buckwalter lexicon&lt;/a&gt; are also provided.
&lt;/li&gt;
&lt;/ol&gt;
&lt;/p&gt;

&lt;p&gt;The morphological class identifiers consist of the part-of-speech category and its refinement, and their meanings read:&lt;/p&gt;

&lt;blockquote&gt;
&lt;table border="0" cellspacing="6pt" cellpadding="3pt"&gt;
&lt;tbody&gt;

&lt;tr&gt;
&lt;td&gt;&lt;code&gt;VI VP VC&lt;/code&gt;&lt;/td&gt;&lt;td&gt;imperfect, perfect, and imperative verb forms&lt;/td&gt;
&lt;td&gt;&lt;code&gt;N- A- D-&lt;/code&gt;&lt;/td&gt;&lt;td&gt;nouns, adjectives, and adverbs&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td&gt;&lt;code&gt;C- P- I-&lt;/code&gt;&lt;/td&gt;&lt;td&gt;conjunctions, prepositions, interjections&lt;/td&gt;
&lt;td&gt;&lt;code&gt;G- Q- Y-&lt;/code&gt;&lt;/td&gt;&lt;td&gt;graphical symbols, numbers, abbreviations&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td&gt;&lt;code&gt;F- FN FI&lt;/code&gt;&lt;/td&gt;&lt;td&gt;particles, esp. negative and interrogative&lt;/td&gt;
&lt;td&gt;&lt;code&gt;S- SD SR&lt;/code&gt;&lt;/td&gt;&lt;td&gt;pronouns, esp. demonstrative and relative&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td&gt;&lt;code&gt;--&lt;/code&gt;&lt;/td&gt;&lt;td&gt;isolated definite articles&lt;/td&gt;
&lt;td&gt;&lt;code&gt;Z-&lt;/code&gt;&lt;/td&gt;&lt;td&gt;proper names&lt;/td&gt;
&lt;/tr&gt;

&lt;/tbody&gt;
&lt;/table&gt;
&lt;/blockquote&gt;

&lt;p&gt;The attributes and morphosyntactic features associated with individual tokens, i.e. the nodes in the dependency tree, include the following kinds of information. A feature can be linguistically applicable but unresolved by the annotation, in which case it is not listed with the token:&lt;/p&gt;

&lt;blockquote&gt;
&lt;table border="0" cellspacing="6pt" cellpadding="3pt"&gt;
&lt;tbody&gt;
&lt;tr&gt;&lt;td&gt;Mood&lt;/td&gt;&lt;td&gt;&lt;code&gt;I&lt;/code&gt;ndicative, &lt;code&gt;S&lt;/code&gt;ubjunctive, or &lt;code&gt;J&lt;/code&gt;ussive of imperfect verbs, with &lt;code&gt;D&lt;/code&gt; if undecided between &lt;code&gt;S&lt;/code&gt; and &lt;code&gt;J&lt;/code&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;Voice&lt;/td&gt;&lt;td&gt;&lt;code&gt;A&lt;/code&gt;ctive or &lt;code&gt;P&lt;/code&gt;assive&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;Person&lt;/td&gt;&lt;td&gt;&lt;code&gt;1&lt;/code&gt; speaker, &lt;code&gt;2&lt;/code&gt; addressee, &lt;code&gt;3&lt;/code&gt; others&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;Gender&lt;/td&gt;&lt;td&gt;morphologically overt 'gender', &lt;code&gt;M&lt;/code&gt;asculine or &lt;code&gt;F&lt;/code&gt;eminine&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;Number&lt;/td&gt;&lt;td&gt;morphologically overt 'number', &lt;code&gt;S&lt;/code&gt;ingular, &lt;code&gt;D&lt;/code&gt;ual, or &lt;code&gt;P&lt;/code&gt;lural&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;Case&lt;/td&gt;&lt;td&gt;&lt;code&gt;1&lt;/code&gt; nominative, &lt;code&gt;2&lt;/code&gt; genitive, &lt;code&gt;4&lt;/code&gt; accusative&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;Defin&lt;/td&gt;&lt;td&gt;morphological 'definiteness', &lt;code&gt;I&lt;/code&gt;ndefinite, &lt;code&gt;D&lt;/code&gt;efinite, &lt;code&gt;R&lt;/code&gt;educed, or &lt;code&gt;C&lt;/code&gt;omplex&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;MemberOf&lt;/td&gt;&lt;td&gt;member of syntactic &lt;code&gt;Co&lt;/code&gt;ordination or &lt;code&gt;Ap&lt;/code&gt;position&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;ClauseHead&lt;/td&gt;&lt;td&gt;the token is the head of the given type of a subordinate clause&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;GramCoref&lt;/td&gt;&lt;td&gt;the pronoun &lt;code&gt;S-&lt;/code&gt; is a grammatical coreferent, unlike other pronouns that are textual coreferents&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;InputForm&lt;/td&gt;&lt;td&gt;the token is the first of all that analyze the given orthographical word in the original input&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;TokenGloss&lt;/td&gt;&lt;td&gt;a clue to the morphological structure of the token&lt;/td&gt;&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;/blockquote&gt;

&lt;p&gt;The inventory of analytical dependency functions is further explained in  &lt;a href="http://ufal.mff.cuni.cz/padt/PADT_1.0/docs/papers/2004-nemlar-padt.pdf"&gt;one document&lt;/a&gt; or &lt;a href="http://ufal.mff.cuni.cz/~smrz/CCISSA2006/ccissa-paper.pdf"&gt;another&lt;/a&gt;:&lt;/p&gt;

&lt;blockquote&gt;
&lt;table border="0" cellspacing="6pt" cellpadding="3pt"&gt;
&lt;tbody&gt;
&lt;tr&gt;&lt;td&gt;Pred&lt;/td&gt;&lt;td&gt;verbal predicate&lt;/td&gt;&lt;td&gt;Coord&lt;/td&gt;&lt;td&gt;coordination&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;Pnom&lt;/td&gt;&lt;td&gt;nominal predicate&lt;/td&gt;&lt;td&gt;Apos&lt;/td&gt;&lt;td&gt;apposition&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;PredE&lt;/td&gt;&lt;td&gt;existential predicate&lt;/td&gt;&lt;td&gt;Ante&lt;/td&gt;&lt;td&gt;anteposition&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;PredC&lt;/td&gt;&lt;td&gt;conjunction as the clause's head&lt;/td&gt;&lt;td&gt;AuxC&lt;/td&gt;&lt;td&gt;conjunction&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;PredP&lt;/td&gt;&lt;td&gt;preposition as the clause's head&lt;/td&gt;&lt;td&gt;AuxP&lt;/td&gt;&lt;td&gt;preposition&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;Sb&lt;/td&gt;&lt;td&gt;subject&lt;/td&gt;&lt;td&gt;AuxE&lt;/td&gt;&lt;td&gt;emphasizing expression&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;Obj&lt;/td&gt;&lt;td&gt;object&lt;/td&gt;&lt;td&gt;AuxM&lt;/td&gt;&lt;td&gt;modifying expression&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;Adv&lt;/td&gt;&lt;td&gt;adverbial&lt;/td&gt;&lt;td&gt;AuxY&lt;/td&gt;&lt;td&gt;auxiliary, part of compound&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;Atr&lt;/td&gt;&lt;td&gt;attribute&lt;/td&gt;&lt;td&gt;AuxG&lt;/td&gt;&lt;td&gt;graphical symbol&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;Atv&lt;/td&gt;&lt;td&gt;complement&lt;/td&gt;&lt;td&gt;AuxK&lt;/td&gt;&lt;td&gt;sentence separator&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;ExD&lt;/td&gt;&lt;td&gt;ellipsis, no actual dependency&lt;/td&gt;&lt;td&gt;_&lt;/td&gt;&lt;td&gt;excessive token, esp. due to typo&lt;/td&gt;&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;/blockquote&gt;

&lt;p&gt;The &lt;a href="http://ufal.mff.cuni.cz/~smrz/CoNLL/padt-conll.btred"&gt;conversion script&lt;/a&gt; from the original &lt;a href="http://ufal.mff.cuni.cz/~pajas/tred/Fslib.html"&gt;FS format&lt;/a&gt; to the &lt;a href="http://nextens.uvt.nl/depparse-wiki/DataFormat"&gt;CoNLL format&lt;/a&gt; produces files with the &lt;code&gt;.conll&lt;/code&gt; extension. The script is run as follows:&lt;/p&gt;

&lt;blockquote&gt;
&lt;pre&gt;
  btred -Qm padt-conll.btred syntax/*.syntax.fs
  mkdir conll
  mv syntax/*.syntax.fs.conll conll/
&lt;/pre&gt;
&lt;/blockquote&gt;

&lt;p&gt;The data use the UTF-8 encoding as required. It might however be preferred to view the data in the Buckwalter transliteration, if rendering the Arabic script poses some problems. We recommend using the &lt;a href="http://sourceforge.net/projects/encode-arabic/"&gt;Encode Arabic&lt;/a&gt; libraries in Perl or Haskell to easily convert the data.&lt;/p&gt;

&lt;p&gt;For using the Perl library from a command line, a code like this would do:&lt;/p&gt;

&lt;blockquote&gt;
&lt;pre&gt;
  # calling the module's functions in a one-liner

  cat PADT-data-in-CoNLL-format | \
      perl -MEncode::Arabic -pe '$_ = encode "buckwalter", decode "utf8", $_'

  # running the scripts installed with the module

  cat PADT-data-in-CoNLL-format | encode "buckwalter"
&lt;/pre&gt;
&lt;/blockquote&gt;

&lt;p&gt;To use the module for &lt;a href="http://search.cpan.org/dist/Encode-Arabic/lib/Encode/Arabic/Buckwalter.pm#EXPORTS_%26_MODES"&gt;reducing the vocalization&lt;/a&gt;, or to choose the &lt;a href="http://www.qamus.org/transliteration.htm"&gt;XML-compliant&lt;/a&gt; variant of the Buckwalter transliteration, one can set the modes of conversion easily. Consider e.g. the following script, which removes any vocalization marks from the tokenized word forms supplied in the second column of the CoNLL data:&lt;/p&gt;

&lt;blockquote&gt;
&lt;pre&gt;
  use Encode::Arabic ':modes';

  enmode "buckwalter", 'full', 'xml';
  demode "buckwalter", 'noneplus', 'xml';

  while ($line = &lt;&gt;) {

      @cols = split /\t/, decode "utf8", $line;

      if (@cols &lt; 2) {

          print $line;
          next; 
      }

      unless ($cols[1] =~ /[\x{20}-\x{7F}]/) {

          $in_buck = encode "buckwalter", $cols[1];
          $cols[1] = decode "buckwalter", $in_buck;
         
          warn $in_buck . "\n";
      }
      
      print encode "utf8", join "\t", @cols;
  }
&lt;/pre&gt;
&lt;/blockquote&gt;

&lt;p&gt;More examples are available in the &lt;a href="http://search.cpan.org/dist/Encode-Arabic/lib/Encode/Arabic/Buckwalter.pm"&gt;CPAN documentation&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The link to the last year's &lt;a href="http://www.cnts.ua.ac.be/conll/"&gt;CoNLL-X&lt;/a&gt; and the &lt;a href="http://www.cnts.ua.ac.be/conll/proceedings.html"&gt;proceedings&lt;/a&gt; ...&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/20445582-1555080919265146260?l=ufal.mff.cuni.cz%2Fpadt%2Fonline%2Fblogger.html' alt='' /&gt;&lt;/div&gt;</description><link>http://ufal.mff.cuni.cz/padt/online/2007/01/conll-shared-task-2007.html</link><author>noreply@blogger.com (Otakar Smrz)</author><thr:total xmlns:thr='http://purl.org/syndication/thread/1.0'>0</thr:total></item><item><guid isPermaLink='false'>tag:blogger.com,1999:blog-20445582.post-8909859500615177655</guid><pubDate>Sat, 30 Dec 2006 15:36:00 +0000</pubDate><atom:updated>2007-04-15T14:30:25.558+02:00</atom:updated><title>Prague Treebanking for Everyone: Video Lectures</title><description>Within the Vilem Mathesius Lecture Series 21, there was a two-day tutorial &lt;a href="http://ufal.mff.cuni.cz/vmc/?a=ls21_tutorial"&gt;Prague Treebanking for Everyone&lt;/a&gt;. Now, everyone has the chance to view the video records of the lectures &lt;a href="http://ufallab.ms.mff.cuni.cz/video/"&gt;online&lt;/a&gt;.

&lt;p&gt;The talk on MorphoTrees and the other levels of annotation in the Prague Arabic Dependency Treebank project is available as a &lt;a href="http://ufallab.ms.mff.cuni.cz/video/flashplay/index/17/36/180/"&gt;video stream&lt;/a&gt; or in &lt;a href="http://ufallab.ms.mff.cuni.cz/video/recordshow/index/17/36/"&gt;other video formats&lt;/a&gt;, along with the &lt;a href="http://ufal.mff.cuni.cz/~smrz/VMC2006/vmc-slides.pdf"&gt;original slides&lt;/a&gt; and the &lt;a href="http://ufal.mff.cuni.cz/~smrz/VMC2006/vmc-handout.pdf"&gt;printable handout&lt;/a&gt;.&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/20445582-8909859500615177655?l=ufal.mff.cuni.cz%2Fpadt%2Fonline%2Fblogger.html' alt='' /&gt;&lt;/div&gt;</description><link>http://ufal.mff.cuni.cz/padt/online/2007/01/prague-treebanking-for-everyone-video.html</link><author>noreply@blogger.com (Otakar Smrz)</author><thr:total xmlns:thr='http://purl.org/syndication/thread/1.0'>0</thr:total></item><item><guid isPermaLink='false'>tag:blogger.com,1999:blog-20445582.post-116187186854386933</guid><pubDate>Thu, 26 Oct 2006 14:10:00 +0000</pubDate><atom:updated>2007-01-07T00:48:50.447+01:00</atom:updated><title>Tips and Tricks</title><description>&lt;p&gt;&lt;a href="http://www.bcs-mt.org.uk/arabic-2006/"&gt;The Challenge of Arabic for NLP/MT&lt;/a&gt; conference organized in London by &lt;a href="http://www.bcs.org/"&gt;The British Computer Society&lt;/a&gt; is over. Let us give a couple of new links:&lt;/p&gt;

&lt;dl&gt;
&lt;dt&gt;Tips and Tricks of the Prague Arabic Dependency Treebank&lt;/dt&gt;
&lt;dd&gt;&lt;a href="http://ufal.mff.cuni.cz/~smrz/BCS2006/bcs-paper.pdf"&gt;[paper]&lt;/a&gt; &lt;a href="http://ufal.mff.cuni.cz/~smrz/BCS2006/bcs-slides.pdf"&gt;[slides]&lt;/a&gt; Otakar Smrž&lt;/dd&gt;

&lt;dt&gt;An Ambiguity-Controlled Morphological Analyzer for Modern Standard Arabic Modeling Finite State Networks&lt;/dt&gt;
&lt;dd&gt;&lt;a href="http://www.attiaspace.com/"&gt;[www]&lt;/a&gt; Mohammed A. Attia&lt;/dd&gt;

&lt;dt&gt;Standard Arabic formalization and linguistic platform for its analysis&lt;/dt&gt;
&lt;dd&gt;&lt;a href="http://perso.orange.fr/rosavram/pages/arabicpag.html"&gt;[www]&lt;/a&gt; Slim Mesfar&lt;/dd&gt;
&lt;/dl&gt;

&lt;p&gt;In this context, the work on morphology by &lt;a href="http://www.ccls.columbia.edu/cadim/"&gt;Columbia's Arabic Dialect Modeling Group&lt;/a&gt; should also be noted:&lt;/p&gt;

&lt;dl&gt;
&lt;dt&gt;MAGEAD: A Morphological Analyzer and Generator for the Arabic Dialects&lt;/dt&gt;
&lt;dd&gt;&lt;a href="http://acl.ldc.upenn.edu/P/P06/P06-1086.pdf"&gt;[pdf]&lt;/a&gt; Nizar Habash and Owen Rambow&lt;/dd&gt;

&lt;dt&gt;Morphological Analysis and Generation for Arabic Dialects&lt;/dt&gt;
&lt;dd&gt;&lt;a href="http://acl.ldc.upenn.edu/W/W05/W05-0703.pdf"&gt;[pdf]&lt;/a&gt; Nizar Habash, Owen Rambow and George Kiraz&lt;/dd&gt;
&lt;/dl&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/20445582-116187186854386933?l=ufal.mff.cuni.cz%2Fpadt%2Fonline%2Fblogger.html' alt='' /&gt;&lt;/div&gt;</description><link>http://ufal.mff.cuni.cz/padt/online/2006/10/tips-and-tricks.html</link><author>noreply@blogger.com (Otakar Smrz)</author><thr:total xmlns:thr='http://purl.org/syndication/thread/1.0'>0</thr:total></item><item><guid isPermaLink='false'>tag:blogger.com,1999:blog-20445582.post-115864092552149034</guid><pubDate>Tue, 19 Sep 2006 04:41:00 +0000</pubDate><atom:updated>2006-12-25T17:39:09.620+01:00</atom:updated><title>Information Structure with the Prague Arabic Dependency Treebank</title><description>&lt;p&gt;Written for the conference on &lt;a href="https://register.casl.umd.edu/arabicconference/"&gt;Communication and Information Structure in Spoken Arabic&lt;/a&gt;, our contribution is now online:&lt;/p&gt;

&lt;dl&gt;
&lt;dt&gt;Information Structure with the Prague Arabic Dependency Treebank&lt;/dt&gt;
&lt;dd&gt;&lt;a href="http://ufal.mff.cuni.cz/~smrz/CCISSA2006/ccissa-paper.pdf"&gt;[paper]&lt;/a&gt;
&lt;a href="http://ufal.mff.cuni.cz/~smrz/CCISSA2006/ccissa-slides.pdf"&gt;[slides]&lt;/a&gt;&lt;/dd&gt;
&lt;/dl&gt;

&lt;p&gt;The article is being reviewed and edited for the proceedings of the conference. Your suggestions and comments are very welcome. Thank you.&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/20445582-115864092552149034?l=ufal.mff.cuni.cz%2Fpadt%2Fonline%2Fblogger.html' alt='' /&gt;&lt;/div&gt;</description><link>http://ufal.mff.cuni.cz/padt/online/2006/09/information-structure-with-prague.html</link><author>noreply@blogger.com (Otakar Smrz)</author><thr:total xmlns:thr='http://purl.org/syndication/thread/1.0'>0</thr:total></item><item><guid isPermaLink='false'>tag:blogger.com,1999:blog-20445582.post-114941667509686929</guid><pubDate>Sun, 04 Jun 2006 10:03:00 +0000</pubDate><atom:updated>2009-12-21T22:59:01.480+01:00</atom:updated><title>Encode Arabic</title><description>&lt;p&gt;Encoding the Arabic language in the &lt;a href="ftp://ftp.informatik.uni-stuttgart.de/pub/arabtex/arabtex.htm"&gt;ArabTeX&lt;/a&gt; transliteration, a non-trivial and multi-purpose notation for Arabic orthographies and phonetic transcriptions, brings improved possibilities for natural language processing systems.&lt;/p&gt;

&lt;p&gt;Relying on the advanced techniques of functional parsing in &lt;a href="http://www.haskell.org/"&gt;Haskell&lt;/a&gt;, we present an implementation of an interpreter for this notation. The developed programming library will be published shortly, complementing the current &lt;a href="http://ufal.mff.cuni.cz/~smrz/Encode/Arabic/"&gt;Encode Arabic&lt;/a&gt; implementation in &lt;a href="http://www.perl.org/"&gt;Perl&lt;/a&gt;.&lt;/p&gt;

&lt;dl&gt;
&lt;dt&gt;Encode Arabic: Exercise in Functional Parsing&lt;/dt&gt;
&lt;dd&gt;&lt;a href="http://ufal.mff.cuni.cz/~smrz/ICFP2006/icfp-encode.pdf"&gt;[submission version]&lt;/a&gt;&lt;/dd&gt;
&lt;/dl&gt;

&lt;p&gt;This paper has been submitted to the &lt;a href="http://haskell.org/haskell-workshop/2006/"&gt;Haskell Workshop&lt;/a&gt; of the &lt;a href="http://icfp06.cs.uchicago.edu/"&gt;International Conference on Functional Programming 2006&lt;/a&gt;.&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/20445582-114941667509686929?l=ufal.mff.cuni.cz%2Fpadt%2Fonline%2Fblogger.html' alt='' /&gt;&lt;/div&gt;</description><link>http://ufal.mff.cuni.cz/padt/online/2006/06/encode-arabic.html</link><author>noreply@blogger.com (Otakar Smrz)</author></item><item><guid isPermaLink='false'>tag:blogger.com,1999:blog-20445582.post-114122020272676159</guid><pubDate>Wed, 01 Mar 2006 13:31:00 +0000</pubDate><atom:updated>2006-03-01T15:14:58.886+01:00</atom:updated><title>Open Source Arabic Grammars</title><description>&lt;p&gt;&lt;a href="http://www.mdstud.chalmers.se/~eldada/"&gt;Ali El Dada&lt;/a&gt; from &lt;a href="http://www.chalmers.se/cse/EN"&gt;Chalmers University of Technology&lt;/a&gt; in Gothenburg, Sweden, is developing an interesting thing:&lt;/p&gt;

&lt;dl&gt;
&lt;dt&gt;Implementing an Open Source Arabic Resource Grammar in Grammatical Framework&lt;/dt&gt;
&lt;dd&gt;&lt;a href="http://www.mdstud.chalmers.se/~eldada/paper.pdf"&gt;[paper]&lt;/a&gt; &lt;a href="http://www.mdstud.chalmers.se/~eldada/submitted_abstract.pdf"&gt;[abstract]&lt;/a&gt; Ali El Dada and Aarne Ranta&lt;/dd&gt;
&lt;/dl&gt;

&lt;p&gt;To this topic, but from the view point of other distinct projects, the following links can be given:&lt;/p&gt;

&lt;dl&gt;
&lt;dt&gt;Topologische Dependenzgrammatik fürs Arabische&lt;/dt&gt;
&lt;dd&gt;&lt;a href="http://www.ps.uni-sb.de/~rade/fopra/Arabic.pdf"&gt;[report]&lt;/a&gt; &lt;a href="http://www.ps.uni-sb.de/theses/odeh/vortrag.pdf"&gt;[slides]&lt;/a&gt; Marwan Odeh&lt;/dd&gt;
&lt;dt&gt;PAPPI: A Multilingual Parser for the Principles-and-Parameters Framework&lt;/dt&gt;
&lt;dd&gt;&lt;a href="http://dingo.sbs.arizona.edu/~sandiway/pappi/"&gt;[www]&lt;/a&gt; Sandiway Fong&lt;/dd&gt;
&lt;/dl&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/20445582-114122020272676159?l=ufal.mff.cuni.cz%2Fpadt%2Fonline%2Fblogger.html' alt='' /&gt;&lt;/div&gt;</description><link>http://ufal.mff.cuni.cz/padt/online/2006/03/open-source-arabic-grammars.html</link><author>noreply@blogger.com (Otakar Smrz)</author><thr:total xmlns:thr='http://purl.org/syndication/thread/1.0'>0</thr:total></item><item><guid isPermaLink='false'>tag:blogger.com,1999:blog-20445582.post-114096602216301916</guid><pubDate>Sun, 26 Feb 2006 14:54:00 +0000</pubDate><atom:updated>2006-02-26T16:04:20.410+01:00</atom:updated><title>Treebanking and Advanced Processing of Arabic</title><description>&lt;p&gt;In early December 2006, the &lt;a href="http://ufal.mff.cuni.cz/"&gt;Institute of Formal and Applied Linguistics&lt;/a&gt;, Charles University in Prague, is organizing a two-week series of workshops and invited lectures, which include the Fifth Workshop on Treebanks and Linguistic Theories (TLT 2006), the Vilem Mathesius Courses (VMC 2006), and the Partnerships for International Research and Education (PIRE) group meeting.&lt;/p&gt;

&lt;p&gt;The proposed &lt;a href="http://ufal.mff.cuni.cz/~smrz/TAPA2006/"&gt;TAPA 2006&lt;/a&gt; workshop should open this series of events on November 30, 2006. Its purpose is to bring together people from different areas of the Natural Language Processing community, who are either interested in the problem of multi-level linguistic description of Arabic, or concerned with the resources, tools and methods used recently in the study of this language.&lt;/p&gt;

&lt;p&gt;The organizers would like to see this workshop on &lt;a href="http://ufal.mff.cuni.cz/~smrz/TAPA2006/"&gt;Treebanking and Advanced Processing of Arabic&lt;/a&gt; both as an opportunity for the invited research teams to promote their
relevant scientific projects, and as an open opportunity for other teams of the community to report on their original approaches or derived applications. The workshop should become a forum for explanation and informed discussion.&lt;/p&gt;

&lt;p&gt;Combining one's participation in TAPA 2006 with the other events is
expected and highly recommended, of course.&lt;/p&gt;

&lt;p&gt;More information on the &lt;a href="http://ufal.mff.cuni.cz/~smrz/TAPA2006/"&gt;website&lt;/a&gt; ...&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/20445582-114096602216301916?l=ufal.mff.cuni.cz%2Fpadt%2Fonline%2Fblogger.html' alt='' /&gt;&lt;/div&gt;</description><link>http://ufal.mff.cuni.cz/padt/online/2006/02/treebanking-and-advanced-processing-of_26.html</link><author>noreply@blogger.com (Otakar Smrz)</author><thr:total xmlns:thr='http://purl.org/syndication/thread/1.0'>0</thr:total></item><item><guid isPermaLink='false'>tag:blogger.com,1999:blog-20445582.post-113864203088520257</guid><pubDate>Mon, 30 Jan 2006 17:41:00 +0000</pubDate><atom:updated>2006-01-30T18:42:06.993+01:00</atom:updated><title>Communication and Information Structure in Spoken Arabic</title><description>&lt;p&gt;The Center for Advanced Study of Language, University of Maryland, organize a conference on &lt;a href="https://register.casl.umd.edu/arabicconference/"&gt;Communication and Information Structure in Spoken Arabic&lt;/a&gt;, which we would like to join:&lt;/p&gt;

&lt;dl&gt;
&lt;dt&gt;Information Structure with the Prague Arabic Dependency Treebank&lt;/dt&gt;
&lt;dd&gt;&lt;a href="http://ufal.mff.cuni.cz/~smrz/CCISSA2006/ccissa-abstract.pdf"&gt;[abstract]&lt;/a&gt;&lt;/dd&gt;
&lt;/dl&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/20445582-113864203088520257?l=ufal.mff.cuni.cz%2Fpadt%2Fonline%2Fblogger.html' alt='' /&gt;&lt;/div&gt;</description><link>http://ufal.mff.cuni.cz/padt/online/2006/01/communication-and-information.html</link><author>noreply@blogger.com (Otakar Smrz)</author><thr:total xmlns:thr='http://purl.org/syndication/thread/1.0'>0</thr:total></item><item><guid isPermaLink='false'>tag:blogger.com,1999:blog-20445582.post-113778349519825797</guid><pubDate>Fri, 20 Jan 2006 19:04:00 +0000</pubDate><atom:updated>2008-01-18T10:48:07.845+01:00</atom:updated><title>The Other Arabic Treebank: Prague Dependencies and Functions</title><description>&lt;p&gt;There is a new overview of the theories behind the Prague Arabic Dependency Treebank:&lt;/p&gt;

&lt;dl&gt;
  &lt;dt&gt;The Other Arabic Treebank: Prague Dependencies and Functions&lt;/dt&gt;
  &lt;dd&gt;&lt;a href="http://ufal.mff.cuni.cz/~smrz/CSLI2008/csli-prague.pdf"&gt;[draft]&lt;/a&gt;&lt;/dd&gt;
&lt;/dl&gt;

&lt;p&gt;It is to appear in &lt;em&gt;Arabic Computational Linguistics: Current Implementations&lt;/em&gt; (edited by Ali Farghaly), CSLI Publications, 2008.&lt;/p&gt;

&lt;p&gt;Reviews and comments to the draft version are very welcome. Thank you.&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/20445582-113778349519825797?l=ufal.mff.cuni.cz%2Fpadt%2Fonline%2Fblogger.html' alt='' /&gt;&lt;/div&gt;</description><link>http://ufal.mff.cuni.cz/padt/online/2006/01/other-arabic-treebank-prague.html</link><author>noreply@blogger.com (Otakar Smrz)</author><thr:total xmlns:thr='http://purl.org/syndication/thread/1.0'>0</thr:total></item><item><guid isPermaLink='false'>tag:blogger.com,1999:blog-20445582.post-113629793832745530</guid><pubDate>Tue, 03 Jan 2006 14:05:00 +0000</pubDate><atom:updated>2006-01-06T01:50:05.230+01:00</atom:updated><title>CoNLL-X Task: Multi-lingual Dependency Parsing</title><description>&lt;p&gt;&lt;a href="http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2004T23"&gt;PADT 1.0&lt;/a&gt; is going to be used by &lt;a href="http://staff.science.uva.nl/%7Eerikt/signll/conll/"&gt;CoNLL-X&lt;/a&gt;, the Tenth Conference on Natural Language Learning, within their shared task of multi-lingual dependency parsing.&lt;/p&gt;

&lt;p&gt;We recommend the users to consult the following documents, should they need further explanation to our treebank:&lt;/p&gt;

&lt;dl&gt;
  &lt;dt&gt;Learning to Use the Prague Arabic Dependency Treebank&lt;/dt&gt;
  &lt;dd&gt;&lt;a href="http://ufal.mff.cuni.cz/~smrz/ALS2005/als-learn.pdf"&gt;[paper]&lt;/a&gt;&lt;/dd&gt;
  &lt;dt&gt;Feature-Based Tagger of Approximations of Functional Arabic Morphology&lt;/dt&gt;
  &lt;dd&gt;&lt;a href="http://ufal.mff.cuni.cz/~smrz/TLT2005/tlt-tagger.pdf"&gt;[paper]&lt;/a&gt; &lt;a href="http://ufal.mff.cuni.cz/~smrz/TLT2005/tlt-slides.pdf"&gt;[slides]&lt;/a&gt;&lt;/dd&gt;
&lt;/dl&gt;

&lt;p&gt;CoNLL-X have their own &lt;a href="http://nextens.uvt.nl/conll-wiki/"&gt;wiki page&lt;/a&gt; and a &lt;a href="http://s2.phpbbforfree.com/forums/conll06.html"&gt;discussion forum&lt;/a&gt;, too.&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/20445582-113629793832745530?l=ufal.mff.cuni.cz%2Fpadt%2Fonline%2Fblogger.html' alt='' /&gt;&lt;/div&gt;</description><link>http://ufal.mff.cuni.cz/padt/online/2006/01/conll-x-task-multi-lingual-dependency.html</link><author>noreply@blogger.com (Otakar Smrz)</author><thr:total xmlns:thr='http://purl.org/syndication/thread/1.0'>0</thr:total></item><item><guid isPermaLink='false'>tag:blogger.com,1999:blog-20445582.post-113623581943933450</guid><pubDate>Mon, 02 Jan 2006 20:58:00 +0000</pubDate><atom:updated>2006-02-02T23:23:23.340+01:00</atom:updated><title>PADT ++</title><description>&lt;p&gt;This website provides information on &lt;a href="http://ufal.mff.cuni.cz/padt/"&gt;Prague Arabic Dependency Treebank&lt;/a&gt; and other projects in Arabic computational linguistics and natural language processing.&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/20445582-113623581943933450?l=ufal.mff.cuni.cz%2Fpadt%2Fonline%2Fblogger.html' alt='' /&gt;&lt;/div&gt;</description><link>http://ufal.mff.cuni.cz/padt/online/2006/01/padt.html</link><author>noreply@blogger.com (Otakar Smrz)</author><thr:total xmlns:thr='http://purl.org/syndication/thread/1.0'>0</thr:total></item></channel></rss>
