Treex::Tutorial::Install - Installation Guide for the Treex NLP framework
This synopsis is just an overview of the six steps, which are described below in more detail.
We expect no admin rights and no previous local Perl environment.
env | grep PERL prints something or directory ~/.cpan exists,
there is a risk that your previously installed local Perl environment will be in conflict with the new one.
wget -O- http://cpanmin.us | perl - -l ~/perl5 App::cpanminus local::lib eval `perl -I ~/perl5/lib/perl5 -Mlocal::lib` echo '## Treex installation ##' >> ~/.bashrc echo 'eval `perl -I ~/perl5/lib/perl5 -Mlocal::lib`' >> ~/.bashrc grep bashrc ~/.bash_profile || echo 'source ~/.bashrc' >> ~/.bash_profile
# First, try to install XML::LibXML cpanm XML::LibXML # If it fails and the build.log contains "looking for -lxml2... no", # you are probably missing libxml2 header files (or the whole libxml2). # On Ubuntu/Debian you can install it with # sudo apt-get install libxml2-dev zlib1g-dev # Few more possibly problematic modules cpanm -n PerlIO::Util cpanm Moose moose-outdated | cpanm # and finally the Treex::Core and its dependencies cpanm Treex::Core # this may take about 10 minutes treex -h # just to check if it was installed correctly
cpanm Treex::EN cpanm Lingua::Interset URI::Find Cache::LRU
See TrEd home page for details. To install Perl Tk module, you need several header files, on Ubuntu/Debian you can install them with
sudo apt-get install libx11-dev libxft-dev libfontconfig1-dev libpng12-dev libxslt-dev libgdbm-dev patch.
# Get a script which automatically downloads and builds everything else wget http://ufal.mff.cuni.cz/tred/install_tred.bash bash install_tred.bash --tred-dir ~/tred # Instruct Treex where to find TrEd and its dependencies echo "tred_dir: $HOME/tred" >> ~/.treex/config.yaml echo "source ~/tred/bin/init_tred_environment" >> ~/.bashrc source ~/tred/bin/init_tred_environment ttred # run TrEd with Treex extension
git clone https://github.com/ufal/treex.git ~/treex # Add the following lines to your ~/.bashrc export PATH="$HOME/treex/bin:$PATH" export PERL5LIB="$HOME/treex/lib:$PERL5LIB" export TMT_ROOT=$HOME/.treex
cpanm Ufal::MorphoDiTa Ufal::NameTag
In this tutorial, we expect Linux OS with Bash shell and Perl 5.10 or higher. Also basic development tools, such as
patch, and a C compiler (
gcc), are required. You can easily use different shell (e.g.
csh), just modify accordingly the shell commands. It is possible to install Treex also on MacOS and Windows+StrawberryPerl, but it is less tested so far. If you have a Perl version older than 5.10 (or if you just want to try the newest Perl), you can install your own Perl using perlbrew -- it is really simple.
Note that if you have Windows and only want to browse *.treex files, you can install TrEd and (in menu Setup - Manage Extensions - Get New Extension) select EasyTreex extension. However, for completing tutorial you need to install Treex (and setup TrEd) as described below, so EasyTreex is superfluous.
In order to install Treex, you must be able to install Perl modules from CPAN. This step is not specific to Treex, it is a basic Perl skill. There are several ways how to achieve the goal, but I consider this the easiest one. There are two things you should be aware of:
env | grep PERLprints something or directory ~/.cpan exists (
ls -l ~/.cpan), it is probable that you have already configured a local Perl environment. In such a case it is important to
If you used local::lib or perlbrew to set the environment, it should be configured properly and you can continue with step 2 (if you want to use
cpanm, install it by
cpan App::cpanminus). If you used another method, such as modifying
$PERL5LIB in your ~/.bashrc or setting PREFIX or INSTALL_BASE options in
cpan configuration, there is a possibility that your previously installed local Perl environment is configured only partially and the procedure described here may fail. If you decide to reuse your previous local Perl environment, the modules will be installed to whatever path you had chosen (instead of ~/perl5) and you should skip this step 1 (otherwise the installation fails with "WHOA THERE! It looks like you've got ..." in ~/.cpanm/build.log).
If you do not need/want to use your previous local Perl environment, you should delete (rename) the ~/.cpan directory and edit your shell profile (~/.bashrc, ~/.profile etc), so no Perl-related variables (such as PERL5LIB, PERL_MB_OPT, PERL_MM_OPT) are exported. After running a new shell (new ssh session),
env | grep PERL should print nothing.
sudo cpan App::cpanminus). However, in course of this tutorial you will be advised to modify some of the modules (Treex::Block::Tutorial::*), so it may be a good compromise to install only the dependencies to system paths using
sudo cpanm --installdeps Treex::Core, but otherwise follow this local Perl setup.
Download and install locally two useful tools (Perl modules) –
wget -O- http://cpanmin.us | perl - -l ~/perl5 App::cpanminus local::lib
App::cpanminus provides cpanm script which is a fast, dependency free, zero-configuration substitute for the standard cpan. local::lib takes care of setting all the environment variables needed to install modules without administrative privileges.
wget -0-, you can use
curl -L or simply download cpanm from http://cpanmin.us, save it as cpanm and run
perl cpanm -l ~/perl5 App::cpanminus local::lib. Instead of ~/perl5, you can use any path you like, but ~/perl5 is a common standard used in this tutorial.
In the following steps, you can use
cpan instead of
cpanm. The advantage is that you can start an interactive
cpan shell which provides more features (I recommend to install first Bundle::CPAN and Term::ReadLine::Perl, so you can browse the history using up/down keys). The disadvantage is that you cannot use it for installing
local::lib locally before
local::lib is installed :-). Also, you will need to go through a configuration dialogue when
cpan is executed for the first time.
eval `perl -I ~/perl5/lib/perl5 -Mlocal::lib` echo '## Treex installation ##' >> ~/.bashrc echo 'eval `perl -I ~/perl5/lib/perl5 -Mlocal::lib`' >> ~/.bashrc grep bashrc ~/.bash_profile || echo 'source ~/.bashrc' >> ~/.bash_profile
The first line sets up the environment variables
$PERL_MB_OPT etc. for the current shell session. It enables you to use the modules installed in ~/perl5 (without specifying this path using
perl -I) and also it ensures that new modules will be installed (using
cpan) to ~/perl5 (not to the system paths). The third line ensures that this setting will be applied also in other (non-login) shell sessions. The fourth line ensures that this setting will be applied also in "login" shell sessions (e.g. when you log in via
ssh). If you prefer to use ~/.profile instead of ~/.bash_profile, adapt the fourth line accordingly.
Treex is divided into several CPAN distributions. Treex::Core contains the main ("core") functionality and almost all other Treex modules depend on it.
Treex::Core itself has many dependencies, most notably Moose and Treex::PML (which have many dependencies and so on), so the installation takes several minutes. One of the most frequent problems in installation is that the Perl module XML::LibXML, which is a binding for
libxml2 library, needs apart from the library also its header files (*.h). So let's check first, whether you can install
If it fails and ~/.cpanm/build.log contains "Cannot write to /usr/lib/ ... XML/SAX.pm line 191", try to run it again and it should show that it was actually installed. If it fails and ~/.cpanm/build.log contains "looking for -lxml2... no", you are probably missing the header files or the whole library. On Ubuntu/Debian you can install it with:
sudo apt-get install libxml2-dev zlib1g-dev
If you know a simple way how to do this without admin privileges, let me know. You can check for the packages with
LANG=C dpkg-query -s libxml2-dev zlib1g-dev 2>&1 | grep Package. On other systems (e.g. RPM based), try to find similarly named packages (libxml2-devel), or look at http://xmlsoft.org.
There are few other possibly problematic modules.
PerlIO::Util has known (and reported)
cpanm -n PerlIO::Util cpanm Moose moose-outdated | cpanm
Now, the installation of Treex::Core should be smooth (but it takes more than 8 minutes if no dependencies were installed before):
Rarely, you may encounter problems with installing some modules. In that case, you should find the first module where something went wrong. You can read the documentation of the module, check its bug tracker, try to install it manually etc. If you cannot diagnose and fix the failure, you may try to install it with
--notest options, but this may cause troubles later on.
treex is the main Treex script.
treex -h should just print the usage information and exit. Its actual usage will be described later on in this tutorial (Treex::Tutorial::FirstSteps); running the command serves here only as a check that
treex was installed and can be found in the
$PATH. The installation created a configuration file ~/.treex/config.yaml which will be described in Treex::Tutorial::Config.
Treex Core itself has no modules for any particular NLP task. There is a separate distribution
Treex-Unilang for such modules that are language independent. In this tutorial, we will mainly work with English, so you need to install a distribution
Treex-EN, which contains only modules specific to English. It is dependent on
Treex-Unilang, so both the distributions can be installed by:
cpanm Treex::EN cpanm Lingua::Interset URI::Find Cache::LRU
TrEd is a fully customizable and programmable graphical editor and viewer for tree-like structures. Although TrEd visualization of the linguistic trees produced by Treex can be very helpful, it is not required, i.e. Treex is fully functional even without installing TrEd.
To install Perl Tk and other TrEd dependencies, you may need to install some header files and also the
patch tool. On Ubuntu/Debian you can install these prerequisites using:
sudo apt-get install libx11-dev libxft-dev libfontconfig1-dev libpng12-dev libxslt-dev libgdbm-dev patch
Now, download a small installation script
You can type
bash install_tred.bash -h to see the installation options. To automatically download and build the latest TrEd and its dependencies to ~/tred, use:
bash install_tred.bash --tred-dir ~/tred
You can run
~/tred/bin/start_tred to check the GUI. When a dialog box "Manage extensions" appears, you can ignore it (click on "Later").
Treex Core contains an extension for TrEd, which enables it to open *.treex, *.treex.gz and *.streex files and use the Treex stylesheet. Treex Core also contains a simple wrapper script
ttred which runs TrEd with this extension enabled (pre-installed). We must instruct Treex where to find TrEd:
echo "tred_dir: $HOME/tred" >> ~/.treex/config.yaml
TrEd installed some of its dependencies to ~/tred/dependencies, but we want to make them permanently available for Treex (and all Perl modules):
echo "source ~/tred/bin/init_tred_environment" >> ~/.bashrc source ~/tred/bin/init_tred_environment
Finally, you can run TrEd with the Treex extension enabled:
Some Treex modules are not mature enough to be released on CPAN. You may also want to test the newest Treex version or commit your own code to the repository. So let's create your local clone of Treex in ~/treex.
git clone https://github.com/ufal/treex.git ~/treex
You need to include the path to the downloaded modules in your $PERL5LIB. Add the following lines to the end of your ~/.bashrc:
export PATH="$HOME/treex/bin:$PATH" export PERL5LIB="$HOME/treex/lib:$PERL5LIB" export TMT_ROOT=$HOME/.treex
It is important that these lines follow eval `perl -I ~/perl5/lib/perl5 -Mlocal::lib` in your ~/.bashrc, so a GIT module is preferred over a CPAN modules of the same name. To apply the setting for the current bash session, type the three export commands or start a new session. You can check it with:
echo $PERL5LIB # ~/treex/lib should precede ~/perl5/... treex -v # should print "Treex version: DEV from..."
Now you can use Perl modules that were not installed from CPAN (but were downloaded from GIT). Some of the modules may have dependencies that you do not have (installed). When you load such a module (e.g. by running
treex) it will fail with an error message like
Can't locate Acme/Time/Baby.pm in @INC (@INC contains:... You can install the missing dependencies (Acme::Time::Baby in this imaginary example) simply with
If you happen to need any of the modules
Morce::English, you must install them manually, because these modules were not released on CPAN, but they are XS-based (involve compiling C code), so you cannot just download them.
svn --username public --password public export \ https://svn.ms.mff.cuni.cz/svn/tectomt_devel/trunk/libs/packaged /tmp/packaged cd /tmp/packaged/Morce-English perl Build.PL ./Build ./Build test ./Build install --prefix $HOME/perl5/lib/perl5
In the same way, you can install
Morce-Czech (in this order because the latter depends on the former).
MorphoDiTa is an open-source tool for morphological analysis of natural language texts. Currently there is a Perl module,
Ufal::MorphoDiTa, available on CPAN providing bindings to the MorphoDiTa library. This module is necessary for running
Treex::Tool::Tagger::MorphoDiTa and consequently
To compile the module, C++11 compiler is needed, either g++ 4.7 or newer, alternatively clang 3.2 or newer. You may check if you have the required compiler installed on your computer.
g++ --version # Or alternatively ... clang --version
When not installed, install it. On Ubuntu/Debian etc. use this command:
sudo apt-get install g++
When the installed compiler version is too old, upgrade it. On Ubuntu/Debian etc. use this command:
sudo apt-get upgrade g++
Finally, you can install the module:
Another useful tool is Ufal::NameTag, a tool for named entity recognition. It should have similar prerequisities as Ufal::MorphoDiTa, so if you followed the previous steps, just install the module.
Although there is no standardized way to uninstall Perl modules, in most cases it is enough to delete the respective files and directories. If you followed this installation guide and you want to remove all the installed stuff and if you had nothing in ~/perl5 before, you can delete the directories ~/perl5, ~/treex, ~/.treex, ~/.tred and ~/.cpanm. You can also delete the added lines from ~/.bashrc (starting with ## Treex installation ##) and ~/.bash_profile.
Martin Popel <email@example.com>
Dušan Variš <firstname.lastname@example.org>
Copyright © 2012 by Institute of Formal and Applied Linguistics, Charles University in Prague
This module is free software; you can redistribute it and/or modify it under the same terms as Perl itself.