Documentation to EVALD 1.0 and EVALD 1.0 for Foreigners

EVALD 1.0 and EVALD 1.0 for Foreigners serve for automatic evaluation of surface coherence (cohesion) in Czech texts written by native speakers of Czech, or non-native speakers of Czech, respectively.

The evaluation part (the backend server) is implemented in Treex (http://ufal.cz/treex), a highly modular NLP framework written in the Perl programming language, and uses the Weka toolkit (http://www.cs.waikato.ac.nz/ml/weka/) for the final prediction of a coherence mark. It can be used directly from a command line or as a backend for a client. The frontend part is implemented as a web server, accessible with a web browser.

There are three possible ways of using EVALD 1.0 and EVALD 1.0 for Foreigners:

  1. as a web demo and RESTful web service hosted at the LINDAT/CLARIN server (available for EVALD 3.0 and EVALD 3.0 for Foreigners only!),

  2. locally with both the backend and the frontend server running on the same machine (or two machines in the same network),

  3. in a batch mode from a command line; the Treex scenario is run directly on the local machine.

1. EVALD as a LINDAT/CLARIN web service

No installation is needed in this case; in a web browser (such as Firefox or Chrome), go to https://lindat.mff.cuni.cz/services/evald/ for EVALD 2.0, or to https://lindat.mff.cuni.cz/services/evald-foreign/ for EVALD 2.0 for Foreigners.

2. EVALD as a backend and frontend server

Both the backend and the frontend components are distributed via the Docker software (https://www.docker.com/), which needs to be installed first. Docker greatly simplifies the installation process of the two components and allows them to be run on Linux-based operating systems, Windows 10, as well as Mac OS X.

How to install the app:
  1. To install Docker, please follow the instructions for your operating system: MS Windows, Linux, Mac OS X
  2. Download the backend component by running:
    docker pull ufal/evald.treex-server:1.0
  3. Download the frontend component by running:
    docker pull ufal/evald.php-server:1.0
How to run the app:
  1. The backend component must be run with two parameters: $PORT specifying the port number, on which the backend server listens to requests (e.g. 34567); $APP specifying the type of application: L1 to run Evald 1.0, L2 to run Evald 1.0 for Foreigners. Run the following command to run the backend server in background:
    docker run -d --expose $PORT -p $PORT:$PORT -t ufal/evald.treex-server:1.0 run_evald_server.pl --port=$PORT --app=$APP
  2. The previous command prints out the ID of the running backend container (represented by a $ID variable in the following). Another way to find out the ID is to run the following command and look for the ID of the running ufal/evald.treex-server:1.0 component:
    docker ps
  3. Find out the IP address of the running backend component (represented by a $IP_ADDR variable in the following, e.g., 172.17.0.2):
    docker inspect -f '{{ .NetworkSettings.IPAddress }}' $ID
    If you are about to execute the frontend component on a different machine in the same network, store the IP address of the machine where backend server is run to the variable $IP_ADDR.
  4. The frontend component requires one parameter to be set: the combination of the backend component's IP address and the port on which it listens. The combination must be in form "http://$IP_ADDR:$PORT" (e.g. http://172.17.0.2:34567). Execute the frontend server in background, which will be listening to requests on a standard HTTP port 80:
    docker run -d -p 80:80 ufal/evald.php-server:1.0 run_server.sh "http://$IP_ADDR:$PORT"
  5. Open your favorite web browser and visit the webpages http://localhost/index.php or http://localhost/index-foreign.php to start working with Evald 1.0 or Evald 1.0 for Foreigners, respectively.
How to stop the app:
  1. Execute the following command to find the IDs of the two running components:
    docker ps
  2. Use Docker to stop the running background processes:
    docker stop $BACKEND_ID
    docker stop $FRONTEND_ID
System requirements:
  • Linux with sudo privileges, Windows 10, Mac OS X
  • Docker
  • backend component:
    • 2.5 GB free on a hard drive, 4GB RAM
  • frontend component:
    • 500MB free on hard drive, 30MB RAM

3. EVALD in a batch mode

EVALD can be run in a batch mode to process just the selected documents stored as a plain text in the UTF-8 encoding and stop. There are two options how to install and use it in a batch mode: as a dockerized container, or directly as a Treex scenario.

Batch mode as a dockerized container

The dockerized backend component ufal/evald.treex-server:1.0 can be used also in a batch mode. The advantage of this approach is easy installation and the possibility to run the application on the operating systems other than Linux.

How to install the app:

Follow the instructions above to install the backend component ufal/evald.treex-server:1.0.

How to run the app:
  1. Create two directories - one for input documents, and the second one for the outputs. The absolute paths to the directories are represented by variables $FROM_DIR_ABS_PATH and $TO_DIR_ABS_PATH, respectively. Put your texts into the $FROM_DIR_ABS_PATH directory.
  2. To process all the texts inside the $FROM_DIR_ABS_PATH directory with the scenario specified by the $APP parameter (see above), run the following command:
    docker run -v $FROM_DIR_ABS_PATH:/from/ -v $TO_DIR_ABS_PATH:/to/ -t ufal/evald.treex-server:1.0 run_evald.sh -f /from -t /to -a $APP
    The parameter -v directory_outside_Docker:directory_inside_Docker serves to map in a runtime a directory outside the Docker container to a specified directory inside the container, which makes all the content of the directory visible for the container. However, additional steps must be done for this feature to function in Docker for Windows. See the instructions here.
System requirements

Requirements are the same as for the backend component in the previous case.

Batch mode as a Treex scenario

In this approach the main processing tool Treex is installed without being wrapped in a Docker container. On the one hand, it allows the user to gain better control over the processing pipeline, on the other hand, installation of Treex and supporting tools is a rather complex task that is not recommended for inexperienced users. Furthermore, Treex in the version used in EVALD is not guaranteed to work out directly on the operating systems other than Linux.

How to install the app:

The instructions described here are very rough. Please contact us on evald@ufal.mff.cuni.cz to get the details and help you with the installation.

Treex (http://ufal.cz/treex) needs to be installed on the local machine, along with all (mostly CPAN) dependencies for the Czech text analysis. Treex must be in the revision 8785eee60754ce914818aa8ca0b40ef5c8ebe6bd, cloned from https://github.com/ufal/treex. In addition, Vowpal Wabbit 8.1.1 (https://github.com/JohnLangford/vowpal_wabbit/releases/tag/8.1.1) must be installed to the location installed_tools/ml/vowpal_wabbit-v8.1-3cf3f692/ relative to the Treex Share directory. The Treex scenario to be run is a part of the LINDAT/CLARIN EVALD 1.0 distribution (http://hdl.handle.net/11234/1-1820, file Evald-1.0.scen), or the LINDAT/CLARIN EVALD 1.0 for Foreigners distribution (http://hdl.handle.net/11234/1-1821, file Evald-1.0-Foreign.scen), respectively.

How to run the app:

Replace $FILE with the actual file name and run one of the following commands (for texts written by native or non-native speakers, respectively):

treex -Lcs from=$FILE Read::Text Evald-1.0.scen
treex -Lcs from=$FILE Read::Text Evald-1.0-Foreign.scen

The evaluation of the text is printed at the end of the run of the Treex scenario, along with a probability of the predicted result, e.g.:

The predicted class for the given text is '1', with probability '0.69'.
System requirements:

This version of the application can be installed and run only on Linux. On the other hand, there is no need for sudo privileges and for Docker to be installed. The hardware requirements correspond to those listed above for the backend component.

Technical Support

If you have questions or need technical support, please contact evald@ufal.mff.cuni.cz.

How to cite EVALD 1.0 or EVALD 1.0 for Foreigners

  • Rysová Kateřina, Mírovský Jiří, Novák Michal, Rysová Magdaléna: EVALD 1.0. Data/software, LINDAT/CLARIN digital library, Prague, Czech Republic, http://hdl.handle.net/11234/1-1820, Nov 2016.
  • Rysová Kateřina, Mírovský Jiří, Novák Michal, Rysová Magdaléna: EVALD 1.0 for Foreigners. Data/software, LINDAT/CLARIN digital library, Prague, Czech Republic, http://hdl.handle.net/11234/1-1821, Nov 2016.

There is also a paper describing the related research and experiments:

  • Rysová Kateřina, Rysová Magdaléna, Mírovský Jiří: Automatic Evaluation of Surface Coherence in L2 Texts in Czech. In: Proceedings of the 28th Conference on Computational Linguistics and Speech Processing ROCLING XXVIII (2016), The Association for Computational Linguistics and Chinese Language Processing (ACLCLP), Taipei, Taiwan, ISBN 978-957-30792-9-3, pp. 214-228, 2016. WWW: http://aclweb.org/anthology/O/O16/O16-1021.pdf

Acknowledgment

EVALD 1.0 and EVALD 1.0 for Foreigners were developed at the Institute of Formal and Applied Linguistics (ÚFAL, http://ufal.mff.cuni.cz/), Faculty of Mathematics and Physics, Charles University, with the financial support of the Ministry of Culture of the Czech Republic, project Automatic Evaluation of Text Coherence in Czech (DG16P02B016, http://ufal.mff.cuni.cz/grants/evald-evaluator-discourse).