EVALD 3.0 and EVALD 3.0 for Foreigners serve for automatic evaluation of surface coherence (cohesion) in Czech texts written by native speakers of Czech, or non-native speakers of Czech, respectively.
The evaluation part (the backend server) is implemented in Treex (http://ufal.cz/treex), a highly modular NLP framework written in the Perl programming language, and uses the Weka toolkit (http://www.cs.waikato.ac.nz/ml/weka/) for the final prediction of a coherence mark. It can be used directly from a command line or as a backend for a client. The frontend part is implemented as a web server, accessible with a web browser.
There are three possible ways of using EVALD 3.0 and EVALD 3.0 for Foreigners:
interactively as a web demo and RESTful web service hosted at the LINDAT/CLARIN server (available for EVALD 4.0 and EVALD 4.0 for Foreigners only!),
interactively but locally, with both the server and the client running on the same machine (or two machines in the same network),
in a batch mode run on the local machine.
No installation is needed in this case; in a web browser (such as Firefox or Chrome), go to https://lindat.mff.cuni.cz/services/evald/ for EVALD 3.0, or to https://lindat.mff.cuni.cz/services/evald-foreign/ for EVALD 3.0 for Foreigners.
Write or copy the text to be tested in the textfield and click on the button "Evaluate!". The evaluation takes approx. ten seconds, then the result is displayed, e.g.:
Both the backend and the frontend components are distributed via the Docker software (https://www.docker.com/), which needs to be installed first. Docker greatly simplifies the installation process of the two components and allows them to be run on Linux-based operating systems, Windows 10, as well as Mac OS X.
docker pull ufal/evald.treex-server:3.0
docker pull ufal/evald.php-server:3.0
--port=$PORT
specifying the port number, on which the backend server listens to requests (e.g. 34567
). The other optional parameter is --app=$APP
specifying the type of application to run: L1
or native
to run Evald 3.0, L2
or foreign
to run Evald 3.0 for Foreigners. If this optional parameter is omitted, both applications are run and it is decided in runtime which one of them should be used. Run the following command to run the backend server in background:
docker run -d --expose $PORT -p $PORT:$PORT -t ufal/evald.treex-server:3.0 run_evald_server.pl --port=$PORT [--app=$APP]
$ID
variable in the following). Another way to find out the ID is to run the following command and look for the ID of the running ufal/evald.treex-server:3.0 component:
docker ps
$IP_ADDR
variable in the following, e.g., 172.17.0.2
):
docker inspect -f '{{ .NetworkSettings.IPAddress }}' $IDIf you are about to execute the frontend component on a different machine in the same network, store the IP address of the machine where backend server is run to the variable
$IP_ADDR
.
--L1=$L1_URL
specifying the URL where a running backend component to process L1 queries is available, and analogously --L2=$L2_URL
for a backend component to process L2 queries. URLs may combine the backend component's IP address, the port on which it listens, and more specifying path. The combination must be in form "http://$IP_ADDR
:$PORT
/$PATH
" (e.g. http://172.17.0.2:34567/native
). Note that if you are running a backend component without the --app=$APP
parameter, the L1 application is accessible using $PATH=native
whereas L2 using $PATH=foreign
. Execute the frontend server in background, which will be listening to requests on a standard HTTP port 80:
docker run -d -p 80:80 ufal/evald.php-server:3.0 run_server.sh [--L1=$L1_URL] [--L2=$L2_URL]
docker ps
docker stop $BACKEND_ID docker stop $FRONTEND_ID
EVALD can be run in a batch mode to process just the selected documents stored as a plain text in the UTF-8 encoding and stop. There are two options how to install and use it in a batch mode: as a dockerized container, or directly as a Treex scenario.
The dockerized backend component ufal/evald.treex-server:3.0 can be used also in a batch mode. The advantage of this approach is easy installation and the possibility to run the application on the operating systems other than Linux.
Follow the instructions above to install the backend component ufal/evald.treex-server:3.0.
$FROM_DIR_ABS_PATH
and $TO_DIR_ABS_PATH
, respectively. Put your texts into the $FROM_DIR_ABS_PATH
directory.
$FROM_DIR_ABS_PATH
directory with the scenario specified by the $APP
parameter (see above), run the following command:
docker run -v $FROM_DIR_ABS_PATH:/from/ -v $TO_DIR_ABS_PATH:/to/ -t ufal/evald.treex-server:3.0 run_evald.sh -f /from -t /to -a $APPThe parameter
-v directory_outside_Docker:directory_inside_Docker
serves to map in a runtime a directory outside the Docker container to a specified directory inside the container, which makes all the content of the directory visible for the container. However, additional steps must be done for this feature to function in Docker for Windows. See the instructions here.
Requirements are the same as for the backend component in the previous case.
In this approach the main processing tool Treex is installed without being wrapped in a Docker container. On the one hand, it allows the user to gain better control over the processing pipeline, on the other hand, installation of Treex and supporting tools is a rather complex task that is not recommended for inexperienced users. Furthermore, Treex in the version used in EVALD is not guaranteed to work out directly on the operating systems other than Linux.
The instructions described here are very rough. Please contact us on evald@ufal.mff.cuni.cz to get the details and help you with the installation.
Treex (http://ufal.cz/treex) needs to be installed on the local machine, along with all (mostly CPAN) dependencies for the Czech text analysis. Treex must be in the revision tagged as EVALD_3.0 (https://github.com/ufal/treex/releases/tag/EVALD_3.0). In addition, Vowpal Wabbit 8.1.1 (https://github.com/JohnLangford/vowpal_wabbit/releases/tag/8.1.1) must be installed to the location installed_tools/ml/vowpal_wabbit-v8.1-3cf3f692/ relative to the Treex Share directory. The Treex scenario to be run is a part of the LINDAT/CLARIN EVALD 3.0 distribution (http://hdl.handle.net/11234/1-2863, file Evald-3.0.scen), or the LINDAT/CLARIN EVALD 3.0 for Foreigners distribution (http://hdl.handle.net/11234/1-2864, file Evald-3.0-Foreign.scen), respectively.
Replace $FILE
with the actual file name and run one of the following commands (for texts written by native or non-native speakers, respectively):
treex -Lcs from=$FILE Read::Text Evald-3.0.scen
treex -Lcs from=$FILE Read::Text Evald-3.0-Foreign.scen
The evaluation of the text (the overall score and also evaluation for individual sets of features) is printed during the run of the Treex scenario, along with probabilities of the predicted results; the first mark (unnamed feature set) represents the overall evaluation, the subsequent marks reflect individual qualities of the text, e.g.:
- feature set: class: B1 probability: 0.67
- feature set: +spell class: B1 probability: 0.516
- feature set: +morph class: B1 probability: 0.79
- feature set: +vocab class: C2 probability: 0.65
- feature set: +syntax class: B1 probability: 0.69
- feature set: +conn_qua class: B1 probability: 0.62
- feature set: +conn_div class: B2 probability: 0.66
- feature set: +pron,+coref class: B1 probability: 0.58
- feature set: +tfa class: B2 probability: 0.66
This version of the application can be installed and run only on Linux. On the other hand, there is no need for sudo privileges and for Docker to be installed. The hardware requirements correspond to those listed above for the backend component.
If you have questions or need technical support, please contact evald@ufal.mff.cuni.cz.
There are also papers describing the related research and experiments:
EVALD 3.0 and EVALD 3.0 for Foreigners were developed at the Institute of Formal and Applied Linguistics (ÚFAL, http://ufal.mff.cuni.cz/), Faculty of Mathematics and Physics, Charles University, with the financial support of the Ministry of Culture of the Czech Republic, project Automatic Evaluation of Text Coherence in Czech (DG16P02B016, http://ufal.mff.cuni.cz/grants/evald-evaluator-discourse).