EVALD 4.0, EVALD 4.0 for Foreigners serve for automatic evaluation of surface coherence (cohesion) in Czech texts written by native speakers of Czech and non-native speakers of Czech, respectively. EVALD 4.0 for Beginners aims at overall evaluation of texts written by non-native speakers of Czech – beginners.
The evaluation part (the backend server) is implemented in Treex (http://ufal.cz/treex), a highly modular NLP framework written in the Perl programming language, and uses the Weka toolkit (http://www.cs.waikato.ac.nz/ml/weka/) for the final prediction of a coherence mark. It can be used directly from a command line or as a backend for a client. The frontend part is implemented as a web server, accessible with a web browser.
There are three possible ways of using any of the three versions of EVALD 4.0:
interactively as a web demo and RESTful web service hosted at the LINDAT/CLARIN server,
interactively but locally, with both the server and the client running on the same machine (or two machines in the same network),
in a batch mode run on the local machine.
No installation is needed in this case; in a web browser (such as Firefox or Chrome), go to https://lindat.mff.cuni.cz/services/evald/ for EVALD 4.0, https://lindat.mff.cuni.cz/services/evald-foreign/ for EVALD 4.0 for Foreigners, or to https://lindat.mff.cuni.cz/services/evald-begin/ for EVALD 4.0 for Beginners.
Write or copy the text to be tested in the textfield and click on the button "Evaluate!". The evaluation takes 10–20 seconds, then the result is displayed, e.g.:
Both the backend and the frontend components are distributed via the Docker software (https://www.docker.com/), which needs to be installed first. Docker greatly simplifies the installation process of the two components and allows them to be run on Linux-based operating systems, Windows 10, as well as Mac OS X.
docker pull ufal/evald.treex-server:4.0
docker pull ufal/evald.php-server:4.0
--port=$PORT
specifying the port number, on which the backend server listens to requests (e.g. 34567
). The other optional parameter is --app=$APP
specifying the type of application to run: L1
or native
to run Evald 4.0, L2
or foreign
to run Evald 4.0 for Foreigners, and L2b
or begin
to run Evald 4.0 for Beginners. If this optional parameter is omitted, all three applications are run and it is decided in runtime which of them should be used. Run the following command to run the backend server in background:
docker run -d --expose $PORT -p $PORT:$PORT -t ufal/evald.treex-server:4.0 run_evald_server.pl --port=$PORT [--app=$APP]
$ID
variable in the following). Another way to find out the ID is to run the following command and look for the ID of the running ufal/evald.treex-server:4.0 component:
docker ps
$IP_ADDR
variable in the following, e.g., 172.17.0.2
):
docker inspect -f '{{ .NetworkSettings.IPAddress }}' $IDIf you are about to execute the frontend component on a different machine in the same network, store the IP address of the machine where backend server is run to the variable
$IP_ADDR
.
--L1=$L1_URL
specifying the URL where a running backend component to process L1 queries is available, and analogously --L2=$L2_URL
for a backend component to process L2 queries, or --L2b=$L2b_URL
for a backend component to process L2b queries. URLs may combine the backend component's IP address, the port on which it listens, and more specifying path. The combination must be in form "http://$IP_ADDR
:$PORT
/$PATH
" (e.g. http://172.17.0.2:34567/native
). Note that if you are running a backend component without the --app=$APP
parameter, the L1 application is accessible using $PATH=native
, L2 using $PATH=foreign
and L2b using $PATH=begin
. Execute the frontend server in background, which will be listening to requests on a standard HTTP port 80:
docker run -d -p 80:80 ufal/evald.php-server:4.0 run_server.sh [--L1=$L1_URL] [--L2=$L2_URL] [--L2b=$L2b_URL]
docker ps
docker stop $BACKEND_ID docker stop $FRONTEND_ID
EVALD can be run in a batch mode to process just the selected documents stored as a plain text in the UTF-8 encoding and stop. There are two options how to install and use it in a batch mode: as a dockerized container, or directly as a Treex scenario.
The dockerized backend component ufal/evald.treex-server:4.0 can be used also in a batch mode. The advantage of this approach is easy installation and the possibility to run the application on the operating systems other than Linux.
Follow the instructions above to install the backend component ufal/evald.treex-server:4.0.
$FROM_DIR_ABS_PATH
and $TO_DIR_ABS_PATH
, respectively. Put your texts into the $FROM_DIR_ABS_PATH
directory.
$FROM_DIR_ABS_PATH
directory with the scenario specified by the $APP
parameter (see above), run the following command:
docker run -v $FROM_DIR_ABS_PATH:/from/ -v $TO_DIR_ABS_PATH:/to/ -t ufal/evald.treex-server:4.0 run_evald.sh -f /from -t /to -a $APPThe parameter
-v directory_outside_Docker:directory_inside_Docker
serves to map in a runtime a directory outside the Docker container to a specified directory inside the container, which makes all the content of the directory visible for the container. However, additional steps must be done for this feature to function in Docker for Windows. See the instructions here.
Requirements are the same as for the backend component in the previous case.
In this approach the main processing tool Treex is installed without being wrapped in a Docker container. On the one hand, it allows the user to gain better control over the processing pipeline, on the other hand, installation of Treex and supporting tools is a rather complex task that is not recommended for inexperienced users. Furthermore, Treex in the version used in EVALD is not guaranteed to work out directly on the operating systems other than Linux.
The instructions described here are very rough. Please contact us on evald@ufal.mff.cuni.cz to get the details and help you with the installation.
Treex (http://ufal.cz/treex) needs to be installed on the local machine, along with all (mostly CPAN) dependencies for the Czech text analysis. Treex must be in the revision tagged as EVALD_4.0 (https://github.com/ufal/treex/releases/tag/EVALD_4.0). In addition, Vowpal Wabbit 8.1.1 (https://github.com/JohnLangford/vowpal_wabbit/releases/tag/8.1.1) must be installed to the location installed_tools/ml/vowpal_wabbit-v8.1-3cf3f692/ relative to the Treex Share directory. The Treex scenario to be run is a part of the LINDAT/CLARIN EVALD 4.0 distribution (http://hdl.handle.net/11234/1-3065, file Evald-4.0.scen), the LINDAT/CLARIN EVALD 4.0 for Foreigners distribution (http://hdl.handle.net/11234/1-3066, file Evald-4.0-Foreign.scen), or the LINDAT/CLARIN EVALD 4.0 for Beginners distribution (http://hdl.handle.net/11234/1-3067, file Evald-4.0-Begin.scen).
Replace $FILE
with the actual file name and run one of the following commands (for texts written by native or non-native speakers, respectively):
treex -Lcs from=$FILE Read::Text Evald-4.0.scen
treex -Lcs from=$FILE Read::Text Evald-4.0-Foreign.scen
treex -Lcs from=$FILE Read::Text Evald-4.0-Begin.scen
The evaluation of the text (the overall score and also evaluation for individual sets of features) is printed during the run of the Treex scenario, along with probabilities of the predicted results; the first mark (unnamed feature set) represents the overall evaluation, the subsequent marks reflect individual qualities of the text, e.g.:
- feature set: class: B1 probability: 0.67
- feature set: spelling class: B1 probability: 0.516
- feature set: morphology class: B1 probability: 0.79
- feature set: vocabulary class: C2 probability: 0.65
- feature set: syntax class: B1 probability: 0.69
- feature set: connectives_quantity class: B1 probability: 0.62
- feature set: connectives_diversity class: B2 probability: 0.66
- feature set: coreference class: B1 probability: 0.58
- feature set: tfa class: B2 probability: 0.66
- feature set: readability class: B1 probability: 0.69
This version of the application (i.e., as a Treex scenario) can be installed and run only on Linux. On the other hand, there is no need for sudo privileges and for Docker to be installed. The hardware requirements correspond to those listed above for the backend component.
If you have questions or need technical support, please contact evald@ufal.mff.cuni.cz.
There are also papers describing the related research and experiments:
Software applications EVALD 4.0, EVALD 4.0 for Foreigners and EVALD 4.0 for Beginners were developed in the years 2016–2019 at the Institute of Formal and Applied Linguistics (ÚFAL, http://ufal.mff.cuni.cz/), Faculty of Mathematics and Physics, Charles University, with the financial support of the Ministry of Culture of the Czech Republic, project Automatic Evaluation of Text Coherence in Czech (DG16P02B016, http://ufal.mff.cuni.cz/grants/evald-evaluator-discourse).