Archive: Lab sessions in 2020

SU1, Friday 12:20 p.m.

Jiří Mírovský
mirovsky at ufal.mff.cuni.cz
room 422

Daniel Zeman
zeman at ufal.mff.cuni.cz
room 409

Outline

Homeworks

Results of the homeworks (click here)


Class 09 – May 15, 2020

The last topic for this course is searching in treebanks, again in the form of your individual study based on the following instructions. This is the last practical class this year and you will get the last homework. Feel free to contact me in the subsequent weeks with any troubles/questions.

We will search treebanks using the Prague Markup Language Tree Query (PML-TQ), which is a powerful search engine for any treebank encoded in the Prague Markup Language (PML). Please notice the power of the framework - once a treebank is encoded in the PML, you can open/browse/edit it using TrEd, process it automatically using btred, and search in it using the PML-TQ.

The PML-TQ is a client-server system. You have two options which client to use and it is up to you which of them you prefer (meaning: the homework can be done in any of them):

  1. Web interface - If you choose to use the web interface, you will not need to install anything and you will get instant access to treebanks, to most of them without an account; examples from the tutorial are linked directly to the web interface, so you can try them easily; BUT: you will be able to create queries only in a textual form (not graphically) and the results will be displayed in a uniform and non-variable form; the web interface also does not support searching in local files (which you however do not need now).
  2. TrEd interface - If you choose to use the TrEd interface, you will need to install the PML-TQ extension first, which can pass flawlessly or there may be problems with the installation and you will have to perform some manual installation steps; also connecting to the server requires some authentication steps; BUT: you will be able to create queries in a graphical environment, the queries will be represented in a graphical form and the results will be displayed in the full form/variability that TrEd and the particular TrEd extension for the given treebank offer. TrEd also supports searching in local files (but you do not need it now).

If you have time and plan to work with treebanks in the future, I would recommend to try option 2 and if you get into too many troubles with the installation, revert to option 1. However, if you want to be done with the topic of searching in treebanks minimalistically and as quickly as possible, you can choose option 1 right away. Please refer to a web page listing the two clients for some connection/installation instructions. You will also get some info (login name etc.) in an e-mail.

There are two tutorials to the PML-TQ, one focused on the web interface, one on the TrEd interface. Please follow one of them based on the client interface you prefer:

Later, you may want to consult other documentation sources, see the PML-TQ documentation page.

Homework 05 (due on May 27th) - if you finish and wish to get the marks sooner, write me an e-mail.


Class 08 – April 24, 2020

Online interactive Zoom session, continuing work with Universal Dependencies in Udapi. The PDF from last week has been expanded with new sections.

Homework 04 (due by May 13, 2020) is specified at the end of the PDF tutorial mentioned above.



Class 07 – April 17, 2020

Individual work with Universal Dependencies in Tred and with Udapi. For instructions, see this PDF. The instructions include several exercises. While you are encouraged to do them all, they are not mandatory and they do not constitute official homework.



Class 06 - April 3rd, 2020

Continuing individual study based on the instructions given below in class 05.

Further notes about using Perl in btred (useful for homework 03)

Homework 03 (due on April 15th)


Class 05 - March 27th, 2020

Individual study based on the following instructions.

Our task for today and for the next class is to learn to use btred, a scripting tool for working with TrEd data. There will be no homework today, you still have time to do your previous homework (homework 2). A new homework (homework 3) will be given next week. The following work instructions apply for two classes (today and the next week), so you can plan to do it according to your needs.

Let me start with a few clarifying statements, which you may already know:

Your task

Please, follow the btred tutorial, which will take you through first steps of working with btred. Our plan is to cover steps 1-7 of the tutorial. I suggest that you split the work in this way: steps 1-4 this week, steps 5-7 next week. But, of course, if you find the first four steps simple enough, you can sooner proceed further. For the tutorial, you can use any PDT-like data, e.g. the PDT data from the class 02. For examples that use the tectogrammatical layer (t-layer), use this data from the PDT, which contain also the t-files.

As Perl may be new to you a btred most certainly is, let me give you a few hints to make your work with Perl and btred easier. You will see in the tutorial that btred scripts may start with three different lines:

A simple script for counting nodes in each given file and printing the number next to each file name might look like this:

#!btred -T -e count_nodes()

sub count_nodes {
    my @nodes = GetNodes($root);  # get all nodes in the tree
    my $number_of_nodes = scalar(@nodes);  # get the length of the array
    my $filename = FileName();
    print "$filename: $number_of_nodes\n";
}

If the script is named count.btred, it can be run on all gzipped a-files in the current directory from a terminal with the following command:

btred -I count.btred *.a.gz

I suggest that after the first line in any btred (or Perl in general) script, you add the following Perl instructions:

use strict;  # it informs e.g. about non-declared variables (often typos)
use warnings;  # it warns e.g. if an uninitialized variable is used (e.g. in addition or concatenation)
use utf8;  # it allows utf8 in the script source code
binmode STDIN, ':utf8';  # setting utf8 for STDIN
binmode STDOUT, ':utf8';  # dtto for STDOUT
binmode STDERR, ':utf8';  # dtto for STDERR

Manuals and documentation

For writing btred scripts generally and for a particular treebank (say, a PDT-like treebank), there are three main sources of information:

Exercise after 4 steps of the tutorial

After you finish the first four steps of the tutorial, you may practice the aquired knowledge, if you want, on the following tasks (you may not need to actually write the scripts; just thinking the tasks through may suffice):

Exercise after 7 steps of the tutorial


Class 04 - March 20th, 2020

Individual study based on the homework. Please contact me (Jiří Mírovský) with any questions.
Homework 02 (due on April 1st)


Class 03 - March 13th, 2020

Cancelled.


Class 02 - March 6th, 2020

Section 000 of the WSJ part of the Penn Treebank in the original merged file format.
PDT data for the class (it is a part of PDT w-, m- and a-files)
PEDT data for the class (it is a part of PEDT, namely a-files from sections 00* (with w- and m- info merged in))
Documentation for the m-layer
PDT 3.5
Demo of a Czech and English morphological analyzer and tagger


Class 01 - February 28th, 2020

English data to test the TrEd installation: section 000 of PEDT

Sample phrase structure tree (file)

S (
  NP ( N ( 'Peter' ) )
  * VP ( * V ( 'gave' )
         NP ( D   ( 'a' )
              * N ( 'flower' ) )
         PP ( * P ( 'to' )
              N   ( 'Mary' ) )
       )
)

Another sample phrase structure tree (file)

S (
  NP ( A ('Young') * N ('men'))
  * VP (* V ( 'love' )
        COORD (NP ( N('beer'))
               * CONJ ('and')
               NP ( N( 'girls' ) )
        ))
)

Homework 01 (due on March 11th)


Class 00 - February 21, 2020

Installation of tree editor TrEd on computers in the lab (installation script 'install_tred.bash' from the TrEd home page)

  1. Setting up cpan (so that it uses local directories; http://www.perlmonks.org/?node_id=630026):
  2. Installing cpanm for easier installation of other Perl modules
  3. Installing Tred


lindat.cz - search for Prague Dependency Treebank 2.0 - sample data

Configuration file .tredrc - customize fonts: font section in Tred documentation

Tred & svn: file pmlbackend_conf.xml

An example file from the Penn Treebank transformed to the PML - to test the TrEd installation; an extension for the Penn Treebank files (ptb) needs to be installed first...