Lab sessions

Every other Thursday (more or less), 10:40 a.m. in SW1

Jiří Mírovský
mirovsky at ufal.mff.cuni.cz
room 422

Daniel Zeman
zeman at ufal.mff.cuni.cz
room 409

Outline

  • various formats for phrase-structure and dependency trees, transformations (Perl or another programming language)
  • mining information from the word layer and morphological layer of the Prague Dependency Treebank or the Prague English Dependency Treebank (bash, Perl or another programming language)
  • mining information from all layers of the PDT/PEDT (btred, Perl)
  • mining information from UD data (btred, Perl)
  • searching in treebanks with PML-Tree Query (PML-TQ)

Homeworks

Results of the homeworks (click here)


Class 00 - February 19th, 2026

(slides from the class)

The first task for today: Installation of tree editor TrEd on computers in the lab or at your personal computers from the TrEd home page.

On MS Windows, use the installation package containing also the Strawberry Perl distribution.

On Linux, follow these instructions:

  1. Setting up cpan (so that it uses local directories; http://www.perlmonks.org/?node_id=630026):
    • mkdir -p ~/.cpan/CPAN
      echo "1" >~/.cpan/CPAN/MyConfig.pm
    • perl -MCPAN -e shell
      cpan> o conf init # (use local::lib and at the end, allow setting (or set manually) some variables in .bashrc file)
    • exit cpan, exit and start bash
  2. Installing cpanm for easier installation of other Perl modules
    • cpan App::cpanminus
  3. Installing Tred

The second task for today: Test the installation, setup TrEd (extensions, fonts)

After we have installed TrEd, let us try it - download the following data:
lindat.cz - go to "Repository" and search for Prague Dependency Treebank 2.0 - sample data

After you unzip the data, try to open one of the .t.gz files. You should get an error message complaining about missing schemas. It is because you also need to install a TrEd extension for the given type of data:

In TrEd, go to Setup -> Manage Extensions -> Get New Extensions and search for pdt20. Check it and press "Install Selected". Now close and start TrEd again. It should be able to open the data now.

You can customize TrEd in the configuration file .tredrc - customize fonts: font section in Tred documentation

TrEd can handle all types of treebanks - try, e.g., an example from the Penn Treebank:

An example file from the Penn Treebank transformed to the PML - to test the TrEd installation; an extension for the Penn Treebank files (ptb) needs to be installed first...

English data to test the TrEd installation: section 000 of PEDT

  • Download and unzip the data. It is section 000 of the Penn Treebank transformed to the format of the Prague treebank family; in this particular case, one document is represented by three files corresponding to surface syntax - analytical layer (a-files), deep syntax - tectogrammatical layer (t-files), and original phrase structure layer (p-files). Then try to open one of the t-files in TrEd (you will need to install extension pedt), you can also open an a-file and a p-file.

The third task for today: Transform phrase-structure trees to dependency ones

Sample phrase structure tree (file)

S (
  NP ( N ( 'Peter' ) )
  * VP ( * V ( 'gave' )
         NP ( D   ( 'a' )
              * N ( 'flower' ) )
         PP ( * P ( 'to' )
              N   ( 'Mary' ) )
       )
)

Another sample phrase structure tree (file)

S (
  NP ( A ('Young') * N ('men'))
  * VP (* V ( 'love' )
        COORD (NP ( N('beer'))
               * CONJ ('and')
               NP ( N( 'girls' ) )
        ))
)