Lab sessions

Every other Thursday (more or less), 9.00 a.m. in SW1

Jiří Mírovský
mirovsky at ufal.mff.cuni.cz
room 422

Daniel Zeman
zeman at ufal.mff.cuni.cz
room 409

(Archive of the practical classes from 2020)

Outline

  • various formats for phrase-structure and dependency trees, transformations (Perl or another programming language)
  • mining information from the word layer and morphological layer of the Prague Dependency Treebank or the Prague English Dependency Treebank (bash, Perl or another programming language)
  • mining information from all layers of the PDT/PEDT (btred, Perl)
  • mining information from UD data (btred, Perl)
  • searching in treebanks with PML-Tree Query (PML-TQ)

Homeworks

Results of the homeworks (click here)

  • All homeworks must be committed into the https://svn.ms.mff.cuni.cz/svn/undergrads/students svn repository; do not send your homeworks by e-mail. Ask for an svn account if you do not have one in this repository yet.
  • Submit your work into your personal directories in the svn repository. There is an explicit deadline for submitting each homework - usually till second Sunday after it was assigned (so it is about 10 days to finish the homework).
  • If the deadline is not met, ask for additional homework. All homeworks must be submitted in order to get the credit (zápočet). You can solve an additional homework even if you submitted the normal homework in time, i.e., you can improve your average by solving some of the additional homeworks. (You have to e-mail us to confirm your additional homework is ready to be rated. All additional homeworks must be submitted at least one week before the credit.)

Class 00 - February 22nd, 2024

(slides from the class)

The first task for today: Installation of tree editor TrEd on computers in the lab or at your personal computers from the TrEd home page.

On MS Windows, use the installation package containing also the Strawberry Perl distribution.

On Linux, follow these instructions:

  1. Setting up cpan (so that it uses local directories; http://www.perlmonks.org/?node_id=630026):
    • mkdir -p ~/.cpan/CPAN
      echo "1" >~/.cpan/CPAN/MyConfig.pm
    • perl -MCPAN -e shell
      cpan> o conf init # (use local::lib and at the end, allow setting (or set manually) some variables in .bashrc file)
    • exit cpan, exit and start bash
  2. Installing cpanm for easier installation of other Perl modules
    • cpan App::cpanminus
  3. Installing Tred

The second task for today: Test the installation, setup TrEd (extensions, fonts)

After we have installed TrEd, let us try it - download the following data:
lindat.cz - go to "Repository" and search for Prague Dependency Treebank 2.0 - sample data

After you unzip the data, try to open one of the .t.gz files. You should get an error message complaining about missing schemas. It is because you also need to install a TrEd extension for the given type of data:

In TrEd, go to Setup -> Manage Extensions -> Get New Extensions and search for pdt20. Check it and press "Install Selected". Now close and start TrEd again. It should be able to open the data now.

You can customize TrEd in the configuration file .tredrc - customize fonts: font section in Tred documentation

TrEd can handle all types of treebanks - try, e.g., an example from the Penn Treebank:

An example file from the Penn Treebank transformed to the PML - to test the TrEd installation; an extension for the Penn Treebank files (ptb) needs to be installed first...

English data to test the TrEd installation: section 000 of PEDT

  • Download and unzip the data. It is section 000 of the Penn Treebank transformed to the format of the Prague treebank family; in this particular case, one document is represented by three files corresponding to surface syntax - analytical layer (a-files), deep syntax - tectogrammatical layer (t-files), and original phrase structure layer (p-files). Then try to open one of the t-files in TrEd (you will need to install extension pedt), you can also open an a-file and a p-file.

The third task for today: Transform phrase-structure trees to dependency ones

Sample phrase structure tree (file)

S (
  NP ( N ( 'Peter' ) )
  * VP ( * V ( 'gave' )
         NP ( D   ( 'a' )
              * N ( 'flower' ) )
         PP ( * P ( 'to' )
              N   ( 'Mary' ) )
       )
)

Another sample phrase structure tree (file)

S (
  NP ( A ('Young') * N ('men'))
  * VP (* V ( 'love' )
        COORD (NP ( N('beer'))
               * CONJ ('and')
               NP ( N( 'girls' ) )
        ))
)