BTrEd/NTrEd Tutorial

Petr Pajas


Table of Contents

1. Preliminaries
2. What BTrEd and NTrEd are and why do I need them?
3. Getting started
4. Processing too slow? Use NTrEd!
4.1. Requirements
4.2. Getting started
5. Quick API reference

1. Preliminaries

Both btred and ntred are TrEd macro processors. TrEd macros are based on Perl, so you should be familiar with basics of Perl programming language. Most notably, you should know how to deal with Perl's basic data structures (scalars, arrays, hashes) and references. You should also know how to write perl subroutines. Basic knowledge of perl object oriented programming is also required (as a minimum, you should know the $object->method(arguments) syntax).

2. What BTrEd and NTrEd are and why do I need them?

You only need these programs if you work with tree structures in FS or other supported format and you want to process those structures programatically without (much) user interaction. btred can be used either directly from the command line or as a server which keeps the trees in memory and evaluates macros on clients requests. ntred is an interface between the client and one or more btred servers.

3. Getting started

In the most usual case, you run btred as follows:

btred -m my_macros
      -e my_function files ...

btred first reads your macros from the files my_macros, then starts opening given files one at a time and executes function my_function on each of them.

The file my_macros may for example look as the following one:


sub is_childless_verb {
  my ($node) = @_;
  if (not($node->firstson()) and $node->{tag}=~/^V/) {
    return 1;
  } else {
    return 0;
  }
}

sub find_childless_verbs {
  do {{
    while ($this) {
      if (is_childless_verb($this)) {
         print ThisAddress(),"\n";
      }
      $this=$this->following;
    } 
  }} while (NextTree());
}
      

To apply the above macros on a set of files, you simply type

btred -m my_macros -e find_childless_verbs *.fs

Let's see what happens in the code: find_childless_verbs iterates over all trees by calling NextTree() to move to the next tree. Within each tree it iterates over all nodes, using following method on current node $this. This method returns the following node in the recursive depth-first ordering. For each node it consults the is_childless_verb function declared above and if this function returns 1, prints a string of the form filename.fs#tree-no.node-position generated by a default macro function called ThisAddress(). Output can be passed to TrEd as a file-list and TrEd will open each of the files at the exact tree and node.

The code in is_childless_verb checks if the given node has no children (which in turn means, it has no first son, as node's children are all male in the terminology used) and is a verb according to the first letter of the attribute tag, containing required morphological information.

It is quite apparent that most of the code of find_childless_verbs is spent on iterating over all nodes of all trees. btred can do this for you, if you use use the options -T (to iterate over all trees) and -N to iterate over all nodes within each of processed trees. Command-line options which don't take arguments can be put together, so you can write the above two as -TN.

Because it is rather easy to forget to put options like -T or -N to the command-line, you can write them directly into the code as shown below. It is sometimes handy to include other options, such as -e find_childless_verbs indicating, which macro should be called on each iteration. The script then looks as follows:


#!btred -TN -e find_childless_verbs
sub is_childless_verb {
   my ($node) = @_;
   if (not($node->firstson()) and $node->{tag}=~/^V/) {
     return 1;
   } else {
     return 0;
   }
}

sub find_childless_verbs {
   if (is_childless_verb($this)) {
      print ThisAddress(),"\n";
   }
}
      

and can be executed simply as

$ btred -m my_macros *.fs

If your script is so simple that you don't bother openning an editor, you can put your code directly on the command line. Here is an example equivalent to the code of my_macros.

$ btred -TN -e 'print ThisAddress(),"\n" unless ($node->firstson() or $node->{tag}!~/^V/)' *.fs

4. Processing too slow? Use NTrEd!

Processing a large amount of data with many incremental scripts (or just incremental versions of one script) may take a very long time. Usually, most of the time is not spent on your script but on opening, parsing (and possibly saving) the data files. This is where NTrEd offers big improvement, because it 1) allows you to utilize computing power of more than one machine, 2) reads the files only once and keeps them in memory. As a result, depending on situation, it may shorten the time needed for one pass from several minutes to just a few seconds.

4.1. Requirements

NTrEd requires you to have password-free access to all machines you use. There are many ways to achive this, the most common of which are a) using ssh2 authorization keys together with a ssh-agent2, b) using Kerberos, or c) .rhosts files.

The remote machines must have shared filesystems. As a minimum, they should share your home directory and a directory containing your data.

4.2. Getting started

The hostnames of the machines you wish to use with NTrEd can either be specified on the command-line, but it is more comfortable to put them in a file called .ntred_serverlist contained in your home directory. This file should contain one or more lines of the form hostname:port. Empty lines are ignored, and lines starting with a hash sign # are treated as comments. You may specify one hostname several times, provided you use different ports. This is particularly useful to utilize the whole power of multi-processor systems. You may choose any port number above 1000, but you should try to avoid collisions with other services running on the remote machines, including btred instances run by other users.

You may always override the list by using the --servers.

Once you have .ntred_serverlist prepared and you have configured your system so that you can log in to all listed servers via ssh without typing a password, you can try to start the servers and load some data:

$ ntred -i *.fs

If all went well, you see the ntred hub distributing your data (*.fs) among btred servers. If you use some other tool than ssh to log into the servers, you can specify it using --ssh option, for example:

$ ntred -i --ssh /usr/bin/rsh *.fs

Once all data files are distributed among the servers, you'll see the following line:

NTRED-HUB: Waiting for a new client request.

Now, open a new console (xterm, or whatever) and run the following:

$ ntred --list-files

On standard output, you'll see a list of open files, printed by each btred server. Note that the order of the files may be quite random. Standard error oputput contains various messages showing e.g. which server is being communicated by the hub etc. It's now time to run some more interesting code on the servers. We may start with the one already crafted for btred above:

$ ntred -TN -e 'print ThisAddress(),"\n" unless ($node->firstson() or $node->{tag}!~/^V/)'

You may see that this is almost identical to the btred example above, except that this time we don't have to specify any filenames, since we already have files loaded on the server (an of course, this time it will be significantly faster, esp. in case of large amount of data files).

Usually, btred scripts can be reused with ntred without changed. You only have to remember that:

  • each btred server processes it's own files without communicating with the others, so if for example, your script does some statistics and prints it at the end, you'll get as many results as there are servers, so you'll have to collect the output and merge the results somehow to obtain the overall statistics for the whole data.

  • Your scripts may modify the data on the servers. The servers can remember what files were changed by your scrips but you have to explicitly tell in your macros if you're making changes (by calling ChangingFile()). You can list the changed files with ntred --list-changed-files. The changes are kept in the memory only, unless you explicitly say the servers to save them (using ntred --save-files - saves all - or ntred --save-changed-files).

  • You can open the data from the server's memory (with all changes) for inspection or manual processing in TrEd. It only suffices to give the full path name to a file and precede it with ntred:// protocol prefix. You may optionally specify tree number after @ sign. You can also specify node to be made active in TrEd. For example, to see node with recursive ordering 22 in the 10th tree in the file /my/data/foo.fs as it is stored in the servers' memory, you would issue

    $ tred ntred:///my/data/foo.fs#1.22@10

    For your convenience, there is a predefined macro Position($node) which prints the URI of the above form for the given node (or for $this if no node is specified). Here is a nice example of the power of Unix redirection and this toolkit: it shows nodes selected by our old good macro in TrEd:

    $ ntred -TN -e 'Position unless ($node->firstson() or $node->{tag}!~/^V/) | tred -l-

  • Most ntred commands can be used with the option -L which allows you to specify files (or a list of files) to apply the commands to. The files must be already open by the servers. If apropriate, you may also specify individual trees or even nodes using the syntax shown above.

5. Quick API reference

Here is a brief list of the macro API. More complete (yet still incomplete) information can be found in TrEd's User Manual. This list doesn't include contributed extensions for various specific projects, such as PDT.

$this

This variable contains the currently processed node.

$root

This variable contains the root of the currently processed tree.

$node->{attribute}

Get the value of node's attribute named attribute.

$node->{attribute} = "value"

Set the value of node's attribute named attribute to "value"

$node->children()

Returns a list of node's child nodes.

$node->parent()

Returns node's parent node.

$node->lbrother()

Returns node's left sibling node.

$node->rbrother()

Returns node's right sibling node.

$node->descendants()

Returns all nodes in the node's subtree not including the node itself.

$node->following()

Returns a node following a given node in the depth-first recursive ordering.

$node->previous()

Returns a node preceding a given node in the depth-first recursive ordering.

FS()

Returns current FSFormat object. It is an object containing information of currently defined attributes and possibly their special semantic (e.g. which attribute is responsible for the topological left-to-right ordering of the tree, which attribute is used to mark nodes as hidden, etc).

$node->visible_children(FS())

Return all visible child nodes (i.e. not marked as hidden) of the given node.

$node->visible_descendants(FS())

Return all visible child nodes (i.e. not marked as hidden) of the given node.

NextTree()

Setup $root and $this to the root of the next tree in the current file to be processed. Returns undef if all trees in the current file has already been processed.

ChangingFile()

Call this macro to let btred know, you're making changes.

CutPaste($node_to_cut,$target_to_paste)

Cuts given node and pastes it on the target node so it becomes it's child. It dies if the target node is in the subtree of the source node (since in that case cut/paste operations makes no sense).

Cut($node)

Disconnects a given node from it's paren and siblings.

PasteNode($node,$new_parent)

Make a given node a new child of a given parent node.

CloneSubtree($node)

Return a new identical copy of the current sub-tree.

$node = FSNode->new()

Create a new node.

IsHidden($node)

Return true if a given node is marked as hidden.