Table of Contents
Both btred
and ntred
are TrEd
macro processors. TrEd macros are based on Perl, so you should be familiar
with basics of Perl programming language. Most notably, you should know
how to deal with Perl's basic data structures (scalars, arrays, hashes)
and references. You should also know how to write perl subroutines. Basic
knowledge of perl object oriented programming is also required (as a
minimum, you should know the
$object->method(arguments)
syntax).
You might find these programs useful if you need to automatically process a larger amount of files you normally edit in TrEd. By processing, we mean programatically editing the data searching in the them, and so on, without user interaction.
btred
provides a feature-rich environment for
all these types of tasks. It can be used either directly from the
command line or as a server which keeps the trees in memory and
evaluates macros on clients requests. ntred
is
the interface between the client and a cluster of one or more
btred
servers.
In the most usual case, you run btred
as follows:
btred -m my_macros
-e my_function
files
...
btred
first reads your macros from the files
my_macros
, then starts opening given
files
one at a time and executes function
my_function
on each of them.
The file my_macros
may for example look as the
following one:
sub is_childless_verb {
my ($node) = @_;
if (not($node->firstson()) and $node->{tag}=~/^V/) {
return 1;
} else {
return 0;
}
}
sub find_childless_verbs {
do {{
while ($this) {
if (is_childless_verb($this)) {
print ThisAddress(),"\n";
}
$this=$this->following;
}
}} while (NextTree());
}
To apply the above macros on a set of files, you simply type
btred -m my_macros -e find_childless_verbs *.fs
Let's
see what happens in the code: find_childless_verbs
iterates over all trees by calling NextTree()
to move
to the next tree. Within each tree it iterates over all nodes, using
following
method on current node
$this
. This method returns the following node in the
recursive depth-first ordering. For each node it consults the
is_childless_verb
function declared above and if this
function returns 1
, prints a string of the form
generated by a default macro function called ThisAddress(). Output can be
passed to TrEd as a file-list and TrEd will
open each of the files at the exact tree and node.filename.fs
#tree-no
.node-position
The code in is_childless_verb
checks if the given
node has no children (which in turn means, it has no first son, as node's
children are all male in the terminology used) and is a verb according to
the first letter of the attribute tag
, containing
required morphological information.
It is quite apparent that most of the code of
find_childless_verbs
is spent on iterating over all
nodes of all trees. btred
can do this for you, if you
use use the options -T
(to iterate over all trees) and
-N
to iterate over all nodes within each of processed
trees. Command-line options which don't take arguments can be put
together, so you can write the above two as -TN
.
Because it is rather easy to forget to put options like
-T
or -N
to the command-line, you
can write them directly into the code as shown below. It is sometimes
handy to include other options, such as -e
find_childless_verbs
indicating, which macro should be called on
each iteration. The script then looks as follows:
#!btred -TN -e find_childless_verbs
sub is_childless_verb {
my ($node) = @_;
if (not($node->firstson()) and $node->{tag}=~/^V/) {
return 1;
} else {
return 0;
}
}
sub find_childless_verbs {
if (is_childless_verb($this)) {
print ThisAddress(),"\n";
}
}
and can be executed simply as
$ btred -m my_macros *.fs
If your script is so simple that you don't bother openning an
editor, you can put your code directly on the command line. Here is an
example equivalent to the code of my_macros
.
$ btred -TN -e 'print ThisAddress(),"\n" unless ($this->firstson() or $this->{tag}!~/^V/)' *.fs
Processing a large amount of data with many incremental scripts (or just incremental versions of one script) may take a very long time. Usually, most of the time is not spent on your script but on opening, parsing (and possibly saving) the data files. This is where NTrEd offers big improvement, because it 1) allows you to utilize computing power of more than one machine, 2) reads the files only once and keeps them in memory. As a result, depending on situation, it may shorten the time needed for one pass from several minutes to just a few seconds.
NTrEd requires you to have password-free
access to all machines you use. There are many ways to achive
this, the most common of which are a) using ssh2
authorization keys together with a ssh-agent2
, b)
using Kerberos
, or c) .rhosts
files.
The remote machines must have shared filesystems. As a minimum, they should share your home directory and a directory containing your data.
The hostnames of the machines you wish to use with
NTrEd
can either be specified on the command-line,
but it is more comfortable to put them in a file called
.ntred_serverlist
contained in your home directory.
This file should contain one or more lines of the form
.
Empty lines are ignored, and lines starting with a hash sign
hostname
:port
#
are treated as comments. You may specify one
hostname several times, provided you use different ports. This is
particularly useful to utilize the whole power of multi-processor
systems. You may choose any port number above 1024
,
but you should try to avoid collisions with other services running on
the remote machines, including btred
instances run by other users.
You may always override the list by using the
--servers
.
Once you have .ntred_serverlist prepared and you have configured your system so that you can log in to all listed servers via ssh without typing a password, you can try to start the servers and load some data:
$ ntred -i *.fs
If all went well, you see the ntred hub distributing your data
(*.fs) among btred servers. If you use some other tool than ssh to log
into the servers, you can specify it using --ssh
option, for example:
$ ntred -i --ssh /usr/bin/rsh *.fs
Once all data files are distributed among the servers, you'll see the following line:
NTRED-HUB: Waiting for a new client request.
Now, open a new console (xterm, or whatever) and run the following:
$ ntred --list-files
On standard output, you'll see a list of open files, printed by each btred server. Note that the order of the files may be quite random. Standard error oputput contains various messages showing e.g. which server is being communicated by the hub etc. It's now time to run some more interesting code on the servers. We may start with the one already crafted for btred above:
$ ntred -TN -e 'print ThisAddress(),"\n" unless ($this->firstson() or $this->{tag}!~/^V/)'
You may see that this is almost identical to the btred example above, except that this time we don't have to specify any filenames, since we already have files loaded on the server (an of course, this time it will be significantly faster, esp. in case of large amount of data files).
Usually, btred scripts can be reused with ntred without changed. You only have to remember that:
each btred server processes it's own files without communicating with the others, so if for example, your script does some statistics and prints it at the end, you'll get as many results as there are servers, so you'll have to collect the output and merge the results somehow to obtain the overall statistics for the whole data.
Your scripts may modify the data on the servers. The servers
can remember what files were changed by your scrips but you have to
explicitly tell in your macros if you're making changes (by calling
ChangingFile()
). You can list the changed files
with ntred --list-changed-files
. The changes are
kept in the memory only, unless you explicitly say the servers to
save them (using ntred --save-files
- saves all -
or ntred --save-changed-files
).
You can open the data from the server's memory (with all
changes) for inspection or manual processing in TrEd. It only
suffices to give the full path name to a file and precede it with
ntred://
protocol prefix. You may optionally
specify tree number after @ sign. You can also specify node
to be made active in TrEd. For example, t
o see node with
recursive ordering 22 in the 10th tree in the file /my/data/foo.fs
as it is stored in the servers' memory, you would issue
$ tred ntred:///my/data/foo.fs#1.22@10
For your convenience, there is a predefined macro
Position($node)
which prints the URI of the above
form for the given node (or for $this
if no node
is specified). Here is a nice example of the power of Unix
redirection and this toolkit: it shows nodes selected by our old
good macro in TrEd:
$ ntred -TN -e 'Position unless ($this->firstson() or $this->{tag}!~/^V/) | tred -l-
Most ntred
commands can be used with the
option -L
which allows you to specify files (or a
list of files) to apply the commands to. The files must be already
open by the servers. If apropriate, you may also specify individual
trees or even nodes using the syntax shown above.
Here is a brief list of the macro API. More complete (yet still incomplete) information can be found in TrEd's User Manual. This list doesn't include contributed extensions for various specific projects, such as PDT.
This variable contains the currently processed node.
This variable contains the root of the currently processed tree.
attribute
)Get the value of node's attribute named
attribute
.
attribute
},
value
)Set the value of node's attribute named
attribute
to
value
.
Returns a list of node's child nodes.
Returns node's parent node.
Returns node's left sibling node.
Returns node's right sibling node.
Returns all nodes in the node's subtree not including the node itself.
Returns a node following a given node in the depth-first recursive ordering.
Returns a node preceding a given node in the depth-first recursive ordering.
Returns the current Treex::PML::FSFormat object. It is an object containing
information of currently defined attributes and possibly their
special semantic (e.g. which attribute is responsible for the
topological left-to-right ordering of the tree, which attribute is
used to mark nodes as hidden
, etc).
Return all visible child nodes (i.e. not marked as hidden) of the given node.
Return all visible child nodes (i.e. not marked as hidden) of the given node.
Setup $root and $this to the root of the next tree in the
current file to be processed. Returns undef
if
all trees in the current file has already been processed.
Call this macro to let btred know, you're making changes.
Cuts given node and pastes it on the target node so it becomes it's child. It dies if the target node is in the subtree of the source node (since in that case cut/paste operations makes no sense).
Disconnects a given node from it's paren and siblings.
Make a given node a new child of a given parent node.
Return a new identical copy of the current sub-tree.
Create a new node.
Return true if a given node is marked as hidden.