FSNode - Simple OO interface to tree structures of Fslib.pm
$node-
>{attribute}
undef
if none).
undef
if none).
undef
if none).
undef
if none).
undef
if none). If any descendant exists, the first one is
returned. Otherwise, right brother is returned, if any. If the given
node has neither a descendant nor a right brother, the right brother
of the first (lowest) ancestor for which right brother exists, is
returned.
undef
if none). A node is considered visible if it has
no hidden ancestor. Requires FSFormat object as the first parameter.
undef
if none), but not descending.
undef
if none). The way of searching described in
following
is used here in reversed order.
undef
if none). A node is considered visible if it has
no hidden ancestor. Requires FSFormat object as the first parameter.
visible_children(fsformat)
visible_descendants(fsformat)
FSFormat - Simple OO interface for FS instance of Fslib.pm
initialize
it with the
optional values.
source
must be either a GLOB or list reference.
Argument output
is optional and if given, it must be a GLOB reference.
'hide'
in the FS attribute declared as @H. Return undef, if no such
node exists.
Fslib::GetTree2
) and return
the root of the resulting FS tree as an FSNode object.
Shadow
, Hilite
and XHilite
depending on the
color assigned to the given attribute in the FS format instance.
FSFile - Simple OO interface for FS files.
use Fslib;
my $file="trees.fs"; my $fs = FSFile->newFSFile($file);
if ($fs->lastTreeNo<0) { die "File is empty or corrupted!\n" }
foreach my $tree ($fs->trees) {
... # do something on the trees
}
$fs->writeFile("$file.out");
initialize
it with the optional values.
new
but accepts name => value pairs as arguments. The
following argument names are available:
filename, format, FS, hint, patterns, tail, trees, save_status, backend
See initialize
for more detail.
FS format
, gzipped FS format
and any non-specific format
strings as identifiers.
test
methods of the modules are invoked as long as one of them
succeeds. That module is than used as a backend for opening and
parsing the file.
Note: this function sets noSaved to zero.
read
methods of the modules are invoked
as long as one of them succeeds to open and parse the file.
Optionally, in perl ver. >= 5.8, you may also specify file character encoding.
FS format
, gzipped FS format
and any
non-specific format
strings as identifiers.
metaData(name)
changeMetaData(name,value)
listMetaData(name)
Unless no_tree_numbers is non-zero, prepend the resulting string with a ``tree number/tree count: '' prefix.
ZBackend - generic IO backend for reading/writing gz-compressed files using either IO::Zlib module or external zcat utility. Only open_backend and close_backend functions are implemented as this backend is meant to be base-class for all other backends which wish to open gz-compressed files.
Optionally, in perl ver. >= 5.8, you may also specify file character encoding.
open_backend
FSBackend - IO backend for reading/writing FS files using FSFile class.
open_backend
. In this case, the calling application may
need to close the handle and reopen it in order to seek the beginning
of the file after the test has read few characters or lines from it.
Optionally, in perl ver. >= 5.8, you may also specify file character encoding.
WARNING: THIS DOCUMENTATION IS VERY OUTDATED.
YOU SHOULD RATHER USE THE OO INTERFACE DESCRIBED ABOVE AND THINK OF FSLIB AS LOW-LEVEL
Fslib.pm - Simple low-level API for treebank files in .fs format. See FSFile, FSFormat and FSNode for an object-oriented abstraction over this module which allows for using other formats to be represented by the same Perl data structures and objects.
use Fslib; use locale; use POSIX qw(locale_h);
setlocale(LC_ALL,"cs_CZ"); setlocale(LANG,"czech");
%attribs = (); @atord = (); @trees = ();
# read the header %attribs=ReadAttribs(\*STDIN,\@atord,2,\@header);
# read the raw tree while ($_=ReadTree(\*F)) { if (/^\[/) { $root=GetTree($_,\@atord,\%attribs); # parse the tree push(@trees, $root) if $root; # store the structure } else { push(@rest, $_); } # keep the rest of the file }
# do some changes ...
# save the tree print @header; # print header PrintFS(\*STDOUT, \@header, \@trees, \@atord, \%attribs); # print the trees print @rest; # print the rest of the file
# destroy trees and free memory foreach (@trees) { DeleteTree($_); } undef @header;
This package has the ambition to be a simple and usable perl API for manipulating the treebank files in the .fs format (which was designed by Michal Kren and is the only format supported by his Windows application GRAPH.EXE used to interractively edit treebank analytical or tectogramatical trees). See also Dan Zeman's review of this format in Czech at
http://ufal.mff.cuni.cz/local/doc/trees/format_fs.html
The Fslib package defines functions for parsing .fs files, extracting headers, reading trees and representing them in memory using simple hash structures, manipulate the values of node attributes (either ``directly'' or via Get and Set functions) and even modify the structure of the trees (via Cut, Paste and DeleteTree functions or ``directly'').
There are many ways to benefit from this package, I note here the most typical one. Assume, you want to read the .fs file from the STDIN (or whatever), then make some changes either to the structure of the trees or to the values of the attributes (or both) and write it again. (Maybe you only want to examine the structure, find something of your interest and write some output about it -- it's up to you). For this purpose you may use the code similar to the one mentioned in SYNOPSIS of this document. Let's see how to manage the appropriate tasks (also watch the example in SYNOPSIS while reading):
First you should read the header of the .fs file using ReadAttribs() function, passing it as parameters the reference to the input file descriptor (like \*STDIN), reference to an array, that will contain the list of attribute names positionaly ordered and storing its return value to a hash. The returned hash will then have the attribute names as keys and their type character definitions as values. (see ReadAttribs description for detail).
Note, that no Read... function from this package performs any seeking, only reads the file on. So, it's expected, that you are at the beggining of the file when you call ReadAttribs, and that you have read the header before you parse the trees.
Anyway, having the attribute definitions read you probbably want to continue and parse the trees. This is done in two steps. First you call the ReadTree() function (passing it only a reference to the input file descriptor) to obtain (on return) a scalar (string), containing a linear representation of the next tree on input in the .fs format (except for line-breaks). You should store it. Now you should test, that it was really a tree that was read and not something else, which may be some environmetal or macro definition for GRAPH.EXE which is allowed by .fs format. This may be done simply by matching the result of ReadTree() with the pattern /^\[/ because trees and only trees begin with the square bracket `['. If it is so, you may continue by parsing the tree with GetTree(). This function parses the linear .fs representation of the tree and re-represents it as a structure of references to perl hashes (this I call a tree node structure - TNS). For more about TNS's see chapter called MODIFYING AND EXAMINING TREES and the REFERENCE item Tree Node Structure. On return of GetTree() you get a reference to the tree's TNS. You may store it (by pushing it to an array, i.e.) and continue by reading next tree, or do any other job on it.
When you are finished with reading the trees and also had made all the changes you wanted, you may want to write the trees back. This is done using the PrintFS() function (see its description bellow). To create a corect .fs file, you probably should write back the header before writing the trees, and also write that messy environmetal stuff after the trees are written.
TNS represents both a node and the whole subtree having root in this
node. So whole trees are represented by their roots. TNS is actualy
just a reference to a hash. The keys of the hashes may be either some
of attribute names or some `special' keys, serving to hold the tree
structure. Suppose $node is a TNS and `lemma' is an attribute defined
in the appropriate .fs file. Than $node->getAttribute(``lemma'')
is value of the
attribute for the node represented by TNS $node. You may obtain this
value also as Get($node,``lemma''). From the $node TNS you may
obtain also the node's parent, sons and brothers (both left and
right). This may be done in several equivalent ways. TNS's of a nodes
relatives are stored under values of those `special' keys mentioned
above. These keys are chosen in such a way that they should not colide
with attribute names and are stored in the following scalar variables:
(You may change these variables if you want, but note, that modifying them once the trees are read may lead to problems:-)
So, to obtain $node's parent's TNS you may use either $node->{$parent} or Get($node,$parent) or even special function Parent($node). The same holds for the first son, left and right brothers while you may also prefere to use the FirstSon(), LBrother() and RBrother() functions respectively. If the node's relative (say first son) does not exist, the value obtained in either of the mentioned ways is still defined but zero.
To create a new node, you usually create a new hash and call NewNode() passing it a reference to the new hash as a parameter.
To modify a node's value for a certain attribute (say 'lemma'), you
symply type $node-
setAttribute(``lemma'',``bar'')> (supposed you want the value to
become 'bar') or use the Set() function like
Set($node,"bar");
.
To modify the tree's structure, you may use Cut and Paste function as described bellow or to delete a whole subtree you may use the DeleteTree function. This also frees memory used by the TNS. If you want to delete a subtree of $node, but still keep its root, you may use a construct like:
DeleteTree(FirstSon($node)) while(FirstSon($node));
Note, that Cut function also deletes a subree from the tree but keeps the TNS in memory and returns a reference to it.
There is also a global variable $Fslib::FSTestListValidity, which may be set to 1 to make Fslib::ParseNode check if value assigned to a list attribute is one of the possible values declared in FS file header. Because this may slow the process of parsing significantly (especially when there is a lot of list attributes) the default value is 0 (no check is performed).
Params
FILE - file handle reference, like \*STDIN $aref - reference to array $DO_PRINT - if 1, read input is also copied to $OUTFILE (which must be a filehandle reference, like \*STDOUT). if 0, read input is also stored to the @$OUTFILE array (in this case $OUTFILE is a reference to an array). $OUTFILE - output file handle or array reference , \*STDIN if ommited
Returns: A hash, having fs-attribute names as keys and strings containing characters identifying types as corresponding values The characters may be some of following (as given by the .fs format):
K Key attribute P Positional attribute O Obligatory attribute L List attribute N Numerical attribute V Value atribute (for displaying in GRAPH.EXE)
The $aref should be on input a reference to an empty array. On return the array contains the key values of the returned hash (the attributes) orderd as thay are defined in FS file, i.e. in their positional order.
Params:
FILE - file handle, like STDIN
Returns:
A string containing the next tree read form FILE in its source form (only with concatenated lines).
Params:
$tree - the source form of a tree with concatenated lines $aref - a reference to an array of attributes in their positional order (see ReadAttributes) $href - a reference to a hash, containing attributes as keys and corresponding type strigs as values
Returns:
A reference to a tree hash-structure described below.
Params:
$node - a reference to a tree hash-structure $aref - a reference to an array of attributes in their positional order (see ReadAttributes) $href - a reference to a hash, containing attributes as keys and corresponding type strigs as values $output - output filehandle
Returns:
Not specified.
Descrption:
Prints the node structure referenced by $node to $output (or STDOUT) in the source format.
Params:
$node - a reference to a tree hash-structure $aref - a reference to an array of attributes in their positional order (see ReadAttributes) $href - a reference to a hash, containing attributes as keys and corresponding type strigs as values $output - output filehandle
Returns:
Not specified.
Descrption:
Prints the tree having its root-node referenced by $node to STDOUT in the source format
RBrother($node)
Params:
$node - a reference to a tree hash-structure
Returns:
Parent, first son, left brother or right brother resp. of the node referenced by $node
Prev($node,[$top])
Params:
$node - a reference to a tree hash-structure $top - a reference to a tree hash-structure, containing the node referenced by $node
Return:
Reference to the next or previous node of $node on the backtracking way along the tree having its root in $top. The $top parameter is NOT obligatory and may be omitted. Return zero, if $top of root of the tree reached.
Cut($node)
Params:
$node - a reference to a node
Description:
Cuts (disconnets) $node from its parent and brothers
Returns:
$node
Paste($node,$newparent,$href)
Params:
$node - a reference to a (cutted or new) node $newparent - a reference to the new parent node $href - a reference to a hash, containing attributes as keys and corresponding type strigs as values Description:
connetcs $node to $newparent and links it with its new brothers, placing it to position corresponding to its numerical-argument value obtained via call to an Ord function.
Returns $node
Special($node,$href,$defchar)
Exported with EXPORT_OK
Params:
$node - a reference to a tree hash-structure $href - a reference to a hash, containing attributes as keys and corresponding type strigs as values $defchar - a type string pattern
Returns:
Value of the first $node attribute of type matching $defchar pattern
ASpecial($href,$defchar)
Exported with EXPORT_OK
Params:
$href - a reference to a hash, containing attributes as keys and corresponding type strigs as values $defchar - a type string pattern
Returns:
Name of the first attribute of type matching $defchar pattern
Exported wiht EXPORT_OK
Params:
$href - a reference to a hash, containing attributes as keys and corresponding type strigs as values
Description:
Are all like Ord, SentOrd, Value, Hide only except for they do not get $node as parameter and return attribute name rather than its value.
Ord($node,$href)
Exported with EXPORT_OK
Params:
$node - a reference to a tree hash-structure $href - a reference to a hash, containing attributes as keys and corresponding type strigs as values
Returns:
$node's ord (value of attribute declared by type character N) Same as Special($node,$href,'N')
Value($node,$href)
Exported with EXPORT_OK
Params:
$node - a reference to a tree hash-structure $href - a reference to a hash, containing attributes as keys and corresponding type strigs as values
Returns:
$node's value attribut (value of attribute declared by type character V) Same as Special($node,$href,'V')
SentOrd($node,$href)
Exported with EXPORT_OK
Params:
$node - a reference to a tree hash-structure $href - a reference to a hash, containing attributes as keys and corresponding type strigs as values
Returns:
$node's sentence ord (value of attribute declared by type character W) Same as Special($node,$href,'W')
Hide($node,$href)
Exported with EXPORT_OK
Params:
$node - a reference to a tree hash-structure $href - a reference to a hash, containing attributes as keys and corresponding type strigs as values
Returns:
"hide" if $node is hidden (actually the value of attribute declared by type character H) Same as Special($node,$href,'H')
IsList($attr,$href)
Params:
$attr - an atribute name $href - a reference to a hash, containing attributes as keys and corresponding type strigs as values
Returns:
1 if attribut $attr is declared as a list (L) in hash of attribute defs (referenced in) $href 0 otherwise
ListValues($attr,$href)
Params:
$attr - an atribute name $href - a reference to a hash, containing attributes as keys and corresponding type strigs as values
Returns:
a list of allowed values for attribute $attr as defined in the hash of attribyte defs $href
Set($node,$attribute,$value)
Params:
$node - a reference to a node $attribute - attribute $value - value to fill $node's $attribute with
Description:
Does the same as $node->setAttribute($attribute,$value)
Get($node,$attribute)
Params:
$node - a reference to a node $attribute - attribute
Return:
Returns $node->getAttribute($attribute)
DrawTree($node,@attrs)
Params:
$node - a reference to a node $attrs - list of attributes to display
Description:
Draws a tree on standard output using character graphics. (May be particulary useful on systems with no GUI - for real graphical representation of FS trees look for Michal Kren's GRAPH.EXE or Perl/Tk based program "tred" by Petr Pajas.
ImportBackends(@backends)
Params:
@backends - a list of backend names
Description:
Demand to load the given backends and return a list of backends for which the demand was fulfilled. These backends may then be freely used in FSFile IO calls.
Description:
TNS is a normal hash, whose keys are names of attribute and whose values are strings, values of the correspoding attributes (as they are given in the FS format source).
In addtion, few other keys and values are added to each node:
"Parent" which is a reference to the parent node (or zero if N/A) $firstson a reference to the first son's node (or zero) "RBrother" a reference to the first right brother (or zero) $lbrother a reference to the first left brother (or zero)
You may initialize a new node by calling NewNode($node), where $node is a reference to some (existing and rather empty) hash.