FSNode - Simple OO interface to tree structures of Fslib.pm
Create a new FSNode object. FSNode is basicly a hash reference, which means
that you may simply acces node's attributes as $node-
{attribute}>
This function inicializes FSNode. It is called by the constructor new.
Return node's parent node (undef
if none).
Return node's left brother node (undef
if none).
Return node's right brother node (undef
if none).
Return node's first dependent node (undef
if none).
Return the next node of the subtree in the order given by structure (undef
if none). If any descendant exists, the first one is returned. Otherwise,
right brother is returned, if any. If the given node has neither a
descendant nor a right brother, the right brother of the first (lowest)
ancestor for which right brother exists, is returned.
Return the previous node of the subtree in the order given by structure (undef
if none). The way of searching described in
following is used here in reversed order.
FSFormat - Simple OO interface for FS instance of Fslib.pm
Create a new FS format instance object and initialize it with the optional values.
Initialize a new FS format instance with given values. See Fslib for more information about attribute hash, ordered names list and unparsed headers.
Reads FS format instance definition from given source, optionally echoing
the unparsed input on the given output. The obligatory argument source
must be either a GLOB or list reference. Argument output
is optional and if given, it must be a GLOB reference.
Return names of special attributes declared in FS format as @W, @N, @V,
@H
respectively.
Return the lowest ancestor-or-self of the given node marked by
'hide'
in the FS attribute declared as @H. Return undef, if no such node exists.
Parse a given line in FS format (using Fslib::GetTree2
) and return the root of the resulting FS tree as an FSNode object.
Return a reference to the internally stored attribute hash.
Return a reference to the internally stored attribute names list.
Return a reference to the internally stored unparsed FS header. Note, that this header must not correspond to the defs and attributes if any changes are made to the definitions or names at run-time by hand.
Return a list of all attribute names (in the order given by FS instance declaration).
Return the n'th attribute name (in the order given by FS instance declaration).
Return the definition string for the given attribute.
Return the number of declared attributes.
Return true if given attribute is assigned a list of all possible values.
Return the list of all possible values for the given attribute.
Return one of Shadow
, Hilite
and XHilite
depending on the color assigned to the given attribute in the FS format
instance.
Return name of a special attribute declared in FS definition with a given letter. See also sentord, order, value, hide.
Return index of the given attribute (in the order given by FS instance declaration).
FSFile - Simple OO interface for FS files.
use Fslib;
open (F,"<trees.fs") || die "Cannot open trees.fs: $!\n"; my $fs = FSFile->newFSFile(\*F); close (F);
die "File is empty or corrupted!\n" if ($fs->{FSFile}->lastTreeNo<0);
foreach my $tree ($fs->trees) {
... # do something on the trees
}
open (F,">trees_out.fs") || die "Cannot open trees.fs: $!\n"; $fs->writeTo(\*F); close (F);
Create a new FS file object and initialize it with the optional values.
Initialize a FS file object. Argument description:
File name
File format indentifier (user-defined string). TrEd, for example, uses
FS format
, gzipped FS format
and any non-specific format
strings as identifiers.
FSFormat object associated with the file
TrEd's hint pattern definition
TrEd's display attributes pattern definition
The rest of the file, which is not parsed by Fslib, i.e. Graph's embedded macros
List of FSNode objects representing root nodes of all trees in the FSFiled.
File save status indicator, 0=file is saved, 1=file is not saved (TrEd uses this field).
Read FS declaration and trees from a given file (file handle open for reading must be passed as a GLOB reference).
Write FS declaration, trees and unparsed tail to a given file (file handle open for reading must be passed as a GLOB reference).
Create a new FSFile object based on the content of a given file (file handle open for reading must be passed as a GLOB reference).
Return the FS file's file name.
Change the FS file's file name.
Return file format indentifier (user-defined string). TrEd, for example,
uses FS format
, gzipped FS format
and any
non-specific format
strings as identifiers.
Change file format indentifier.
Return a reference to the associated FSFormat object.
Associate FS file with a new FSFormat object.
Return the Tred's hint pattern declared in the FSFile.
Change the Tred's hint pattern associated with this FSFile.
Return a list of display attribute patterns associated with this FSFile.
Change the list of display attribute patterns associated with this FSFile.
Return the unparsed tail of the FS file (i.e. Graph's embedded macros).
Modify the unparsed tail of the FS file (i.e. Graph's embedded macros).
Return a list of all trees (e.g. their roots represented by FSNode objects).
Assign a new list of trees.
Return a reference to the internal array of all trees (e.g. their roots represented by FSNode objects).
Associate a new reference to a list of trees with the this FSFile. The referenced array must be a list of FSNode objects representing all the new trees.
Return number of associated trees minus one.
Return/assign file saving status (this is completely user-driven).
Fslib.pm - Simple low-level API for treebank files in .fs format. See FSFile, FSFormat and FSNode for an object-oriented abstraction over this module.
use Fslib; use locale; use POSIX qw(locale_h);
setlocale(LC_ALL,"cs_CZ"); setlocale(LANG,"czech");
%attribs = (); @atord = (); @trees = ();
# read the header %attribs=ReadAttribs(\*STDIN,\@atord,2,\@header);
# read the raw tree while ($_=ReadTree(\*F)) { if (/^\[/) { $root=GetTree($_,\@atord,\%attribs); # parse the tree push(@trees, $root) if $root; # store the structure } else { push(@rest, $_); } # keep the rest of the file }
# do some changes ...
# save the tree print @header; # print header PrintFS(\*STDOUT, \@header, \@trees, \@atord, \%attribs); # print the trees print @rest; # print the rest of the file
# destroy trees and free memory foreach (@trees) { DeleteTree($_); } undef @header;
This package has the ambition to be a simple and usable perl API for manipulating the treebank files in the .fs format (which was designed by Michal Kren and is the only format supported by his Windows application GRAPH.EXE used to interractively edit treebank analytical or tectogramatical trees). See also Dan Zeman's review of this format in czech at
http://ufal.mff.cuni.cz/local/doc/trees/format_fs.html
The Fslib package defines functions for parsing .fs files, extracting headers, reading trees and representing them in memory using simple hash structures, manipulate the values of node attributes (either ``directly'' or via Get and Set functions) and even modify the structure of the trees (via Cut, Paste and DeleteTree functions or ``directly'').
There are many ways to benefit from this package, I note here the most typical one. Assume, you want to read the .fs file from the STDIN (or whatever), then make some changes either to the structure of the trees or to the values of the attributes (or both) and write it again. (Maybe you only want to examine the structure, find something of your interest and write some output about it -- it's up to you). For this purpose you may use the code similar to the one mentioned in SYNOPSIS of this document. Let's see how to manage the appropriate tasks (also watch the example in SYNOPSIS while reading):
First you should read the header of the .fs file using ReadAttribs() function, passing it as parameters the reference to the input file descriptor (like \*STDIN), reference to an array, that will contain the list of attribute names positionaly ordered and storing its return value to a hash. The returned hash will then have the attribute names as keys and their type character definitions as values. (see ReadAttribs description for detail).
Note, that no Read... function from this package performs any seeking, only reads the file on. So, it's expected, that you are at the beggining of the file when you call ReadAttribs, and that you have read the header before you parse the trees.
Anyway, having the attribute definitions read you probbably want to continue and parse the trees. This is done in two steps. First you call the ReadTree() function (passing it only a reference to the input file descriptor) to obtain (on return) a scalar (string), containing a linear representation of the next tree on input in the .fs format (except for line-breaks). You should store it. Now you should test, that it was really a tree that was read and not something else, which may be some environmetal or macro definition for GRAPH.EXE which is allowed by .fs format. This may be done simply by matching the result of ReadTree() with the pattern /^\[/ because trees and only trees begin with the square bracket `['. If it is so, you may continue by parsing the tree with GetTree(). This function parses the linear .fs representation of the tree and re-represents it as a structure of references to perl hashes (this I call a tree node structure - TNS). For more about TNS's see chapter called MODIFYING AND EXAMINING TREES and the REFERENCE item Tree Node Structure. On return of GetTree() you get a reference to the tree's TNS. You may store it (by pushing it to an array, i.e.) and continue by reading next tree, or do any other job on it.
When you are finished with reading the trees and also had made all the changes you wanted, you may want to write the trees back. This is done using the PrintFS() function (see its description bellow). To create a corect .fs file, you probably should write back the header before writing the trees, and also write that messy environmetal stuff after the trees are written.
TNS represents both a node and the whole subtree having root in this node.
So whole trees are represented by their roots. TNS is actualy just a
reference to a hash. The keys of the hashes may be either some of attribute
names or some `special' keys, serving to hold the tree structure. Suppose
$node
is a TNS and `lemma' is an attribute defined in the
appropriate .fs file. Than $node->{``lemma''} is value of the attribute
for the node represented by TNS $node. You may obtain this value also as Get($node,``lemma''). From the $node
TNS you may obtain also the
node's parent, sons and brothers (both left and right). This may be done in
several equivalent ways. TNS's of a nodes relatives are stored under values
of those `special' keys mentioned above. These keys are chosen in such a
way that they should not colide with attribute names and are stored in the
following scalar variables:
Fslib::$parent
Fslib::$firstson
Fslib::$lbrother
Fslib::$rbrother
(You may change these variables if you want, but note, that modifying them once the trees are read may lead to problems:-)
So, to obtain $node's parent's TNS you may use either $node->{$parent} or Get($node,$parent) or even special function Parent($node). The same holds for the first son, left and right brothers while you may also prefere to use the FirstSon(), LBrother() and RBrother() functions respectively. If the node's relative (say first son) does not exist, the value obtained in either of the mentioned ways is still defined but zero.
To create a new node, you usually create a new hash and call NewNode() passing it a reference to the new hash as a parameter.
To modify a node's value for a certain attribute (say 'lemma'), you symply
type $node-
{``lemma''}=``bar''> (supposed you want the value to become 'bar') or
use the Set() function like
Set($node,"bar");
.
To modify the tree's structure, you may use Cut and Paste function as described bellow or to delete a whole subtree you may use the DeleteTree function. This also frees memory used by the TNS. If you want to delete a subtree of $node, but still keep its root, you may use a construct like:
DeleteTree(FirstSon($node)) while(FirstSon($node));
Note, that Cut function also deletes a subree from the tree but keeps the TNS in memory and returns a reference to it.
There is also a global variable Fslib::$FSTestListValidity, which may be set to 1 to make Fslib::ParseNode check if value assigned to a list attribute is one of the possible values declared in FS file header. Because this may slow the process of parsing significantly (especially when there is a lot of list attributes) the default value is 0 (no check is performed).
Params
FILE - file handle reference, like \*STDIN $aref - reference to array $DO_PRINT - if 1, read input is also copied to $OUTFILE (which must be a filehandle reference, like \*STDOUT). if 0, read input is also stored to the @$OUTFILE array (in this case $OUTFILE is a reference to an array). $OUTFILE - output file handle or array reference , \*STDIN if ommited
Returns: A hash, having fs-attribute names as keys and strings containing characters identifying types as corresponding values The characters may be some of following (as given by the .fs format):
K Key attribute P Positional attribute O Obligatory attribute L List attribute N Numerical attribute V Value atribute (for displaying in GRAPH.EXE)
The $aref should be on input a reference to an empty array. On return the array contains the key values of the returned hash (the attributes) orderd as thay are defined in FS file, i.e. in their positional order.
Params:
FILE - file handle, like STDIN
Returns:
A string containing the next tree read form FILE in its source form (only with concatenated lines).
Params:
$tree - the source form of a tree with concatenated lines $aref - a reference to an array of attributes in their positional order (see ReadAttributes) $href - a reference to a hash, containing attributes as keys and corresponding type strigs as values
Returns:
A reference to a tree hash-structure described below.
Params:
$node - a reference to a tree hash-structure $aref - a reference to an array of attributes in their positional order (see ReadAttributes) $href - a reference to a hash, containing attributes as keys and corresponding type strigs as values Returns:
Unknown.
Descrption:
Prints the node structure referenced by $node to STDOUT in a source format
Params:
$node - a reference to a tree hash-structure $aref - a reference to an array of attributes in their positional order (see ReadAttributes) $href - a reference to a hash, containing attributes as keys and corresponding type strigs as values Returns:
Unknown.
Descrption:
Prints the tree having its root-node referenced by $node to STDOUT in a source format
Params:
$node - a reference to a tree hash-structure
Returns:
Parent, first son, left brother or right brother resp. of the node referenced by $node
Params:
$node - a reference to a tree hash-structure $top - a reference to a tree hash-structure, containing the node referenced by $node
Return:
Reference to the next or previous node of $node on the backtracking way along the tree having its root in $top. The $top parameter is NOT obligatory and may be omitted. Return zero, if $top of root of the tree reached.
Params:
$node - a reference to a node
Description:
Cuts (disconnets) $node from its parent and brothers Returns:
$node
Params:
$node - a reference to a (cutted or new) node $newparent - a reference to the new parent node $href - a reference to a hash, containing attributes as keys and corresponding type strigs as values Description:
connetcs $node to $newparent and links it with its new brothers, placing it to position corresponding to its numerical-argument value obtained via call to an Ord function.
Returns $node
Exported with EXPORT_OK
Params:
$node - a reference to a tree hash-structure $href - a reference to a hash, containing attributes as keys and corresponding type strigs as values $defchar - a type string pattern
Returns:
Value of the first $node attribute of type matching $defchar pattern
Exported with EXPORT_OK
Params:
$href - a reference to a hash, containing attributes as keys and corresponding type strigs as values $defchar - a type string pattern
Returns:
Name of the first attribute of type matching $defchar pattern
Exported wiht EXPORT_OK
Params:
$href - a reference to a hash, containing attributes as keys and corresponding type strigs as values
Description:
Are all like Ord, SentOrd, Value, Hide only except for they do not get $node as parameter and return attribute name rather than its value.
Exported with EXPORT_OK
Params:
$node - a reference to a tree hash-structure $href - a reference to a hash, containing attributes as keys and corresponding type strigs as values
Returns:
$node's ord (value of attribute declared by type character N) Same as Special($node,$href,'N')
Exported with EXPORT_OK
Params:
$node - a reference to a tree hash-structure $href - a reference to a hash, containing attributes as keys and corresponding type strigs as values
Returns:
$node's value attribut (value of attribute declared by type character V) Same as Special($node,$href,'V')
Exported with EXPORT_OK
Params:
$node - a reference to a tree hash-structure $href - a reference to a hash, containing attributes as keys and corresponding type strigs as values
Returns:
$node's sentence ord (value of attribute declared by type character W) Same as Special($node,$href,'W')
Exported with EXPORT_OK
Params:
$node - a reference to a tree hash-structure $href - a reference to a hash, containing attributes as keys and corresponding type strigs as values
Returns:
"hide" if $node is hidden (actually the value of attribute declared by type character H) Same as Special($node,$href,'H')
Params:
$attr - an atribute name $href - a reference to a hash, containing attributes as keys and corresponding type strigs as values
Returns:
1 if attribut $attr is declared as a list (L) in hash of attribute defs (referenced in) $href 0 otherwise
Params:
$attr - an atribute name $href - a reference to a hash, containing attributes as keys and corresponding type strigs as values
Returns:
a list of allowed values for attribute $attr as defined in the hash of attribyte defs $href
Params:
$node - a reference to a node $attribute - attribute $value - value to fill $node's $attribute with
Description:
Does the same as $node->{$attribute}=$value
Params:
$node - a reference to a node $attribute - attribute
Return:
Returns $node->{$attribute}
Params:
$node - a reference to a node $attrs - list of attributes to display
Description:
Draws a tree on standard output using character graphics. (May be particulary useful on systems with no GUI - for real graphical representation of FS trees look for Michal Kren's GRAPH.EXE or Perl/Tk based program "tred" by Petr Pajas.
Description:
TNS is a normal hash, whose keys are names of attribute and whose values are strings, values of the correspoding attributes (as they are given in the FS format source).
In addtion, few other keys and values are added to each node:
"Parent" which is a reference to the parent node (or zero if N/A) $firstson a reference to the first son's node (or zero) "RBrother" a reference to the first right brother (or zero) $lbrother a reference to the first left brother (or zero)
You may initialize a new node by calling NewNode($node), where $node is a reference to some (existing and rather empty) hash.