FSNode

FSNode - Simple OO interface to tree structures of Fslib.pm


REFERENCE

new

Create a new FSNode object. FSNode is basicly a hash reference, which means that you may simply acces node's attributes as $node-{attribute}>

initialize

This function inicializes FSNode. It is called by the constructor new.

parent

Return node's parent node (undef if none).

lbrother

Return node's left brother node (undef if none).

rbrother

Return node's right brother node (undef if none).

firstson

Return node's first dependent node (undef if none).

following (top?)

Return the next node of the subtree in the order given by structure (undef if none). If any descendant exists, the first one is returned. Otherwise, right brother is returned, if any. If the given node has neither a descendant nor a right brother, the right brother of the first (lowest) ancestor for which right brother exists, is returned.

previous (top?)

Return the previous node of the subtree in the order given by structure (undef if none). The way of searching described in following is used here in reversed order.


FSFormat

FSFormat - Simple OO interface for FS instance of Fslib.pm


REFERENCE

new (attributes_hash_ref?, ordered_names_list_ref?, unparsed_header?)

Create a new FS format instance object and initialize it with the optional values.

initialize (attributes_hash_ref?, ordered_names_list_ref?, unparsed_header?)

Initialize a new FS format instance with given values. See Fslib for more information about attribute hash, ordered names list and unparsed headers.

readFrom (source,output?)

Reads FS format instance definition from given source, optionally echoing the unparsed input on the given output. The obligatory argument source must be either a GLOB or list reference. Argument output is optional and if given, it must be a GLOB reference.

sentord, order, value, hide

Return names of special attributes declared in FS format as @W, @N, @V, @H respectively.

isHidden (node)

Return the lowest ancestor-or-self of the given node marked by 'hide' in the FS attribute declared as @H. Return undef, if no such node exists.

parseFSTree (line)

Parse a given line in FS format (using Fslib::GetTree2) and return the root of the resulting FS tree as an FSNode object.

defs

Return a reference to the internally stored attribute hash.

list

Return a reference to the internally stored attribute names list.

unparsed

Return a reference to the internally stored unparsed FS header. Note, that this header must not correspond to the defs and attributes if any changes are made to the definitions or names at run-time by hand.

attributes

Return a list of all attribute names (in the order given by FS instance declaration).

atno (n)

Return the n'th attribute name (in the order given by FS instance declaration).

atno (attribute_name)

Return the definition string for the given attribute.

count

Return the number of declared attributes.

isList (attribute_name)

Return true if given attribute is assigned a list of all possible values.

listValues (attribute_name)

Return the list of all possible values for the given attribute.

color (attribute_name)

Return one of Shadow, Hilite and XHilite depending on the color assigned to the given attribute in the FS format instance.

special (letter)

Return name of a special attribute declared in FS definition with a given letter. See also sentord, order, value, hide.

indexOf (attribute_name)

Return index of the given attribute (in the order given by FS instance declaration).


FSFile

FSFile - Simple OO interface for FS files.


SYNOPSIS

  use Fslib;

  open (F,"<trees.fs") ||
    die "Cannot open trees.fs: $!\n";
  my $fs = FSFile->newFSFile(\*F);
  close (F);

  die "File is empty or corrupted!\n" 
    if ($fs->{FSFile}->lastTreeNo<0);

  foreach my $tree ($fs->trees) {

    ...    # do something on the trees

  }

  open (F,">trees_out.fs") 
    || die "Cannot open trees.fs: $!\n";
  $fs->writeTo(\*F);
  close (F);


REFERENCE

new (name?,format?,FS?,hint_pattern?,attribs_pattern?,unparsed_tail?,trees?,save_status?)

Create a new FS file object and initialize it with the optional values.

initialize (name?,format?,FS?,hint_pattern?,attribs_patterns?,unparsed_tail?,trees?,save_status?)

Initialize a FS file object. Argument description:

name (scalar)

File name

format (scalar)

File format indentifier (user-defined string). TrEd, for example, uses FS format, gzipped FS format and any non-specific format strings as identifiers.

FS (FSFormat)

FSFormat object associated with the file

hint_pattern (scalar)

TrEd's hint pattern definition

attribs_patterns (list reference)

TrEd's display attributes pattern definition

unparsed_tail (list reference)

The rest of the file, which is not parsed by Fslib, i.e. Graph's embedded macros

trees (list reference)

List of FSNode objects representing root nodes of all trees in the FSFiled.

save_status (scalar)

File save status indicator, 0=file is saved, 1=file is not saved (TrEd uses this field).

readFrom (glob_ref)

Read FS declaration and trees from a given file (file handle open for reading must be passed as a GLOB reference).

writeTo (glob_ref)

Write FS declaration, trees and unparsed tail to a given file (file handle open for reading must be passed as a GLOB reference).

newFSFile (glob_ref)

Create a new FSFile object based on the content of a given file (file handle open for reading must be passed as a GLOB reference).

filename

Return the FS file's file name.

changeFilename (new_filename)

Change the FS file's file name.

fileFormat

Return file format indentifier (user-defined string). TrEd, for example, uses FS format, gzipped FS format and any non-specific format strings as identifiers.

changeFileFormat

Change file format indentifier.

FS

Return a reference to the associated FSFormat object.

changeFS

Associate FS file with a new FSFormat object.

hint

Return the Tred's hint pattern declared in the FSFile.

changeHint

Change the Tred's hint pattern associated with this FSFile.

patterns

Return a list of display attribute patterns associated with this FSFile.

changePatterns

Change the list of display attribute patterns associated with this FSFile.

tail

Return the unparsed tail of the FS file (i.e. Graph's embedded macros).

tail

Modify the unparsed tail of the FS file (i.e. Graph's embedded macros).

trees

Return a list of all trees (e.g. their roots represented by FSNode objects).

trees

Assign a new list of trees.

trees

Return a reference to the internal array of all trees (e.g. their roots represented by FSNode objects).

changeTreeList (new_trees)

Associate a new reference to a list of trees with the this FSFile. The referenced array must be a list of FSNode objects representing all the new trees.

lastTreeNo

Return number of associated trees minus one.

notSaved (value?)

Return/assign file saving status (this is completely user-driven).


Fslib

Fslib.pm - Simple low-level API for treebank files in .fs format. See FSFile, FSFormat and FSNode for an object-oriented abstraction over this module.


SYNOPSIS

  use Fslib;
  use locale;
  use POSIX qw(locale_h);

  setlocale(LC_ALL,"cs_CZ");
  setlocale(LANG,"czech");

  %attribs = ();
  @atord = ();
  @trees = ();

  # read the header
  %attribs=ReadAttribs(\*STDIN,\@atord,2,\@header);

  # read the raw tree
  while ($_=ReadTree(\*F)) {
    if (/^\[/) {
      $root=GetTree($_,\@atord,\%attribs);  # parse the tree
      push(@trees, $root) if $root;         # store the structure
    } else { push(@rest, $_); }             # keep the rest of the file
  }

  # do some changes 
  ...

  # save the tree
  print @header;      # print header
  PrintFS(\*STDOUT,
          \@header,
          \@trees,
          \@atord,
          \%attribs); # print the trees
  print @rest;        # print the rest of the file

  # destroy trees and free memory
  foreach (@trees) { DeleteTree($_); }
  undef @header;


DESCRIPTION

This package has the ambition to be a simple and usable perl API for manipulating the treebank files in the .fs format (which was designed by Michal Kren and is the only format supported by his Windows application GRAPH.EXE used to interractively edit treebank analytical or tectogramatical trees). See also Dan Zeman's review of this format in czech at

http://ufal.mff.cuni.cz/local/doc/trees/format_fs.html

The Fslib package defines functions for parsing .fs files, extracting headers, reading trees and representing them in memory using simple hash structures, manipulate the values of node attributes (either ``directly'' or via Get and Set functions) and even modify the structure of the trees (via Cut, Paste and DeleteTree functions or ``directly'').


USAGE

There are many ways to benefit from this package, I note here the most typical one. Assume, you want to read the .fs file from the STDIN (or whatever), then make some changes either to the structure of the trees or to the values of the attributes (or both) and write it again. (Maybe you only want to examine the structure, find something of your interest and write some output about it -- it's up to you). For this purpose you may use the code similar to the one mentioned in SYNOPSIS of this document. Let's see how to manage the appropriate tasks (also watch the example in SYNOPSIS while reading):


PARSING FS FILES

First you should read the header of the .fs file using ReadAttribs() function, passing it as parameters the reference to the input file descriptor (like \*STDIN), reference to an array, that will contain the list of attribute names positionaly ordered and storing its return value to a hash. The returned hash will then have the attribute names as keys and their type character definitions as values. (see ReadAttribs description for detail).

Note, that no Read... function from this package performs any seeking, only reads the file on. So, it's expected, that you are at the beggining of the file when you call ReadAttribs, and that you have read the header before you parse the trees.

Anyway, having the attribute definitions read you probbably want to continue and parse the trees. This is done in two steps. First you call the ReadTree() function (passing it only a reference to the input file descriptor) to obtain (on return) a scalar (string), containing a linear representation of the next tree on input in the .fs format (except for line-breaks). You should store it. Now you should test, that it was really a tree that was read and not something else, which may be some environmetal or macro definition for GRAPH.EXE which is allowed by .fs format. This may be done simply by matching the result of ReadTree() with the pattern /^\[/ because trees and only trees begin with the square bracket `['. If it is so, you may continue by parsing the tree with GetTree(). This function parses the linear .fs representation of the tree and re-represents it as a structure of references to perl hashes (this I call a tree node structure - TNS). For more about TNS's see chapter called MODIFYING AND EXAMINING TREES and the REFERENCE item Tree Node Structure. On return of GetTree() you get a reference to the tree's TNS. You may store it (by pushing it to an array, i.e.) and continue by reading next tree, or do any other job on it.

When you are finished with reading the trees and also had made all the changes you wanted, you may want to write the trees back. This is done using the PrintFS() function (see its description bellow). To create a corect .fs file, you probably should write back the header before writing the trees, and also write that messy environmetal stuff after the trees are written.


MODIFYING OR EXAMINING TREES

TNS represents both a node and the whole subtree having root in this node. So whole trees are represented by their roots. TNS is actualy just a reference to a hash. The keys of the hashes may be either some of attribute names or some `special' keys, serving to hold the tree structure. Suppose $node is a TNS and `lemma' is an attribute defined in the appropriate .fs file. Than $node->{``lemma''} is value of the attribute for the node represented by TNS $node. You may obtain this value also as Get($node,``lemma''). From the $node TNS you may obtain also the node's parent, sons and brothers (both left and right). This may be done in several equivalent ways. TNS's of a nodes relatives are stored under values of those `special' keys mentioned above. These keys are chosen in such a way that they should not colide with attribute names and are stored in the following scalar variables:

(You may change these variables if you want, but note, that modifying them once the trees are read may lead to problems:-)

So, to obtain $node's parent's TNS you may use either $node->{$parent} or Get($node,$parent) or even special function Parent($node). The same holds for the first son, left and right brothers while you may also prefere to use the FirstSon(), LBrother() and RBrother() functions respectively. If the node's relative (say first son) does not exist, the value obtained in either of the mentioned ways is still defined but zero.

To create a new node, you usually create a new hash and call NewNode() passing it a reference to the new hash as a parameter.

To modify a node's value for a certain attribute (say 'lemma'), you symply type $node-{``lemma''}=``bar''> (supposed you want the value to become 'bar') or use the Set() function like Set($node,"bar");.

To modify the tree's structure, you may use Cut and Paste function as described bellow or to delete a whole subtree you may use the DeleteTree function. This also frees memory used by the TNS. If you want to delete a subtree of $node, but still keep its root, you may use a construct like:

  DeleteTree(FirstSon($node)) while(FirstSon($node));

Note, that Cut function also deletes a subree from the tree but keeps the TNS in memory and returns a reference to it.

There is also a global variable Fslib::$FSTestListValidity, which may be set to 1 to make Fslib::ParseNode check if value assigned to a list attribute is one of the possible values declared in FS file header. Because this may slow the process of parsing significantly (especially when there is a lot of list attributes) the default value is 0 (no check is performed).


REFERENCE

ReadAttribs (FILE,$aref[,$DO_PRINT[,OUTFILE]])

 Params

   FILE      - file handle reference, like \*STDIN
   $aref     - reference to array
   $DO_PRINT - if 1, read input is also copied to
               $OUTFILE (which must be a filehandle reference, like
               \*STDOUT).
               if 0, read input is also stored to the @$OUTFILE
               array (in this case $OUTFILE is a reference to an array).
   $OUTFILE - output file handle or array reference , \*STDIN if ommited

 Returns:
   A hash, having fs-attribute names as keys
   and strings containing characters identifying 
   types as corresponding values   
   The characters may be some of following
   (as given by the .fs format):

       K        Key attribute
       P        Positional attribute
       O        Obligatory attribute
       L        List attribute
       N        Numerical attribute
       V        Value atribute (for displaying in GRAPH.EXE)

   The $aref should be on input a reference to
   an empty array. On return the array contains
   the key values of the returned hash (the attributes)
   orderd as thay are defined in FS file, i.e. in
   their positional order.
ReadTree (FILE)

 Params: 

   FILE - file handle, like STDIN

 Returns:

   A string containing the next tree read form FILE
   in its source form (only with concatenated lines).
GetTree ($tree,$aref,$href)

 Params: 

   $tree - the source form of a tree with concatenated lines
   $aref - a reference to an array of attributes in their 
           positional order (see ReadAttributes)
   $href - a reference to a hash, containing attributes as keys
           and corresponding type strigs as values

 Returns:

   A reference to a tree hash-structure described below.
   
PrintNode ($node,$aref,$href)

 Params: 

   $node - a reference to a tree hash-structure
   $aref - a reference to an array of attributes in their 
           positional order (see ReadAttributes)
   $href - a reference to a hash, containing attributes as keys
           and corresponding type strigs as values
  Returns:

   Unknown.

 Descrption:

   Prints the node structure referenced by $node 
   to STDOUT in a source format
PrintTree ($node,$aref,$href)

 Params: 

   $node - a reference to a tree hash-structure
   $aref - a reference to an array of attributes in their 
           positional order (see ReadAttributes)
   $href - a reference to a hash, containing attributes as keys
           and corresponding type strigs as values
 
 Returns:

   Unknown.

 Descrption:

   Prints the tree having its root-node referenced by $node 
   to STDOUT in a source format
Parent($node), FirstSon($node), LBrother($node), RBrother($node)

 Params: 

   $node - a reference to a tree hash-structure

 Returns:

   Parent, first son, left brother or right brother resp. of
   the node referenced by $node
Next($node,[$top]), Prev($node,[$top])

 Params: 

   $node - a reference to a tree hash-structure
   $top  - a reference to a tree hash-structure, containing
           the node referenced by $node

 Return:

   Reference to the next or previous node of $node on 
   the backtracking way along the tree having its root in $top.
   The $top parameter is NOT obligatory and may be omitted.
   Return zero, if $top of root of the tree reached.
Cut($node)

 Params: 

   $node - a reference to a node

  Description:

   Cuts (disconnets) $node from its parent and brothers
 
  Returns:

   $node
Paste($node,$newparent,$href)

 Params: 

   $node      - a reference to a (cutted or new) node
   $newparent - a reference to the new parent node
   $href      - a reference to a hash, containing attributes as keys
                and corresponding type strigs as values
 Description:

   connetcs $node to $newparent and links it
   with its new brothers, placing it to position
   corresponding to its numerical-argument value
   obtained via call to an Ord function.  

 Returns $node
Special($node,$href,$defchar)

 Exported with EXPORT_OK

 Params: 

   $node    - a reference to a tree hash-structure
   $href    - a reference to a hash, containing attributes as keys
              and corresponding type strigs as values
   $defchar - a type string pattern

 Returns:

   Value of the first $node attribute of type matching $defchar pattern
ASpecial($href,$defchar)

 Exported with EXPORT_OK

 Params: 

   $href    - a reference to a hash, containing attributes as keys
              and corresponding type strigs as values
   $defchar - a type string pattern

 Returns:

   Name of the first attribute of type matching $defchar pattern
AOrd, ASentOrd, AValue, AHide ($href)

 Exported wiht EXPORT_OK

 Params:

   $href    - a reference to a hash, containing attributes as keys
              and corresponding type strigs as values

 Description:

 Are all like Ord, SentOrd, Value, Hide only except for
 they do not get $node as parameter and return attribute
 name rather than its value.
Ord($node,$href)

 Exported with EXPORT_OK

 Params: 

   $node - a reference to a tree hash-structure
   $href - a reference to a hash, containing attributes as keys
           and corresponding type strigs as values

 Returns:

   $node's ord (value of attribute declared by type character N)
   Same as Special($node,$href,'N')
Value($node,$href)

 Exported with EXPORT_OK

 Params: 

   $node - a reference to a tree hash-structure
   $href - a reference to a hash, containing attributes as keys
           and corresponding type strigs as values

 Returns:

   $node's value attribut (value of attribute declared by type character V)
   Same as Special($node,$href,'V')
SentOrd($node,$href)

 Exported with EXPORT_OK

 Params: 

   $node - a reference to a tree hash-structure
   $href - a reference to a hash, containing attributes as keys
           and corresponding type strigs as values

 Returns:

   $node's sentence ord (value of attribute declared by type character W)
   Same as Special($node,$href,'W')
Hide($node,$href)

 Exported with EXPORT_OK

 Params: 

   $node - a reference to a tree hash-structure
   $href - a reference to a hash, containing attributes as keys
           and corresponding type strigs as values

 Returns:

   "hide" if $node is hidden (actually the value of attribute declared
   by type character H)
   Same as Special($node,$href,'H')
IsList($attr,$href)

 Params:

   $attr - an atribute name
   $href - a reference to a hash, containing attributes as keys
           and corresponding type strigs as values

 Returns:

   1 if attribut $attr is declared as a list (L) in hash of attribute defs
   (referenced in) $href
   0 otherwise
ListValues($attr,$href)

 Params:

   $attr - an atribute name
   $href - a reference to a hash, containing attributes as keys
           and corresponding type strigs as values

 Returns:

   a list of allowed values for attribute $attr as defined in
   the hash of attribyte defs $href
Set($node,$attribute,$value)

 Params: 

   $node      - a reference to a node
   $attribute - attribute
   $value     - value to fill $node's $attribute with

 Description:

   Does the same as $node->{$attribute}=$value
Get($node,$attribute)

 Params: 

   $node      - a reference to a node
   $attribute - attribute

 Return:

   Returns $node->{$attribute}
DrawTree($node,@attrs)

 Params: 

   $node      - a reference to a node
   $attrs     - list of attributes to display

 Description:

   Draws a tree on standard output using character graphics. (May be
   particulary useful on systems with no GUI - for real graphical
   representation of FS trees look for Michal Kren's GRAPH.EXE or
   Perl/Tk based program "tred" by Petr Pajas.
THE TREE NODE-STRUCTURE (TNS)

 Description:

 TNS is a normal hash, whose keys are names of attribute
 and whose values are strings, values of the correspoding 
 attributes (as they are given in the FS format source). 

 In addtion, few other keys and values are added to each node:

   "Parent"    which is a reference to the parent node (or zero if N/A)
   $firstson  a reference to the first son's node (or zero)
   "RBrother"  a reference to the first right brother (or zero)
   $lbrother  a reference to the first left brother (or zero)

 You may initialize a new node by calling NewNode($node),
 where $node is a reference to some (existing and rather empty) hash.


SEE ALSO

http://ufal.mff.cuni.cz/local/trees/format_fs.html