Treex::PML documentation

Table of contents


Treex::PML

ToC

Treex::PML.pm - library for tree processing

SYNOPSIS

ToC

  use Treex::PML qw(ImportBackends);

  my @IObackends = ImportBackends(qw(PML Storable));

  my $file="trees.fs";
  my $fs = Treex::PML::Document->load($file,{ backends => \@IObackends });

  if ($fs->lastTreeNo<0) { die "File is empty or corrupted!\n" }
  foreach my $tree ($fs->trees) {
     my $node = $tree;
     while ($node) {
       ...  # do something on node
       $node = $node->following; # depth-first traversal
     }
  }
  $fs->writeFile("$file.out");

DESCRIPTION

ToC

Introduction

This package provides API for manipulating treebank files.; originally only files in so-called FS format (designed by Michal Kren) were supported, but the current implementation features pluggable I/O backends for other data formats. The module implements a generic data-model of a XML-based format called PML (http://ufal.mff.cuni.cz/pdt/pml).

Treex::PML provides among other the following classes:

Treex::PML::Factory

a factory class which delegates object creation to a default factory class, which can be specified by the user (defaults to Treex::PML::StandardFactory. It is important that both user and library code uses the create methods from Treex::PML::Factory to create new objects rather than calling constructors from an explicit object class.

This classical Factory Pattern allows the user to replace the standard family of Treex::PML classes with customized versions by setting up a customized factory as default. Then, all objects created by the Treex::PML library and applications will be from the customized family.

Treex::PML::Document

representing a FS file (consisting of a set of trees, type declarations, meta-data etc.). Treex::PML::Document object has containers for additional (user or application defined) data structures (run-time only).

Treex::PML::Node

representing a node of a tree (including the root node, which also represents the whole tree), see "Representation of trees" in Treex::PML::Node for details.

Resource paths

Since some I/O backends require additional resources (such as schemas, DTDs, configuration files, XSLT stylesheets, dictionaries, etc.), For this purpose, Treex::PML maintains a list of so called "resource paths" which I/O backends may conveniently search for their resources.

See PACKAGE FUNCTIONS for description of functions related to pluggable I/O backends and the list resource paths..

PACKAGE FUNCTIONS

ToC

Treex::PML::does ($thing,$role)

Parameters

$thing - any Perl scalar (an object, a reference or a non-reference)

Description

This function is an alias for a very useful function UNIVERSAL::DOES::does(), which does checks if $thing performs the inteface (role) $role. If the thing is an object or class, it simply checks $thing->DOES($role) (see UNIVERSAL::DOES or UNIVERSAL in Perl >= 5.10.1). Otherwise it tells whether the thing can be dereferenced as an array/hash/etc.

Unlike UNIVERSAL::isa(), it is semantically correct to use does for something unknown and to use it for reftype.

This function also handles overloading. For example, does($thing, 'ARRAY') returns true if the thing is an array reference, or if the thing is an object with overloaded @{}.

Using this function (or UNIVERSAL::DOES::does()) is the recommended method for testing types of objects in the Treex::PML hierarchy (Treex::PML::Node, Treex::PML::Document, etc.)

Returns

In a list context the list of backends sucessfully loaded, in scalar context a true value if and only if all requested backends were successfully loaded.

Treex::PML::UseBackends (@backends)

Parameters

@backends - a list of backend names

Description

Demand loading and using the given modules as the initial set of I/O backends. The initial set of backends is returned by Backends(). This set is used as the default set of backends by <<Treex::PML::Document-load>>> (unless a different list of backends was specified in a parameter).

Returns

In a list context the list of backends sucessfully loaded, in scalar context a true value if and only if all requested backends were successfully loaded.

Treex::PML::AddBackends (@backends)

Parameters

@backends - a list of backend names

Description

In a list context the list of already available backends sucessfully loaded, in scalar context a true value if and only if all requested backends were already available or successfully loaded.

Returns

A list of backends already available or sucessfully loaded.

Treex::PML::Backends ()

Description

Returns the initial set of backends. This set is used as the default set of backends by <<Treex::PML::Document-load>>>.

Returns

A list of backends already available or sucessfully loaded.

Treex::PML::BackendCanRead ($backend)

Parameters

$backend - a name of an I/O backend

Returns

Returns true if the backend provides all methods required for reading.

Treex::PML::BackendCanWrite ($backend)

Parameters

$backend - a name of an I/O backend

Returns

Returns true if the backend provides all methods required for writing.

Treex::PML::ImportBackends (@backends)

Parameters

@backends - a list of backend names

Description

Demand to load the given modules as I/O backends and return a list of backend names successfully loaded. This list may then passed to Treex::PML::Document IO calls.

Returns

List of names of successfully loaded I/O backends.

Treex::PML::CloneValue ($scalar,$old_values?, $new_values?)

Parameters

$scalar - arbitrary Perl scalar $old_values - array reference (optional) $new_values - array reference (optional)

Description

Returns a deep copy of the Perl structures contained in a given scalar.

The optional argument $old_values can be an array reference consisting of values (references) that are either to be preserved (if $new_values is undefined) or mapped to the corresponding values in the array $new_values. This means that if $scalar contains (possibly deeply nested) reference to an object $A, and $old_values is [$A], then if $new_values is undefined, the resulting copy of $scalar will also refer to the object $A rather than to a deep copy of $A; if $new_values is [$B], all references to $A will be replaced by $B in the resulting copy. Note also that the effect of using [$A] as both $old_values and $new_values is the same as leaving $new_values undefined.

Returns

a deep copy of $scalar as described above

Treex::PML::ResourcePaths ()

Returns the current list of directories used by Treex::PML to search for resources.

Treex::PML::SetResourcePaths (@paths)

Parameters

@paths - a list of a directory paths

Description

Specify the complete set of directories to be used by Treex::PML when looking up resources.

Treex::PML::AddResourcePath (@paths)

Parameters

@paths - a list of directory paths

Description

Add given paths to the end of the list of directories searched by Treex::PML for resources.

Treex::PML::AddResourcePathAsFirst (@paths)

Parameters

@paths - a list of directory paths

Description

Add given paths to beginning of the list of directories searched for resources.

Treex::PML::RemoveResourcePath (@paths)

Parameters

@paths - a list of directory paths

Description

Remove given paths from the list of directories searched for resources.

Treex::PML::FindInResourcePaths ($filename, \%options?)

Parameters

$filename - a relative path to a file

Description

If a given filename is a relative path of a file found in the resource paths, return:

If the option 'all' is true, a list of absolute paths to all occurrences found (may be empty).

If the option 'strict' is true, an absolute path to the first occurrence or an empty list if there is no occurrence of the file in the resource paths.

Otherwise act as with 'strict', but return unmodified $filename if no occurrence is found.

If $filename is an absolute path, it is always returned unmodified as a single return value.

Options are passed in an optional second argument as key-value pairs of a HASH reference:

  FindInResources($filename, {
    # 'strict' => 0 or 1
    # 'all'    => 0 or 1
  });

Treex::PML::FindInResources ($filename)

Alias for FindInResourcePaths($filename).

Treex::PML::FindDirInResourcePaths ($dirname)

Parameters

$dirname - a relative path to a directory

Description

If a given directory name is a relative path of a sub-directory located in one of resource directories, return an absolute path for that subdirectory. Otherwise return dirname.

Treex::PML::FindDirInResources ($filename)

Alias for FindDirInResourcePaths($filename).

Treex::PML::ResolvePath ($ref_filename,$filename,$search_resource_path?)

Parameters

$ref_filename - a reference filename

$filename - a relative path to a file

$search_resource_paths - 0 or 1

Description

If a given filename is a relative path, try to find the file in the same directory as ref-filename. In case of success, return a path based on the directory part of ref-filename and filename. If the file can't be located in this way and the $search_resource_paths argument is true, return the value of FindInResourcePaths($filename).

EXPORTED SYMBOLS

ToC

For backward compatibility reasons only, Treex::PML exports by default the following function symbol:

ImportBackends

For this reason, it is recommended to load Treex::PML as:

  use Treex::PML ();

The following function symbols can be imported on demand:

ImportBackends, CloneValue, ResourcePaths, FindInResources, FindDirInResources, FindDirInResourcePaths, ResolvePath, AddResourcePath, AddResourcePathAsFirst, SetResourcePaths, RemoveResourcePath

SEE ALSO

ToC

Tree editor TrEd: http://ufal.mff.cuni.cz/~pajas/tred

Prague Markup Language (PML) format: http://ufal.mff.cuni.cz/jazz/PML/

Description of FS format: http://ufal.mff.cuni.cz/pdt/Corpora/PDT_1.0/Doc/fs.html

Related packages: Treex::PML::Schema, Treex::PML::Instance, Treex::PML::Document, Treex::PML::Node, Treex::PML::Factory

COPYRIGHT AND LICENSE

ToC