Treex::PML.pm - library for tree processing
use Treex::PML qw(ImportBackends); my @IObackends = ImportBackends(qw(PML Storable)); my $file="trees.fs"; my $fs = Treex::PML::Document->load($file,{ backends => \@IObackends }); if ($fs->lastTreeNo<0) { die "File is empty or corrupted!\n" } foreach my $tree ($fs->trees) { my $node = $tree; while ($node) { ... # do something on node $node = $node->following; # depth-first traversal } } $fs->writeFile("$file.out");
This package provides API for manipulating treebank files.; originally only files in so-called FS format (designed by Michal Kren) were supported, but the current implementation features pluggable I/O backends for other data formats. The module implements a generic data-model of a XML-based format called PML (http://ufal.mff.cuni.cz/pdt/pml).
Treex::PML provides among other the following classes:
a factory class which delegates object creation to a default factory class, which can be specified by the user (defaults to Treex::PML::StandardFactory. It is important that both user and library code uses the create methods from Treex::PML::Factory to create new objects rather than calling constructors from an explicit object class.
This classical Factory Pattern allows the user to replace the standard
family of Treex::PML
classes with customized versions by setting up
a customized factory as default. Then, all objects created by the
Treex::PML library and applications will be from the customized
family.
representing a FS file (consisting of a set of trees, type declarations, meta-data etc.). Treex::PML::Document object has containers for additional (user or application defined) data structures (run-time only).
representing a node of a tree (including the root node, which also represents the whole tree), see "Representation of trees" in Treex::PML::Node for details.
Since some I/O backends require additional resources (such as schemas, DTDs, configuration files, XSLT stylesheets, dictionaries, etc.), For this purpose, Treex::PML maintains a list of so called "resource paths" which I/O backends may conveniently search for their resources.
See PACKAGE FUNCTIONS for description of functions related to pluggable I/O backends and the list resource paths..
$thing
- any Perl scalar (an object, a reference or a non-reference)
This function is an alias for a very useful function
UNIVERSAL::DOES::does(), which does checks if $thing performs the
inteface (role) $role. If the thing is an object or class, it simply
checks $thing->DOES($role) (see UNIVERSAL::DOES
or UNIVERSAL
in Perl >= 5.10.1).
Otherwise it tells whether the thing can be dereferenced as an array/hash/etc.
Unlike UNIVERSAL::isa(), it is semantically correct to use does for something unknown and to use it for reftype.
This function also handles overloading. For example, does($thing, 'ARRAY') returns true if the thing is an array reference, or if the thing is an object with overloaded @{}.
Using this function (or UNIVERSAL::DOES::does()) is the recommended
method for testing types of objects in the Treex::PML
hierarchy
(Treex::PML::Node, Treex::PML::Document
, etc.)
In a list context the list of backends sucessfully loaded, in scalar context a true value if and only if all requested backends were successfully loaded.
@backends
- a list of backend names
Demand loading and using the given modules as the initial set of I/O
backends. The initial set of backends is returned by Backends()
.
This set is used as the default set of backends by <<Treex::PML::Document-
load>>>
(unless a different list of backends was specified in a parameter).
In a list context the list of backends sucessfully loaded, in scalar context a true value if and only if all requested backends were successfully loaded.
@backends
- a list of backend names
In a list context the list of already available backends sucessfully loaded, in scalar context a true value if and only if all requested backends were already available or successfully loaded.
A list of backends already available or sucessfully loaded.
Returns the initial set of backends. This set is used as the default
set of backends by <<Treex::PML::Document-
load>>>.
A list of backends already available or sucessfully loaded.
$backend
- a name of an I/O backend
Returns true if the backend provides all methods required for reading.
$backend
- a name of an I/O backend
Returns true if the backend provides all methods required for writing.
@backends
- a list of backend names
Demand to load the given modules as I/O backends and return a list of backend names successfully loaded. This list may then passed to Treex::PML::Document IO calls.
List of names of successfully loaded I/O backends.
$scalar
- arbitrary Perl scalar
$old_values
- array reference (optional)
$new_values
- array reference (optional)
Returns a deep copy of the Perl structures contained in a given scalar.
The optional argument $old_values can be an array reference consisting of values (references) that are either to be preserved (if $new_values is undefined) or mapped to the corresponding values in the array $new_values. This means that if $scalar contains (possibly deeply nested) reference to an object $A, and $old_values is [$A], then if $new_values is undefined, the resulting copy of $scalar will also refer to the object $A rather than to a deep copy of $A; if $new_values is [$B], all references to $A will be replaced by $B in the resulting copy. Note also that the effect of using [$A] as both $old_values and $new_values is the same as leaving $new_values undefined.
a deep copy of $scalar as described above
Returns the current list of directories used by Treex::PML to search for resources.
@paths
- a list of a directory paths
Specify the complete set of directories to be used by Treex::PML when looking up resources.
@paths
- a list of directory paths
Add given paths to the end of the list of directories searched by Treex::PML for resources.
@paths
- a list of directory paths
Add given paths to beginning of the list of directories searched for resources.
@paths
- a list of directory paths
Remove given paths from the list of directories searched for resources.
$filename
- a relative path to a file
If a given filename is a relative path of a file found in the resource paths, return:
If the option 'all' is true, a list of absolute paths to all occurrences found (may be empty).
If the option 'strict' is true, an absolute path to the first occurrence or an empty list if there is no occurrence of the file in the resource paths.
Otherwise act as with 'strict', but return unmodified $filename
if
no occurrence is found.
If $filename
is an absolute path, it is always returned unmodified
as a single return value.
Options are passed in an optional second argument as key-value pairs of a HASH reference:
FindInResources($filename, { # 'strict' => 0 or 1 # 'all' => 0 or 1 });
Alias for FindInResourcePaths($filename)
.
$dirname
- a relative path to a directory
If a given directory name is a relative path of a sub-directory located in one of resource directories, return an absolute path for that subdirectory. Otherwise return dirname.
Alias for FindDirInResourcePaths($filename)
.
$ref_filename
- a reference filename
$filename
- a relative path to a file
$search_resource_paths
- 0 or 1
If a given filename is a relative path, try to find the file in the
same directory as ref-filename. In case of success, return a path
based on the directory part of ref-filename and filename. If the file
can't be located in this way and the $search_resource_paths
argument is true, return the value of
FindInResourcePaths($filename)
.
For backward compatibility reasons only, Treex::PML exports by default the following function symbol:
ImportBackends
For this reason, it is recommended to load Treex::PML as:
use Treex::PML ();
The following function symbols can be imported on demand:
ImportBackends
, CloneValue
, ResourcePaths
, FindInResources
, FindDirInResources
, FindDirInResourcePaths
, ResolvePath
, AddResourcePath
, AddResourcePathAsFirst
, SetResourcePaths
, RemoveResourcePath
Tree editor TrEd: http://ufal.mff.cuni.cz/~pajas/tred
Prague Markup Language (PML) format: http://ufal.mff.cuni.cz/jazz/PML/
Description of FS format: http://ufal.mff.cuni.cz/pdt/Corpora/PDT_1.0/Doc/fs.html
Related packages: Treex::PML::Schema, Treex::PML::Instance, Treex::PML::Document, Treex::PML::Node, Treex::PML::Factory
Copyright (C) 2006-2010 by Petr Pajas
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.8.2 or, at your option, any later version of Perl 5 you may have available.