NAME

TectoMT::Document


DESCRIPTION

A document consists of a sequence of bundles, mirroring a sequence of natural language sentences (typically, but not necessarily, originating from the same text). Attributes (attribute-value pairs) can to attached to a document as a whole.


METHODS

Constructor

my $new_document = TectoMT::Document->new();

Creates a new empty document object.

my $new_document = TectoMT::Document->new( { 'fsfile' => $fsfile } );

Creates a TectoMT document corresponding to the specified Fsfile object.

my $new_document = TectoMT::Document->new( { 'filename' => $filename } );

Loads the tmt file and creates a TectoMT document corresponding to its content.

Access to the underlying Fslib representation

$document->tie_with_fsfile($fsfile);

Associates the given document with a FSFile object which will be used as its underlying represenatation. Which means that for each FSFile sentence a new TectoMT::Bundle object is created and for each tree in the FSFile sentence representation a new tree of TectoMT::Node objects is built. Both representations are interlinked in both directions.

$document->untie_from_fsfile();

Deletes the mutual references between the tied fsfile and its TectoMT mirror.

$document->get_fsfile_name();

Returns the name of the file with the underlying Fslib representation.

my $fsfile = $document->get_tied_fsfile();

Returns the associated FSFile object used as the documents's underlying represenatation. Fatal error is no such object is associated.

Accessing directly the PML files

open, save, save_as Not implemented yet.
my $filename = $fsfile->get_fsfile_name();

Access to attributes

my $value = $document->get_attr($name);

Returns the value of the document attribute of the given name.

$document->set_attr($name,$value);

Sets the given attribute of the document with the given value. If the attribute name is 'id', then the document's indexing table is updated.

Access to generic attributes and trees

Besides document attributes with names statically predefined in the TectoMT pml schema (such as 'czech_source_text'), one can use generic attributes, which are parametrizable by language (using ISO 639 codes) and direction (S for source, T for target). Attribute names then look e.g. like 'Sar text' (source-side arabic text).

my $value = $document->get_generic_attr($name);
$document->set_generic_attr($name,$value);

Access to the contained bundles

my @bundles = $document->get_bundles();

Returns the array of bundles contained in the document.

my $new_bundle = $document->new_bundle();

Creates a new empty bundle and appends it at the end of the document.

my $new_bundle = $document->new_bundle_before($existing_bundle);

Creates a new empty bundle and inserts it in front of the existing bundle.

my $new_bundle = $document->new_bundle_after($existing_bundle);

Creates a new empty bundle and inserts it after the existing bundle.

Node indexing

$document->index_node_by_id($id,$node);

The node is added to the id2node hash table (as mentioned above, it is done automatically in $node->set_attr() if the attribute name is 'id'). When using undef in the place of the second argument, the entry for the given id is deleted from the hash.

my $node = $document->get_node_by_id($id);

Return the node which has the value $id in its 'id' attribute, no matter to which tree and to which bundle in the given document the node belongs to.

It is prohibited in TectoMT for IDs to point outside of the current document. In rare cases where your data has such links, we recommend you to split the documents differently or hack it by dropping the problematic links.

$document->id_is_indexed($id);

Return true if the given id is already present in the indexing table.

$document->get_all_node_ids();

Return the array of all node identifiers indexed in the document.


COPYRIGHT

Copyright 2006 Zdenek Zabokrtsky. This file is distributed under the GNU General Public License v2. See $TMT_ROOT/README