UFALMorfo analysis interface for Perl


NAME

Morfo::Analysis::Run - Interface to do morphological analysis with UFALMorfo tools.


SYNOPSIS

use MorfoWrap;


DESCRIPTION

This module provides an interfece to UFALMorfo morphological analysis. It uses directly the C library.


METHODS

new

Create a new analysis object. The analysis object forms all necessary context for analysis. It holds intermediate results and internal analysis configuration.

  my $a = Morfo::Analysis::Run::new($ma);
ma

The analyzator to use. The analyzator is a data structure used for analysis. Since it is only used reading, it may be shared among parallel usages.

run - execute the analysis

Run the analysis and leave the results untouched for further usage.

 $a->run("nesu");

The results is stored in the ms structure. It may be manipulated and/or brought into Perl with several functions described bellow.

It may be considered to be a middle-level interface.

sortL - sort lemmas

The function sortL sorts the internally stored results by the lemma string.

 $a->sortL();

It may be considered to be a middle-level interface.

sortT - sort tags

The function sortT sorts the tag-lists internally stored at in the ms structure with each lemma result.

 $a->sortT();

It may be considered to be a middle-level interface.

sort

The function sort sorts the results both in lemmas and tags. It is equivalent to calling both sortL and sortT.

 $a->sort();

It may be considered to be a middle-level interface.

ll - get Lemma List

The function ll returns a reference to a newly created array containing a list of lemmas retrieved from the internal storage of the ms structure.

 my$lemma_list = $a->ll();

It may be considered to be a middle-level interface.

sf - get Simple from of Full results.

The function sf returns a reference to a newly created array containing a structure of full description of the analysis retrieved from the internal storage of the ms structure.

 my$simple_full = $a->sf();

The simple form means that lemma attributes and tags are returned in its atom/number representation. A few of function described bellow may be used inspect it.

 my $sf = $a->sf();

Returned structure

Each lemma+attributes couple has a hash reference entry in the array. The hash has the following keys

lemma

for the lemma string

tags

for the list of tags. It is a reference to an array of hashes. The hash has the following keys

atom

the tag atom number

src

a number identifying the source of the analysis

attr

a number identifying attributes tied with the lemma. See attr2hash to expand the atom to a hash structure.

It may be considered to be a middle-level interface.

lemmatize

Run the analysis and get lemma list from it. An array reference is returned.

It is equivalent to running the analysis, sorting results, retrieving them into Perl with ll and reseting the internal state.

This function is considered to be a high-level intervace.

simple_full

Run the analysis and get full reprezentation of its results in a simple form. The simple form means that lemma attributes and tags are returned in its atom/number representation. An array reference is returned.

 my $r = $a->simple_full("nesu");

It is equivalent to running the analysis, sorting results, retrieving them into Perl with sf and reseting the internal state.

This function is considered to be a high-level intervace. See the sf function for a description of the resulting structure.

attr2hash

Attributes are internally identified as a number to handle them in an efficient way. The method attr2hash creates a hash containing the full attributes' structude and data.

 my $h = $a->attr2hash($attr_atom);

The created hash contains only the keys for attributes actually present. It may be

syn

for syntactic flags,

sem

for semantic flags,

sty

for style flags,

der

for derivation information and

com

for comments.

All the keys has an array as a value. The array contains a list of strings - the values of the attributes.

tag2pos

Tags are internally identified as a number to handle them in an efficient way. The method tag2pos creates a string containing the positional representation of the tag.

 my $t = $a->tag2pos($tag_atom);


CONSTANTS

The following constant values are used.

@TagSrc

The array TagSrc holds names of tag sources. It is indended to map tag source number returned from the analysis to a short and readable name.

Result Tag Source

$rtsDict

A number used to mark tags that describes analysis of the form recognized in the dictionary.

$rtsForceNG

A number used to mark tags that describes analysis of the form recognized in the dictionary in spite of that the dictionary does not indicate possibility of negation and/or grading.

$rtsPrefix

A number used to mark tags that describes analysis of the form recognized in the dictionary with prefix removed. The tags generated by prefix module and with not allowed negation and/or grading are marked with this number either.

$rtsFallback

A number used to mark tags that describes that the form was not recognized by the dictionary.

$rtsNumber

A number used to mark tags that describes that the form was recognized as a number.