Home

Download

Tutorial

Manpage

FAQ


Prospector

Links

Prospector

  • Automatic search in experiment configurations.
  • Several available search algorithms.
  • Uses eman to create/clone/manage individual experiments.
  • Independent of the machine translation playground.
  • Already usable but still under development.

Configuration

In order to run Prospector, the user has to create a configuration directory containing the following files:

  • rules
  • vars
  • traceback
  • score

The file score is an executable that produces a number corresponding to the score or ``fitness'' of a given final eman step.

Apart from these files, the directory also has to contain a subdirectory chunks with parts of eman tracebacks.

Traceback

This file defines how the individual chunks should be combined:

eval
  mert
    tm
      align
    lm

The lines correspond to names of files in the directory chunks and indentation defines which steps depend on each other. Multiple eman steps can be defined in one file. This file exists to add flexibility to configuration. In the future, we would like to extend it to allow alternative/conditional definitions.

The chunks can contain variable slots surrounded by '#' sign:

+- s.align.e357fb70.20120221-1115
|  | ALILABEL=en-#SRCFACTOR#-cs-#TGTFACTOR#
|  | ALISYMS=gdfa
|  | CORPUS=czeng-news
|  | GIZASTEP=s.mosesgiza.fcfbe812.20120221-1114
|  | SRCALIAUG=en+#SRCFACTOR#
|  | TGTALIAUG=cs+#TGTFACTOR#

Variables

Each line of this file defines a variable and its values. The first column is the variable name (must match the slot name in the traceback) and the second column (separated by any number of spaces) contains possible values for that variable, separated by commas.

Rules

This file places restrictions on possible combinations of variable values. The user can optionally define these to avoid evaluating nonsensical configurations or to direct the search. It contains lines with 4 space-delimited columns, for example:

TMSRCAUG   /\+/       STEPS   t0a1-0
TMSRCAUG   /^[^+]*$/  STEPS   t0-0

Each line can be viewed as an if-statement: if 1 matches 2 then 3 must equal 4. Numbers represent the columns on the line. The second column is evaluated as a Perl regular expression. The fourth column must contain an exact value.

Prediction

The user can optionally also include a file predict. It has to be an executable that will, given the path to the final step, output the complexity or cost estimate for such an experiment. If the user also specifies a limit, Prospector will query this program before running each experiment and it will discard experiments with cost over the threshold.

Running Prospector

Usage:

prospector [options] config-directory

Command-Line Arguments

max-running
Maximum number of experiments running in parallel.
search
Search type. Possible values:
  • genetic
  • exhaustive
  • line
  • random
genetic-population-size
Size of one generation in genetic search.
genetic-nbest
The number of best configurations considered as parents for next generation.
genetic-mutation-prob
Probability of mutation in genetic search.
random-limit
The total number of experiments created in random search.
max-allowed-prediction
Threshold value of prediction.
verbose
Be verbose.