MasKIT User's Manual

This section describes the command line tool. The REST API is described on the API Reference page.

1. Running MasKIT

The simplest way to run MasKIT is to provide a plain text as a standard input and get the result in txt format at the standard output.

./maskit.pl --stdin

The input is assumed to be in UTF-8 encoding and can be either a plain text or (with --input-format presegmented switch) a pre-segmented, i.e. sentence per line text.

The following line runs MasKIT with a presegmented (sentence per line) plain text file as an input and the result given in the CONLL-U format at the standard output.

./maskit.pl --input-file [input_file_name] --input-format presegmented --output-format conllu

The result in the selected output format goes always to the standard output; additionally, for logging purposes, the result can be stored to a file, e.g. the following command will send the result in HTML to the standard output and also store the result in the CONLL-U format in a file.

./maskit.pl --input-file [input_file_name] --output-format html --store-format conllu

The full command syntax of running MasKIT

Usage: maskit.pl [options]
options:  -i|--input-file [input text file name]
         -si|--stdin (input text provided via stdin)
         -if|--input-format [input format: txt (default) or presegmented]
         -rf|--replacements-file [replacements file name]
          -r|--randomize (if used, the replacements are selected in random order)
          -c|--classes (if used, classes are used as replacements)
         -of|--output-format [output format: txt (default), html, conllu]
          -d|--diff (display the original expressions next to the anonymized versions)
         -ne|--named-entities [scope: 1 - add NameTag marks to the anonymized versions, 2 - to all recognized tokens]
         -os|--output-statistics (add MasKIT statistics to output; if present, output is JSON with two items: data (in output-format) and stats (in HTML))
         -sf|--store-format [format: log the output in the given format: txt, html, conllu]
         -ss|--store-statistics (log statistics to an HTML file)
          -v|--version (prints the version of the program and ends)
          -n|--info (prints the program version and supported features as JSON and ends)
          -h|--help (prints a short help and ends)

1.1. Input Formats

The input format can be specified using the --input-format option. Currently supported input formats are:

  • txt (default): the input is a plain text
  • presegmented: the input is a presegmented plain text, i.e. each sentence is on a single line; empty lines mark paragraph breaks

1.2. Output Formats

The output format is specified using the --output-format option. Currently supported output formats are:

  • txt (default): the output in a plain text; the original texts (if present in the output thanks to --diff option) are diplayed next to the replacements (separated from the replacement by an underscore and enclosed in square brackets).
  • html: the output in HTML; the replacements are colour-marked, the original texts (if present in the output thanks to --diff option) are in subsript, enclosed in square brackets and striked through.
  • conllu: the CoNLL-U format - not yet implemented.

2. Running the MasKIT REST Server

TODO