PONK User's Manual

This section describes the command line tool. The REST API is described on the API Reference page.

1. Running PONK

The simplest way to run PONK is to provide a plain text as a standard input and get the result in html format at the standard output.

./ponk.pl --stdin

The input is assumed to be in UTF-8 encoding and can be either a plain text or (with --input-format md switch) a MarkDown text.

The following line runs PONK with a MarkDown text file as an input and the result given in the CONLL-U format at the standard output.

./ponk.pl --input-file [input_file_name] --input-format md --output-format conllu

The input file can be in MS Word DOCX format, as in the following example; in that case, it is internally transformed to MarkDown (using locally installed Pandoc) and processed as such. Additionally, in this example, we set the output format to HTML and set the logging level to 0, i.e. full logging.

./ponk.pl --input-file ../data/pokus.docx --input-format docx --output-format html --logging-level 0

The result in the selected output format goes always to the standard output; additionally, for logging purposes, the result can be stored to a file, e.g. the following command will send the result in HTML to the standard output and also store the result in the CoNLL-U format in a file.

./ponk.pl --input-file [input_file_name] --output-format html --store-format conllu

The full command syntax of running PONK

Usage: ponk.pl [options]
options:  -i|--input-file [input text file name]
         -si|--stdin (input text provided via stdin)
         -if|--input-format [input format: txt (default), md, docx]
         -of|--output-format [output format: html (default), txt, md, conllu]
         -os|--output-statistics (add PONK statistics to output; if present, output is JSON with two items: data (in output-format) and stats (in HTML))
         -sf|--store-format [format: log the output in the given format: txt, html, conllu]
         -ss|--store-statistics (log statistics to an HTML file)
         -ll|--logging-level (override the default (anonymous) logging level (0=full, 1=limited, 2=anonymous))
          -v|--version (prints the version of the program and ends)
          -n|--info (prints the program version and supported features as JSON and ends)
          -h|--help (prints a short help and ends)

1.1. Input Formats

The input format can be specified using the --input-format option. Currently supported input formats are:

  • txt (default): the input is a plain text
  • md: the input is a MarkDown text
  • docx: the input is a MS Word DOCX file

1.2. Output Formats

The output format is specified using the --output-format option. Currently supported output formats are:

  • html: the output in HTML
  • txt: the output in a plain text
  • md: the output in MarkDown (not yet implemented)
  • conllu: the CoNLL-U format - not available via API