ntred documentation

Table of contents


ntred

ToC

ntred - controller/hub/client interface to a cluster of btred servers

SYNOPSIS

ToC

To query the servers:

  ntred [-m|-I macro-file] [-e code] [--hub hub] [--port port]
        [-N|-H] [-T] [--key-file file] [--filelist file-list ] [-L files]
        -- script-arguments

To start remote servers and a hub:

  ntred -i [--servers server[,server,...]] [--serverlist server-list]
           [--filelist file-list] [--max-servers num]
           [--allow-no-trees] [--no-secondary-files]
           [--no-server-check] [--no-server-start] [--no-hub-start]
           [--old-dist-method] [--safe-mode] [--server-debug]
           [-m|-I macro-file] [--disable-extensions list] [--enable-extensions list]
           [-M module] [--btred path-to-btred]
           [--ssh ssh-command] [--local]
           [--key-file] [--hub hub] [--port port] [file [...]]

To close all remote servers and a hub:

  ntred --quit|--ps-quit

To kill all remote servers and a hub:

  ntred --kill|--ps-kill

To manage files on the servers:

  ntred --list-files|--list-changed-files

  ntred --reload-files [--disable-extensions list] [--enable-extensions list]

  ntred --reload-macros [-m macro-file]

  ntred --load-files [--filelist file-list] [file [...]]

  ntred --close-files

  ntred --save-files|--save-changed-files [-s strip-sfx]
       [-a append-sfx] [-p strip-prefix] [-r add-prefix] [-f out-fmt]

  ntred --dump-files [--filelist file-list] [file [...]]

  ntred --upload-file filename < fs-file

Get help:

  ntred -u          for usage (synopsis)
  ntred -h          for help
  ntred --man       for the manual page

DESCRIPTION

ToC

This program is able to start one or more btred servers on a set of host machines over SSH, create a proxy hub to provide the communication between the servers and a client, distribute given files over the servers (provided the servers are able to load the files from given filenames (eg. they share the files over NFS), query the servers using a btred-macro and collect the answers. It is highly recommended to use some password-free authentication method for SSH (e.g. Kerberos or ssh-agent), so that password input is not required each time the SSH connection is made (at least two per-host).

In the client mode, the standard output of the macro is printed to STDOUT of the client. STDERR is reserved for debugging and information messages as well as error messages caused by the macros on the servers. The rest of the error output from a server is stored in a file <logdir>/ntred-server-<host>.log (where <logdir> can be specified using --logdir, NTRED_LOGDIR or TMP or TEMP environment variable and defaults to /tmp if none of the previous exists).

CONFIG FILE

ToC

Options for ntred can be specified on the command line or in a config file (default is ~/.ntredrc, but --ntred-config-file option can be use to set a custom config file). The format of this file is simple: each non-empty line should be of the form:

 option = value

with leading and trailing white-space ignored and with all characters following a '#' ignored. Option names for the config file are the same as command-line options, but without leading - or --.

OPTIONS

ToC

QUERY MODE OPTIONS

  ntred [-m macro-file] [-e code] [--hub hub] [--port port]
        [-N|-H] [-T] [--key-file file] -- script-arguments

--execute|-e code

This is the query to be executed on the remote servers. It usually either of a name of the macro defined in the file given as --macro-file or some Perl one-liner. If omitted, it defaults to 'autostart()' and a macro with this name must be defined in the macro file provided..

--macro-file|-m filename

A file containing macros that are to be sent to the servers and preloaded on the servers before the particular query code is evaluated. The query code to be evaluated must be specified in --execute.

Note that servers already have the set of macros they obtained on startup (i.e. --init): it is namely either the default set of macros (tred.mac) or the set specified via --macro-file, and beside one of these also the set specified via --include-macro-file.

--include-macro-file|-I filename

A file containing additional set of macros to be sent to the servers. In query mode, this option behaves just like --macro-file and is only provided for compatibility with btred. If both these options are used, both sets of macros are loaded.

--hub hostname

The hostname of the hub to connect to (defaults to localhost).

--port port

The port on which the hub is listening (defaults to 1500).

--all-trees|-T

Run the query code on all trees (wrapping the code into a if ($root) { do {{ CODE }} while TredMacro::NextTree() } loop).

--all-nodes|-N

Run the query code on all nodes (you still must use --all-trees or -T to process all trees) (wrapping the code into a while ($this) { CODE ; $this=$this-following }";> loop).

--all-nonhidden-nodes|-H

Run the query code on all nodes that are not hidden (you still must use --all-trees or -T to process all trees). This wraps the code into a while ($this) { CODE ; $this=$this-following_visible(FS()) }";> loop).

--listed-files|-L

Run the query code only on files specified on the command line (provided they are already present on some server, i.e. this option does not make servers load files they don't already have). Filenames may contain ordinary TrEd suffixes of the form ##tree or ##tree.node to indicate that the processing should apply only on a single tree (use in combination with -N) or a single node (use without -T and -N). NTrEd URIs of the form ntred:// are also allowed. Relative file-names are expanded according to the current working directory before they are sent to ntred servers.

--regexp-files|-R regular-expression

Run the query code only on files whose filenames match a given regular expression.

--filelist|-l filename

Like --listed-files|-L but this time the files to be processed are listed in the given file rather than on the command line. Both options may be used together in which case the file-lists are joined.

--key-file

This option may be used to provide a file with a session-key which is necessary for the authentication to the running hub. This defaults to `~/.ntred_session_key'.

HUB AND SERVER MODE OPTIONS

  ntred -i [--servers server[,server,...]] [--serverlist server-list]
           [--filelist file-list] [--max-servers num]
           [--no-secondary-files] [--no-server-check]
           [--no-server-start] [--no-hub-start] [--old-dist-method]
           [--safe-mode] [--server-debug] [--max-retries num]
           [-m macro-file] [-M module] [--btred path-to-btred]
           [--ssh ssh-command] [--local] [--key-file file]
           [--hub hub] [--port port] [file [...]]

--servers list of hosts

A comma separated list of hosts to run btred servers. The hostname may be optionally followed by a comma and a port number thus making it possible to run several btred servers on one host. If the port number is omitted, it defaults to 1600. See also --serverlist.

Ports may be given as a range, e.g. 1600-1800; in this case btred server will use the first port from the range that is free.

Special syntax can be used to start btred servers over SGE cluster using the task queue (qsub and qdel commands must be in PATH):

  qsub://<job_prefix>:<port_range>

In this case ntred schedules one btred server task on the SGE queue for each port in the range. It waits at most two seconds times --max-retries for the first server to start. Tasks of the servers that are not running by that time are abandoned and deleted from the queue.

--serverlist filename

This provides more convenient way to specify servers by providing a file containing a list of servers, one per line. If neither --servers nor --serverlist is provided, then the list of servers is read from ~/.ntred_serverlist.

--filelist filename

A list of files to distribute between servers (one filename per line). Additional files may be given as command-line arguments.

--no-secondary-files

Don't load "secondary" files along with normal files (a file may require other - secondary - file to load along with it; this is typical for stand-off annotation where one tree is built upon another).

--allow-no-trees|-0

Allow files with no trees (normaly such files are considered broken). Note: the short flag is -zero.

--max-servers number

Limit number of servers to start even if the list of servers contains more of them.

--max-retries number

Specifies how many times the hub tries to connect to a btred server before it gives up.

--no-server-check

Skip an initial check for server hosts availability (a dummy attempt for SSH connection).

--no-server-start

Don't start new btred servers on the remote hosts. Instead, start a hub and try to connect to the btred servers already running. This requires a server-session key to be given on the standard input.

--no-hub-start

Start servers on the remote hosts but don't start a hub. The server session key required for authentication to the servers will be printed on the standard output.

--safe-mode|-F

Run btred servers in safe mode in which all macros are processed in a Safe compartment whith some security restrictions. This mode seems to be likely to cause btred servers to suffer from memory leaks.

In the safe mode, only the following opcodes and opcode-sets are allowed (see documentation for the Opcode Perl module):

  :base_core :base_mem :base_loop :base_math
  entereval caller dofile
  print entertry leavetry tie untie bless
  sprintf localtime gmtime sort require

plus :base_orig, but the following opcodes (which are forbidden):

  getppid getpgrp setpgrp getpriority setpriority pipe_op sselect
  select dbmopen dbmclose tie untie

--server-debug

Run btred server with -D flag for some more debugging information.

--macro-file|-m filename

A file containing the default set of macros to be prepared on btred servers. The file (with exactly the same path) must be visible from all server hosts.

--include-macro-file|-I filename

A file containing additional set of macros to be prepared on btred servers. This option is typically used instead of --macro-file to load macros from both filename and the default macro set from (tred.mac). --macro-file can still be used in combination with --include-macro-file to supply a replacement for tred.mac.

--enable-extensions list

Give a comma-separated list of installed TrEd extension names to temporarily enable if disabled in the extension configuration. Use '*' to enable all currently enabled extensions.

This option can only be used with --init or --reload-macros.

--disable-extensions list

Give a comma-separated list of installed TrEd extension names to temporarily disable if enabled in the extension configuration. Use '*' to disable all currently enabled extensions.

This option can only be used with --init or --reload-macros.

--key-file

Allows to specify a file where the the session-key for a client's authentication will be stored. It defaults to ~/.ntred_session_key.

--terminal-encoding|-d encoding

Automatically applies a given character encoding to all stdout output operations on the servers and command-line arguments. Only works with Perl >= 5.8.

--hub hostname

The hostname of the local machine the hub will listen on (defaults to localhost but a machine's hostname may be given to allow remote access to the hub).

--port port

The port number the hub will be listening on (defaults to 1500).

--preload-module|-M module-name

This option is passed to the btred command line when starting a btred server. It makes btred preload a given Perl module at btred startup so as it is available to all macros (DOES NOT WORK WITH RESTRICTED MODE). This option may be specified more than once with different modules.

--old-dist-method

Use old benchmark-based distribution method.

HUB CONTROL OPTIONS

  ntred --list-files|--list-changed-files

  ntred --reload-files [-filelist file-list] [--listed-files file [...]]

  ntred --reload-changed-files

  ntred --reload-macros [-m macro_file]

  ntred --load-files [--filelist file-list] [file [...]]

  ntred --close-files

  ntred --save-files|--save-changed-files [-s strip-sfx]
       [-a append-sfx] [-p strip-prefix] [-r add-prefix] [-f out-fmt] [--knit]

  ntred --quit

  ntred --kill|--ps-kill [--servers server[,server,...]] [--serverlist server-list]

  ntred --break|--ps-break

  ntred --dump-files [--filelist file-list] [file [...]]

  ntred --upload-file filename < fs-file




--init|-i

Start remote btred servers and a hub. See HUB AND SERVER MODE OPTIONS.

--keep-empty-servers|-E

Keep alive those servers which do not get any file during the initial file distribution (by default, unused servers are automatically closed).

--quit

Sends the hub and all the servers a command to quit.

--break

Tells ntred nub to send USR1 a signal to all running btred servers. Upon receiving that signal, the servers should stop current processing and return to the request awaiting state.

--ps-kill

Similar to --break but tries to identify btred server processes by looking at the output of the system command ps x -o pid,command. This may help if killall -USR1 btred doesn't work.

--kill

Runs killall -9 ntred on the local machine and and killall -9 btred on the server hosts listed with --servers, --serverlist, or in ~/.ntred_serverlist.

--ps-kill

Similar to --kill but tries to identify btred server processes by looking at the output of the system command ps x -o pid,command. This may help if killall -TERM btred doesn't work.

--list-files

List all files currently open on servers.

--list-changed-files

List all files that have been changed by some macro. Note, that a macro has to claim that the file was changed by setting $TredMacro::FileChanged variable to 1, otherwise the btred server would never notice.

--listed-files|-L

Apply request only on listed files (currently only works for queries and --reload-files request).

--count-files

Query number of files open on each server.

--reload-files

Send a command to the btred servers to reload all open files. If --filelist or --listed-files options are given, reload only files occuring in the given lists (all other files remain intact in servers' memory).

--reload-changed-files

Send a command to the btred servers to reload files that have been modified since (re)loaded.

--reload-macros

Send command to the btred servers to reload the initial macro file. If -m (--macro-file) is specified, the servers use the given macro-file instead of the original one (specified when initializing btred servers). Note, that the file (with exactly the same path) must be visible from all server hosts.

Options --disable-extensions and --enable-extensions can be used to define a set of extensions to use.

--load-files

Send a command to the hub to distribute files given on the command-line or those specified using --filelist of in ~/.ntred_filelist to the servers.

Warning: Do not load files already in servers' memory, otherwise you may get duplicates (depending on to which server the files get distributed). Use --list-files to see which files are already loaded.

See also --reload-files.

--close-files

Send a command to the btred servers to close all open files.

--save-files

Send a command to the servers to save all open files. The filenames of the saved files may be modified using --add-prefix, --strip-prefix, --strip-suffix, --append-suffix.

--save-changed-files

Same as --save-files except that only files that have been changed by some macro will be saved. Note, that a macro has to claim that the file was changed by setting $TredMacro::FileChanged variable to 1, otherwise the btred server would never notice. See also --list-changed-files.

--knit|K ALL|NONE|name1,name2,...

If a file is saved, save/update also listed types of reffiles the file pulled data from. For the moment, this only makes sense with the PML backend which supports so called knitting, i.e. a method to pull certain data from external resources and push it back (with all changes) to the original position in the resource when saving the file. This option allows to list the types of resources (in PML the types are the reference names listed in the PML schema) which should be saved. Default is NONE. This type of resources doesn't include so called secondary files.

--dump-files

Dumps given files or trees to standard output in FS format as they are in memory of the btred servers. Individual dumps are separated with `//FSEND' preceded and followed by two newline characters. To output a single tree, follow the filename with ##n suffix where n is the absolute position of the tree in the file (starting from one). The following example shows how csplit command can be used to save individual dumps into separate files:

  ntred --dump <files> | csplit -z -f out -b '%d.fs' - '/\/\/FSEND/+2' '{*}'

To merge these separate files into one huge FS file, use

  any2any -m hugeout.fs out*.fs

--upload-file

This command is kind of a reverse to dump. It takes a filename from the first command line argument and a FS file from the standard input and sends them to the btred servers. The server possessing the file with the given filename replaces its own in-memory copy of the file with the one provided on the standard input. To update a single tree, follow the filename with ##n suffix where n is the absolute position of the tree in the file (starting from one).

--no-secondary-files

Ignore "secondary" files even if loaded. This only affects some commands, such as --list-changed, in which particular case is means that a file, whose secondary file was changed, is not reported as changed unless the (primary) file itself was marked as changed.

--allow-no-trees|-0

Allow files with no trees (normaly such files are considered broken). Note: the short flag is -zero.

SAVING OPTIONS

--strip-prefix|-p regexp

Remove strings matching given regexp from the beginning of filenames before saving.

--add-prefix|-r prefix

Prepend output filenames with the given prefix when saving.

--strip-suffix|-s regexp

Strip strings matching given regexp from the end of filenames when saving.

--add-suffix|-a suffix

Append given suffix to the filenames when saving.

--output-format|-f [fs|csts|trxml|tei|storable]

Format to use for files being saved.

--no-secondary-files

Don't save "secondary" files (not even if changed). Normally, secondary files (if loaded) are saved along with their primary files (the exactly same file-name prefix/suffix processing and format apply to both the primary and secondary files).

GENERAL OPTIONS

--glob|-g

Apply Perl glob function on the filename patterns given on the command-line. This expands possible wild-card patterns on each of the filename command-line argument as the standard Unix shell /bin/csh would do. This can not only help in a situation where the shell used doesn't support wildcard expansion, but can also be used to reduce the number of the command-line arguments passed to the process in cases where the argument list would after the shell-expansion exceed a system limit. Note, that currently expansion is performed on the client regardless of the type of request. This may change in the future versions.

--usage|-u

Print a brief help message on usage and exits.

--help|-h

Prints the help page and exits.

--man

Displays the help as manual page.

--quiet|-q

Suppress all NTRED-CLIENT/NTRED-HUB messages on error output.

--really-quiet|-Q

Redirect all std error output to /dev/null.

--ssh command

This option may be used to specify the ssh/rsh command to use to connect to remote servers. Default value is `ssh -o ConnectTimeout=10'.

--local

Run btred servers locally. It ignores all non-local entries in the .ntred_serverlist (i.e. entries not matching local machine's hostname). There still may be more BTrEd instances, provided they use different ports. In this case, SSH is not used.

Use --servers localhost:XY (where XY is port number) if you wish to run a single server on the loopback interface.

--btred command

This option may be used to specify the command to use to start a btred server on a remote host. The command must accept any btred parameters.

SECURITY ISSUES

ToC

USE AT YOUR OWN RISK. IF SECURITY IS A CRITICAL ISSUE OR IF IN DOUBT, DON'T USE IT AT ALL.

Why is security an issue here? Because btred servers execute almost arbitrary Perl code provided by the client. In the --unrestricted mode such code may contain arbitrary commands such as system() or open(). It is therefore desirable that the servers are not open for all parties.

The following precautions have been taken to lower the potential security risks:

1) Both btred servers and hub require an authorization based on verification of a MD5 signature of a random data block (generated by the server in case of the hub-to-btred-server communication and by the hub in case of the client-to-hub communication) xor-ed with an authorization key known to both parties. Although the communication is unencrypted, the client must with each request send a MD5 checksum of the request XORed with the secret authorization key. Only requests whose signature is verified by the server, are responded to.

2) There may be only one connection from a hub to a server. As soon as it is closed, the server terminates.

3) If the servers are started by the hub itself (using --init) the authorization key is created by the hub and is passed to the btred server via a ssh encrypted pipe.

4) For the client's disposal, the authorization key is stored in user's home directory as ~/.ntred_session_key with permissions set to 600 (only user can read or write). This theoretically (depending on the general security of the system) limits the access to the hub (and thus to the servers) to the user running the hub only. It may, though, be obviously abused from the local root account to execute arbitrary perl code on all btred server hosts. This might especially be undesirable if the hub runs on a machine whose administrator would normaly have no user access to the machines running btred servers. Another possible security issue might arise if user's home directory is on a remote NFS server, so that NFS intervenes accessing the key file. Since NFS uses an unencrypted protocol, network sniffing techniques may be used to obtain the authorization key and hence run arbitrary code on btred hosts. If such situations are likely to happen (e.g. in a large network) it is advisable to use a different location for the authorization key (see --key-file), e.g. /tmp.

5) It is possible to restrict Perl code evaluated on the servers to a safer compartment, where some critical Perl commands are disabled. In some cases, these restrictions may not be sufficient, in other they may be too strict. Some memory leaks can appear when Safe compartment is used. See --safe-mode above for more discussion.

6) Unless --hub option is used, the hub runs on localhost and as such is not (under normal circumstances) open for connections from the outside world. If you are considering making the hub listen on a non-local interface, note that it is a much better option to configure a secure SSH tunnel.

FILES

ToC

~/.ntred_serverlist - default list of servers to use

~/.ntred_filelist - default list of files to load on servers

~/.ntred_session_key - client/hub session key

LICENSE

ToC

This software is distributed under GPL - The General Public Licence. Full text of the GPL can be found in the LICENSE file distributed with this program and also on-line at http://www.gnu.org/copyleft/gpl.html.

AUTHORS

ToC

Petr Pajas <pajas@matfyz.cz>

Zdenek Zabokrtsky <zabokrtsky@ufal.mff.cuni.cz>

Copyright 2003-2008 Petr Pajas and Zdenek Zabokrtsky, All rights reserved.