HTML page annotation tool HOWTO

The tool consists of several programs written in different programming languages.

Annotation interface

Annotation is done in the web browser, after typing the URL of the directory you installed the tool in, you should get a login screen. There are two modes of access: "demo mode" and "regular access mode".

Demo mode

In the demo mode, you can type in a URL of some web page, load it ("Load" button), annotate it, and see ("View" button) the result of you annotation.

The type of annotation is selected by clicking on the buttons in the left-top corner of the browser window. The annotation itself is done by clicking on the beginning and end of the text to be annotated.

Regular access mode

After the login, you can select the page group and page for annotation from the drop-down menu. The asterisk before the page name marks pages already (at least partially) annotated.

As in the demo mode, the type of annotation is selected by clicking on the buttons. Result of the annotation is saved automatically when you change the page from the menu or by clicking on a Save button. Alternatively, you can click on a Next button, which saves the annotation and loads following page from the page list.

Key shortcuts

For faster navigation, you can use keys instead of clicking on the buttons, as summarized in the following table.

ActionButtonKey
switch annotation type to HeaderHeaderh or 1
switch annotation type to TextTextt or 2
switch annotation type to OtherOthero or 3
save page and load next page (regular mode)Nextn
save page (regular mode)Saves
undo last annotationN/Au
redo last annotationN/Ar
cancel current annotationN/Aesc

Administration interface

Administration is done using a command-line interface.

There are two directories with annotation-related stuff. First one must be in the web-space accessible tree (a "web directory"). All scripts and programs are there, as well as a config file.

The second directory (a "data directory") should be placed out of the reach of the web-space. In this directory, user and group config files, as well as the actual annotation data are located.

Configuration

The basic configuration file resides in the web dir and specifies basic directory paths. It is a plain text file.

Users

Users are defined in the file users in the data directory. Every line represents one user, passwords may be in a plain text or SHA1 hash format. Format of a line is:

login password group1 group2 ...

e.g.

qiq secret new_qiq_group1

Page groups

Pages are grouped into so called page groups. These groups are defined in the groups directory (within the data directory). Every line in the group file contains description of a page to be annotated. Format of a line is:

URL status path/error message

e.g.

http://google.com/ OK d7abb22246/http/google.com_80/_._.html

URL Import

In order to import new pages, follow these steps:

You can restrict the scope of the script processing by appending group name(s) as the last argument (e.g. ./status.pl new_qiq_group1).

Getting Annotated data

Result of the annotation process is the original file (possibly converted to utf-8 and tidied up by HTML Tidy) with annotation marked using <span> element. You can get these files running ./export.pl script. As in the previous case, you can specify one or more groups for script to act on. The result of this script is the list of paths for the resulting HTML files.