Status: 
to be released
OS: 
Single-page web application

DocMarker

DocMarker is an annotation tool for creating training data for the text-to-form information retrieval NLP task. Say you have a free-form text (rich-text maybe) that contains some information that should be filled out into some structured form. This tool lets you record and annotate this form-filling process.

This tool is built to be extensible so that you can provide your own forms in the JSON Forms schema. You can also add custom form controls. An extension of DocMarker is called a customization. Check out the customization for the RES-Q+ project here:

To see what DocMarker does on its own without being extended, check out this demo:

To try out the demo, open the link, click on "Create New File" and select the default form. DocMarker works in three modes. Now we are in the text-editting mode. You can copy-paste any rich-text into the tool. In the upper right corner you can switch to the annonymization mode. You select a text region by dragging to have it anonymized (replaced by asterisks at the time of file-saving). In the last mode (annotation mode), you fill out the structured form on the right (one field at a time) and annotate the corresponding text on the left by dragging over the text. You first selet a form field by clicking and then you drag over the text region to create the link.

The tool is developed as a single-page web application, using React and JSON Forms and built using Parcel. To create your own customization, copy the example customization folder and follow the instructions there. When your customization gets compiled by Parcel, it produces a folder with index.html and lots of other CSS and JS files that you can upload to any web-hosting service and the application will become available. It is a frontend-only application so no data leaves your web browser (which is ideal for sensitive data), and as a bonus you don't need to run a backend server with database.

GitHub: https://github.com/Jirka-Mayer/doc-marker

 

Screenshot: