Prague Markup Language (PML)


PML is a generic data format based on XML intended for storing linguistically annotated data, such as the Prague Dependency Treebank, also annotation lexicons, etc.



PML-related tools

pml_schema.rng - Relax NG grammar (XML) for PML schema files (contains some embedded Schematron rules)

pml_simplify - a tool converting modular PML schemata to simple PML schemata for easier processing (Perl)

pml_simplify.xsl - a newer implementation of pml_simplify in XSLT 2.0

pml2rng.xsl - XSLT stylesheet generating a Relax NG from a simple PML schema

pml_common.rng - common Relax NG file included by all Relax NG grammars generated by pml2rng.xsl

validate_pml - a Perl script automating validation of PML instances based on Relax NG (uses pml_simplify and the XSLT stylesheet listed above).

validate_pml_stream - like validate_pml, but more suitable for huge files - requires Jing, or Sun's Multi-Schema Validator with the following shell script wrapper msv.

Querying over PML

PML-TQ - PML Tree Query Language and Engine


Perl libraries for loading and writing PML that are included in the distribution of TrEd, but can be obtained separately using SVN:

svn co svn://
svn co svn://
svn co svn://
They are organized as follows:

Orther tools

Sun Multi-Schema XML Validator (MSV) contains a validator of Relax NG grammars with embedded Schematron rules.

TrEd - tree editor with built-in PML support; the distribution also contains PML-related libraries for Perl

old2pml.btred - btred macro converting PDT 1.0 files to PML

adata2csts.btred - btred macro convertng a-layer (and lower layers) from PML to CSTS

mdata2csts.xsl - XSL transforming m-layer (and w-layer) from PML to CSTS

conll2pml - conversion script from CoNLL-X format to PML

Up-to-date PML schemas for PDT 2.0 annotation






The development of PML is a part of the project "Integration of language resources for information extraction from natural texts", Information Society of Grant Agency of Academy of Sciences of the Czech Republic: 1ET101120503

Petr Pajas, 2006