Prague Markup Language (PML)

The Prague Markup Language (PML) is an XML-based, universally applicable data format based on abstract data types intended primarily for interchange of linguistic annotations. It is completely independent of a particular annotation schema. It can capture simple linear annotations as well as annotations with one or more richly structured interconnected annotation layers, dependency or constituency trees.

A concrete PML-based format for a specific annotation is defined by describing the data layout and XML vocabulary in a special file called PML Schema and referring to this schema file from individual data files (instances). The schema can be used to validate the instances. It is also used by applications to "understand" the structure of the data and to choose an optimal in-memory representation. The generic nature of the PML makes it very easy to convert data from other formats to the PML without a loss of information.

The PML was developed at the Institute of Formal and Applied Linguistics of the Charles University in Prague. It was first used in the Prague Dependency Treebank 2.0 and many other treebanks since. Conversion tools for various existing treebank formats are available, too.