The Linguistic Annotation Workshop (The LAW)

A Merger of NLPXML 2007 and FLAC 2007

Linguistically annotated corpora play a major role in parsing, information extraction, question answering, machine translation and many other areas of computational linguistics, and provide an empirical testbed for theoretical linguistics research. This has led to a proliferation of annotation systems, frameworks, formats, and schemes. Recognition of the need to harmonize annotation practices and frameworks has become increasingly critical, as witnessed by numerous workshops dealing with different aspects of linguistic annotation over the past few years.

The Linguistic Annotation Workshop (The LAW) will provide the first single forum for consideration of these different aspects by merging NLPXML: Natural Language Processing and XML and FLAC: Frontiers in Linguistically Annotated Corpora, which is itself a merger between Linguistically Interpreted Corpora (LINC) and Frontiers in Corpus Annotation (FCA). In total, the LAW will be the convergence of 14 previous workshops (5 NLPXML, 1 FLAC, 6 LINC and 2 FCA).

The goals of this workshop include:

(1) The exchange and propagation of research results with respect to the annotation, manipulation and exploitation of corpora, taking into account different applications and theoretical investigations in the field of language technology and research;
(2) Working towards the harmonization and interoperability from the perspective of the increasingly large number of tools and frameworks that support the creation, instantiation, manipulation, querying, and exploitation of annotated resources;
(3) Working towards a consensus on all issues crucial to the advancement of the field of corpus annotation.

The workshop will include presentations of long (8 page) and short (4 page) papers, demonstrations of annotation tools and invited presentations by "working groups", as discussed here, followed by an open discussion. Long papers should reflect work in an advanced state, but short papers may describe more preliminary work and pilot studies. Papers topics may cover any aspect of linguistic annotation including:

  1. New and innovative annotation schemes
  2. Machine learning and knowledge-based methods for automation of corpus annotation
  3. Linguistic considerations for merging of annotation of distinct phenomena
  4. Comparison of annotation schemes
  5. Evaluation considerations for corpus annotation
  6. Comparison and/or evaluation of existing annotation systems, including functionality, common/missing features, accommodation of different input/output formats and resource types (lexicons, knowledge bases, ontologies, etc.)
  7. Creation, maintenance, and interactive exploration of annotation structures and annotated data
  8. Representation formats/structures for merged annotations of different phenomena, and means to explore/manipulate them
  9. Assessment of, and potential means to achieve, interoperability of annotation formats/frameworks among different systems as well as different tasks, frameworks, modalities, and languages

The workshop will also include a one-hour demonstration session for annotation systems and tools. Proposals for system demonstrations should follow the short paper submission format. The proposal should provide an overview of the system to be demonstrated, including functionality, supported input/output formats or structures, supported languages and modalities, etc. Accepted proposals will appear in the proceedings and are intended to provide background for the demonstration.

We will also be giving an Innovative Student Annotation Award to one student presenter -- please indicate if your paper is a student paper. This includes waiving of the workshop fee for one student.

Target Audience

Those interested in creating and using existing and future annotated corpora and other language resources. This includes annotators, lexicographers, system developers and those designing NLP system evaluation tasks for the NLP community.