Our preliminary topics include:
(a) selection of diverse or balanced corpora with few licensing
restrictions for common annotation by the community. Possible corpora
include the "open" portion of the American National Corpus and
Wikipedia XML, a freely available cleaned-up corpus that is derived
from the Wikipedia;
(b) approaches to discourse coherence, especially as resulting from
different interacting annotation layers, and its applications to
computational linguistics;
(c) annotation systems/frameworks and interoperability, including
the feasibility of applying a common annotation framework to various
annotation types, language processing tasks, modalities, and
languages, especially as it could enable the merging of annotations of
diverse phenomena produced by different systems.
We will attempt to lay out clearly and precisely the assumptions on such topics held by members of the annotation community and in doing so, we hope to both: (1) lay the foundations for the meaningful integration of annotation resources; and (2) assess the limitations of integrated approaches.