CAPEK

Čapek: an annotation editor for schoolchildren

We have been developing Čapek, an annotation editor tailored to schoolchildren. It allows them to practice morphology and syntax in a similar way as they normally do at (a Czech) school with a paper and pencil. The editor is designed to be language independent and it presents an extension of the STYX system, an electronic exercisebook of Czech morhology and syntax.

Details on Čapek

Tokenization: A process of tokenization of the sentence is quite a complex task. In school practicing, schoolchildren split sentences into tokens without hesitation more or less intuitively. That is possible mainly due to the fact that the sentences they work with are usually quite simple. Thus we decided not to force the users to tokenize the sentences manually. Instead, the system automatically runs tokenization.
Practicing morphology: After tokenization, the sentence is considered to be a sequence of tokens. The user does the morphological analysis of tokens using a set of combo boxes or context menus. The set of part of speech classes and morphological categories is configurable for any language simply by providing a configuration file corresponding to the given annotation scheme.
Practicing syntax: By default, the editor is designed to enable practicing dependency-based syntax. However, it's possible to add a module for practicing phrase-based syntax. Capek enables to manipulate the nodes via the operations accomplished using the drag and drop techniques:
1. making a node dependent on another node,
2. labeling a node with a syntactic function (as in the case of the morphological analysis, a set of syntactic functions is configurable for any language simply by providing a configuration file corresponding to the given annotation scheme),
3. joining the nodes,
4. splitting a node.
The editor is named after Czech writer Karel Čapek.
We are impplementing Capek for iOS, see degustation.

Download

Unpack the archive.
Run capek\bin\capek.exe on MS Windows; run capek/bin/capek on Linux.

What we are interested in

In the Czech Republic, children (both at the elementary school level and high school level) -- as a very important part of their Czech language education curriculum -- are required to parse sentences into dependencies structured as trees (see the analysis of Yellow kingcups blossemed out near by stream. below).

We would like to see whether there are similar requirements on children in your country, that is, eventually, whether we could broaden our perspective to other languages as well or whether we have to limit ourselves to Czech. Could you please help us with the following questions?

A case study

We use the editor to explore a new way to get more morphologically and syntactically annotated data by transforming their annotation into a more linguistically appropriate schema. We have been testing it for the Czech language and the Prague Dependency Treebank annotation schema.

CONTACT styx@ufal.mff.cuni.cz

Jirka Hana, Barbora Hladká, Marie Konárová
Institute of Formal and Applied Linguistics, Charles University in Prague