s

A sentence. Sentence boundaries are identified at tokenization time, unless there are marked in the source, which is almost never the case. The algorithm for sentence boundary identification used in the CNC is very rudimentary, and it is correct only about 95-98% of the time for general texts, and it s accuracy depends very heavily on the type of the text.

Sentences are identified uniquely within the CNC corpus (as they should be in any corpus). The identification consists of the

The full sentence identification is typically recorded in full at each sentence in the data in the id attribute.

Content


ATTRIBUTES
CONTENT DECLARATION

Tag Minimization
Open Tag: REQUIRED
Close Tag: OPTIONAL

Parent Elements


Top Elements
All Elements


csts DTD