Language Technologies for Research in Humanities


NPFL131 / ATKL00349

Pavel Straňák

stranak@ufal.mff.cuni.cz

Friday 12:30–14:00
Palachovo nám. 2, room S131

21. 4. 2023

Language Corpora

Corpus: a big amount of collected text, processed in a uniform way for searching and statistics:

Major Corpus Managers 1/2

Al the following corpus managers use the same query language for advanced queries: CQP / CQL. It is quite useful to Learn some basics of CQL.

Major Corpus Managers 2/2

There are many other corpus interfaces and corpora in them. But the Kontext and TEITOK corpus managers are some of the best in the world.

Corpora of Chinese 1/2

Corpora of Chinese 2/2

At LINDAT we are currently hosting Universal Dependencies and a few treebanks outside, like the Penn Chinese Treebank. We are happy to host other corpora, just get in touch.

Unix shell – for cycle

A way to repeat some processing for each item in a list. For example process a number of files in the same way. To begin with we will just print some file names to the screen in some ways.