Corpora

	Title	Type
	Automated Speech Scoring in Czech	Project
	Bengali Visual Genome	Project
	CorefUD	Project
	Czech Academic Corpus	Project
	Czech Legal Text Treebank	Project
	Czech Malach Cross-lingual Speech Retrieval Test Collection	Project
	Czech Named Entity Corpus	Project
	Czech RST Discourse Treebank 1.0	Project
	CzeDLex - A Lexicon of Czech Discourse Connectives	Project
	CzEng	Project
	CzEngVallex - Czech and English verbal valency	Project
	CzeSL	Project
	Deep Universal Dependencies	Project
	Deltacorpus	Project
	ELITR Minuting Corpus	Project
	EngVallex - English valency lexicon linked to corpora	Project
	European Language Grid	Project
	EUROSAI Corpus	Project
	EVALD 3.0 (Evaluator of Discourse)	Project
	HamleDT	Project
	Hausa Visual Question Answering Dataset	Project
	HindEnCorp	Project
	Hindi Visual Genome	Project
	Implicit relations in text coherence	Project
	Interset	Project
	Lindat KonText	Project
	Malayalam Visual Genome	Project
	Medieval Charter Sections Corpus	Project
	Methods for rapid discourse annotation in selected corpora	Project
	Modeling of Complexity in Czech Literary Texts	Project
	MorfFlex CZ	Project
	Multilingual Corpus Annotation as a Support for Language Technologies	Project
	NomVallex: Valency Lexicon of Czech Nouns and Adjectives	Project
	Odia Visual Genome	Project
	OdiEnCorp	Project
	OVQA	Project
	ParCzech	Project
	PARSEME	Project
	PARSEME	Project
	PAWS (Parallel Anaphoric Wall Street Journal)	Project
	PDT-C	Project
	PDT-Vallex: Valency Lexicon Linked to Czech Corpora	Project
	PDTSC 2.0	Project
	PML-Tree Query	Project
	Prague Czech-English Dependency Treebank	Project
	Prague Czech-English Dependency Treebank 2.0 Coref	Project
	Prague Czech-English Dependency Treebank 3.0	Project
	Prague Database of Spoken Language 1.0	Project
	Prague Dependency Treebank	Project
	Prague Dependency Treebank 3.0	Project
	Prague Dependency Treebank 3.5	Project
	Prague Discourse Treebank 1.0	Project
	Prague Discourse Treebank 2.0	Project
	Prague Discourse Treebank 3.0	Project
	Prague Discourse Treebank 4.0	Project
	Prague English Dependency Treebank	Project
	Prague Markup Language (PML)	Project
	PRAVIDCO	Project
	QT21	Project
	ROMi 1.0	Project
	Semantic Pattern Recognition	Project
	Sentiment Analysis in Czech	Project
	Shallow discourse parsing in Czech	Project
	Slovakoczech NLP workshop	Project
	SumeCzech	Project
	SynSemClass (formerly CzEngClass)	Project
	UFAL Medical Corpus	Project
	UFAL Parallel Corpus of North Levantine	Project
	UniDive	Project
	Uniform Meaning Representation for Czech	Project
	Uniform Meaning Representation for Latin	Project
	Universal Dependencies	Project
	UrMonoCorp	Project
	VPS-30-En: Verb Pattern Sample - 30 English	Project
	VPS-GradeUp	Project
	W2C	Project
	Working with the Penn Discourse Treebank	Project
	Working with the RST-DT and the RST-SC	Project
	A comparison of Czech and English verbal valency based on corpus material (theory and practice)	Grant
	A data-intensive study of word-formation and inflection of nouns and adjectives in four European languages	Grant
	Asistent přístupné úřední komunikace	Grant
	Automatická analýza diskurzních vztahů v češtině	Grant
	Automatické hodnocení mluveného projevu v češtině [Automated Speech Scoring in Czech]	Grant
	Centre for Language Research Infrastructure in the Czech Republic	Grant
	Čeština ve věku strojového překladu	Grant
	Common Language Resources and their Applications - a Marie Curie ITN	Grant
	Complexity of inflection and word-formation: An intra- and cross-linguistic perspective	Grant
	Computational Literary Studies Infrastructure	Grant
	Computational Models of Competition in Natural Languages	Grant
	Contextually-based synonymy and valency of verbs in a bilingual setting	Grant
	Coreference, Discourse Relations and Information Structure in a Contrastive Perspective	Grant
	Corpus-based Valency Lexicon of Czech Nouns	Grant
	Cross-lingual approaches to coreference resolution	Grant
	Deep Syntactic Representation across Languages	Grant
	Development of statistical methods for spoken dialogue systems	Grant
	Digital Analysis of Chant Transmission	Grant
	Empowering Healthcare with Large Language Models: Reducing Clinicians' Workload and Improving Stroke Patient Care	Grant
	Epistemic and Evidential Markers in Czech	Grant
	Establishing and operating the Czech node of pan-European infrastructure for research (Vybudování a provoz českého uzlu pan-evropské infrastruktury pro výzkum)	Grant
	EuroMatrix	Grant
	European Language Grid	Grant
	Explicitní popis jazyka a anotovaná data se zřetelem na češtinu	Grant
	Generování české poezie v edukačním a multimediálním prostředí	Grant
	Global Coherence of Czech Texts in the Corpus-Based Perspective	Grant
	High Performance Language Technologies	Grant
	Implicit Relations in Text Coherence	Grant
	LINDAT/CLARIAH-CZ Language Resources and Digital Arts and Humanities Research Infrastructure	Grant
	LINDAT/CLARIN - Research infrastructure for language technologies – extension of the repository and its computational power	Grant
	Linguistic Factors of Readability in Czech Administrative and Educational Texts	Grant
	Merlin	Grant
	Metody pro rychlou diskurzní anotaci ve vybraných korpusech	Grant
	Modelling dependency syntax across languages	Grant
	Modelování komplexity českých literárních textů	Grant
	Morphological and lexical analysis of internationalisms in five languages	Grant
	Morphological complexity of the verbal lexicon in four languages: Quantitative research based on corpus data	Grant
	Morphologically and Syntactically Annotated Corpora of Many Languages	Grant
	Multilingual Corpus Annotation as a Support for Language Technologies	Grant
	Multilingual Lens: Investigating Large Text Corpora from Different Methodological Perspectives	Grant
	Národní centrum umělé intelligence	Grant
	On Linguistic Structure of Evaluative Meaning in Czech	Grant
	OPJAK LINDAT/CLARIAH-CZ Přístrojové vybavení	Grant
	Reviving Zellig S. Harris: More linguistic information for distributional lexical analysis of English and Czech	Grant
	Sentence-Level Polarity Detection in a Computer Corpus	Grant
	Strojový překlad se sémantickou informací	Grant
	Structure of coreferential chains in parallel language data	Grant
	Subcategorization of adverbial meanings based on corpus data	Grant
	TextLink: Skladba diskurzu v evropských jazycích	Grant
	TextLink: Structuring Discourse in Multilingual Europe	Grant
	Tools and data for Machine Translation between Related Languages	Grant
	Towards a Computational Analysis of Text Structure	Grant
	Transatlantic Collaboration between LAPPS and CLARIN: Semantic, Technical and Infrastructural Interoperability of Services	Grant
	Uniform Meaning Representation (UMR)	Grant
	Uniform Meaning Representation for a low-resource language (Persian)	Grant
	Universal morphosyntactic annotation of language data	Grant
	Valency of Non-verbal Predicates. An Extension of Valency Studies to Adjectives and Deadjectival Nouns.	Grant
	Word-formation Relations Reflected in Noun Valency: The Case of Czech Deverbal and Deadjectival Nouns	Grant
	Zpřístupnění a obohacení knihovních sbírek pro digitální humanitní vědy a jazykový výzkum	Grant
	ForFun 1.0	Tool
	Netgraph	Tool
	PML-TQ	Tool

Institute of Formal and Applied Linguistics

Charles University, Czech Republic
Faculty of Mathematics and Physics

Search form

Corpora