Grants

Dialog
	Duration	Provider
EDU-AI: AI asistent pro žáky a učitele	04/2021-12/2023	TAČR
LINDAT/CLARIN: Centre for Language Research Infrastructure in the Czech Republic	2016 - 2019	MŠMT - velké infrastruktury
ECSS: Evaluation of conversational speech synthesis	2022-2024	GAUK
LINDAT/CLARIAH-CZ: LINDAT/CLARIAH-CZ Language Resources and Digital Arts and Humanities Research Infrastructure	(2016-)2023-2026	MŠMT - velké infrastruktury
WELCOME: Multiple Intelligent Conversation Agent Services for Reception, Management and Integration of Third Country Nationals.	2020-2023	H2020
NG-NLG: Next-Generation Natural Language Generation	2022-2027	Horizon Europe, ERC
CZDEMOS4AI: Prospěšný multiagentní AI avatar v malé demokratické společnosti	09/2024-12/2029	TAČR

	Duration	Provider
EDU-AI: AI asistent pro žáky a učitele	04/2021-12/2023	TAČR
ECSS: Evaluation of conversational speech synthesis	2022-2024	GAUK
LINDAT/CLARIAH-CZ: LINDAT/CLARIAH-CZ Language Resources and Digital Arts and Humanities Research Infrastructure	(2016-)2023-2026	MŠMT - velké infrastruktury
MASAPI: Multilingual assistant for searching, analysing and processing information and decision support	2021-2024	TAČR
NG-NLG: Next-Generation Natural Language Generation	2022-2027	Horizon Europe, ERC
CZDEMOS4AI: Prospěšný multiagentní AI avatar v malé demokratické společnosti	09/2024-12/2029	TAČR
The Anthropology of Artificial Intelligence: Ethics, Understanding, Human Nature	2023-2024	ETF UK

Information Retrieval
	Duration	Provider
EDU-AI: AI asistent pro žáky a učitele	04/2021-12/2023	TAČR
CEDMO 2.0 NPO	1.9. 2024 - 30. 4. 2026	MPO
DACT: Digital Analysis of Chant Transmission	2023-2029	Social Sciences and Humanities Research Council of Canada
MASAPI: Multilingual assistant for searching, analysing and processing information and decision support	2021-2024	TAČR
CZDEMOS4AI: Prospěšný multiagentní AI avatar v malé demokratické společnosti	09/2024-12/2029	TAČR

Annotations
	Duration	Provider
A data-based approach to competition in word-formation: selected semantic categories across seven languages	2021-2023	START
PONK: Asistent přístupné úřední komunikace	9/2023-12/2025	TAČR
LINDAT/CLARIN: Centre for Language Research Infrastructure in the Czech Republic	2016 - 2019	MŠMT - velké infrastruktury
CLS Infra: Computational Literary Studies Infrastructure	2021-2025	H2020
NomVallex-Denom: Czech non-verbal predicates motivated by nouns and their syntactic behavior	2025-2027	GAČR
HVar: Disagreement in corpus annotation and variation of human understanding of text	2024-2026	GAČR
SEEM-CZ: Epistemic and Evidential Markers in Czech	2023-2025	GAČR
ForFun2: ForFun2: Functions and Forms of Circumstantial Modifications	2023-2025	GAČR
EduPo: Generování české poezie v edukačním a multimediálním prostředí	09/2023 - 11/2026	TAČR
Global Coherence: Global Coherence of Czech Texts in the Corpus-Based Perspective	2020 - 2023	GAČR
LINDAT/CLARIAH-CZ: LINDAT/CLARIAH-CZ Language Resources and Digital Arts and Humanities Research Infrastructure	(2016-)2023-2026	MŠMT - velké infrastruktury
OP VVV LINDAT: LINDAT/CLARIN - Research infrastructure for language technologies – extension of the repository and its computational power	2017–2019	MŠMT - OP VVV
RapiDisc: Metody pro rychlou diskurzní anotaci ve vybraných korpusech	2022-2024	GAČR
Multilingual Lens: Investigating Large Text Corpora from Different Methodological Perspectives	2024 - 2029	UK
WELCOME: Multiple Intelligent Conversation Agent Services for Reception, Management and Integration of Third Country Nationals.	2020-2023	H2020
OmniOMR: OmniOMR - optical music recognition using machine learning for digital libraries	2023-2027	NAKI
NomVallexDer: Word-formation Relations Reflected in Noun Valency: The Case of Czech Deverbal and Deadjectival Nouns	2022-2024	GAČR

Data
	Duration	Provider
A data-based approach to competition in word-formation: selected semantic categories across seven languages	2021-2023	START
Adapting Uniform Meaning Representation (UMR) for the Italic/Romance languages	2024-2026	GAUK
Automatické hodnocení mluveného projevu v češtině [Automated Speech Scoring in Czech]	2023–2027	NAKI
CEDMO 2.0 NPO	1.9. 2024 - 30. 4. 2026	MPO
LINDAT/CLARIN: Centre for Language Research Infrastructure in the Czech Republic	2016 - 2019	MŠMT - velké infrastruktury
CLS Infra: Computational Literary Studies Infrastructure	2021-2025	H2020
NomVallex-Denom: Czech non-verbal predicates motivated by nouns and their syntactic behavior	2025-2027	GAČR
DACT: Digital Analysis of Chant Transmission	2023-2029	Social Sciences and Humanities Research Council of Canada
HVar: Disagreement in corpus annotation and variation of human understanding of text	2024-2026	GAČR
SEEM-CZ: Epistemic and Evidential Markers in Czech	2023-2025	GAČR
ECSS: Evaluation of conversational speech synthesis	2022-2024	GAUK
Global Coherence: Global Coherence of Czech Texts in the Corpus-Based Perspective	2020 - 2023	GAČR
HPLT: High Performance Language Technologies	2022-2025	HE
LINDAT/CLARIAH-CZ: LINDAT/CLARIAH-CZ Language Resources and Digital Arts and Humanities Research Infrastructure	(2016-)2023-2026	MŠMT - velké infrastruktury
OP VVV LINDAT: LINDAT/CLARIN - Research infrastructure for language technologies – extension of the repository and its computational power	2017–2019	MŠMT - OP VVV
RapiDisc: Metody pro rychlou diskurzní anotaci ve vybraných korpusech	2022-2024	GAČR
Multilingual Lens: Investigating Large Text Corpora from Different Methodological Perspectives	2024 - 2029	UK
WELCOME: Multiple Intelligent Conversation Agent Services for Reception, Management and Integration of Third Country Nationals.	2020-2023	H2020
OmniOMR: OmniOMR - optical music recognition using machine learning for digital libraries	2023-2027	NAKI
EdUKate: Promoting digital education of foreign-language children through machine translation	2023-2026	TAČR
Mashcima: Synthetic training data generation and other methods for handwritten music recognition	2023-2025	GAUK
Uniform Meaning Representation (UMR)	1.3.2023 - 30.9.2027	MŠMT

Lexicons
	Duration	Provider
A data-based approach to competition in word-formation: selected semantic categories across seven languages	2021-2023	START
LINDAT/CLARIN: Centre for Language Research Infrastructure in the Czech Republic	2016 - 2019	MŠMT - velké infrastruktury
NomVallex-Denom: Czech non-verbal predicates motivated by nouns and their syntactic behavior	2025-2027	GAČR
SEEM-CZ: Epistemic and Evidential Markers in Czech	2023-2025	GAČR
LINDAT/CLARIAH-CZ: LINDAT/CLARIAH-CZ Language Resources and Digital Arts and Humanities Research Infrastructure	(2016-)2023-2026	MŠMT - velké infrastruktury
Modeling Mopheme Flow among Languages	Jan 2024- Dec 2026	GAUK
Uniform Meaning Representation (UMR)	1.3.2023 - 30.9.2027	MŠMT
NomVallexDer: Word-formation Relations Reflected in Noun Valency: The Case of Czech Deverbal and Deadjectival Nouns	2022-2024	GAČR

Morphology
	Duration	Provider
A data-based approach to competition in word-formation: selected semantic categories across seven languages	2021-2023	START
LINDAT/CLARIN: Centre for Language Research Infrastructure in the Czech Republic	2016 - 2019	MŠMT - velké infrastruktury
Compound Identification and Splitting in Four Languages: A Deep Learning Approach	2022-2024	GAUK
LINDAT/CLARIAH-CZ: LINDAT/CLARIAH-CZ Language Resources and Digital Arts and Humanities Research Infrastructure	(2016-)2023-2026	MŠMT - velké infrastruktury
Modeling Mopheme Flow among Languages	Jan 2024- Dec 2026	GAUK
Morphological complexity of the verbal lexicon in four languages: Quantitative research based on corpus data	2023-2025	GAUK

Multilingual
	Duration	Provider
A data-based approach to competition in word-formation: selected semantic categories across seven languages	2021-2023	START
Babel Octopus: Robust Multi-Source Speech Translation	2021-2023	START
Better Tokenization for Multilingual Language Models and Machine Translation	2025-2027	GAČR
Compound Identification and Splitting in Four Languages: A Deep Learning Approach	2022-2024	GAUK
CLS Infra: Computational Literary Studies Infrastructure	2021-2025	H2020
Empowering Healthcare with Large Language Models: Reducing Clinicians' Workload and Improving Stroke Patient Care	2025 - 2027	GAUK
General-purpose Language Models for low-resourced languages	2025-2028	GAUK
HPLT: High Performance Language Technologies	2022-2025	HE
Language Neutral and Culturally Aware Multilingual Neural Sentence Representations	2023-2026	UK
LINDAT/CLARIAH-CZ: LINDAT/CLARIAH-CZ Language Resources and Digital Arts and Humanities Research Infrastructure	(2016-)2023-2026	MŠMT - velké infrastruktury
Modeling Mopheme Flow among Languages	Jan 2024- Dec 2026	GAUK
LangTech: Modernizace oboru Matematická lingvistika		MŠMT - OP VVV
Morphological complexity of the verbal lexicon in four languages: Quantitative research based on corpus data	2023-2025	GAUK
Multilingual Lens: Investigating Large Text Corpora from Different Methodological Perspectives	2024 - 2029	UK
WELCOME: Multiple Intelligent Conversation Agent Services for Reception, Management and Integration of Third Country Nationals.	2020-2023	H2020
NEUREM3: Neuronové reprezentace v multimodálním a mnohojazyčném modelování (Neural Representations in Multi-modal and Multi-lingual Modelling)	2019-2023	GAČR
EdUKate: Promoting digital education of foreign-language children through machine translation	2023-2026	TAČR
Uniform Meaning Representation (UMR)	1.3.2023 - 30.9.2027	MŠMT

Semantics
	Duration	Provider
A data-based approach to competition in word-formation: selected semantic categories across seven languages	2021-2023	START
Adapting Uniform Meaning Representation (UMR) for the Italic/Romance languages	2024-2026	GAUK
LINDAT/CLARIN: Centre for Language Research Infrastructure in the Czech Republic	2016 - 2019	MŠMT - velké infrastruktury
CSCCL: Comprehensibility and Semantic Consistency of Czech Legislation	2025-2027	GAUK
CLS Infra: Computational Literary Studies Infrastructure	2021-2025	H2020
NomVallex-Denom: Czech non-verbal predicates motivated by nouns and their syntactic behavior	2025-2027	GAČR
HVar: Disagreement in corpus annotation and variation of human understanding of text	2024-2026	GAČR
SEEM-CZ: Epistemic and Evidential Markers in Czech	2023-2025	GAČR
ForFun2: ForFun2: Functions and Forms of Circumstantial Modifications	2023-2025	GAČR
Global Coherence: Global Coherence of Czech Texts in the Corpus-Based Perspective	2020 - 2023	GAČR
LUSyD: Language Understanding: from Syntax to Discourse	2020–2024	GAČR
LINDAT/CLARIAH-CZ: LINDAT/CLARIAH-CZ Language Resources and Digital Arts and Humanities Research Infrastructure	(2016-)2023-2026	MŠMT - velké infrastruktury
MASAPI: Multilingual assistant for searching, analysing and processing information and decision support	2021-2024	TAČR
WELCOME: Multiple Intelligent Conversation Agent Services for Reception, Management and Integration of Third Country Nationals.	2020-2023	H2020
NG-NLG: Next-Generation Natural Language Generation	2022-2027	Horizon Europe, ERC
Uniform Meaning Representation (UMR)	1.3.2023 - 30.9.2027	MŠMT
Uniform Meaning Representation for a low-resource language (Persian)	2025 2027	GAUK
Using Auxiliary Subtasks for Learning Constraints in NLP	2023-2025	GAUK

	Duration	Provider
ATRIUM: Advancing FronTier Research In the Arts and hUManities	2024 - 2027	HE
HumanAId: AI zaměřená na člověka pro udržitelnou a adaptabilní společnost	1. 3. 2025 - 31. 12. 2028	MŠMT - OP JAK
AIAI: AI: Authorship and Interpretation	2025-2027	GAČR
CEDMO 2.0 EU: Central European Digital Media Observatory 2.0	1.1.2024-31.10.2026	EC Digital Europe Programme (DIGITAL)
RES-Q Plus: Comprehensive solutions of healthcare improvement based on the global Registry of Stroke Care Quality	2022-2026	HE
ELE 2: European Language Equality 2	2022-2023	PPPA (EU)
EVERSE: European Virtual Institute for Research Software Excellence	2024-2027	HE
HumanE-AI-Net: HumanE AI Network	1. 9. 2020 - 31. 8. 2024	H2020
Identification and Prevention of Unwanted Gender Bias in Neural Language Models	2023-2024	GAČR
Improving stomach examinations with Artificial Intelligence: A deep learning approach for assisted gastroscopy	1. 7. 2024 - 31. 12. 2026	MŠMT
InCroMin: Interactive Crosslingual Minutes	2024	HE
Jazykověda, umělá inteligence a jazykové a řečové technologie: od výzkumu k aplikacím	1. 1. 2025 - 31. 12. 2028	MŠMT - OP JAK
Methods for improving neural machine translation of diverse texts	2023-2025	GAUK
OpenEuroLLM: Open European Family of Large Language Models	36 months	Digital Europe Programme
test
test2
MEMORISE: Virtualisation and Multimodal Exploration of Heritage on Nazi Persecution	2022-2026	HE

Corpora
	Duration	Provider
PONK: Asistent přístupné úřední komunikace	9/2023-12/2025	TAČR
Automatické hodnocení mluveného projevu v češtině [Automated Speech Scoring in Czech]	2023–2027	NAKI
LINDAT/CLARIN: Centre for Language Research Infrastructure in the Czech Republic	2016 - 2019	MŠMT - velké infrastruktury
CLS Infra: Computational Literary Studies Infrastructure	2021-2025	H2020
DACT: Digital Analysis of Chant Transmission	2023-2029	Social Sciences and Humanities Research Council of Canada
Empowering Healthcare with Large Language Models: Reducing Clinicians' Workload and Improving Stroke Patient Care	2025 - 2027	GAUK
SEEM-CZ: Epistemic and Evidential Markers in Czech	2023-2025	GAČR
EduPo: Generování české poezie v edukačním a multimediálním prostředí	09/2023 - 11/2026	TAČR
Global Coherence: Global Coherence of Czech Texts in the Corpus-Based Perspective	2020 - 2023	GAČR
HPLT: High Performance Language Technologies	2022-2025	HE
LINDAT/CLARIAH-CZ: LINDAT/CLARIAH-CZ Language Resources and Digital Arts and Humanities Research Infrastructure	(2016-)2023-2026	MŠMT - velké infrastruktury
OP VVV LINDAT: LINDAT/CLARIN - Research infrastructure for language technologies – extension of the repository and its computational power	2017–2019	MŠMT - OP VVV
RapiDisc: Metody pro rychlou diskurzní anotaci ve vybraných korpusech	2022-2024	GAČR
Morphological complexity of the verbal lexicon in four languages: Quantitative research based on corpus data	2023-2025	GAUK
Multilingual Lens: Investigating Large Text Corpora from Different Methodological Perspectives	2024 - 2029	UK
Uniform Meaning Representation (UMR)	1.3.2023 - 30.9.2027	MŠMT
Uniform Meaning Representation for a low-resource language (Persian)	2025 2027	GAUK
NomVallexDer: Word-formation Relations Reflected in Noun Valency: The Case of Czech Deverbal and Deadjectival Nouns	2022-2024	GAČR

Machine Learning
	Duration	Provider
PONK: Asistent přístupné úřední komunikace	9/2023-12/2025	TAČR
LINDAT/CLARIN: Centre for Language Research Infrastructure in the Czech Republic	2016 - 2019	MŠMT - velké infrastruktury
Compound Identification and Splitting in Four Languages: A Deep Learning Approach	2022-2024	GAUK
DACT: Digital Analysis of Chant Transmission	2023-2029	Social Sciences and Humanities Research Council of Canada
Empowering Healthcare with Large Language Models: Reducing Clinicians' Workload and Improving Stroke Patient Care	2025 - 2027	GAUK
ECSS: Evaluation of conversational speech synthesis	2022-2024	GAUK
General-purpose Language Models for low-resourced languages	2025-2028	GAUK
HPLT: High Performance Language Technologies	2022-2025	HE
Language Neutral and Culturally Aware Multilingual Neural Sentence Representations	2023-2026	UK
LUSyD: Language Understanding: from Syntax to Discourse	2020–2024	GAČR
LINDAT/CLARIAH-CZ: LINDAT/CLARIAH-CZ Language Resources and Digital Arts and Humanities Research Infrastructure	(2016-)2023-2026	MŠMT - velké infrastruktury
LangTech: Modernizace oboru Matematická lingvistika		MŠMT - OP VVV
MASAPI: Multilingual assistant for searching, analysing and processing information and decision support	2021-2024	TAČR
NEUREM3: Neuronové reprezentace v multimodálním a mnohojazyčném modelování (Neural Representations in Multi-modal and Multi-lingual Modelling)	2019-2023	GAČR
NG-NLG: Next-Generation Natural Language Generation	2022-2027	Horizon Europe, ERC
OmniOMR: OmniOMR - optical music recognition using machine learning for digital libraries	2023-2027	NAKI
CZDEMOS4AI: Prospěšný multiagentní AI avatar v malé demokratické společnosti	09/2024-12/2029	TAČR
Reliable and Explainable Large Language Models for Text Generation	2025-2027	GAUK
Mashcima: Synthetic training data generation and other methods for handwritten music recognition	2023-2025	GAUK
Using Auxiliary Subtasks for Learning Constraints in NLP	2023-2025	GAUK

Discourse
	Duration	Provider
Automatické hodnocení mluveného projevu v češtině [Automated Speech Scoring in Czech]	2023–2027	NAKI
LINDAT/CLARIN: Centre for Language Research Infrastructure in the Czech Republic	2016 - 2019	MŠMT - velké infrastruktury
Global Coherence: Global Coherence of Czech Texts in the Corpus-Based Perspective	2020 - 2023	GAČR
LINDAT/CLARIAH-CZ: LINDAT/CLARIAH-CZ Language Resources and Digital Arts and Humanities Research Infrastructure	(2016-)2023-2026	MŠMT - velké infrastruktury
RapiDisc: Metody pro rychlou diskurzní anotaci ve vybraných korpusech	2022-2024	GAČR
Multilingual Lens: Investigating Large Text Corpora from Different Methodological Perspectives	2024 - 2029	UK

Monolingual
	Duration	Provider
Automatické hodnocení mluveného projevu v češtině [Automated Speech Scoring in Czech]	2023–2027	NAKI
NomVallex-Denom: Czech non-verbal predicates motivated by nouns and their syntactic behavior	2025-2027	GAČR
EduPo: Generování české poezie v edukačním a multimediálním prostředí	09/2023 - 11/2026	TAČR
HPLT: High Performance Language Technologies	2022-2025	HE
LINDAT/CLARIAH-CZ: LINDAT/CLARIAH-CZ Language Resources and Digital Arts and Humanities Research Infrastructure	(2016-)2023-2026	MŠMT - velké infrastruktury
CZDEMOS4AI: Prospěšný multiagentní AI avatar v malé demokratické společnosti	09/2024-12/2029	TAČR
NomVallexDer: Word-formation Relations Reflected in Noun Valency: The Case of Czech Deverbal and Deadjectival Nouns	2022-2024	GAČR

Tools
	Duration	Provider
Automatické hodnocení mluveného projevu v češtině [Automated Speech Scoring in Czech]	2023–2027	NAKI
LINDAT/CLARIN: Centre for Language Research Infrastructure in the Czech Republic	2016 - 2019	MŠMT - velké infrastruktury
Compound Identification and Splitting in Four Languages: A Deep Learning Approach	2022-2024	GAUK
CLS Infra: Computational Literary Studies Infrastructure	2021-2025	H2020
DACT: Digital Analysis of Chant Transmission	2023-2029	Social Sciences and Humanities Research Council of Canada
EduPo: Generování české poezie v edukačním a multimediálním prostředí	09/2023 - 11/2026	TAČR
LINDAT/CLARIAH-CZ: LINDAT/CLARIAH-CZ Language Resources and Digital Arts and Humanities Research Infrastructure	(2016-)2023-2026	MŠMT - velké infrastruktury
OP VVV LINDAT: LINDAT/CLARIN - Research infrastructure for language technologies – extension of the repository and its computational power	2017–2019	MŠMT - OP VVV
Mashcima: Synthetic training data generation and other methods for handwritten music recognition	2023-2025	GAUK
The Anthropology of Artificial Intelligence: Ethics, Understanding, Human Nature	2023-2024	ETF UK

Machine Translation
	Duration	Provider
Babel Octopus: Robust Multi-Source Speech Translation	2021-2023	START
Better Tokenization for Multilingual Language Models and Machine Translation	2025-2027	GAČR
LINDAT/CLARIN: Centre for Language Research Infrastructure in the Czech Republic	2016 - 2019	MŠMT - velké infrastruktury
HPLT: High Performance Language Technologies	2022-2025	HE
LUSyD: Language Understanding: from Syntax to Discourse	2020–2024	GAČR
LINDAT/CLARIAH-CZ: LINDAT/CLARIAH-CZ Language Resources and Digital Arts and Humanities Research Infrastructure	(2016-)2023-2026	MŠMT - velké infrastruktury
MASAPI: Multilingual assistant for searching, analysing and processing information and decision support	2021-2024	TAČR
WELCOME: Multiple Intelligent Conversation Agent Services for Reception, Management and Integration of Third Country Nationals.	2020-2023	H2020
EdUKate: Promoting digital education of foreign-language children through machine translation	2023-2026	TAČR
Using Auxiliary Subtasks for Learning Constraints in NLP	2023-2025	GAUK

Speech Recognition
	Duration	Provider
Babel Octopus: Robust Multi-Source Speech Translation	2021-2023	START
LINDAT/CLARIN: Centre for Language Research Infrastructure in the Czech Republic	2016 - 2019	MŠMT - velké infrastruktury
LINDAT/CLARIAH-CZ: LINDAT/CLARIAH-CZ Language Resources and Digital Arts and Humanities Research Infrastructure	(2016-)2023-2026	MŠMT - velké infrastruktury
WELCOME: Multiple Intelligent Conversation Agent Services for Reception, Management and Integration of Third Country Nationals.	2020-2023	H2020

Information Structure
	Duration	Provider
CEDMO 2.0 NPO	1.9. 2024 - 30. 4. 2026	MPO
LINDAT/CLARIAH-CZ: LINDAT/CLARIAH-CZ Language Resources and Digital Arts and Humanities Research Infrastructure	(2016-)2023-2026	MŠMT - velké infrastruktury
MASAPI: Multilingual assistant for searching, analysing and processing information and decision support	2021-2024	TAČR
Multilingual Lens: Investigating Large Text Corpora from Different Methodological Perspectives	2024 - 2029	UK

Multi-modality
	Duration	Provider
CEDMO 2.0 NPO	1.9. 2024 - 30. 4. 2026	MPO
LINDAT/CLARIN: Centre for Language Research Infrastructure in the Czech Republic	2016 - 2019	MŠMT - velké infrastruktury
DACT: Digital Analysis of Chant Transmission	2023-2029	Social Sciences and Humanities Research Council of Canada
Language Neutral and Culturally Aware Multilingual Neural Sentence Representations	2023-2026	UK
LINDAT/CLARIAH-CZ: LINDAT/CLARIAH-CZ Language Resources and Digital Arts and Humanities Research Infrastructure	(2016-)2023-2026	MŠMT - velké infrastruktury
WELCOME: Multiple Intelligent Conversation Agent Services for Reception, Management and Integration of Third Country Nationals.	2020-2023	H2020
NEUREM3: Neuronové reprezentace v multimodálním a mnohojazyčném modelování (Neural Representations in Multi-modal and Multi-lingual Modelling)	2019-2023	GAČR
EdUKate: Promoting digital education of foreign-language children through machine translation	2023-2026	TAČR
CZDEMOS4AI: Prospěšný multiagentní AI avatar v malé demokratické společnosti	09/2024-12/2029	TAČR

Coreference
	Duration	Provider
LINDAT/CLARIN: Centre for Language Research Infrastructure in the Czech Republic	2016 - 2019	MŠMT - velké infrastruktury
Coreference Resolution and Representation in Deep Universal Dependencies	2025 - 2027	GAUK
LUSyD: Language Understanding: from Syntax to Discourse	2020–2024	GAČR
LINDAT/CLARIAH-CZ: LINDAT/CLARIAH-CZ Language Resources and Digital Arts and Humanities Research Infrastructure	(2016-)2023-2026	MŠMT - velké infrastruktury
Using Auxiliary Subtasks for Learning Constraints in NLP	2023-2025	GAUK

Linked data
	Duration	Provider
LINDAT/CLARIN: Centre for Language Research Infrastructure in the Czech Republic	2016 - 2019	MŠMT - velké infrastruktury
NomVallex-Denom: Czech non-verbal predicates motivated by nouns and their syntactic behavior	2025-2027	GAČR
DACT: Digital Analysis of Chant Transmission	2023-2029	Social Sciences and Humanities Research Council of Canada
LINDAT/CLARIAH-CZ: LINDAT/CLARIAH-CZ Language Resources and Digital Arts and Humanities Research Infrastructure	(2016-)2023-2026	MŠMT - velké infrastruktury
WELCOME: Multiple Intelligent Conversation Agent Services for Reception, Management and Integration of Third Country Nationals.	2020-2023	H2020
NG-NLG: Next-Generation Natural Language Generation	2022-2027	Horizon Europe, ERC
Uniform Meaning Representation (UMR)	1.3.2023 - 30.9.2027	MŠMT

Parsers
	Duration	Provider
LINDAT/CLARIN: Centre for Language Research Infrastructure in the Czech Republic	2016 - 2019	MŠMT - velké infrastruktury
CLS Infra: Computational Literary Studies Infrastructure	2021-2025	H2020
LUSyD: Language Understanding: from Syntax to Discourse	2020–2024	GAČR
LINDAT/CLARIAH-CZ: LINDAT/CLARIAH-CZ Language Resources and Digital Arts and Humanities Research Infrastructure	(2016-)2023-2026	MŠMT - velké infrastruktury
RapiDisc: Metody pro rychlou diskurzní anotaci ve vybraných korpusech	2022-2024	GAČR
WELCOME: Multiple Intelligent Conversation Agent Services for Reception, Management and Integration of Third Country Nationals.	2020-2023	H2020

Publications
	Duration	Provider
LINDAT/CLARIN: Centre for Language Research Infrastructure in the Czech Republic	2016 - 2019	MŠMT - velké infrastruktury
LINDAT/CLARIAH-CZ: LINDAT/CLARIAH-CZ Language Resources and Digital Arts and Humanities Research Infrastructure	(2016-)2023-2026	MŠMT - velké infrastruktury

Taggers
	Duration	Provider
LINDAT/CLARIN: Centre for Language Research Infrastructure in the Czech Republic	2016 - 2019	MŠMT - velké infrastruktury
CLS Infra: Computational Literary Studies Infrastructure	2021-2025	H2020
LINDAT/CLARIAH-CZ: LINDAT/CLARIAH-CZ Language Resources and Digital Arts and Humanities Research Infrastructure	(2016-)2023-2026	MŠMT - velké infrastruktury

Valency
	Duration	Provider
LINDAT/CLARIN: Centre for Language Research Infrastructure in the Czech Republic	2016 - 2019	MŠMT - velké infrastruktury
NomVallex-Denom: Czech non-verbal predicates motivated by nouns and their syntactic behavior	2025-2027	GAČR
LUSyD: Language Understanding: from Syntax to Discourse	2020–2024	GAČR
LINDAT/CLARIAH-CZ: LINDAT/CLARIAH-CZ Language Resources and Digital Arts and Humanities Research Infrastructure	(2016-)2023-2026	MŠMT - velké infrastruktury
Uniform Meaning Representation (UMR)	1.3.2023 - 30.9.2027	MŠMT
NomVallexDer: Word-formation Relations Reflected in Noun Valency: The Case of Czech Deverbal and Deadjectival Nouns	2022-2024	GAČR

Teaching
	Duration	Provider
CLS Infra: Computational Literary Studies Infrastructure	2021-2025	H2020
LCT: European Masters Program Language and Communication Technologies	IX.2007-VIII.2013, IX.2013-VIII.2019, IX.2019-VIII.2025	EU ERASMUS MUNDUS
EduPo: Generování české poezie v edukačním a multimediálním prostředí	09/2023 - 11/2026	TAČR
LangTech: Modernizace oboru Matematická lingvistika		MŠMT - OP VVV

Syntax
	Duration	Provider
NomVallex-Denom: Czech non-verbal predicates motivated by nouns and their syntactic behavior	2025-2027	GAČR
ForFun2: ForFun2: Functions and Forms of Circumstantial Modifications	2023-2025	GAČR
LUSyD: Language Understanding: from Syntax to Discourse	2020–2024	GAČR
LINDAT/CLARIAH-CZ: LINDAT/CLARIAH-CZ Language Resources and Digital Arts and Humanities Research Infrastructure	(2016-)2023-2026	MŠMT - velké infrastruktury
Uniform Meaning Representation (UMR)	1.3.2023 - 30.9.2027	MŠMT
NomVallexDer: Word-formation Relations Reflected in Noun Valency: The Case of Czech Deverbal and Deadjectival Nouns	2022-2024	GAČR

Psycholinguistics
	Duration	Provider
HVar: Disagreement in corpus annotation and variation of human understanding of text	2024-2026	GAČR

Multiword Expressions
	Duration	Provider
LINDAT/CLARIAH-CZ: LINDAT/CLARIAH-CZ Language Resources and Digital Arts and Humanities Research Infrastructure	(2016-)2023-2026	MŠMT - velké infrastruktury
Uniform Meaning Representation (UMR)	1.3.2023 - 30.9.2027	MŠMT

Speech Retrieval
	Duration	Provider
LINDAT/CLARIAH-CZ: LINDAT/CLARIAH-CZ Language Resources and Digital Arts and Humanities Research Infrastructure	(2016-)2023-2026	MŠMT - velké infrastruktury

Spellcheckers
	Duration	Provider
LINDAT/CLARIAH-CZ: LINDAT/CLARIAH-CZ Language Resources and Digital Arts and Humanities Research Infrastructure	(2016-)2023-2026	MŠMT - velké infrastruktury

Provider: Digital Europe Programme
	Duration	Provider	Grant ID	PI	Area
OpenEuroLLM: Open European Family of Large Language Models	36 months	Digital Europe Programme	101195233	Jan Hajič

Provider: HE
	Duration	Provider	Grant ID	PI	Area
EVERSE: European Virtual Institute for Research Software Excellence	2024-2027	HE	101129744	Pavel Straňák
ATRIUM: Advancing FronTier Research In the Arts and hUManities	2024 - 2027	HE	101132163	Pavel Straňák
InCroMin: Interactive Crosslingual Minutes	2024	HE	101070631	Ondřej Bojar
MEMORISE: Virtualisation and Multimodal Exploration of Heritage on Nazi Persecution	2022-2026	HE	101061016	Pavel Pecina
RES-Q Plus: Comprehensive solutions of healthcare improvement based on the global Registry of Stroke Care Quality	2022-2026	HE	101057603	Pavel Pecina
HPLT: High Performance Language Technologies	2022-2025	HE	101070350	Jan Hajič	Corpora, Data, Machine Learning, Machine Translation, Monolingual, Multilingual

Provider: Social Sciences and Humanities Research Council of Canada
	Duration	Provider	Grant ID	PI	Area
DACT: Digital Analysis of Chant Transmission	2023-2029	Social Sciences and Humanities Research Council of Canada	895-2023-1002	Jan Hajič jr.	Corpora, Data, Information Retrieval, Linked data, Machine Learning, Multi-modality, Tools

Provider: ETF UK
	Duration	Provider	Grant ID	PI	Area
The Anthropology of Artificial Intelligence: Ethics, Understanding, Human Nature	2023-2024	ETF UK	247002	Rudolf Rosa	Tools

Provider: Horizon Europe, ERC
	Duration	Provider	Grant ID	PI	Area
NG-NLG: Next-Generation Natural Language Generation	2022-2027	Horizon Europe, ERC	101039303	Ondřej Dušek	Dialog, Linked data, Machine Learning, Semantics

Provider: PPPA (EU)
	Duration	Provider	Grant ID	PI	Area
ELE 2: European Language Equality 2	2022-2023	PPPA (EU)	LC-01884166 (Project 101075356)	Jan Hajič

MŠMT - velké infrastruktury
	Duration	Provider	Grant ID	PI	Area
LINDAT/CLARIN: Centre for Language Research Infrastructure in the Czech Republic	2016 - 2019	MŠMT - velké infrastruktury	LM2015071	Jan Hajič	Annotations, Coreference, Corpora, Data, Dialog, Discourse, Lexicons, Linked data, Machine Learning, Machine Translation, Morphology, Multi-modality, Parsers, Publications, Semantics, Speech Recognition, Taggers, Tools, Valency
LINDAT/CLARIAH-CZ: LINDAT/CLARIAH-CZ Language Resources and Digital Arts and Humanities Research Infrastructure	(2016-)2023-2026	MŠMT - velké infrastruktury	LM2023062	Jan Hajič	Annotations, Coreference, Corpora, Data, Dialog, Discourse, Information Structure, Lexicons, Linked data, Machine Learning, Machine Translation, Monolingual, Morphology, Multi-modality, Multilingual, Multiword Expressions, Parsers, Publications, Semantics, Speech Recognition, Speech Retrieval, Spellcheckers, Syntax, Taggers, Tools, Valency

Provider: MPO
	Duration	Provider	Grant ID	PI	Area
CEDMO 2.0 NPO	1.9. 2024 - 30. 4. 2026	MPO	MPO 60273/24/21300/21000	Ondřej Bojar	Data, Information Retrieval, Information Structure, Multi-modality

Provider: EC Digital Europe Programme (DIGITAL)
	Duration	Provider	Grant ID	PI	Area
CEDMO 2.0 EU: Central European Digital Media Observatory 2.0	1.1.2024-31.10.2026	EC Digital Europe Programme (DIGITAL)	101158609	Václav Moravec

Provider: MŠMT - OP JAK
	Duration	Provider	Grant ID	PI	Area
HumanAId: AI zaměřená na člověka pro udržitelnou a adaptabilní společnost	1. 3. 2025 - 31. 12. 2028	MŠMT - OP JAK	CZ.02.01.01/00/23_025/0008691	Barbora Vidová Hladká
Jazykověda, umělá inteligence a jazykové a řečové technologie: od výzkumu k aplikacím	1. 1. 2025 - 31. 12. 2028	MŠMT - OP JAK	CZ.02.01.01/00/23_020/0008518	Jan Hajič

Institutional support for research at the Charles University
	Duration	Provider	Grant ID	PI	Area
Multilingual Lens: Investigating Large Text Corpora from Different Methodological Perspectives	2024 - 2029	UK	UNCE/24/SSH/009	Zdeněk Žabokrtský	Annotations, Corpora, Data, Discourse, Information Structure, Multilingual
Language Neutral and Culturally Aware Multilingual Neural Sentence Representations	2023-2026	UK	PRIMUS/23/SCI/023	Jindřich Libovický	Machine Learning, Multi-modality, Multilingual

Horizon 2020 - European Commission
	Duration	Provider	Grant ID	PI	Area
CLS Infra: Computational Literary Studies Infrastructure	2021-2025	H2020	101004984	Silvie Cinková	Annotations, Corpora, Data, Multilingual, Parsers, Semantics, Taggers, Teaching, Tools
WELCOME: Multiple Intelligent Conversation Agent Services for Reception, Management and Integration of Third Country Nationals.	2020-2023	H2020	870930	Pavel Pecina	Annotations, Data, Dialog, Linked data, Machine Translation, Multi-modality, Multilingual, Parsers, Semantics, Speech Recognition
HumanE-AI-Net: HumanE AI Network	1. 9. 2020 - 31. 8. 2024	H2020	952026	Jan Hajič

EU ERASMUS MUNDUS
	Duration	Provider	Grant ID	PI	Area
LCT: European Masters Program Language and Communication Technologies	IX.2007-VIII.2013, IX.2013-VIII.2019, IX.2019-VIII.2025	EU ERASMUS MUNDUS	610622-EPP-1-2019-1-DE-EPPKA1-JMD-MOB	Vladislav Kuboň	Teaching

MŠMT - OP VVV
	Duration	Provider	Grant ID	PI	Area
OP VVV LINDAT: LINDAT/CLARIN - Research infrastructure for language technologies – extension of the repository and its computational power	2017–2019	MŠMT - OP VVV	CZ.02.1.01/0.0/0.0/16_013/0001781	Jan Hajič	Annotations, Corpora, Data, Tools
LangTech: Modernizace oboru Matematická lingvistika		MŠMT - OP VVV	CZ.02.2.69/0.0/0.0/16_018/0002373	Zdeněk Žabokrtský	Machine Learning, Multilingual, Teaching

Technology Agency (Czech Republic)
	Duration	Provider	Grant ID	PI	Area
PONK: Asistent přístupné úřední komunikace	9/2023-12/2025	TAČR	TQ01000526	Barbora Vidová Hladká	Annotations, Corpora, Machine Learning
EdUKate: Promoting digital education of foreign-language children through machine translation	2023-2026	TAČR	TQ01000458	Lucie Poláková	Data, Machine Translation, Multi-modality, Multilingual
MASAPI: Multilingual assistant for searching, analysing and processing information and decision support	2021-2024	TAČR	FW03010656	Pavel Pecina	Information Retrieval, Information Structure, Machine Learning, Machine Translation, Semantics
CZDEMOS4AI: Prospěšný multiagentní AI avatar v malé demokratické společnosti	09/2024-12/2029	TAČR	TQ12000040	Martin Popel	Dialog, Information Retrieval, Machine Learning, Monolingual, Multi-modality
EduPo: Generování české poezie v edukačním a multimediálním prostředí	09/2023 - 11/2026	TAČR	TQ01000153	Rudolf Rosa	Annotations, Corpora, Monolingual, Teaching, Tools
EDU-AI: AI asistent pro žáky a učitele	04/2021-12/2023	TAČR	TL05000236	Ondřej Dušek	Dialog, Information Retrieval

Czech Science Foundation
	Duration	Provider	Grant ID	PI	Area
Better Tokenization for Multilingual Language Models and Machine Translation	2025-2027	GAČR	25-16242S	Jindřich Libovický	Machine Translation, Multilingual
NomVallex-Denom: Czech non-verbal predicates motivated by nouns and their syntactic behavior	2025-2027	GAČR	25-16716S	Veronika Kolářová	Annotations, Data, Lexicons, Linked data, Monolingual, Semantics, Syntax, Valency
AIAI: AI: Authorship and Interpretation	2025-2027	GAČR	25-14501L	Rudolf Rosa
HVar: Disagreement in corpus annotation and variation of human understanding of text	2024-2026	GAČR	24-11132S	Šárka Zikánová	Annotations, Data, Psycholinguistics, Semantics
ForFun2: ForFun2: Functions and Forms of Circumstantial Modifications	2023-2025	GAČR	23-05238S	Marie Mikulová	Annotations, Semantics, Syntax
SEEM-CZ: Epistemic and Evidential Markers in Czech	2023-2025	GAČR	23-05240S	Barbora Štěpánková	Annotations, Corpora, Data, Lexicons, Semantics
Identification and Prevention of Unwanted Gender Bias in Neural Language Models	2023-2024	GAČR	23-06912S	David Mareček
RapiDisc: Metody pro rychlou diskurzní anotaci ve vybraných korpusech	2022-2024	GAČR	22-03269S	Jiří Mírovský	Annotations, Corpora, Data, Discourse, Parsers
NomVallexDer: Word-formation Relations Reflected in Noun Valency: The Case of Czech Deverbal and Deadjectival Nouns	2022-2024	GAČR	22-20927S	Veronika Kolářová	Annotations, Corpora, Lexicons, Monolingual, Syntax, Valency
LUSyD: Language Understanding: from Syntax to Discourse	2020–2024	GAČR	GX20-16819X	Jan Hajič	Coreference, Machine Learning, Machine Translation, Parsers, Semantics, Syntax, Valency
Global Coherence: Global Coherence of Czech Texts in the Corpus-Based Perspective	2020 - 2023	GAČR	20-09853S	Lucie Poláková	Annotations, Corpora, Data, Discourse, Semantics
NEUREM3: Neuronové reprezentace v multimodálním a mnohojazyčném modelování (Neural Representations in Multi-modal and Multi-lingual Modelling)	2019-2023	GAČR	19-26934X	Ondřej Bojar	Machine Learning, Multi-modality, Multilingual

Ministry of Education, Youth and Sport (Czech Republic)
	Duration	Provider	Grant ID	PI	Area
Uniform Meaning Representation (UMR)	1.3.2023 - 30.9.2027	MŠMT	LUAUS23283	Jan Hajič	Corpora, Data, Lexicons, Linked data, Multilingual, Multiword Expressions, Semantics, Syntax, Valency
Improving stomach examinations with Artificial Intelligence: A deep learning approach for assisted gastroscopy	1. 7. 2024 - 31. 12. 2026	MŠMT	LUABA24136	Pavel Pecina

Ministry of Culture
	Duration	Provider	Grant ID	PI	Area
Automatické hodnocení mluveného projevu v češtině [Automated Speech Scoring in Czech]	2023–2027	NAKI	DH23P03OVV037	Kateřina Rysová	Corpora, Data, Discourse, Monolingual, Tools
OmniOMR: OmniOMR - optical music recognition using machine learning for digital libraries	2023-2027	NAKI	DH23P03OVV008	Jan Hajič jr.	Annotations, Data, Machine Learning

Program START (UK - OP VVV)
	Duration	Provider	Grant ID	PI	Area
A data-based approach to competition in word-formation: selected semantic categories across seven languages	2021-2023	START	START/HUM/010		Annotations, Data, Lexicons, Morphology, Multilingual, Semantics
Babel Octopus: Robust Multi-Source Speech Translation	2021-2023	START	START/SCI/089	Peter Polák	Machine Translation, Multilingual, Speech Recognition

Grant Agency of the Charles University
	Duration	Provider	Grant ID	PI	Area
Modeling Mopheme Flow among Languages	Jan 2024- Dec 2026	GAUK	101924	Abishek Stephen	Lexicons, Morphology, Multilingual
General-purpose Language Models for low-resourced languages	2025-2028	GAUK	302425	Nalin Kumar	Machine Learning, Multilingual
Reliable and Explainable Large Language Models for Text Generation	2025-2027	GAUK	252986	Patrícia Schmidtová	Machine Learning
CSCCL: Comprehensibility and Semantic Consistency of Czech Legislation	2025-2027	GAUK	393225	Tomáš Polák	Semantics
Coreference Resolution and Representation in Deep Universal Dependencies	2025 - 2027	GAUK	105124	Dima Taji	Coreference
Empowering Healthcare with Large Language Models: Reducing Clinicians' Workload and Improving Stroke Patient Care	2025 - 2027	GAUK	284125	Vojtěch Lanz	Corpora, Machine Learning, Multilingual
Uniform Meaning Representation for a low-resource language (Persian)	2025 2027	GAUK	394625		Corpora, Semantics
Adapting Uniform Meaning Representation (UMR) for the Italic/Romance languages	2024-2026	GAUK	104924	Federica Gamba	Data, Semantics
Morphological complexity of the verbal lexicon in four languages: Quantitative research based on corpus data	2023-2025	GAUK	246723	Hana Hledíková	Corpora, Morphology, Multilingual
Using Auxiliary Subtasks for Learning Constraints in NLP	2023-2025	GAUK	272323	Dávid Javorský	Coreference, Machine Learning, Machine Translation, Semantics
Methods for improving neural machine translation of diverse texts	2023-2025	GAUK	244523	Josef Jon
Mashcima: Synthetic training data generation and other methods for handwritten music recognition	2023-2025	GAUK	289623	Jiří Mayer	Data, Machine Learning, Tools
Compound Identification and Splitting in Four Languages: A Deep Learning Approach	2022-2024	GAUK	128122	Emil Svoboda	Machine Learning, Morphology, Multilingual, Tools
ECSS: Evaluation of conversational speech synthesis	2022-2024	GAUK	40222		Data, Dialog, Machine Learning

Ministry of Education, Youth and Sport (Czech Republic)
	Duration	Provider	Area
INTERCOST-Readability: Modelování komplexity českých literárních textů	VI 2018 - X 2021	MŠMT	Annotations, Corpora, Data, Discourse, Information Structure, Semantics, Syntax, Teaching
Multilingual Corpus Annotation as a Support for Language Technologies	2014-2016	MŠMT	Annotations, Coreference, Corpora, Data, Discourse
MOBAme: Modern Bayesian methods in machine learning	2013-2013	MŠMT	Teaching
VYSTADIAL: Development of statistical methods for spoken dialogue systems	2012-2016	MŠMT	Corpora, Dialog, Speech Recognition, Tools
KontaktII: Strojový překlad se sémantickou informací	2012-2014	MŠMT	Annotations, Corpora, Data, Lexicons, Machine Translation, Semantics, Valency
LINDAT/Clarin: Establishing and operating the Czech node of pan-European infrastructure for research (Vybudování a provoz českého uzlu pan-evropské infrastruktury pro výzkum)	2010-2015	MŠMT	Annotations, Coreference, Corpora, Data, Dialog, Discourse, Lexicons, Linked data, Machine Learning, Machine Translation, Morphology, Multi-modality, Parsers, Publications, Semantics, Speech Recognition, Taggers, Tools, Valency
Kontakt: Towards a Computational Analysis of Text Structure	2010 - 2012	MŠMT	Annotations, Coreference, Corpora, Data, Discourse
TextLink-cz: TextLink: Skladba diskurzu v evropských jazycích	1.11.2015 - 31.12.2017	MŠMT	Annotations, Corpora, Data, Discourse, Lexicons, Linked data, Monolingual
LD-Parseme: PARSEME: Parsing a víceslovné výrazy – k jazykovědné přesnosti a výpočetní efektivitě ve zpracování přirozeného jazyka	04-2014 – 03-2017	MŠMT	Lexicons, Multiword Expressions, Semantics, Valency

National Scientific Foundation
	Duration	Provider	Area
PIRE: Partnership for International Research and Education	till 2014	NSF	Machine Translation, Semantics, Speech Recognition, Teaching

Horizon 2020 - European Commission
	Duration	Provider	Area
CLARIN-PLUS	September 2015 – August 2017	H2020
QT21: Quality Translation 21	II.2015-I.2018	H2020	Data, Lexicons, Linked data, Machine Learning, Machine Translation, Tools
SSHOC: Social Sciences & Humanities Open Cloud	2019-30/04/2022	H2020
ELG: European Language Grid	2019-2021	H2020	Annotations, Corpora, Data, Linked data, Machine Translation, Multilingual, Parsers, Semantics, Speech Recognition, Syntax, Taggers, Tools
Bergamot: Browser-based Multilingual Translation	2019-2021	H2020	Machine Translation
ELITR: European Live Translator	2019-2021	H2020	Machine Translation, Speech Recognition
KConnect: Khresmoi Multilingual Medical Text Analysis, Search and Machine Translation Connected in a Thriving Data-Value Chain	2015-2017	H2020	Information Retrieval, Machine Translation, Semantics
HimL: Health in my Language	2.2015–1.2018	H2020	Data, Lexicons, Machine Translation, Morphology
CRACKER: Cracking the Language Barrier: Coordination, Evaluation and Resources for European MT Research	1.2015-12.2017	H2020	Data, Machine Translation

FP6: Research - European Commission
	Duration	Provider	Area
EuroMatrix	IX.2006-II.2009	FP6	Annotations, Corpora, Machine Translation, Tools, Valency

Technology Agency (Czech Republic)
	Duration	Provider	Area
THEaiTRE: THEAITRE: Umělá inteligence autorem divadelní hry?	April 2020 - September 2022	TAČR	Dialog, Machine Learning, Tools
INTLIB: Intelligent library	2012-2015	TAČR	Data, Linked data, Tools

Institutional support for research at the Charles University
	Duration	Provider	Area
AIvK Exponát Didaktikon: Život s umělou inteligencí: upgrade	2023-09-01 - 2023-12-31	UK	Teaching, Tools
NaMuDDiS: Natural multi-domain dialogue systems	2019-2021	UK	Dialog, Discourse, Teaching
UNCE VITRI: Center for the Transdisciplinary Research of Violence, Trauma and Justice	2018-2023	UK	Data, Discourse, Multi-modality, Semantics
PROGRES Q18 - Společenské vědy: Programy progres	2017-2021	UK
PROGRES Q48 - Informatika: Programy progres	2017-2021	UK
PRVOUK: Programy rozvoje vědních oblastí na Univerzitě Karlově - Informatika	2012-2016	UK

Grant Agency of the Charles University
	Duration	Provider	Area
Arithmetic Properties in the space of Language Model Prompts	2023	GAUK	Machine Learning
Independent component analysis of continuous word representations	2021–2022	GAUK	Annotations, Machine Learning, Semantics
Dialogue systems focused on combining tasks and chit-chat	2021-2023	GAUK	Dialog, Machine Learning
Controllable NLG: Controllable Natural Language Generation	2021-2023	GAUK
Exploring Multilingual Representations of Language Units in Neural Networks	2021 - 2023	GAUK	Information Structure, Machine Learning, Multilingual
Named Entity Linking	2020-2022	GAUK	Data, Machine Learning, Multilingual, Taggers
Machine Translation of Interpreted Speech	2020-2022	GAUK	Machine Translation, Multi-modality, Speech Recognition
Domain Adaptation for Natural Language Generation	2020-2022	GAUK	Data, Machine Learning
Low resource methods for dialogue systems applications	2020 - 2022	GAUK	Dialog, Discourse, Machine Learning
Neural machine translation for low-resource languages	2019-2021	GAUK	Machine Translation, Monolingual
Developing derivational networks for multiple languages	2019-2021	GAUK	Data, Morphology, Multilingual
Vektorová reprezentace textu založená na neuronových sítích	2019 - 2021	GAUK	Information Retrieval, Machine Learning, Machine Translation
Research of Methods of Neural Machine Translation Evaluation	2018-2020	GAUK	Machine Translation
Utilising Linguistic Knowledge in Neural Machine Translation	2018 - 2020	GAUK	Machine Translation
Multimodal Optical Music Recognition using Deep Learning	2017-2019	GAUK	Machine Learning, Multi-modality
Universal morphosyntactic annotation of language data	2017-2019	GAUK	Annotations, Corpora, Machine Learning, Multilingual, Parsers
DeepSynt: Deep Syntactic Representation across Languages	2017-2018	GAUK	Corpora, Data, Multilingual
Open domain dialog management with knowledge graphs	2016-2018	GAUK	Data, Dialog, Machine Learning
open-domain SLU: Spoken Language Understanding in open-domain environment	2016-2018	GAUK	Dialog, Information Retrieval, Linked data, Machine Learning, Semantics
ANNMT: Utilization of artificial neural networks in machine translation	2016-2018	GAUK	Machine Translation
Using Language Knowledge in Scene Text Recognition	2015-2017	GAUK	Multi-modality
cross-coref: Cross-lingual approaches to coreference resolution	2015-2017	GAUK	Annotations, Coreference, Corpora, Data, Machine Learning, Machine Translation, Multilingual
DiaMine: Information mining from spoken dialogue	2015-2017	GAUK	Data, Dialog, Machine Learning, Speech Recognition
Čapek GAUK: An alternative way of getting more annotated linguistic data	2014-2016	GAUK	Annotations, Tools
AdaNLG: An adaptive natural language generator	2014-2016	GAUK	Dialog, Multilingual, Semantics
croSSSynt: Modelling dependency syntax across languages	2014-2016	GAUK	Annotations, Corpora, Data, Multilingual, Parsers
MSDS: Modern Spoken Dialog Systems	2014, 2015, 2016	GAUK	Data, Dialog, Machine Learning, Speech Recognition
DepRefSet: Utilizing a Multitude of References in Machine Translation	2013-2015	GAUK	Data, Machine Translation
Interactive information retrieval in audiovisual dialogue corpora	2013-2015	GAUK	Information Retrieval, Speech Retrieval
Tools and data for Machine Translation between Related Languages	2012-2013	GAUK	Corpora, Data, Machine Translation, Tools, Valency
Utilization of coreference in MT: Utilization of coreference in Machine Translation	2011-2013	GAUK	Linked data, Machine Translation
Sentence-Level Polarity Detection in a Computer Corpus	2011-2013	GAUK	Annotations, Corpora, Data, Lexicons, Tools

Ministry of Culture
	Duration	Provider	Area
Prameny Krkonoš: Prameny Krkonoš. Vývoj systému evidence, zpracování a prezentace pramenů k historii a kultuře Krkonoš a jeho využití ve výzkumu a edukaci	2020-2022	NAKI
ÚSTR: Systém pro trvalé uchování dokumentace a prezentaci historichých pramenů z období totalitních režimů	2016-2019	NAKI
VIADAT: Virtuální asistent pro zpřístupnění historických audiovizuálních dat	2016-2019	NAKI	Annotations, Speech Recognition, Tools
AMALACH	2012-2015	NAKI	Information Retrieval, Machine Translation, Multi-modality, Speech Recognition, Speech Retrieval, Teaching
EVALD (Evaluator of Discourse): Automatic Evaluation of Text Coherence in Czech	1. 3. 2016 – 31. 12. 2019	NAKI	Coreference, Discourse, Information Structure

Provider: CELSA
	Duration	Provider	Area
CELL: Contextual Machine Learning of Language Translations	2020-2022	CELSA	Machine Learning, Machine Translation, Multi-modality, Multilingual

Czech Science Foundation
	Duration	Provider	Area
Word-formation structure of Czech words: a data-based research	2019-2021	GAČR	Data, Morphology
NomVallex II.: Valency of Non-verbal Predicates. An Extension of Valency Studies to Adjectives and Deadjectival Nouns.	2019-2021	GAČR	Corpora, Lexicons, Valency
LiFR: Linguistic Factors of Readability in Czech Administrative and Educational Texts	2019-2021	GAČR	Annotations, Corpora, Data, Discourse, Information Structure, Monolingual, Semantics, Syntax
CzeDParse: Automatická analýza diskurzních vztahů v češtině	2019-2021	GAČR	Annotations, Corpora, Data, Discourse, Lexicons, Parsers
Mnohojazyčný strojový překlad	2018-2020	GAČR	Machine Learning, Machine Translation, Multilingual
LSD: Linguistic Structure Representation in Neural Networks	2018-2020	GAČR	Machine Learning, Machine Translation, Morphology, Multilingual, Parsers, Syntax, Taggers
VALLEX - Between Reciprocity and Reflexivity: The Case of Czech Reciprocal Constructions	2018-2020	GAČR	Data, Lexicons, Monolingual, Semantics, Syntax, Valency
AnaConn: Anaphoricity in Connectives: Lexical Description and Bilingual Corpus Analysis	2017–2019	GAČR	Discourse, Lexicons, Multilingual
ForFun: Subcategorization of adverbial meanings based on corpus data	2017-2019	GAČR	Annotations, Corpora, Data, Monolingual, Semantics
IRTC: Implicit Relations in Text Coherence	2017-2019	GAČR	Annotations, Corpora, Data, Discourse, Psycholinguistics
CzEngClass: Contextually-based synonymy and valency of verbs in a bilingual setting	2017-2019	GAČR	Annotations, Corpora, Data, Lexicons, Semantics, Valency
CorefChains: Structure of coreferential chains in parallel language data	2016-2018	GAČR	Annotations, Coreference, Corpora, Data
NomVallex: Corpus-based Valency Lexicon of Czech Nouns	2016-2018	GAČR	Corpora, Lexicons, Valency
DerInfMorph: An Integrated Approach to Derivational and Inflectional Morphology of Czech	2016-2018	GAČR	Data, Monolingual, Morphology
Manyla: Morphologically and Syntactically Annotated Corpora of Many Languages	2015–2017	GAČR	Annotations, Corpora, Data, Morphology, Multilingual, Parsers, Taggers
zelligharris: Reviving Zellig S. Harris: More linguistic information for distributional lexical analysis of English and Czech	2015-2017	GAČR	Annotations, Corpora, Data, Semantics, Taggers
On Linguistic Structure of Evaluative Meaning in Czech	2015-2017	GAČR	Annotations, Corpora, Data, Lexicons, Semantics
Combining Words: Syntactic Properties of Czech Multiword Expressions with Light Verbs	2015-2017	GAČR	Annotations, Data, Lexicons, Multiword Expressions, Valency
LiStr: Sentence structure induction without annotated corpora	2014 - 2016	GAČR	Machine Learning, Multilingual, Parsers
CzEngVallex: A comparison of Czech and English verbal valency based on corpus material (theory and practice)	2013-2015	GAČR	Annotations, Corpora, Data, Lexicons
Vybrané derivační vztahy pro automatické zpracovaní češtiny	2012–2014	GAČR	Morphology
VALLEX: Delving Deeper: Lexicographic Description of Syntactic and Semantic Properties of Czech Verbs	2012-2015	GAČR	Annotations, Data, Lexicons, Semantics, Syntax, Valency
Systematic, economical and corpus-based description of valency properties of Czech deverbal nouns (theory and practice)	2012-2014	GAČR	Lexicons, Valency
CEMI: Center for large-scale multi-modal data interpretation	2012 - 2019	GAČR	Multi-modality
CorefDisk: Coreference, Discourse Relations and Information Structure in a Contrastive Perspective	2012 - 2015	GAČR	Annotations, Coreference, Corpora, Data, Discourse, Information Structure
CZECHMATE: Čeština ve věku strojového překladu	2011 – 2013	GAČR	Annotations, Corpora, Data, Machine Translation, Morphology, Parsers
NoSCoM: Non-Standard Computational Models and Their Applications in Complexity, Linguistics, and Learning	2010-2014	GAČR
Komputační lingvistika: Explicitní popis jazyka a anotovaná data se zřetelem na češtinu	2010-2013	GAČR	Annotations, Coreference, Corpora, Data, Discourse, Information Structure

OP Praha – Pól růstu ČR
	Duration	Provider	Area
MTviet: Machine Translation from Vietnamese into Czech for the Purposes of the Police of the Czech Republic	2017-2018	Praha OP PPR	Machine Translation

Mellon Foundation (USA)
	Duration	Provider	Area
LAPPS-CLARIN: Transatlantic Collaboration between LAPPS and CLARIN: Semantic, Technical and Infrastructural Interoperability of Services	2016-2018, 2019-2021	Mellon Foundation (USA)	Annotations, Corpora, Data, Tools

FP7: Research - European Commission
	Duration	Provider	Area
TextLink: TextLink: Structuring Discourse in Multilingual Europe	2014 - 2017	FP7	Coreference, Corpora, Discourse, Linked data, Multilingual
QTLeap: Quality Translation by Deep Language Engineering Approaches	2013–2016	FP7	Linked data, Machine Translation
PARSEME: PARSEME: Parsing and Multiword Expressions	2013-2017	FP7	Lexicons, Multiword Expressions, Semantics, Valency
MosesCore	2012-2015	FP7	Data, Machine Translation, Teaching, Tools
EUDAT: EUDAT: European Data Infrastructure	2011–2014	FP7	Data
FAUST: Feedback Analysis for User adaptive Statistical Translation	2010–2013	FP7	Machine Translation
KHRESMOI: Medical information analysis and retrieval	2010-2014	FP7	Information Retrieval, Machine Translation
CLARA: Common Language Resources and their Applications - a Marie Curie ITN	2009-2013	FP7	Annotations, Corpora, Data, Machine Translation, Teaching
EuroMatrixPlus	2009-2012	FP7	Machine Translation

EU Lifelong Learning Programme
	Duration	Provider	Area
Merlin	2012-2014	LLP	Annotations, Corpora, Data

MVČR
	Duration	Provider	Area
PoliSys: Systém pro analýzu policejních dat pro potřeby Policie ČR	03/2017-03/2018	MVČR	Data, Information Retrieval, Machine Learning, Morphology

Inspire
	Duration	Provider	Area
INSPIRE: INSPIRE in Pocket		Inspire	Machine Translation

Institute of Formal and Applied Linguistics

Charles University, Czech Republic
Faculty of Mathematics and Physics

Search form

Grants