Text readability is the comprehensibility of a written text. It depends on the content as well as on a number of linguistic and typographical features. A readable text is a text that the intended reader can understand well without excessive effort.
Readability is associated with clarity, honesty, and respect to the intended reader. This is particularly important in domains such as law, public administration, finance, health care, and customer service, where the communication style reflects the culture of the public life in the given community. For instance, army instructions should be tweaked for all soldiers to understand; informed consents should be optimized for all patients, or recommended reading can be selected to match the appropriate school age or language proficiency level. The English-speaking community has been strongly promoting and enforcing the so-called plain English in these domains (e.g. Garner, 2001; Tiersma, 1999; Cutts, 2013), with other language communities following; in particular the Nordic and German-speaking countries. The concept of plain language requires communication free of jargon and of convoluted, underspecified, and deceptive formulations.
Quantitative readability assessment helps maintain the optimum readability level for the given audience. The first widely used readability formula has been the Flesch Reading Ease (Flesch, 1948). Numerous English readability metrics have been established since then. Older formulas were based on crude quantitative measures, such as the number of syllables, and thus supposedly language-independent. Modern metrics (e.g. McNamara et al., 2014) use linguistically informed features, which makes them clearly language-dependent.
A different approach to quantitative readability assessment was pursued in Germany in the 1970s: the so called Hamburg Readability Concept (Hamburger Verständlichkeitskonzept, Langer et al., 1973, 2013) is based on four text qualities that have been experimentally proven to affect reading comprehension, no matter the intelligence or education of the reader: simplicity, ordering/structure, brevity/succinctness, and stimulating elements. The authors argued that reading comprehension measured on a large number of respondents be a more reliable measure of readability than any quantitative information on formal text elements could be. The Hamburg Readability Concept has delivered one of the most influential style training programs for German up till now, training the writers in sensitivity to the four text qualities in a series of exercises.
Contrary to the international research, the study of readability in Czech has attracted no attention. Czech has neither a continuous tradition of readability research nor an established plain-language planning, despite extensive stylistics research even concerning the administrative style (e.g. Exner, 1992; Panevová and Sgall, 2014; Těšitelová et al., 1980, 1983, 1985; Beneš, 2016; Smolík, 2009; Mistrík, 1997 for Slovak). Šlerka and Smolík have performed a first exploration of some readability formulas on Czech but not further elaborated on the topic (Šlerka and Smolík, 2010). Most recently, Rysová et al. (2017) released a text-cohesion assessment tool for Czech and Straka et al. (to appear 2018) a data set for training summarization of Czech news. Burešová (2017) has defended her master thesis on Automatic Simplification of Czech.
The proposed project thus seeks to fill this gap by a systematic research into readability of Czech texts, making use of methods of theoretical linguistics and natural language processing, as well as international readability research. The combination of the theoretical study with natural language processing allows us to implement evaluation schemes based on exact statistical analysis. Our research focuses on exploring linguistic features affecting human comprehension.