Silvie Cinková
Main Research Interests
- Lexical Semantics
- Corpus Linguistics
- Linguistic Annotation
- Computational Lexicography
- Germanic languages (English, German, Swedish, and Icelandic)
- Digital Humanities
- Readability
Projects
Current projects
Readability
![]()
Readability is the ease with which a reader comprehends a written text. It can be a matter of life and death: how long may it sensibly take you to learn how to use a public-access defibrillator?
Poor understanding of legally binding texts can ruin one's life conditions. Obscure texts often conceal frauds. Kafkaesque forms suggest incompetence or dishonesty of your local administration. An understanding impairment makes you particularly vulnerable: are you a second language speaker? Do you wear thick glasses or a hearing aid? Do you or any of your near ones suffer from dyslexia or even a slight mental handicap? You may be exposed, and readability ought to be on your agenda, too. If, on the other hand, you run an office, your work may be just piling up with endless iterations with clients leading nowhere. Try and tailor your documents to your grandmother. Readable documents make better administration, as shows e.g. a study by G.Mills and M. Duckworth from 1996 (http://www.clarity-international.net/wp-content/uploads/2014/05/Gains-from-Clarity.pdf).
Readability of textbooks influences the study motivation from the young age on. Too few young adults eager to start MINT subjects? Poor PISA results in reading? If you as a parent are in troubles helping your kid with a biology homework when you have just consulted their school textbook, do not wonder that they are not all crazy about delving deeper in biology, ever. And no, abbreviating a college textbook never makes a remotely satisfactory textbook for younger teenagers.
Past projects
Manual annotations
As my first project at UFAL, I have coordinated the manual deep-syntax ("tectogrammatical") annotations of the Prague English Dependency Treebank and later the Prague DaTabase of Spoken English.
Recently, I coordinated and performed the manual annotation of a sample of English verbs according to the Corpus Pattern Analysis to explore how high an interannotator agreement we were able to achieve with this approach. For more detail and further experiments with lexical semantics, see our Semantic Pattern Recognition project page or directly browse our sample.
In the CEMI project, I was performing some pilot annotations and creating annotation instructions to the Image Text Understanding task.
Until 2015 I was in charge of the Czech-Swedish parallel corpus in the Intercorp project.
Rule-based automatic annotations
As part of my dissertation, I created a rule-based Swedish lemmatizer (not maintained since 2009) and word-sketch definitions to find verbs and their relevant noun collocates, including their modifiers and several other structures. These rules were later adopted in the Sketch Engine.
More linguistic information for distributional lexical analysis of English and Czech
- What makes two word senses hard to tell apart? Experiments with interannotator agreement in a semantic task based on the Corpus Pattern Analysis.
- Which linguistic information improves the performance of the word2vec word embedding model? An experiment with morphosyntactic derivations.
For details of the project documentation see https://ufal.mff.cuni.cz/grants/zelligharris.
Service
- member of the editorial board of Orð og tunga
- Czech national coordinator of the DARIAH CLARIN Digital Humanities Course Registry
Curriculum Vitae
Structured CV in Czech
Structured CV in English
Teaching
Quantitative linguistics and R programming for linguists and students of humanities
I fell for R in 2014. With my purely scholarly background making me learn all this the hardest way, I am a very empathetic teacher. If you are a humanities student and need a really gentle start in data visualization, data wrangling, and (simple) statistical computing, come and check out http://ufal.github.io/NPFL112 and http://ufal.mff.cuni.cz/courses/r-for-humanities/english (taught together with Václav Cvrček every summer term, Czech or English on demand). Disclaimer: The course is too slow for students of computer science!
Selected Bibliography
- Google Scholar
- ORCID: 0000-0003-4526-3915
- Scopus ID: 26664407500
- Researcher ID: J-3520-2012



