Main Research Interests
- Lexical Semantics
- Knowledge Representation
- Corpus Linguistics
- Linguistic Annotation
- Computational Lexicography
- Germanic languages (English, German, Swedish, and Icelandic)
- Digital Humanities
- Distant Reading
Readability is the ease with which a reader comprehends a written text. It can be a matter of life and death: how long may it sensibly take you to learn how to use a public-access defibrillator?
Poor understanding of legally binding texts can ruin one's life conditions. Obscure texts often conceal frauds. Kafkaesque forms suggest incompetence or dishonesty of your local administration. An understanding impairment makes you particularly vulnerable: are you a second language speaker? Do you wear thick glasses or a hearing aid? Do you or any of your near ones suffer from dyslexia or even a slight mental handicap? You may be exposed, and readability ought to be on your agenda, too. If, on the other hand, you run an office, your work may be just piling up with endless iterations with clients leading nowhere. Try and tailor your documents to your grandmother. Readable documents make better administration, as shows e.g. a study by G.Mills and M. Duckworth from 1996 (http://www.clarity-international.net/wp-content/uploads/2014/05/Gains-from-Clarity.pdf).
Readability of textbooks influences the study motivation from the young age on. Too few young adults eager to start MINT subjects? Poor PISA results in reading? If you as a parent are in troubles helping your kid with a biology homework when you have just consulted their school textbook, do not wonder that they are not all crazy about delving deeper in biology, ever. And no, abbreviating a college textbook never makes a remotely satisfactory textbook for younger teenagers.
Check out my first readability project here. This one is about texts in literary studies.
This is a short-term project with IBM Watson Labs, started fall 2017.
More linguistic information for distributional lexical analysis of English and Czech
- What makes two word senses hard to tell apart? Experiments with interannotator agreement in a semantic task based on the Corpus Pattern Analysis.
- Which linguistic information improves the performance of the word2vec word embedding model? An experiment with morphosyntactic derivations.
For details of the project documentation see https://ufal.mff.cuni.cz/grants/zelligharris.
As my first project at UFAL, I have coordinated the manual deep-syntax ("tectogrammatical") annotations of the Prague English Dependency Treebank and later the Prague DaTabase of Spoken English.
Recently, I coordinated and performed the manual annotation of a sample of English verbs according to the Corpus Pattern Analysis to explore how high an interannotator agreement we were able to achieve with this approach. For more detail and further experiments with lexical semantics, see our Semantic Pattern Recognition project page or directly browse our sample.
In the CEMI project, I was performing some pilot annotations and creating annotation instructions to the Image Text Understanding task.
Until 2015 I was in charge of the Czech-Swedish parallel corpus in the Intercorp project.
Rule-based automatic annotations
As part of my dissertation, I created a rule-based Swedish lemmatizer (not maintained since 2009) and word-sketch definitions to find verbs and their relevant noun collocates, including their modifiers and several other structures. These rules were later adopted in the Sketch Engine.
- member of the editorial board of Orð og tunga
- member of the management committee of the Czech Association for Digital Humanities
- member of the executive committee of the European Association for Digital Humanities (elected end June 2018)
- one of the national coordinators of the COST Action CA 16204 Distant Reading for European Literary History for Czechia
- Czech national coordinator of the DARIAH CLARIN Digital Humanities Course Registry
Structured CV in Czech
Structured CV in English
Quantitative linguistics and R programming for linguists and students of humanities
I fell for R in 2014. With my purely scholarly background making me learn all this the hardest way, I am a very empathetic teacher. If you are a humanities student and need a really gentle start in data visualization, data wrangling, and (simple) statistical computing, come and check out http://ufal.mff.cuni.cz/courses/r-for-humanities/english (taught together with Václav Cvrček every summer term, Czech or English on demand). Disclaimer: The course is too slow for students of computer science!
- Cinková Silvie, Hlávka Zdeněk: Modeling Semantic Distance in the Pattern Dictionary of English Verbs. In: Jazykovedný časopis / Journal of Linguistics, Vol. 68, No. 2, Copyright © SAP – Slovak Academic Press, Bratislava Slovakia, ISSN 0021-5597, pp. 122-135 , Jan 2018
- Baisa Vít, Cinková Silvie, Krejčová Ema, Vernerová Anna: VPS-GradeUp: Graded Decisions on Usage Patterns. In: Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC 2016), Copyright © European Language Resources Association, Paris, France, ISBN 978-2-9517408-9-1, pp. 823-827, 2016
- Cinková Silvie: WordSim353 for Czech. In: Lecture Notes in Computer Science, No. 9924, Text, Speech, and Dialogue: 19th International Conference, TSD 2016, Copyright © Springer International Publishing, Cham / Heidelberg / New York / Dordrecht / London, ISBN 978-3-319-45509-9, ISSN 0302-9743, pp. 190-197, 2016
- Cinková Silvie, Krejčová Ema, Vernerová Anna, Baisa Vít: What Do Graded Decisions Tell Us about Verb Uses. In: Proceedings of the XVII EURALEX International Congress: Lexicography and Linguistic Diversity, Copyright © Tbilisi University Press, Tbilisi, Georgia, ISBN 978-9941-13-542-2, pp. 318-328, 2016
- Oepen Stephan, Kuhlmann Marco, Miyao Yusuke, Zeman Daniel, Cinková Silvie, Flickinger Dan, Hajič Jan, Ivanova Angelina, Urešová Zdeňka: Towards Comparability of Linguistic Graph Banks for Semantic Parsing. In: Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC 2016), Copyright © European Language Resources Association, Paris, France, ISBN 978-2-9517408-9-1, pp. 3991-3995, 2016
- Baisa Vít, Bradbury Jane, Cinková Silvie, El Maarouf Ismail, Kilgarriff Adam, Popescu Octavian: SemEval-2015 Task 15: A CPA dictionary-entry-building task. In: Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015), Copyright © Association for Computational Linguistics, Stroudsburg, PA, USA, ISBN 978-1-941643-40-2, pp. 315-324, 2015
- Oepen Stephan, Kuhlmann Marco, Miyao Yusuke, Zeman Daniel, Cinková Silvie, Flickinger Dan, Hajič Jan, Urešová Zdeňka: SemEval 2015 Task 18: Broad-Coverage Semantic Dependency Parsing. In: Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015), Copyright © Association for Computational Linguistics, Stroudsburg, PA, USA, ISBN 978-1-941643-40-2, pp. 915-926, 2015
- Bojar Ondřej, Cinková Silvie, Hajič Jan, Hladká Barbora, Kuboň Vladislav, Mírovský Jiří, Panevová Jarmila, Peterek Nino, Spoustová Johanka, Žabokrtský Zdeněk: The Czech Language in the Digital Age. Copyright © Springer-Verlag Berlin Heidelberg, Berlin, Germany, ISBN 978-3-642-30705-8, 79 pp., Sep 2012
- Cinková Silvie, Holub Martin, Kríž Vincent: Managing Uncertainty in Semantic Tagging. In: Proceedings of 13th Conference of the European Chapter of the Association for Computational Linguistics, Copyright © Association for Computational Linguistics, Avignon, France, ISBN 978-1-937284-19-0, pp. 840-850, 2012
- Hajič Jan, Hajičová Eva, Panevová Jarmila, Sgall Petr, Bojar Ondřej, Cinková Silvie, Fučíková Eva, Mikulová Marie, Pajas Petr, Popelka Jan, Semecký Jiří, Šindlerová Jana, Štěpánek Jan, Toman Josef, Urešová Zdeňka, Žabokrtský Zdeněk: Announcing Prague Czech-English Dependency Treebank 2.0. In: Proceedings of the 8th International Conference on Language Resources and Evaluation (LREC 2012), Copyright © European Language Resources Association, İstanbul, Turkey, ISBN 978-2-9517408-7-7, pp. 3153-3160, 2012
- Holub Martin, Kríž Vincent, Cinková Silvie, Bick Eckhard: Tailored Feature Extraction for Lexical Disambiguation of English Verbs Based on Corpus Pattern Analysis. In: Proceedings of the 24th International Conference on Computational Linguistics (Coling 2012), Copyright © Coling 2012 Organizing Committee, Mumbai, India, pp. 1195-1209, 2012
- Ptáček Jan, Ircing Pavel, Spousta Miroslav, Romportl Jan, Loose Zdeněk, Cinková Silvie, Gil José Relaño, Santos Raúl: Integration of Speech and Text Processing Modules into a Real-Time Dialogue System. In: Lecture Notes in Computer Science, Vol. 6231, No. 6231/2010, Text, Speech and Dialogue. 13th International Conference, TSD 2010, Brno, Czech Republic, September 6-10, 2010. Proceedings, Copyright © Springer, Berlin / Heidelberg, ISBN 978-3-642-15759-2, ISSN 0302-9743, pp. 552-559, 2010
- Cinková Silvie: Words that Matter: Towards a Swedish-Czech Colligational Dictionary of Basic Verbs. Copyright © UFAL, Malostranské nám. 25, 118 00 Praha 1, ISBN 978-80-904175-3-3, 256 pp., Dec 2009