SIS code: 
Sylvie Archaimbault, Martin Hájek, Jana Plaňavová Latanowicz

Data Analytics for Students of Social Studies and Humanities

  • Time: Tuesday 12:20-13:50, 1st lecture on February 14, 2023
  • Place: hybrid = present (Matfyz, Malostranské nám. 25, 118 00 Prague) + on-line (Zoom)
  • Language: English
  • Lecture videos from 2021/22: Youtube

Aim of the course

We encourage students to use data in their projects.

This course is a gentle, programming-free combination of lectures and practical demonstrations of real-life data workflows in various Social Studies and Humanities (SSH) research areas. It aims at motivating the SSH students to improve their digital literacy in more advanced data analytics courses. The curriculum has arisen as a joint effort of Charles University (CU), University of Warsaw (UW), and Sorbonne University (SU).

This course does not require any prior data analysis or computer science experience. All you need to get started is basic computer literacy.

You will learn how to tell data stories and captivate your future audiences with TableauPublic, how to use the systems Transkribus and Pero for the digitization of historical documents, and how to annotate texts in TEITOK. We will acquaint yout with the André Mazon's digitized correspondence archive and with the migrant stories published at i am a migrant

Calendar 2022/23

No. Date Topic Teaching materials
1. Feb 14  Introduction
    -- course organization, motivation, outline
    -- basic terminology

Zoom link: TBA

2. Feb 21 Collection of André Mazon's correspondence I
    -- Mazon’s correspondence
    -- digitization
3. Feb 28 Beginner's guide to data analysis with Excel  
4. Mar 7 Collection of André Mazon's correspondence II
    -- analysis of metadata using the Tableau system
5. Mar 14 Collection of André Mazon's correspondence III
    -- analysis of letters (images and transcriptions)
    -- Optical Character Recognition, Handwritten Text Recognition
    -- Transkribus and Pero systems
6. Mar 21 Introduction to the Universal Dependencies framework
7. Mar 28 Collection of André Mazon's correspondence IV
    -- annotating data
    -- linguistic processing using the UDPipe and NameTag tools
    -- searching and querying data in TEITOK


8. Apr 4 Quantitative textual analysis in Sociology


9. Apr 11 Quantitative textual analysis in Sociology


10. Apr 18 Collection of André Mazon's correspondence V
    -- visualization in Gephi, part I
11. Apr 25 Collection of André Mazon's correspondence V
    -- visualization in Gephi, part II
12. May 2 Introduction to Machine Learning


13. May 9 Student presentations


14. May 16 Student presentations  



  1. Brett, M.R. Topic Modeling: A Basic Introduction. The Journal of Digital Humanities 2(1): 12-16. 2012. on-line
  2. Corrales Compagnucci, Marcelo. Big Data, Databases and "Ownership" Rights in the Cloud. 2020.
  3. Erjavec, T., Ogrodniczuk, M., Osenova, P. et al.The ParlaMint corpora of parliamentary proceedings. Lang Resources & Evaluation (2022).
  4. Foster, Ian, Ghani, Rayid, Jarmin, R.S., Kreuter, F. and Lane, J. (ed.). Big Data and Social Science: A Practical Guide to Methods and Tools (Chapman & Hall/CRC Statistics in the Social and Behavioral Sciences). 2017.
  5. Hladká Barbora, Holub Martin: A Gentle Introduction to Machine Learning for Natural Language Processing: How to start in 16 practical steps.In: Language and Linguistics Compass, vol. 9, No. 2, pp. 55-76, 2015.
  6. Jurafski, Dan, Martin, James H. Speech and Langugae Processing. 2021. url
  7. Piotrowski, Michael. Natural Language Processing for Historical Texts. Morgan & Claypool Publishers. 2012. pdf
  8. Glossary of common terms used in the course: url


By courtesy of DataCamp, you will receive a six-month access to their e-learning materials. These will help you master Tableau Public to the level you wish.

The dataset of André Mazon's correspondence is available for the course's activities based on the Partnership Agreement between the Center of Slavic Studies (Sorbonne University) and the Institue of Formal and Apllied Linguistics (Charles University).

This course is funded by the 4EU+ Alliance under grant agreement No 2021_F3_10, visit this site.