Sylvie Archaimbault, Martin Hájek, Jana Plaňavová Latanowicz

Data Analytics for Students of Social Studies and Humanities

  Time: Tue 10:40-12:10 
  Place: on-line on the Zoom platform.
  • Language: English
  • The classes will be recorded. The video files will be available here. 
Aim of the course

We encourage students to use data in their projects.

This course is a gentle, programming-free combination of lectures and practical demonstrations of real-life data workflows in various Social Studies and Humanities (SSH) research areas. It aims at motivating the SSH students to improve their digital literacy in more advanced data analytics courses. The curriculum has arisen as a joint effort of Charles University (CU), University of Warsaw (UW), and Sorbonne University (SU).

This course does not require any prior data analysis or computer science experience. All you need to get started is basic computer literacy.

You will learn how to tell data stories and captivate your future audiences with TableauPublic, how to use the systems Transkribus and Pero for the digitization of historical documents, and how to annotate texts in TEITOK. We will acquaint yout with the André Mazon's digitized correspondence archive and with the migrant stories published at i am a migrant


No. Date Topic Teaching materials
1. Feb 15  Introduction (CU)
    -- course organization, motivation, outline
    -- basic terminology

Zoom link:

 - npfl134-lec-1.pdf (last update: Feb 14)
 - Lecture video: TBA
2. Feb 22  André Mazon (SU, CU)
    -- his correspondence
    -- getting data: digitization

Zoom link:

 - npfl134-lec-2.pdf (last update: Feb 24)
 - lecture video
3. Mar 1

 Collection of André Mazon's correspondence I (CU)
    -- analysis of metadata using the Tableau system

  HW #1 assignment

Zoom link:

 - npfl134-lec-3: Mindmap of Tableau and the contents of this lecture
 - Download Mazon collection metadata and corresponding geocoding data
 - Download Silvie's Tableau Workbook file

Lecture videos:
 - First inspection of Mazon metadata in a spreadsheet editor (NPFL134/3a)
 - Getting Data into Tableau  (NPFL134/3b, Tableau I)
 - Plot a categorical variable (NPFL134/3c, Tableau II)
 - Filters (NPFL134/3d, Tableau III)
 - Plots, cummulative sum in Tableau  (NPFL134/3e, Tableau IV)
 - Calculated variables in Tableau (NPFL134/3f, Tableau V)
 - More about calculated variables in Tableau (NPFL134/3g, Tableau VI)

4. Mar 8

 Collection of André Mazon's correspodence II (CU)
    -- analysis of letters (images and transcriptions)
    -- Optical Character Recognition
    -- Handwritten Text Recognition
    -- Transkribus and Pero systems

 HW #2 assignment

Zoom link:  

 - npfl134-transkribus.pdf

Lecture videos:
- Introductory Presentation (NPFL 134/4)
How to process a document in Transkribus (NPFL134/4a)
How to process a document in PERO (NPFL134/4b)

5. Mar 15

 Collection of André Mazon's correspondence III (CU)
    -- images and transcriptions in TEITOK
    -- annotating data: linguistic processing
    -- UDPipe and NameTag tools
    -- searching and querying data in TEITOK

   HW #3 assignment

Zoom link:  

 - lecture video

 - Teitok login page

 - PERO (you may ignore this link)

6. Mar 22

 Licensing data (UW)
-- legal and ethical aspects
    -- open access, open science, licenses

Zoom link:

 - npfl134-lec-6.pdf (last update: Mar 25)
 - lecture video

7. Mar 29

 Sharing data in repositories (CU)
    -- e.g. LINDAT repository

   HW #4 assignment

Zoom link:

 - npfl134-lec-7.pdf (last update: Mar 29) 
 - lecture video

8. Apr 5

 Introduction to Machine Learning (CU)
    -- using data
    -- basic idea, terminology
    -- Natural Language Processing

   HW #5 assignment

Zoom link:

 - npfl134-lec-8.pdf (last update: Apr 5) 
 - lecture video

9. Apr 12  Quantitative textual analysis in Sociology (CU)
    -- analysis of social discourses
    -- media reporting and web presentations

   HW #6 assignment

   Zoom link:

npfl134-lec-9.pdf  (last update: Apr 11), on-line
- lecture video

10. Apr 19  Quantitative textual analysis in Sociology (CU)
      -- word co-occurence analysis of social discourses
      -- autobiographical narratives 

   Zoom link:
 - npfl134-lec-10.pdf (last update: Apr 19), on-line
 - lecture video
11. Apr 26

 Readability of legal texts (UW, CU)
-- HW #5 evaluation
    -- measuring readability of legal texts 

 Zoom link:

 - npfl134-lec-11.pdf   (last update: Apr 26)
 - lecture video
12. May 3

 Searching in the Mazon collection with the PML-TQ (CU)
 Searching the ParlaMint corpus (CU)

 Zoom link:

Part A
  - lecture video

Part B
  -  npfl134-lec-12b.pdf  (last update: May 5)
  - ParlaMint-GB 2.1 and gentle programming in R 
     - gb-eu.csv, gbeu.Rs-fr.csv
  - lecture video

13. May 10

 Tableau data visualisation, part I (CU)
   -- student presentations
     -- data insight

Zoom link:

   - TEITOK - DraCor Shakespeare's Dramas Corpus

 UD references
   - parts of speech (upos)
   - features (feats)
   - dependency relations (deprel)
   - UDPipe GUI

   - PMLTQ 1
   - PMLTQ 2

14. May 17

  Tableau data visualisation, part II (CU)
   -- student presentations
     -- data insight

Zoom link:

  June 15-17
in Prague
Workshop: a follow-up to the course  



  1. Brett, M.R. Topic Modeling: A Basic Introduction. The Journal of Digital Humanities 2(1): 12-16. 2012. on-line
  2. Corrales Compagnucci, Marcelo. Big Data, Databases and "Ownership" Rights in the Cloud. 2020.
  3. Erjavec, T., Ogrodniczuk, M., Osenova, P. et al.The ParlaMint corpora of parliamentary proceedings. Lang Resources & Evaluation (2022).
  4. Foster, Ian, Ghani, Rayid, Jarmin, R.S., Kreuter, F. and Lane, J. (ed.). Big Data and Social Science: A Practical Guide to Methods and Tools (Chapman & Hall/CRC Statistics in the Social and Behavioral Sciences). 2017.
  5. Hladká Barbora, Holub Martin: A Gentle Introduction to Machine Learning for Natural Language Processing: How to start in 16 practical steps.In: Language and Linguistics Compass, vol. 9, No. 2, pp. 55-76, 2015.
  6. Jurafski, Dan, Martin, James H. Speech and Langugae Processing. 2021. url
  7. Piotrowski, Michael. Natural Language Processing for Historical Texts. Morgan & Claypool Publishers. 2012. pdf
  8. Glossary of common terms used in the course: url


By courtesy of DataCamp, you will receive a six-month access to their e-learning materials. These will help you master Tableau Public to the level you wish.

The dataset of André Mazon's correspondence is available for the course's activities based on the Partnership Agreement between the Center of Slavic Studies (Sorbonne University) and the Institue of Formal and Apllied Linguistics (Charles University).

This course is funded by the 4EU+ Alliance under grant agreement No 2021_F3_10, visit this site.