Workshop 2022: a follow-up to the course

  • workshop participants are not required to take the course
  • if you are interested in, please contact Barbora Hladka at hladka@ufal.mff.cuni.cz

Programme

Wednesday Jun 15

Thursday Jun 16

  • 9:00-10:30
    Jana Plaňavová Latanowicz, University of Warsaw
    Analysing readability of legal documents (guided exercise) 

    Old Roman law principle provides that ignorance of law is no excuse. With the growing amount of legislation written by legalise, it starts to be intellectually inaccessible to citizens. Understanding of the most basic acts of law by citizens seems to become rather wishful thinking. This workshop class aims to teach how to use tools for measuring readability of documents and guide the participants through very simple exercises.

  • 10:45-12:15
    Radim Hladík, Czech Academy of Sciences
    From Data to Insights with Open Science Tools: Exploration of Theatrical Plays with DraCor and R
    Presentation

    The talk aims to demonstrate the research value of well-curated datasets and open-sourced tools. We will begin with the introduction of the statistical programming language R to show how the ecosystem of open-source software packages lowers the threshold for entry into the data-driven analysis. The example will rely on DraCor (https://dracor.org/), an easily accessible corpus of theatrical works accompanied by a rich set of metadata. We will show how in several steps we can move from obtaining data to their transformation and providing insights with visualizations of descriptive statistics. Apart from the appreciation of FAIR data, the students should gain a basic understanding of how a computational analysis proceeds and what advantages it offers.

  • 12:15-13:00
    Lunch - Sandwich delivery from Bread Gap
     
  • 13:00-14:30
    Tomáš Musil, Charles University
    THEaiTRE: Generating a Theatre Play Script
    Presentation

    In February 2021, an experimental theatre play “AI: When a Robot Writes a Play”, where 90% of its script was automatically generated by artificial intelligence, was premiered in Švanda theatre in Prague. In this class you will learn how the play was created and how to make the computer generate a new script based on your own input.

Friday Jun 17

  • 9:00-10:30
    Tomáš Musil, Charles University
    The Geometry of Meaning: Analysing Text with Word Embeddings
    Presentation

    Word embeddings are representations of words in multidimensional space. In this class, you will learn how to create word embeddings by various methods and how to use the embeddings to analyse text and create interesting visualisations.

  • 10:45-12:15
    Martin Holub, Jakub Genči, Charles University
    Automatic Text Categorization: The Case of Authorship Detection
    Presentation

    This lecture will be a follow-up to the previous introductory lesson given in the course (see Class #8, April 5 2022) on natural language processing and machine learning. Main general principles of machine learning will be recapitulated and we will demonstrate how machine learning can be applied in the field of text categorization. In particular, we will focus on the recognition of author style as a special case of automatic text categorization. Successful models for authorship prediction employ statistical analysis of n-gram distributions. Both theoretical principles and experimental results will be shown.

  • 12:15-13:00
    Light lunch at Kafe & Hrnky 

Acknowledgement​

This course is funded by the 4EU+ Alliance under grant agreement No 2021_F3_10, visit this site.