Workshop 2023: a follow-up to the course

  • June 14-16, 2023 (Wed-Fri)
  • Malostranske nam. 25, lab SW2
  • Charles University, Faculty of Mathematics ad Physics
  • Prague, Czech Republic


Wednesday June 14

  • 12:00
  • 13:00-14:30
    Shivam Sen, Charles University
    Text Based Semantic Network Analysis in the Digital Age: An Overview
    With the advent of the digital age, enormous amounts of textual data is now readily available over the internet awaiting analysis by a curious Scientist. Moreover, advances in computation technology has made possible the modelling of these unstructured data as networks of words to reveal meanings and relations, of and between concepts, like never before. Thus, this presentation provides an overview of the methodology allowing such kind of analysis. First it introduces some common sources of text data available to researchers in this day along with the techniques for their extraction. Then, it presents the workflow for processing these data and constructing a semantic network from it. And finally, the presentation concludes with a brief on how to analyse such networks and their possible applications in the field of social sciences and humanities.

Thursday June 15

  • 9:00-10:30
    Jan Hajič, jr., Charles University
    0.5 Million Gregorian Chants on a Computational Playground
    Gregorian chant is the foundational musical tradition of Latin Europe (and the billion-strong Roman Catholic church) ever since the 9th century. Thus it pre-dates nearly all concepts of music that we work with today (such as harmony, major and minor keys, or beat), and therefore presents a challenge for musicological understanding. At the same time, it is a rich tradition in terms of surviving sources, and digital chant scholarship has — in a remarkable feat of collaboration — amassed a public database of more than 500 000 records of indvidual chants in manuscripts across all of Europe. Thus, Gregorian chant presents an excellent opportunity for computational research. In this workshop, we introduce chant data and showcase several studies: on the pre-major and minor modality of chant melodies, on unexpected properties of chant repertoire, and on how bioinformatics can help study cultural networks of chant across Europe.
  • 10:45-12:15
    Ondřej Fúsik, Charles University
    Building Corpora: The Old Norse case Study
    This lecture will delve into the fundamental aspects of corpus compilation, using the development of an Old Norse corpus as a case study. It will review the integration of existing Old Norse databases into a larger corpus, discuss text selection and tagging methodologies, and address licensing and availability concerns. Additionally, it will explore the complexities of defining Old Norse as a subject.
  • 12:15-13:00
  • 13:00-14:30
    Dominika Kovaříková, Charles University
    Exploring the Czech National Corpus: Language Dynamics & Interdisciplinary Applications
    The Czech National Corpus offers a wide range of contemporary Czech texts and parallel corpora in 40 different languages. By using tools like KonText, participants will gain insights into its extensive capabilities. This presentation focuses on international students, providing examples in English, German, Italian, French, Polish, and other languages. Participants will have the opportunity to explore the potential of the corpus data in various tasks, such as finding suitable collocations and identifying optimal translation equivalents in context, as well as conducting non-linguistic research in social sciences. This presentation aims to provide a valuable experience for those interested in deepening their understanding of language dynamics and interdisciplinary applications.

Friday June 16

  • 9:00-10:30
    Rudolf Rosa, Charles University
    THEaiTRE: Generating Theatre Play Scripts with GPT Models
    In February 2021, the THEaiTRE team staged the first theatre play for which 90% of the script was automatically generated by an artificial intelligence system. The THEaiTRobot system was based on the GPT-2 language model (then state of the art), created by the OpenAI consortium, complemented with automated translation. We will look at how a generative language model works and what are its strengths and limitations. We will also see what models are currently available and try out some of them in practice.
  • 10:45-12:15
    Petr Plecháč, Czech Academy of Science
    Authorship recognition: the case of Henry VIII
    In the first collection of William Shakespeare’s works published in 1623 (the so-called First Folio) a play appears entitled The Famous History of the Life of King Henry the Eight for the very first time. While the stylistic dissimilarity of Henry VIII  to Shakespeare’s other plays had been pointed out before, it was not until the mid-nineteenth century that Shakespeare’s sole authorship was called into question by James Spedding. Since then various hypotheses were raised on who was involved in the writing of the play and what the precise shares were of the authorial contributions. This class will discuss the results of machine learning driven analysis based on frequencies of words and frequencies of rhythmic patterns.
  • 13:00-15:00
    Lunch, Kuchyň restaurant 


This course is funded by the 4EU+ Alliance.