JEB133 Data analysis

Barbora Hladká, March 2024

The lecture and seminars on Data Analysis within the Economy History course (No. JEB133) will be organized as follows: the lecture on 03/15 will be replaced by an on-line self-guided tutorial demonstrating a parliamentary data analysis. Please, study this tutorial in detail. The seminars on 03/15 and 03/22 will be replaced by an on-line Zoom session on 03/22, at 2:30 p.m. to address students' questions related to the assignment (see below). Here is a Zoom link to join the session https://cesnet.zoom.us/j/2592348083?omn=93750870963 and if no one join the session by 2:45 p.m., I close the session.

The students must complete a project assignment. The motivation behind the assignment is to encourage students to use data in their projects. The assignment is a practical hands-on training where the students will be using the collections of parliamentary data compiled in the ParlaMint project. Please, check the list of ParlaMint collections available in KonText https://www.clarin.si/kontext/corpora/corplist > ParlaMint. 

Project assignment

Answer the following questions

  1. How many times the speakers in a parliament of your choice have talked about a topic of your choice?
    E.g. I am interested in the debates in the Czech parliament on the topic of vaccination (očkování in Czech). Therefore I choose the collection ParlaMint-CZ 4.0 (Czech parliament) and run the query [lemma="očkování"]. 
  2. How did the overall frequency of the text change over months?

Proceed analogously to the solution demonstrated in the tutorial

  1. Choose the collection in KonText - work with its version 4.0 (or 2.1).
  2. Create a query to answer your questions (1) and (2).
  3. Extract data from KonText and upload them to a Google spreadsheet.
  4. Create a pivot table to answer the question (1) and a plot to answer the question (2) - see pp. 15-18 in the tutorial.
  5. Create one more sheet in your spreadhseet and describe your results. Your description should not be longer than 500 characters (including spaces). 
  6. Share your spreadsheet with vidohlad@gmail.com by April 2, 2024. Then I will score your results: the maximum score is 6 points and the minimum score is 1 point. 

Feedback

March 26, 2024

I have looked at several submitted Google spreadsheets (Thank you!) and based on that I have a few comments:

  1. Don't forget to sort the pivot table with the speakers by the COUNTA of name attribute in descendening order (see p. 16 in the tutorial)
  2. Check that you have the correct axis labels in the plot: x axis yyyy-mm and y-axis COUNTA-of-yyyy-mm (see p. 18 in the tutorial).
  3. Edit the chart title as follows:  list the corpus you used, e.g. GB-2.1, and the query you submitted, e.g. [lemma = "leave"][lemma = "the"][lemma = "European"][lemma = "Union"] (see p. 18 in the tutorial)
  4. Please save your charts in png format (see p. 19 in the tutorial) and upload them to a shared Google Photo Album.

Micro Gallery