NPFL070 – Language Data Resources

About

SIS code: NPFL070
Semester: summer
E-credits: 5
Examination: 1/2 MC (KZ)
Instructors: Martin Popel, Zdeněk Žabokrtský

  • the classes combine lectures and practicals

Timespace Coordinates

Tuesday 14.00–16.20, Linux lab SU1

Course prerequisities

Course passing requirements

To pass the course. you will need to submit homework assignments and do a written test. See Grading for more details.

Lectures

1. Introduction

2. Corpora - Case Study: the Czech National Corpus

Using annotated data for evaluation

Treebanking

Universal dependencies, Udapi (by Martin Popel)

Parsing and practical applications (by Martin Popel)

1. Introduction

  • Course overview
  • Prerequisities: Make sure you have a valid account for accessing the Czech National Corpus. If not, see the CNC registration page.
  • Overview od language data resources: slides

2. Corpora - Case Study: the Czech National Corpus

Using annotated data for evaluation

Treebanking

Universal dependencies, Udapi (by Martin Popel)

Parsing and practical applications (by Martin Popel)

Latent Dirichlet Allocation

 Deadline: Nov 14 23:59  10 points  Duration: 2h

Unuspervised text segmentation

 Deadline: Dec 5 23:59  10 points  Duration: 2h

Finding motifs in DNA

 Deadline: Dec 19 23:59  10 points  Duration: 2h

Homework assignments

  • There will be 8–12 homework assignments.
  • For most assignments, you will get points, up to a given maximum (the maximum is specified with each assignment).
    • If your submission is especially good, you can get extra points (up to +10% of the maximum).
  • Most assignments will have a fixed deadline (usually in 1 week).
  • If you submit the assignment after the deadline, you will get:
    • up to 50% of the maximum points if it is less than 2 weeks after the deadline;
    • 0 points if it is more than 2 weeks after the deadline.
  • Once we check the submitted assignments, you will see the points you got and the comments from us in:

Test

Grading

Your grade is based on the average of your performance; the test and the homework assignments are weighted 1:1.

  1. ≥ 90%
  2. ≥ 70%
  3. ≥ 50%
  4. < 50%

For example, if you get 600 out of 1000 points for homework assignments (60%) and 36 out of 40 points for the test (90%), your total performance is 75% and you get a 2.

No cheating

  • Cheating is strictly prohibited and any student found cheating will be punished. The punishment can involve failing the whole course, or, in grave cases, being expelled from the faculty.
  • Discussing homework assignments with your classmates is OK. Sharing code is not OK (unless explicitly allowed); by default, you must complete the assignments yourself.
  • All students involved in cheating will be punished. E.g. if you share your assignment with a friend, both you and your friend will be punished.