- Course overview
- Prerequisities: Make sure you have a valid account for accessing the Czech National Corpus. If not, see the CNC registration page.
- Overview od language data resources: slides
2. Corpora - Case Study: the Czech National Corpus
Using annotated data for evaluation
Universal dependencies, Udapi (by Martin Popel)
Parsing and practical applications (by Martin Popel)
Latent Dirichlet Allocation
Deadline: Nov 14 23:59
Unuspervised text segmentation
Deadline: Dec 5 23:59
Finding motifs in DNA
Deadline: Dec 19 23:59
- There will be 8–12 homework assignments.
- For most assignments, you will get points,
up to a given maximum
(the maximum is specified with each assignment).
- If your submission is especially good,
you can get extra points (up to +10% of the maximum).
- Most assignments will have a fixed deadline (usually in 1 week).
- If you submit the assignment after the deadline, you will get:
- up to 50% of the maximum points if it is less than 2 weeks after the deadline;
- 0 points if it is more than 2 weeks after the deadline.
- Once we check the submitted assignments, you will see the points you got and
the comments from us in:
Your grade is based on the average of your performance;
the test and the homework assignments are weighted 1:1.
- ≥ 90%
- ≥ 70%
- ≥ 50%
- < 50%
For example, if you get
600 out of 1000 points for homework assignments (60%)
and 36 out of 40 points for the test (90%),
your total performance is 75% and you get a 2.
- Cheating is strictly prohibited and any student found cheating will be punished.
The punishment can involve failing the whole course, or, in grave cases,
being expelled from the faculty.
- Discussing homework assignments with your classmates is OK. Sharing code is
not OK (unless explicitly allowed); by default, you must complete the assignments yourself.
- All students involved in cheating will be punished. E.g. if you share
your assignment with a friend, both you and your friend will be punished.