PFL043 Syllabus: Statistical NLP (2002/2003)
[Statistické metody zpracování přirozených jazyků, 2002/2003]
ZS 2/2 Z,Zk LS 2/2 Z,Zk

Where: Malostranske na. 25, m. c. 64 (2.p.)
When: Thu 14:30 - 17:10

Instructor: Jan Hajic
Office: MFF UK Malostranske nam., 2nd floor, rm 66
Phone: +420 221 914 257
Office hours: Thu 13:00-14:00

Prerequisites & Relation to Other Courses:

Students should have a substantial programming experience in either C, C++, Java and/or Perl, and have preferably taken Data Structures (Datove struktury, TIN005), Unix (SWI015), and Intro to Probability (MAI016) or their equivalents, even though all the probability theory needed will be re-explained. Knowledge of, or willingness to learn the basics of Perl as-you-go (and on your own) is also important. Part of the benefits of the course is that although the lectures will be given in Czech, all the materials are in English and all the current English terminology will be explained in order to facilitate reading of current NLP literature, which is almost exclusively in English.

The material covered in this course is selected in such a way that at its completion you should be able to understand papers in the field of Natural Language Processing, and it should also make your life easier when taking more advanced courses either at UFAL MFF UK or elsewhere.

No background in NLP is necessary.


Assignments & Due Dates:

Unix lab accounts

For MFF UK students, please see For others, please visit

Turning in the Assignments

  • How (technically)

    The Assignments

    No. Due date Task Resources
    Dec. 02, 2002 Exploring Entropy and Language Modeling TEXTEN1.txt (large!) TEXTCZ1.txt (large!)
    Feb. 24, 2003 Word Classes TEXTEN1.txt (large!) TEXTCZ1.txt (large!) TEXTEN1.ptg (large!) TEXTCZ1.ptg (large!)
    Apr. 22, 2003 Tagging texten2.ptg (large!) textcz2.ptg (large!)
    May 16, 2003 TBA  
    Open to submissions Tentative Closed to submissions

    Additional resources:

    Tentative Course Schedule:

    For the moment, please see the last year's page to get an idea.

    Grading Weights:

    Assignments (4) 50%
    Final exam 50%

    The assignments will get points (100 points each max.). Therefore, the final exam will be valued at 400 points.


    Exam Date, Time Where
    Mid-term (Questionnaire, Answers) Dec. 5, 2002, 2-2:30 64, 2.p., MS
    Final TBA TBA

    Both the mid-term and the final exams will be written (not oral), with about 6 major questions and some subquestions. You will have 30 minutes for the mid-term, and up to 2 hours for the final exam to write down the answers.

    To get an idea of the type of exam questions, please see the questionaire for one of the previous year's midterm exam, again at original course page.

    As stated above, your final grade (or pass/fail for PhD students) will be determined by both the final exam and your assignment results in a 50:50 ratio.

    In special circumstances (long-term absence etc.), some other schedule and grading scheme could be worked out individually, but please try hard to hand in all assignments in time and come for the final exam on the regular date.