NPFL067/068 Syllabus 2012-2013

Statistical NLP I., II. [Statistické metody zpracování přirozených jazyků]
NPFL067 ZS 2/2 Z,Zk - NPFL068 LS 2/2 Z,Zk

Lectures (fall semester):
Where: Malostranské nam. 25, 4th floor, S1
When: Thu 12:20 - 13:50

Seminars (fall semester):
Where: Malostranské nam. 25, 4th floor, S1
When: Thu 14:00 - 15:30

Instructor: Jan Hajič/Pavel Pecina
Email: hajic@ufal.mff.cuni.cz
WWW: http://ufal.mff.cuni.cz/~hajic
Office: MFF UK Malostranské nam., 4th floor, rm 420/422

Prerequisites & Relation to Other Courses

Students should have a substantial programming experience in either C, C++, Java and/or Perl, and have preferably taken Data Structures (Datove struktury, NTIN060), Unix (NSWI095), and Intro to Probability (NMAI059) or their equivalents, even though all the probability theory needed will be re-explained. Knowledge of, or willingness to learn the basics of Perl as-you-go (and on your own) is also important. One of the benefits of the course is that it is given in English; it should enable you to read current literature on NLP more smoothly, since the literature is almost exclusively in English. Czech terminology will be explained for those interested.

The material covered in this course is selected in such a way that at its completion you should be able to understand papers in the field of Natural Language Processing, and it should also make your life easier when taking more advanced courses either at UFAL MFF UK or elsewhere.

No background in NLP is necessary.


Readings

Required

Manning/Schuetze cover small
Manning, C. D. and H. Schütze: Foundations of Statistical Natural Language Processing. The MIT Press. 1999. ISBN 0-262-13360-1.

Eight copies of this book are available at the CS library for borrowing. Please be considerate to other students and do not keep the book(s) longer than absolutely necessary.

Recommended & Reference Readings:

Jurafsky, Martin cover small Jurafsky, D. and J. H. Martin: Speech and Language Processing. Prentice-Hall. 2000. ISBN 0-13-095069-6.

Three copies of Jurafsky's book are available at UFAL's library.
Wall et al. cover small Wall, L., Christiansen, T. and R. L. Schwartz: Programming PERL, 3rd ed.. O'Reilly. 1996. ISBN 0-596-00027-8. (Sorry no large cover picture available.)
Allen cover small Allen, J.: Natural Language Understanding. The Benajmins/Cummings Publishing Company Inc. 1994. ISBN 0-8053-0334-0.
Cover/Thomas cover small Cover, T. M. and J. A. Thomas: Elements of Information Theory. Wiley. 1991. ISBN 0-471-06259-6.
Charniak cover small Charniak, E.: Statistical Language Learning. The MIT Press. 1996. ISBN 0-262-53141-0.
Jelinek cover small Jelinek, F.: Statistical Methods for Speech Recognition. The MIT Press. 1998. ISBN 0-262-10066-5. Four copies of Jelinek's book are available at UFAL's library, but they are primarily reserved for those taking Nino Peterek's and/or Filip Jurcicek's courses.

Proceedings of major conferences (related to Natural Language Processing):

Some of the Proceedings are available at UFAL's library, physically and/or in electronic form. Most of them are, however, available through the ACL Anthology.

Other Resources:


Assignments & Due Dates

Unix lab accounts

For MFF UK students, please see http://www.ms.mff.cuni.cz/labs/unix. For others, please visit http://www.ms.mff.cuni.cz/students/externisti.html.

Turning in the Assignments

Plagiarism

No plagiarism will be tolerated. The assignments are to be worked on on your own; please respect it. If the instructor determines that there are substantial similarities exceeding the likelihood of such an event, he will call the two (or more) students to explain them and possibly to take an immediate test (or assignment, at the discretion of the instructor, not to exceed four hours of work) to determine the student's abilities related to the offending work. *All* cases of confirmed plagiarism will be reported to the Student Office.

Lateness

For each day your submission is late, 5 points will be subtracted from the points awarded to the solution or a part of it, up to max. of 50 points per homework. Submissions received less then 4 weeks before the closing date of the term will not be graded and will be awarded 0 points.

The Assignments

No. Course Due date Task Resources
 #1  NPFL067 February 28, 2013 Exploring Entropy and  Language Modeling TEXTEN1.txt (large!) TEXTCZ1.txt (large!)
 #2  NPFL068 April 30, 2013 Word Classes TEXTEN1.txt (large!) TEXTCZ1.txt (large!) TEXTEN1.ptg (large!) TEXTCZ1.ptg (large!)
 #3  NPFL068 June 30, 2013 Tagging texten2.ptg (large!) textcz2.ptg (large!)

Additional resources


Tentative Course Schedule:

For the moment, please see the Hopkins page to get an idea.


Exam

Exam Date, Time Where
NPFL067 Jan. 10, 2013, 12:20 - 13:30 S1
NPFL068 May 14, 2013, 12:20 - 14:15 S9

Both the mid-term and the final exams will be written (not oral), with about 6 major questions and some subquestions. You will have 60 minutes for the mid-term, and up to 90 minutes for the final exam to write down the answers.

To get an idea of the type of exam questions, please see the questionaire for one of the previous year's final exam (Questionnaire).

As stated above, your final grade (or pass/fail for PhD students) will be determined by both the final exam and your assignment results in a 50:50 ratio (NPFL067), and 1:1:1 (or in other words, roughly 33:33:33) for NPFL068.

In special circumstances (long-term absence etc.), some other schedule and grading scheme could be worked out individually, but please try hard to hand in all assignments in time and come for the final exam on the regular date.

NPFL067 Grades

The official, 'usual' grading table is now available here, not on Pavel's pages. You will need a username and password to access it - I will email it to you.

NPFL068 Grades

The official, 'usual' grading table is now available here, not on Pavel's pages. You will need a username and password to access it - same as above (it has been mailed to you).

Archive

The web pages from 2011/2012, including grading (password needed as usual) are available at http://ufal.mff.cuni.cz/~hajic/courses/archive/npfl067/1011/syllabus.html.

The original web pages for this course are also still active at http://www.cs.jhu.edu/~hajic/courses/cs465/syllabus.html.