NPFL067/068 Syllabus 2012-2013
Statistical NLP I., II. [Statistické metody zpracování přirozených jazyků]
NPFL067 ZS 2/2 Z,Zk - NPFL068 LS 2/2 Z,Zk
Lectures (fall semester):
Where: Malostranské nam. 25, 4th floor, S1
When: Thu 12:20 - 13:50
Seminars (fall semester):
Where: Malostranské nam. 25, 4th floor, S1
When: Thu 14:00 - 15:30
Instructor: Jan Hajič/Pavel Pecina
Email: hajic@ufal.mff.cuni.cz
WWW: http://ufal.mff.cuni.cz/~hajic
Office: MFF UK Malostranské nam., 4th floor, rm 420/422
Prerequisites & Relation to Other Courses
Students should have a substantial programming experience in either C, C++, Java and/or Perl, and have preferably taken Data Structures (Datove struktury, NTIN060), Unix (NSWI095), and Intro to Probability (NMAI059) or their equivalents, even though all the probability theory needed will be re-explained. Knowledge of, or willingness to learn the basics of Perl as-you-go (and on your own) is also important. One of the benefits of the course is that it is given in English; it should enable you to read current literature on NLP more smoothly, since the literature is almost exclusively in English. Czech terminology will be explained for those interested.
The material covered in this course is selected in such a way that at its completion you should be able to understand papers in the field of Natural Language Processing, and it should also make your life easier when taking more advanced courses either at UFAL MFF UK or elsewhere.
No background in NLP is necessary.
Readings
Required
|
Manning, C. D. and H. Schütze:
Foundations of Statistical Natural Language Processing. The MIT Press. 1999. ISBN 0-262-13360-1.
Eight copies of this book are available at the CS library for borrowing. Please be considerate to other students and do not keep the book(s) longer than absolutely necessary. |
Recommended & Reference Readings:
Jurafsky, D. and J. H. Martin:
Speech and Language Processing.
Prentice-Hall. 2000. ISBN
0-13-095069-6.
Three copies of Jurafsky's book are available at UFAL's library. | |
Wall, L., Christiansen, T. and R. L. Schwartz: Programming PERL, 3rd ed.. O'Reilly. 1996. ISBN 0-596-00027-8. (Sorry no large cover picture available.) | |
Allen, J.: Natural Language Understanding. The Benajmins/Cummings Publishing Company Inc. 1994. ISBN 0-8053-0334-0. | |
Cover, T. M. and J. A. Thomas: Elements of Information Theory. Wiley. 1991. ISBN 0-471-06259-6. | |
Charniak, E.: Statistical Language Learning. The MIT Press. 1996. ISBN 0-262-53141-0. | |
Jelinek, F.: Statistical Methods for Speech Recognition. The MIT Press. 1998. ISBN 0-262-10066-5. Four copies of Jelinek's book are available at UFAL's library, but they are primarily reserved for those taking Nino Peterek's and/or Filip Jurcicek's courses. |
Proceedings of major conferences (related to Natural Language Processing):
Some of the Proceedings are available at UFAL's library, physically and/or in electronic form. Most of them are, however, available through the ACL Anthology.
- ACL (Association of Computational Linguistics)
- European Chapter of the ACL
- North American Chapter of the ACL
- EMNLP (Empirical Methods in Natural Language Processing)
- COLING (International Committee of Computational Linguistics)
- ANLP (Applied Natural Language Processing, by ACL)
- ACL SIGDAT, SIGNLL other SIG (Special Interest Groups) Workhops, such as WVLC (Workshop on Very Large Corpora)
- DARPA HLT (Defense Advanced Research Project Agency Human Language Technology Workshops)
Other Resources:
- CLSP Workshops: Language Engineering for Students and Professionals Integrating Research and Education
Assignments & Due Dates
Unix lab accounts
For MFF UK students, please see http://www.ms.mff.cuni.cz/labs/unix. For others, please visit http://www.ms.mff.cuni.cz/students/externisti.html.Turning in the Assignments
-
Use a separate directory for each assigment. Create a main web page
called index.html or index.htm in
that directory. Create as many other web pages as necessary. Put all
the other necessary files (.ps files, pictures, source code, ...)
into the same directory and make relative links to them from your main
or other linked web pages. If you use some "content creation" tools
related to MSFT software please make sure the references use the
correct case (matching uppercase/lowercase).
- Pack everything into a single .tgz file:
cd <your assignment #x directory>
tar -czvf ~/username.assignx.tgz ./* Send the resulting file by e-mail (as an attachment) to
hajic@ufal.mff.cuni.cz
with the following subject line:
Subject: <your name in the form (no accents): First.Last> <number of the assignment>
e.g.
Subject: Jan.Novak 2
for Jan Novák, turning in the second assignment.
Plagiarism
No plagiarism will be tolerated. The assignments are to be worked on on your own; please respect it. If the instructor determines that there are substantial similarities exceeding the likelihood of such an event, he will call the two (or more) students to explain them and possibly to take an immediate test (or assignment, at the discretion of the instructor, not to exceed four hours of work) to determine the student's abilities related to the offending work. *All* cases of confirmed plagiarism will be reported to the Student Office.Lateness
For each day your submission is late, 5 points will be subtracted from the points awarded to the solution or a part of it, up to max. of 50 points per homework. Submissions received less then 4 weeks before the closing date of the term will not be graded and will be awarded 0 points.The Assignments
No. | Course | Due date | Task | Resources |
---|---|---|---|---|
#1 | NPFL067 | February 28, 2013 | Exploring Entropy and Language Modeling | TEXTEN1.txt (large!) TEXTCZ1.txt (large!) |
#2 | NPFL068 | April 30, 2013 | Word Classes | TEXTEN1.txt (large!) TEXTCZ1.txt (large!) TEXTEN1.ptg (large!) TEXTCZ1.ptg (large!) |
#3 | NPFL068 | June 30, 2013 | Tagging | texten2.ptg (large!) textcz2.ptg (large!) |
Additional resources
- Eric Brill's short guide to Perl.
Tentative Course Schedule:
For the moment, please see the Hopkins page to get an idea.
Exam
Exam | Date, Time | Where |
---|---|---|
NPFL067 | Jan. 10, 2013, 12:20 - 13:30 | S1 |
NPFL068 | May 14, 2013, 12:20 - 14:15 | S9 |
Both the mid-term and the final exams will be written (not oral), with about 6 major questions and some subquestions. You will have 60 minutes for the mid-term, and up to 90 minutes for the final exam to write down the answers.
To get an idea of the type of exam questions, please see the questionaire for one of the previous year's final exam (Questionnaire).
As stated above, your final grade (or pass/fail for PhD students) will be determined by both the final exam and your assignment results in a 50:50 ratio (NPFL067), and 1:1:1 (or in other words, roughly 33:33:33) for NPFL068.
In special circumstances (long-term absence etc.), some other schedule and grading scheme could be worked out individually, but please try hard to hand in all assignments in time and come for the final exam on the regular date.
NPFL067 Grades
The official, 'usual' grading table is now available here, not on Pavel's pages. You will need a username and password to access it - I will email it to you.
NPFL068 Grades
The official, 'usual' grading table is now available here, not on Pavel's pages. You will need a username and password to access it - same as above (it has been mailed to you).
Archive
The web pages from 2011/2012, including grading (password needed as usual) are available at http://ufal.mff.cuni.cz/~hajic/courses/archive/npfl067/1011/syllabus.html.
The original web pages for this course are also still active at http://www.cs.jhu.edu/~hajic/courses/cs465/syllabus.html.