NPFL103 – Information Retrieval

"Information retrieval is the task of searching a body of information for objects that statisfied an information need."

This course is offered at the Faculty of Mathematics and Physics to graduate students interested in the area of information retrieval, web search, document classification, and other related areas. it is based on the book by Christopher D. Manning, Prabhakar Raghavan, Hinrich Schütze Introduction to Information Retrieval. The course covers both the foundations of information retrieval and some more advanced topics.

About

SIS code: NPFL103
Semester: winter
E-credits: 6
Examination: 2/2 C+Ex
Lecturer: Pavel Pecina, pecina@ufal.mff.cuni.cz

Timespace Coordinates

  • Mondays, 15:40-17:10 (18:50)
  • S10, Malostranské nám. 25

News

  • The course will start on Oct 2 at 15:40.
  • The exam will take place on Jan 15 at 15:40 in S1.

Lectures

1. Introduction, Boolean retrieval, Inverted index, Text preprocessing Slides

2. Dictionaries, Tolerant retrieval, Spelling correction Slides

3. Index construction and index compression Slides

4. Ranked retrieval, Term weighting, Vector space model Slides

5. Ranking, Complete search system, Evaluation, Benchmarks Slides 1. Vector space models

6. Result summaries, Relevance Feedback, Query Expansion Slides

7. Probabilistic information retrieval Slides

8. Language models, Text classification Slides

9. Vector space classification Slides

10. Document clustering Slides

11. Latent Semantic Indexing Slides 2. Retrieval frameworks

12. Web search, Crawling, Duplicate and spam detection, PageRank Slides

13. Review session, Questions and Answers

14. Exam


Assignments

1. Vector space models

2. Retrieval frameworks


Prerequisities

Students should have a substantial programming experience and be familar with basic algorithms, data structures, and statistical/probabilistic concepts.

Passing Requirements

To pass the course, students need to complete two homework assignments and pass a written test. See Grading for more details.

1. Introduction, Boolean retrieval, Inverted index, Text preprocessing

 Oct 02 Slides


2. Dictionaries, Tolerant retrieval, Spelling correction

 Oct 09 Slides


3. Index construction and index compression

 Oct 23 Slides


4. Ranked retrieval, Term weighting, Vector space model

 Oct 30 Slides


5. Ranking, Complete search system, Evaluation, Benchmarks

 Nov 06 Slides 1. Vector space models


6. Result summaries, Relevance Feedback, Query Expansion

 Nov 13 Slides


7. Probabilistic information retrieval

 Nov 20 Slides


8. Language models, Text classification

 Nov 27 Slides


9. Vector space classification

 Dec 4 Slides


10. Document clustering

 Dec 11 Slides


11. Latent Semantic Indexing

 Dec 18 Slides 2. Retrieval frameworks


12. Web search, Crawling, Duplicate and spam detection, PageRank

 Jan 08 Slides


13. Review session, Questions and Answers

 Jan 08


14. Exam

 Jan 08

1. Vector space models

 Deadline: Dec 31 23:59  100 points

Design, develop and evaluate your own retrieval system based on vectore space models.

2. Retrieval frameworks

 Deadline: Jan 31 23:39  100 points

Design, develop and evaluate a state-of-the-art retrieval system using one of the off-the-shelf retrieval frameworks.

Homework assignments

  • There are two homework assignments during the semester.
  • The assignments require a substantial amount of programming, experimentation, and reporting to complete.
  • Students present their solutions during the practicals.
  • The assignments are graded by 0-100 points each and are to be worked on independently.

Exam

  • The exam is in a form of a written test, scheduled at the end of semester.
  • The test include approximately 20 short-answer questions covered by the topics discussed during the course.
  • Maximum duration of the test is 90 minutes.
  • The test is graded by 0-100 points and is to be worked on independently.

Grading

  • Both the homework assignments and exam are required to pass the course.
  • Students are required to earn at least 50 points for each assignment and at least 50 points for the test.
  • The final grade will be based on the average results of the exam and homework assignments weighted equally:
    1. 100% - 90%
    2. 89% - 70%
    3. 69% - 50%
    4. 0% - 50%

Plagiarism

  • No plagiarism will be tolerated.
  • All cases of plagiarism will be reported to the Student Office.

Introduction to Information Retrieval

Christopher D. Manning, Prabhakar Raghavan, Hinrich Schütze Cambridge University Press, 2008, ISBN: 978-0521865715.

Available online.




Information Retrieval, Algorithms and Heuristics

David A. Grossman and Ophir Frieder, Springer, 2004, ISBN 978-1402030048.





Modern Information Retrieval

Ricardo Baeza-Yates and Berthier Ribeiro-Neto, Addison Wesley, 1999, ISBN: 978-0201398298.