Information Retrieval - 2017-18 - Syllabus

Summary

  • Boolean systems
  • Vocabularies and dictionaries
  • Indexing
  • Measures and weights for terms
  • Vector Space Model
  • Evaluation of information retrieval systems
  • Topic modeling, probabilistic systems, text classification
  • Clustering
  • Matrix decomposition and latent semantic analysis

Book

Manning, C. D., Raghavan, P., & Sch├╝tze, H. (2008). Introduction to information retrieval (Vol. 1, p. 496). Cambridge: Cambridge university press. (http://nlp.stanford.edu/IR-book/ )

Detailed Program

Course program: first notions

LessonBook Chapters
Course program. Introduction to information retrieval by means of boolean retrieval. 1
Tokenization and normalization. Term vocabulary. Dictionaries. Tolerant retrieval. 2, 3
  • Tech: nltk and mongodb.
  • Practice: implementation of a real boolean system.

Course program: the vector space model

LessonBook Chapters
Scores and weights. Co-occurrences, mutual information, and specific language. 6, 7
The vector space model. 6
Evaluation in information retrieval. 8
Relevance feedback and query expansion. 9
  • Tech: introduction to elasticsearch.
  • Practice: construction of a vector system.

Course program: probabilistic approaches

LessonBook Chapters
Probabilistic information retrieval. Language models. 11, 12
Text classification and vector space classification. 13, 14
Vector machines and machine learning. 15
Relevance feedback and query expansion. 9
  • Practice: classification of real documents.

Course program: linking, matching, and clustering

LessonBook Chapters
Data linking and matching.
Flat clustering and hiearchical clustering. 16, 17
Other clustering approaches
Relevance feedback and query expansion. 9
  • Tech: clustering with python.
  • Practice: clustering of real documents.

Course program: topic modeling

LessonBook Chapters
Matrix decomposition and latent semantic indexing. 18
Topic modeling and Latent Dirichlet Allocation (LDA) (part I).
Topic modeling and Latent Dirichlet Allocation (LDA) (part II).
  • Tech: gensim.
  • Practice: topic discovery.
Ultime notizie
RSS feed
Gli avvisi sui siti Web si intendono aggiornati e gli studenti sono vivamente pregati di NON INVIARE email con richieste di conferma di date/orari.
Precedente edizione del corso
Le informazioni relative al corso dell'A.A. 2015-16 sono disponibili in archivio

This web page integrates the information available on the official course page on the website of the Department of Computer Science.
In case of technical problems write to the ISLab web admin (web [at] islab.di.unimi.it).