Spring 2024: COMP 631 - Introduction to Information Retrieval



Meeting times and location      02:00PM - 03:15PM MW DCH 1070

Course Description and Prerequisites:

It is the information age! We face challenges brought by large-scale and unstructured information on open systems such as the Web and social media. Through this course, we'll study the theory, design, and implementation of text-based and Web-based information retrieval systems, including an examination of web and social media mining algorithms and techniques at the core of modern search, data mining and LLMs applications.

Prerequisites: This is a graduate level course. While there are no official pre-requisites, it may be beneficial for students to have had previous exposure to linear algebra and basic probability theory. You should be able to design and develop large programs and learn new software libraries on your own.

Learning Outcomes:

The goal of this course is deriving a comprehensive understanding of fundamental issues, techniques, applications and future directions of Information Retrieval. In particular, by the end of the semester students will be able to:

  • Define and explain the key concepts and models relevant to information storage and retrieval, including efficient text indexing, boolean, vector space and probabilistic retrieval models, relevance feedback, document clustering and text categorization, Web search, including crawling, indexing, and link-based algorithms like PageRank.
  • Design, implement, and evaluate the core algorithms underlying a fully functional web search / data mining system, including the indexing, retrieval, and ranking components, as well as advanced algorithms like document clustering and text categorization.
  • Identify the salient features and apply recent research results in web search and data mining, including topics such as collaborative filtering, adversarial information retrieval, location-based services, and social information management.

Instructor Information:

Name Xia "Ben" Hu
Telephone number 979-845-8873
Email address xia.hu@rice.edu
Office hours Thu 10-11am
Office location See syllabus on Canvas

TA Information:

Textbook and/or Resource Material:

Primary Textbook:

"Introduction to Information Retrieval"
Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schutze
Cambridge University Press. 2008. Available at Cambridge University Press, at Amazon, and other fine booksellers.

Other Textbooks:
"Mining of Massive Datasets"
Anand Rajarman and Jeffrey D. Ullman

"Data-Intensive Text Processing with MapReduce"
Jimmy Lin and Chris Dyer

"Networks, Crowds, and Markets: Reasoning About a Highly Connected World"
David Easley and Jon Kleinberg

Grading Policies:

Grading Scale: A = 90-100%, B = 80-90%, C = 70-80%, D = 60-70%, F = below 60%.

The course grading policy is as follows:

  • Class participation and quizzes - 5%
  • Homework assignments - 20%
  • Project - 30%
  • Exams - 45%

Attendance and Make-up Policies:

Ten quizzes will be randomly taken in the semester and are used to measure the attendance as well. Seven quizzes are required for full score. As long as more than seven quizzes are received successfully, no extra evidence is needed. Otherwise an excused absence is required. If the number of attendances is less than seven, we will deduct one point for each absence. The specific excused absences and rules can be found at https://ga.rice.edu/undergraduate-students/academic-policies-procedures/attendance-excused-absences/

Course Topics:

This course will mainly cover the following topics:

  • Statistical properties of text
  • Vector space model
  • Statistical language models
  • Learning to rank
  • Recent evaluation, NDCG, using clickthrough
  • Network essentials, network measures, hubs and authorities, PageRank
  • Homophily, Social Influence, Reciprocity
  • Classification, naive Bayes, kNN, SVM
  • Clustering, K-means, community detection
  • Recommender systems

Other Pertinent Course Information:

Homework: In addition to some regular homework exercises (assignments and quizzes), students are encouraged to participate in classroom discussions and Q&A.

Project: Students are expected to work on some programming projects. We will discuss the format in our first class. The evaluation of the project consists of progress report, project presentation and/or demonstration, and a written report.

Americans with Disabilities Act (ADA):

The Americans with Disabilities Act (ADA) is a federal anti-discrimination statute that provides comprehensive civil rights protection for persons with disabilities. Among other things, this legislation requires that all students with disabilities be guaranteed a learning environment that provides for reasonable accommodation of their disabilities. If you believe you have a disability requiring an accommodation, please contact Rice’s Disability Resource Center. For additional information, visit https://policy.rice.edu/402

Academic Integrity:

"On my honor, I have neither given nor received any unauthorized aid on this (exam, quiz, paper)."

Upon accepting admission to Rice University, a student immediately assumes a commitment to uphold the Honor Code, to accept responsibility for learning, and to follow the philosophy and rules of the Honor System. Students will be required to state their commitment on examinations, research papers, and other academic work. Ignorance of the rules does not exclude any member of the Rice community from the requirements or the processes of the Honor System. For additional information please visit: https://oaa.rice.edu/policies-and-procedures/honor-code



Acknowledgement:

The course materials have been copied or adapted from the previous editions of CSCE 670, taught by Professor James Caverlee of Texas A&M University.