Vision and Language

COMP 646: Deep Learning for Vision and Language | Spring 2025

Instructor: Vicente Ordóñez-Román (vicenteor at rice.edu), Office Hours: Thursdays 10am-11am (DH 2080).

TA: Jaywon Koo (jk125 at rice.edu), Office Hours: Tuesdays 11am-noon (DH 3075).

TA: Catherine He (ch151 at rice.edu), Office Hours: Wednesdays 2:30pm-3:30pm (DH 3075).

TA: Jason Uwaeze (ju6 at rice.edu), Office Hours: Wednesdays 10am-11am (DH 3109).

TA: Ruidi Chang (rc151 at rice.edu), Office Hours: Fridays 1pm-2pm (DH 3110).

Class Time: Tuesdays and Thursdays from 4pm to 5:15pm. Location: Keck Hall 100.

Piazza: [Take me to piazza]

Course Description: Visual recognition and language understanding are two fundamental tasks in the quest toward Artificial Intelligence. In this course we will study and acquire the skills to build machine learning and deep learning models that can reason about images and text for generating image descriptions, find objects in images, generating images from text, image generation and synthesis, and other general tasks involving both text and images. On the technical side we will leverage models such as convolutional neural networks (CNNs), Transformer networks (e.g. BERT, LLama, ViTs), Generative Models (e.g Latent Diffusion, DiTs, VAEs), among others. Emphasis will also be placed on re-using multimodal foundation models such as CLIP, SDXL, LLaMA-3, etc.

Learning Objectives: (a) Develop intuitions about the connections between language and vision, (b) Understand concepts in representation learning for both images and text, (c) Become familiar with state-of-the-art models for tasks in vision and language, (d) Obtain practical experience in the implementation and adaptation of these models.

Prerrequisites: There are no formal strict pre-requisities for this class. We will review basics of machine learning at the beginning of this class. Students however should have a basic knowledge of linear algebra, differential calculus, and basic statistics and probability. Moreover students are expected to have attained some level of proficiency in Python programming or be willing to learn Python programming. Students are encouraged to complete the following activity before the first lecture: [Primer on Image Processing].

Schedule

Date	Topic
Tue, Jan 14	Introduction to vision and language [pptx] [pdf] #welcome
Thu, Jan 16	Supervised vs unsupervised learning and linear classifiers [pptx] [pdf] #machine-learning
Tue, Jan 21	No Class this Day due to Weather
Thu, Jan 23	Stochastic Gradient Descent / Generalization [pptx] [pdf] #machine-learning
Assignment on Pytorch + Image Classification [colab] Due Monday February 10th, 11:59pm (CT).
Tue, Jan 28	Regularization / Softmax / Multi-layer Perceptrons and Backpropagation [pptx] [pdf] #machine-learning
Thu, Jan 30	The Convolutional Operator, Image Filtering and Convolutional Neural Networks [pptx] [pdf] #computer-vision
Tue, Feb 4	Convolutional Neural Network Architectures I: LeNet, AlexNet, VGG [pptx] [pdf] #computer-vision
Thu, Feb 6	Convolutional Neural Network Architectures II: InceptionNets, ResNets [pptx] [pdf] #computer-vision
Assignment Image Question Answering [colab] Due Monday February 24th, 11:59pm (CT).
Tue, Feb 11	Introduction, Bag of Words Representations [pptx] [pdf] #natural-language-processing
Thu, Feb 13	#Spring-Recess (No Scheduled Classes)
Tue, Feb 18	Word Representations, Sequence Models, Language Modeling [pptx] [pdf] #natural-language-processing
Thu, Feb 20	Transformers: BERT, GPT-2, ViT, CLIP [pptx] [pdf] #natural-language-processing
Assignment on Generative Multimodal AI [colab] Due Monday March 10th, 11:59pm (CT).
Tue, Feb 25	Instruction Tuning, Learning from Human Feedback, Efficient Finetuning #natural-language-processing
Thu, Feb 27	#Quiz
Tue, Mar 4	AutoEncoders (AEs) and Variational AutoEncoders (VAEs) [pptx] [pdf]#generative-ai
Thu, Mar 6	Diffusion Models I [pptx] [pdf]#generative-ai
Tue, Mar 11	Diffusion Models II #generative-ai
Thu, Mar 13	Convolutional Neural Networks for Object Detection and Segmentation [pptx] [pdf] #computer-vision
Tue, Mar 18	#Spring-Break (No Scheduled Classes)
Thu, Mar 20	#Spring-Break (No Scheduled Classes)
Tue, Mar 25	Working with training large scale jobs in practice (Wandb, Containers, SLURM) #practical-session
Thu, Mar 27	Instruction Tuning and Multimodality #computer-vision #natural-language-processing
Tue, Apr 1	Working with user interfaces and model deployment (Flask, Gradio, Replit) #practical-session
Thu, Apr 3	Guest Lecture: Recent Trends in Multimodal AI Lisa Anne Hendricks, Research Scientist at Google DeepMind Part of the team that produced the Google Gemini family of multimodal models.
Tue, Apr 8	Recent Trends #computer-vision #natural-language-processing
Thu, Apr 10	Feature Inversion #practical-session
Tue, Apr 15	Recent work in Multimodal AI I #computer-vision #natural-language-processing
Thu, Apr 17	Recent works in Multimodal AI II #computer-vision #natural-language-processing
Tue, Apr 22	Course Recap #computer-vision #natural-language-processing
Thu, Apr 24	Final Project Presentation

Disclaimer: The professor reserves to right to make changes to the syllabus, including assignment due dates. These changes will be announced as early as possible.

Grading: Assignments: 30% (3 assignments), Class Project: 60%, Quiz: 10%. Grade cutoffs -- no stricter than the following: A [between 90% and 100%], B [between 80% and 90%), C [between 70% and 80%), D [between 55% and 70%), F [less than 55%)

COVID-19 Notice: If you have any flu-like symptoms you should stay home. There is no grade for attendance in this class.

Late Submission Policy: No late assignments will be accepted in this class. Unless the student has procured special accommodations for warranted circumstances -- or due to exceptional personal situations. If you consider this might be your case please contact the instructor directly as early as possible. If you contact the instructor on the day of the deadline but before the deadline is past due, then you are required to also submit a copy of your notebook with the progress you have made so far in order to make your request. In general, unless a medical condition or other serious situation is affecting you, please do not email the instructor or TAs requesting to have special considerations. If you need special accommodations regarding a disability, then contact the Disability Resource Center at Rice, please follow the advice on that section of this syllabus.

Honor Code and Academic Integrity: "In this course, all students will be held to the standards of the Rice Honor Code, a code that you pledged to honor when you matriculated at this institution. If you are unfamiliar with the details of this code and how it is administered, you should consult the Honor System Handbook at http://honor.rice.edu/honor-system-handbook/. This handbook outlines the University's expectations for the integrity of your academic work, the procedures for resolving alleged violations of those expectations, and the rights and responsibilities of students and faculty members throughout the process." For this class: If assignments are individual then no collaboration is expected, no two students should submit the same source code. Regardless of circumstances I will assume that any source code, text, or images submitted alongside reports or projects are of the authorship of the students unless otherwise explicitly stated through appropriate means. Any missing information regarding sources will be regarded potentially as a failure to abide by the academic integrity statement even if that was not the intent. Please be careful about citing sources and clearly stating what is your original work and what is not in all assignments and projects. Especially avoid vague statements such as "we built our model based on X", instead be explicit e.g. "we downloaded X and modified the encoder so that it can work with videos instead of images by adding three more layers". Avoid vague statements that make it difficult to understand what you did from what was done by others. Sometimes great projects consist in simply putting together two existing components that someone else developed, however this has to be clearly acknowledged as such.

Title IX Support: Rice University cares about your wellbeing and safety. Rice encourages any student who has experienced an incident of harassment, pregnancy discrimination or gender discrimination or relationship, sexual, or other forms interpersonal violence to seek support through The SAFE Office. Students should be aware when seeking support on campus that most employees, including myself, as the instructor/TA, are required by Title IX to disclose all incidents of non-consensual interpersonal behaviors to Title IX professionals on campus who can act to support that student and meet their needs. For more information, please visit safe.rice.edu or email titleixsupport@rice.edu.

Disability Resource Center: "If you have a documented disability or other condition that may affect academic performance you should: 1) make sure this documentation is on file with the Disability Resource Center (Allen Center, Room 111 / adarice@rice.edu / x5841) to determine the accommodations you need; and 2) talk with me to discuss your accommodation needs."