Deep Learning for Visual Recognition

Course Description: How can we use computers to recognize objects, people, actions, animals, places, etc from images? This seemingly trivial task that people perform without much effort has remained one of the core problems in Computer Vision. Recent advances in representation learning using multiple layers of abstraction (deep learning) have demonstrated to be an important aspect for designing artificial systems for visual recognition. In this class we will study, conceive, and implement deep learning models and learning algorithms for computational visual recognition. After this class you will be able to understand, design, implement, and assess the impact of deep learning techniques for a diverse range of visual recognition tasks.

Learning Objectives: (a) Develop intuitions between aspects in human vision and computer vision, (b) Understanding foundational concepts for representation learning using neural networks, (c) Become familiar with state-of-the-art models for tasks such as image classification, object detection, image segmentation, scene recognition, etc, and (d) Obtain practical experience in the implementation of visual recognition models using deep learning.

Prerrequisites: This course requires no previous background in computer vision or machine learning but knowledge in either of those will be helpful. You need to know about matrices, calculating derivatives, and probabilities (bayes rule). You will also need to be at least a moderately proficient programmer in python. There will be several lab assignments. These assignments will show you the basics of modern general visual recognition algorithms and models, and will give you the tools for implementing more advanced ones. Finally, we will have a class project where you will be able to work on something beyond your assignments and where you will have more freedom to pursue a focused problem that is of your interest and better matches your background. Finally we will be using python/pytorch in the lecture notes, so being proficient in Python by completing a few projects in this language before the class starts is helpful. You should install python, jupyter, and pytorch, and complete the following notebook [pytorch_tensors].

Syllabus

Date	Topic
Mon, January 13th	Introduction to Visual Recognition [pptx] [pdf] + Primer on Image Processing [link]
Assignment on Image Processing and Manipulation [Colab]. Due January 26th 5pm EST.
Wed, January 15th	Image Processing and Image Manipulations [pptx] [pdf]
Mon, January 20th	MLK Holiday -- no class this day
Assignment on Image Classification [Colab]. Due February 3rd 11:59pm EST.
Wed, January 22nd	Softmax Classifier + Stochastic Gradient Descent [pptx] [pdf]
Mon, January 27th	Shallow Image Features and the Bag of Features model [pptx] [pdf]
Assignment on Deep Learning Basics [Colab]. Due February 10th 11:59pm EST.
Wed, January 29th	Neural Networks and the Multi-layer Perceptron Model [pptx] [pdf]
Mon, February 3rd	Convolutional Neural Networks (CNNs) [pptx] [pdf]
Assignment on Convolutional Neural Networks [Colab]. Due February 24th 11:59pm EST.
Wed, February 5th	Speaker: Dr. Catherine Schuman (Oak Ridge National Laboratory) Guest Lecture: Neuromorphic Computing More information: Dr. Catherine Schuman works as Research Scientist at the Oak Ridge National Lab (ORNL) in Tennessee in Neuromorphic computing and Spiking Neural Networks. These are models that function in some ways more similarly to processes in the brain and seem to be promising in terms of efficiency.
Mon, February 10th	Convolutional Neural Network Architectures: LeNet, AlexNet, VGG, GoogLeNet, ResNet [pptx] [pdf]
Wed, February 12th	Deep Learning-based Object Detection [pptx] [pdf]
Mon, February 17th	Deep Learning-based Semantic Image Segmentation [pptx] [pdf]
Wed, February 19th	Generative Adversarial Networks (GANs) [pptx] [pdf]
Mon, February 24th	Paper Review: CNNs as Features for Transfer Learning CNN Features off-the-shelf: an Astounding Baseline for Recognition. Ali Sharif Razavian, Hossein Azizpour, Josephine Sullivan, Stefan Carlsson. CVPR 2014 Workshops. [arxiv] (Presented by Ziyan Yang) Do Better ImageNet Models Transfer Better? Simon Kornblith, Jonathon Shlens, Quoc V. Le CVPR 2019 [arxiv] (Presented by Paola Cascante-Bonilla)
Wed, February 26th	Recurrent Neural Networks (RNNs) [pptx] [pdf]
Mon, March 2nd	Paper Review: Face Recognition and Pose Estimation Deep Face Recognition. Omkar M. Parkhi, Andrea Vedaldi, and Andrew Zisserman. BMVC 2015. [pdf] (Presented by Nazanin and Navreet) Deep High-Resolution Representation Learning for Human Pose Estimation. Ke Sun, Bin Xiao, Dong Liu, Jingdong Wang. CVPR 2019 [arxiv] (Presented by Minjie and Leizhen).
Wed, March 4th	Paper Review: Recent Methods for Object Detection and Instance Segmentation. Mask R-CNN. by Kaiming He, Georgia Gkioxari, Piotr Dollár, Ross Girshick. ICCV 2017 [arxiv] (Presented by Andrew and Soneya). CornerNet: Detecting Objects as Paired Keypoints. by Hei Law, Jia Deng . ECCV 2018. [arxiv] (Presented by Fazlay and Matthew).
Mon, March 9th	Spring recess -- no class this day
Wed, March 11th	Spring recess -- no class this day
Mon, March 16th	Extended Spring recess due to COVID-19 -- no class this day -- stay safe!
Wed, March 18th	Extended Spring recess due to COVID-19 -- no class this day -- stay safe!
Mon, March 23rd	Paper Review: Interpreting and Explaining Deep Neural Networks Network Dissection: Quantifying Interpretability of Deep Visual Representations. David Bau, Bolei Zhou, Aditya Khosla, Aude Oliva, Antonio Torralba. CVPR 2017 [arxiv] (Presented by Zhe and Will). Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization. Ramprasaath R. Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, Dhruv Batra. ICCV 2017. [arxiv] (Presented by Ruipeng and Zhiming).
Wed, March 25th	Paper Review: Image to Text: Image Captioning Show and Tell: A Neural Image Caption Generator. Oriol Vinyals, Alexander Toshev, Samy Bengio, Dumitru Erhan. CVPR 2015 [arxiv] (Presented by Jacob and Ahsan). Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering. Peter Anderson, Xiaodong He, Chris Buehler, Damien Teney, Mark Johnson, Stephen Gould, Lei Zhang. CVPR 2018. [arxiv] (Presented by Jiaxin and Zheng).
Mon, March 30th	Paper Review: Structured Prediction with Partial Labels + Efficient NNs I Learning Structured Inference Neural Networks with Label Relations. Hexiang Hu, Guang-Tong Zhou, Zhiwei Deng, Zicheng Liao, Greg Mori CVPR 2016 [arxiv] (Presented by Anshuman and Kamya). Feedback-prop: Convolutional Neural Network Inference under Partial Evidence Tianlu Wang, Kota Yamaguchi, Vicente Ordonez. CVPR 2018. [arxiv] (Presented by Zijie and Lulu). Efficient NNs I: MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. Andrew G. Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, Hartwig Adam. [arxiv] (Presented by Arjit and Gaurav).
Wed, April 1st	Paper Review: Conditional Generative Adversarial Networks (GANs) Image-to-Image Translation with Conditional Adversarial Networks. By Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, Alexei A. Efros. CVPR 2017 [arxiv] (Presented by Shivani and Akhila). Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks. Jun-Yan Zhu, Taesung Park, Phillip Isola, Alexei A. Efros. ICCV 2017. [arxiv] (Presented by Sanchit and Rishab).
Mon, April 6th	Paper Review: Avoiding Visual Bias in Computer Vision Women also Snowboard: Overcoming Bias in Captioning Models. By Kaylee Burns, Lisa Anne Hendricks, Kate Saenko, Trevor Darrell, Anna Rohrbach. ECCV 2018 [arxiv] (Presented by Tina and Junyu). Balanced Datasets Are Not Enough: Estimating and Mitigating Gender Bias in Deep Image Representations. Tianlu Wang, Jieyu Zhao, Mark Yatskar, Kai-Wei Chang, Vicente Ordonez ICCV 2019. [arxiv] (Presented by Nidhi and Daniel C.).
Wed, April 8th	Paper Review: Video Recognition + Efficient NNs II Efficient NNs II: ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design. Ningning Ma, Xiangyu Zhang, Hai-Tao Zheng, Jian Sun. ECCV 2018 [arxiv] (Presented by Fuxiao and Dexuan). Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset. Joao Carreira, Andrew Zisserman. CVPR 2017. [arxiv] (Presented by Mustafa and Will). SlowFast Networks for Video Recognition. Christoph Feichtenhofer, Haoqi Fan, Jitendra Malik, Kaiming He. ICCV 2019. [link] (Presented by Tx and yf7da).
Mon, April 13th	Paper Review: Transformer Networks ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks Jiasen Lu, Dhruv Batra, Devi Parikh, Stefan Lee NeurIPS 2019 [arxiv] (Presented by Mofijul and Arash). VisualBERT: A Simple and Performant Baseline for Vision and Language. Liunian Harold Li, Mark Yatskar, Da Yin, Cho-Jui Hsieh, Kai-Wei Chang . [arxiv] (Presented by Sanxing and Zhe).
Wed, April 15th	Paper Review: Self-supervised Learning Self-Supervised Learning of Pretext-Invariant Representations. Ishan Misra, Laurens van der Maaten . [arxiv] (Presented by Martin and Leticia). Momentum Contrast for Unsupervised Visual Representation Learning. Kaiming He, Haoqi Fan, Yuxin Wu, Saining Xie, Ross Girshick. [arxiv] (Presented by Rasool and Seyed).
Mon, April 20th	Paper Review: Colorization and Super-resolution ESRGAN: Enhanced Super-Resolution Generative Adversarial Networks. Xintao Wang, Ke Yu, Shixiang Wu, Jinjin Gu, Yihao Liu, Chao Dong, Chen Change Loy, Yu Qiao, Xiaoou Tang. ECCV 2018 Workshops [arxiv] (Presented by Aniruddha and Akhil). Learning Diverse Image Colorization. Aditya Deshpande, Jiajun Lu, Mao-Chuang Yeh, Min Jin Chong, David Forsyth. CVPR 2017. [arxiv] (Presented by Phillip and Colin).
Wed, April 22nd	Paper Review: Neural Architecture Design and Search Exploring Randomly Wired Neural Networks for Image Recognition. Saining Xie, Alexander Kirillov, Ross Girshick, Kaiming He. [arxiv] (Presented by ki5hd and zm8bh). DARTS: Differentiable Architecture Search. Hanxiao Liu, Karen Simonyan, Yiming Yang. ICLR 2019. [arxiv] (Presented by Aashikur and Daniel W.).
Mon, April 27th	Course Re-cap and Good Bye! [slides]

Disclaimer: The professor reserves to right to make changes to the syllabus, including assignment due dates. These changes will be announced as early as possible.

Grading: ** Due to the COVID-19 state of emergency we have exceptionally changed the grading scheme** to use by default the following distribution: Assignments: 600pts (4 assignments: 150pts + 150pts + 150pts + 150pts), Class Project: 200pts, Reading Summaries: 100pts, Class Presentation: 100pts. Letter grades to be decided as follows: A+ (950pts), A (850pts), A- (800pts), B+ (770pts), B (750pts), B- (730pts), C+ (700pts), C (670pts), C- (650pts), D+ (630pts), D (600pts), D- (570pts).

Note: The old grading scheme will be applied if this leads to a more favorable grade for the student: Assignments: 400pts (4 assignments: 100pts + 100pts + 100pts + 100pts), Class Project: 400pts, Reading Summaries: 100pts, Class Presentation: 100pts. Letter grades to be decided as follows: A+ (1000pts), A (930pts), A- (900pts), B+ (870pts), B (830pts), B- (800pts), C+ (770pts), C (730pts), C- (700pts), D+ (670pts), D (630pts), D- (600pts).

Late Submission Policy: No late assignments will be accepted in this class. Unless the student has procured special accommodations for this class.

Academic Integrity Statement: "The School of Engineering and Applied Science relies upon and cherishes its community of trust. We firmly endorse, uphold, and embrace the University’s Honor principle that students will not lie, cheat, or steal, nor shall they tolerate those who do. We recognize that even one honor infraction can destroy an exemplary reputation that has taken years to build. Acting in a manner consistent with the principles of honor will benefit every member of the community both while enrolled in the Engineering School and in the future. Students are expected to be familiar with the university honor code, including the section on academic fraud."

Accessibility Statement: "The University of Virginia strives to provide accessibility to all students. If you require an accommodation to fully access this course, please contact the Student Disability Access Center (SDAC) at (434) 243-5180 or sdac@virginia.edu. If you are unsure if you require an accommodation, or to learn more about their services, you may contact the SDAC at the number above or by visiting their website at https://www.studenthealth.virginia.edu/student-disability-access-center/about-sdac."

Other similar courses or courses with useful related material:

Introduction to Deep Learning (Joseph Redmon and Ali Farhadi, University of Washington)
Deep Learning for Perception (Dhruv Batra, Virginia Tech)
Visual Recognition (Yong Jae Lee, UC Davis)
Introduction to Computer Vision (James Hays, Brown University / Georgia Tech)
Convolutional Neural Networks for Visual Recognition (Fei-fei Li, Andrej Karpathy and Justin Johnson, Stanford University)
Machine Learning (Nando de Freitas, University of Oxford)
Visual Recognition (Adriana Kovashka, University of Pittsburgh)
Multimodal Learning with Vision, Language and Sound (Leonid Sigal, University of British Columbia)
Recognizing People, Objects and Actions (Tamara L. Berg, UNC Chapel Hill)

CS 6501-003: Deep Learning for Visual Recognition

Instructor: Vicente Ordóñez-Román (vicente at virginia.edu). Office Hours: Tuesdays 3 to 5pm (Rice 310)

Teaching Assistant: Ziyan Yang (zy3cx at virginia.edu) -- Hours: Thursdays 3pm to 5pm (Rice 442)

Teaching Assistant: Paola Cascante-Bonilla (pc9za at virginia.edu) -- Office Hours: Fridays 2 to 4pm (Rice 442)

Class Time: Mondays & Wednesdays between 3:30PM and 4:45PM, at Olsson Hall 005.

Discussion Forum: Piazza

Syllabus

Other similar courses or courses with useful related material: