COMP 480/580 Probabilistic Algorithms and Data Structures
- Instructor: Anshumali Shrivastava (anshumali AT rice)
- TA and Office Hours: Zhen (zc36) -- Monday 10 - 11am (DH 3061), Senthil (sr79) -- Tuesdays 11-12 (DH 3097), Gaurav(gg29) -- Thursday 3-4pm (Abercrombie A216), Itzel (ico1) -- Fridays 4:30-5:30 PM (DH 3059)
- Class Timings: Tue/Thu 01:00PM - 02:15PM
- Location: HRZ 210
- Instructor Office Hours: Anshu: Tue 2:15PM - 3:15PM (DH 3118)
Recent Announcement
Assignment 2 released, Due Date March 3rd HW2
Piazza Link for the Discussions piazza
Assignment 1 released, Due Date Feb 13th HW1
List of sample project topics link
Please sign up for lectures you want to scribe at excel
Overview
This course will be ideal for someone wanting to build a strong foundation in the theory and practice of algorithms for
processing Big-Data. We will discuss advanced data structures and algorithms going beyond deterministic setting and emphasize the role
of randomness in getting significant, often exponential, improvements in computations and memory.
Grading
Prerequisite
COMP 182 or equivalent required. COMP 382 is useful but not required. Basic Knowledge of
Probability. Knowledge of programming is required. Capability to manipulate primitive data structures such as arrays, list, etc. will
be needed for assignments.
Materials
Most of the materials (scribes and slides) needed will be posted on this website.
Some of the materials are fairly new and textbook is yet to be written.
A Nice Book to have is "Probability and Computing: Randomized Algorithms and Probabilistic Analysis
" by Michael Mitzenmacher and Eli Upfal.
Schedule
- 01/14 : Introduction, Logistics, Mark and Re-capture Estimation.Scribe Slide
- 01/16 : Brief Pseudo Randomness, Universal Hashing, Chaining, and Linear Probing.Slide
- 01/21 : Compressed Cache and Bloom Filters. (Why caching does not kill your memory) Slide
- 01/23 : Markov, Chebyshev's, and Chernoff Bounds. Slide Scribe
- 01/28 : Analysis of Hashing, Chaining and Probing. Slide Scribe
- 01/30 : Resizing Hash Tables: Consistent Hashing and balanced allocations.(The idea behind Akamai Technologies)Slide Scribe
- 02/04 : SPOCA: A Stateless, Proportional, Optimally-Consistent Addressing Algorithm. (Best paper in USENIX 2011)Slide Scribe
- 02/06 : Introduction to Stream Computing and Reservoir Sampling. (Algorithms at the Edges)Slide Scribe
- 02/11 : Stream Estimation 1: Count-Min Sketch (Generalized Bloom Filters)Slide Scribe
- 02/13 : Spring Break.
- 02/18 : Stream Estimation 2: Count-Sketches Scribe
- 02/20 : Stream Estimation 3: Count Distinct and Norm Estimation on Streams.Scribe
- 02/25 : Minwise Hashing, Near-Duplicate Detection and LSH 1. (What is "hash function" in this NY-Times article)Slide Scribe
- 02/27 : Minwise Hashing, Near-Duplicate Detection and LSH 2. Scribe
- 03/03 : Basic Sampling. (Beyond Random Sampling)Scribe
- 03/05 : More LSH (Eliminate Pairwise Comparisons)
- 03/10 : Extended Spring Break (Classes Cancelled)
- 03/12 : Extended Spring Break (Classes Cancelled)
- 03/17 : Spring Break.
- 03/19 : Spring Break.
- 03/24 : Advanced Sampling. LSH as Computationally Efficient Importance Samplers. (Adaptive Sampling at the Cost of Random Sampling)
- 03/26 : DNF Counting(A closer Look at Estimation via Sampling).Scribe
- 03/30 : Markov Chains, Stationary distributions, MCMC 1 (Sampling in Complex Spaces).Scribe
- 04/02 : In Class Mid-Term Exam.
- 04/07 : Markov Chains, Stationary distributions, MCMC 2. Scribe
- 04/09 : Markov Chains, Stationary distributions, MCMC 3. Scribe
- 04/14 : Markov Chains, Stationary distributions, MCMC 4.Scribe
- 04/16 : Randomized Routing Scribe
- 04/21 : Estimation on Graphs
- 04/23 : ACE and RACE Density Estimation (Analytics over Edge)
Students with Disability
If you have a documented disability that may affect academic performance, you should: 1) make sure this documentation is on file with Disability Support Services (Allen Center, Room 111 / adarice@rice.edu / x5841) to determine the accommodations you need; and 2) meet with me to discuss your accommodation needs.