-- remark by Antonio Torralba (after his third beer)
Overview
A graduate seminar course in Computer Vision with emphasis on using large amounts of real data (images, video, textual annotations, user preferences, etc) to learn the structure of our visual world toward the ultimate goal of Image Understanding. We will be reading an eclectic mix of classic and recent papers on topics including: theories of perception, low-level vision (color, texture), mid-level vision (grouping and segmentation), object and scene recognition, image parsing, words and pictures models, image manifolds, etc.Prerequisite: 16-720 or similar Computer Vision course
We will meet on Tuesdays and Thursdays from 10:30am-11:50am in NSH 3002.
Instructor: Alexei (Alyosha) Efros, Assistant Professor, 4207 Newell-Simon Hall.
Office Hours: Tuesdays at Noon, Thursdays at 1:30pm
TA: Jean-Francois Lalonde, A521 Newell-Simon Hall.
Office Hours: Monday 1:30pm and Wednesday 1:30pm (also by appointment if you can't make it: jlalonde at cs)
Projects
Check out this list of data sources for some ideas on where to get images to work with.Challenges:
Class Schedule
A list of suggested papers to present is available here.If you want to change your presentation date, please arrange a swap with another student and notify the instructor and the TA at least two weeks in advance.
Introduction
Date | Presenter | Paper title | Slides |
Jan. 16 | Alyosha Efros | Introduction, Vision: Measurement vs. Perception Administrative stuff, overview of the course, datasets |
Intro ppt |
Jan. 18 | Alyosha Efros | Overview lecture on theories of Visual Perception 1. Cavanagh, P. (1995) Vision is getting easier every day 2. Cavanagh, P. (1991) What's up in top-down processing? 3. Cavanagh, P. (2005) The Artist as Neuroscientist Suggested reading: Nakayama, K. (1998) Vision fin-de-siecle - a reductionistic explanation of perception for the 21st century? |
Theories ppt |
Jan. 23 | Alyosha Efros | Overview lecture on the physiology of vision 4. Adelson, E.H. & Bergen, J.R. (1991) The Plenoptic Function and the Elements of Early Vision |
Physiology ppt |
Part 1: Images
Learning Features from Data
Date | Presenter | Paper title | Slides |
Jan. 25 | Byron Evaluator: Eakta |
5. Olshausen, B. & Field, D. (1996) Wavelet-like receptive fields emerge from a network that learns sparse codes for natural images, Nature (Byron)
|
Coming soon... |
Jan. 30 | Byron Andrew Evaluator: Eakta |
We will first finish the Olshausen & Field paper from last class. 6. Serre, T., Wolf, L. Poggio, T. (2005) Object recognition with features inspired by visual cortex, CVPR (Andrew)
|
Serre ppt |
Distributions of Features
Date | Presenter | Paper title | Slides |
Feb. 1st | Frederik Jean-Francois |
7. Rubner, Y., Tomasi, C. and Guibas, L.J. (2000) The Earth Mover's Distance as a Metric for Image Retrieval, IJCV (Frederik)
|
Rubner ppt Martin pdf |
Images as Texture ("Bag of Words" models)
Date | Presenter | Paper title | Slides |
Feb. 6 | Alyosha | 9. Renninger, L.W. & Malik, J. (2004) When is scene recognition just texture recognition?, Vision Research (Alyosha)
10. Csurka, G., Bray, C., Dance, C., and Fan, L. (2004) Visual categorization with bags of keypoints (Alyosha) 11. Winn, J., Criminisi, A. and Minka, T. (2005) Object Categorization by Learned Universal Visual Dictionary (Alyosha) |
Coming soon... |
Images as Scenes
Date | Presenter | Paper title | Slides |
Feb. 8 | Sebastian | 12. Torralba, A. and Oliva, A. (2003) Statistics of Natural Image Categories, Network: Computation in Neural Systems (Sebastian) 13. Torralba, A. and Oliva, A. (2002) Depth estimation from image structure, PAMI (Sebastian) 14. Oliva, A. and Torralba, A. (2001) Modeling the shape of the scene: a holistic representation of the spatial envelope, IJCV (Sebastian) |
Gist pdf |
Images as Feature Vectors
Date | Presenter | Paper title | Slides |
Feb. 13 | Google talk! (Henry Rowley) | ||
Feb. 15 | Alyosha | 15. Roweis, S. & Saul, L. (2000) Nonlinear dimensionality reduction by locally linear embedding, Science (Presenter: Alyosha, Evaluator: Ankur) 16. Tenenbaum, J.B., De Silva, V. and Langford, J.C. (2000) A global geometric framework for nonlinear dimensionality reduction, Science (Presenter: Alyosha, Evaluator: Ankur) | Manifolds ppt |
Feb. 20 |
Devi Evaluator: Ankur |
Ankur will evaluate papers 15 and 16 Additional applications
|
Isomap applications ppt |
Feb. 22 | Ralph | 17. Tenenbaum & Freeman (2000) Separating Style and Content with Bilinear Models, Neural Computation (Ralph) | Coming soon... |
Image matching (Distance Transforms)
Date | Presenter | Paper title | Slides |
Feb. 27 |
Alyosha Evaluator: Minh |
18. Learned-Miller, E. (2005) Data Driven Image Models through Continuous Joint Alignment, PAMI (Alyosha)
|
Registration ppt |
Mar. 1 |
Ankur Evaluator: Byron |
19. Huttenlocker, Klanderman, G. and Rucklidge, W. (1993) Comparing Images Using the Hausdorff Distance, PAMI (Ankur) 20. Borgefors, G. (1988) Hierarchical Chamfer Matching: A Parametric Edge Matching Algorithm, PAMI (Ankur)
|
Comparison ppt |
Image Correspondence (Caltech-101-fest!)
Date | Presenter | Paper title | Slides |
Mar. 6 |
Ross Alyosha |
21. Zhang, H., Berg, A., Maire, M. and Malik, J. (2006) SVM-KNN: Discriminative Nearest Neighbor Classification for Visual Category Recognition, CVPR (Ross) 22. Frome, A., Singer, Y. and Malik, J. (2006) Image Retrieval and Recognition Using Local Distance Functions, NIPS (to appear) (Ross) 23. Berg, A., Berg, T. and Malik, J. (2005) Shape Matching and Object Recognition using Low Distortion Correspondences, CVPR (Alyosha)
|
SVM-KNN ppt |
Mar. 8 | Special lecture by Andrew Zisserman! |
Lots of Data is Fun!
Date | Presenter | Paper title | Slides |
Mar. 13 | No class: Spring break! | ||
Mar. 15 | No class: Spring break! | ||
Mar. 20 |
Hongwen Ross |
24. Lazebnik, S., Schmid, C. and Ponce, J. (2006) Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories, CVPR (Ross)
28. Nistér, D. and Stewénius, H. (2006) Scalable Recognition with a Vocabulary Tree (Hongwen) |
Coming soon... |
Mar. 22 | Alyosha Ralph |
25. Zitnik & Kanade (2003) Content-free image retrieval, unpublished (Alyosha) 26. Berg, T., Berg, A., Edwards, J., Maire, M., White, R, Teh, R.Y., Learned-Miller, E. and Forsyth, D.A. (in submission) Names and Faces (Ralph) |
Coming soon... |
Mar. 27 | Devi Jean-Francois |
27. Dalal and Triggs (2005) Histograms of Oriented Gradients for Human Detection, CVPR (Devi)
29. Snavely, N., Seitz, S.M. and Szeliski, R. (2006) Photo tourism: Exploring photo collections in 3D, SIGGRAPH, (webpage) (Jean-Francois) |
Coming soon... |
Boosting Background
Date | Presenter | Paper title | Slides |
Mar. 29 | Sebastian Minh |
30. AdaBoost background (Sebastian) 31. Friedman, J. H., Hastie, T. and Tibshirani, R. (1998) Additive Logistic Regression: a Statistical View of Boosting (Sebastian) 32. Schneiderman, H. and Kanade, T. (2004) Object Detection Using the Statistics of Parts, IJCV (Presenter: Minh, Evaluator: Andrew) 33. Viola, P. and Jones (2001) Robust Real-time Object Detection, Second International Workshop on Statistical and Computational Theories of Vision (Presenter: Minh, Evaluator: Andrew) |
Obj. detection ppt Evaluation ppt |
Part 2: Objects and Parts
Segmentation
Date | Presenter | Paper title | Slides |
Apr. 3-5 |
Alyosha Fred |
34. Wertheimer, M. (1923) Laws of Organization in Perceptual Forms (Alyosha) 35. Weiss, Y. (1999) Segmentation using eigenvectors: a unifying view, ICCV (Fred) 36. Ng, A.Y., Jordan, M.I. and Weiss, Y. (2001) On Spectral Clustering: Analysis and an algorithm, NIPS (Fred) |
Coming soon... |
Apr. 10 | Ross Jean-Francois Evaluator: Hongwen | 37. Tu and Zhu (2002) Image Segmentation by Data-Driven Markov Chain Monte Carlo, PAMI (Ross) | Coming soon... |
Apr. 12 |
Jean-Francois Evaluator: Hongwen |
38. Boykov and Jolly (2001) Interactive Graph Cuts for Optimal Boundary & Region Segmentation of Objects in ND Images, ICCV (Jean-Francois)
|
Coming soon... |
Grouping Repeated Structures
Date | Presenter | Paper title | Slides |
Apr. 17 |
Ankur Eakta |
39. Boiman, O. and Irani, M (2006), Similarity by Composition, NIPS (Ankur) | Coming soon... |
Apr. 19 | No classes (from academic calendar) | ||
Apr. 24 |
Eakta Alyosha Evaluator: Fred |
40. Kannan, A., Winn, J. and Rother, C. (2006) Clustering appearance and shape by learning jigsaws, NIPS (Eakta) 41. Ren, X. and Malik, J. (2003) Learning a Classification Model for Segmentation, ICCV
|
Coming soon... |
From Features to Objects
Date | Presenter | Paper title | Slides |
Apr. 26 | Hongwen |
44. Torralba, A., Murphy, K.P. and Freeman, W.T (in press) Sharing visual features for multiclass and multiview object detection, PAMI (Hongwen) 45. Opelt, A., Pinz, A, Zisserman, A. (2006) Incremental learning of object detectors using a visual shape alphabet, CVPR 46. Ferrari, V., Fevrier, L., Jurie, F. and Schmid, C. (2006) Groups of Adjacent Contour Segments for Object Detection, INRIA Technical Report 47. Leibe, B., Leonardis, A. and Schiele, B. (2004) Combined Object Categorization and Segmentation with an Implicit Shape Model, ECCV'04 Workshop on Statistical Learning in Computer Vision (Hongwen) 48. Leibe, B., Seemann, E. and Schiele, B. (2005) Pedestrian Detection in Crowded Scenes, CVPR |
Coming soon... |
Scenes, Context, and Image Parsing
Date | Presenter | Paper title | Slides |
May 1 |
Byron Alyosha |
66. Saxena, A., Chung, S. and Ng, A.Y. (2005) Learning Depth from Single Monocular Images, NIPS (Byron) 64. Hoiem, D., Efros, A.A. and Hebert, M. (2005) Geometric Context from a Single Image, ICCV (Alyosha) 67. Tu, Z., Chen, X., Yuille, A. and Zhu, S.C. (2005) Image Parsing: Unifying Segmentation, Detection, and Recognition, IJCV 68. Ren, X., Fowlkes, C. and Malik, J. (2006) Figure/Ground Assignment in Natural Images, ECCV 69. Cornelis, N., Leibe, B., Cornelis, K. and Van Gool, L. (2006) 3D City Modeling Using Cognitive Loops, 3DPVT |
Coming soon... |
Face Modeling / Recognition
Date | Presenter | Paper title | Slides |
May 3 |
Andrew Minh Evaluator : Ralph |
70. Sinha, P., Balas, B.J., Ostrovsky, Y., and Russell, R. (under review) Face recognition by humans: 20 results all computer vision researchers should know about (Andrew) 71. Cootes, T.F., Edwards, G.J. and Taylor, C.J. (1998) Active Appearance Models, ECCV (Minh)
|
Coming soon... |
Final project presentations
Date | Informations |
May 7 | The presentations will be from 1:00 to 4:00 pm. The location is PH226A (that's Porter Hall). See here for updated information from the HUB (search for 16721). |
Similar Courses
This course has been inspired by these offered by several of my colleagues. Here is a partial list:- Selected Topics in Vision & Learning (Serge Belongie, UCSD)
- Learning and Inference in Vision (Bill Freeman, MIT)
- Object Recognition (Kristen Grauman, Texas-Austin)
- High-level Recognition in Computer Vision (Fei-Fei Li, Princeton)
- Recognizing People, Objects, and Scenes (Jitendra Malik, Berkeley)
- Recognition Problems in Computer Vision (Greg Mori, SFU)
- Scene Understanding Seminar (Aude Oliva, MIT)
- Visual Recognition (Pietro Perona, CalTech)
- Vision and Learning (Jianbo Shi, UPenn)
- CMU VASC Seminar (Spring 2007)
- Sicily Workshop on Category-level Object Recognition (2006)
- IMA Visual Learning and Recognition Workshop (2006)
- MSRI Visual Recognition Workshop (2006)
- Scene Understanding Symposium SUnS'06 (2006)
- Recognizing and Learning Object Categories (ICCV 2005 Tutorial)