Richard Szeliski

Microsoft Research Redmond

Weaving the World's Photos into a 3D Web

Date: Wednesday, Sep. 22 13:45 - 14:45


Session-Chair: Michael Goesele



The explosion of imagery available on the Internet has opened up a host of new applications in computer vision, image-based modeling, and image-based rendering. It is now possible to automatically reconstruct 3D models of heavily photographed scenes and objects such as tourist locations, and to recognize these from novel images such as cell phone queries. In this talk, I survey some of the work in this field, starting with the Photo Tourism image-based modeling and navigation system, and then discussing the complexity issues (and solutions) engendered by the huge scale of these datasets. I also discuss work in interactive and automated 3D modeling, with a particular emphasis on architectural reconstruction, and location recognition in urban environments.


Richard Szeliski is a Principal Researcher at Microsoft Research, where he leads the Interactive Visual Media Group. He is also an Affiliate Professor at the University of Washington, and is a Fellow of the ACM and IEEE. Dr. Szeliski pioneered the field of Bayesian methods for computer vision, as well as image-based modeling, image-based rendering, and computational photography, which lie at the intersection of computer vision and computer graphics. His most recent research on Photo Tourism and Photosynth is an exciting example of the promise of large-scale image-based rendering.

Dr. Szeliski received his Ph.D. degree in Computer Science from Carnegie Mellon University, Pittsburgh, in 1988 and joined Microsoft Research in 1995. Prior to Microsoft, he worked at Bell-Northern Research, Schlumberger Palo Alto Research, the Artificial Intelligence Center of SRI International, and the Cambridge Research Lab of Digital Equipment Corporation. He has published over 150 research papers in computer vision, computer graphics, medical imaging, neural nets, and numerical analysis, as well as the books Bayesian Modeling of Uncertainty in Low-Level Vision and Computer Vision: Algorithms and Applications. He was a Program Committee Chair for ICCV'2001 and the 1999 Vision Algorithms Workshop, served as an Associate Editor of the IEEE Transactions on Pattern Analysis and Machine Intelligence and on the Editorial Board of the International Journal of Computer Vision, and is a Founding Editor of Foundations and Trends in Computer Graphics and Vision.


Yair Weiss

The Hebrew University of Jerusalem

Learning and Inference in Low-Level Vision

Date: Thursday, Sep. 23 09:00 - 10:00


Session-Chair: Stefan Roth



Low level vision addresses the issues of labeling and organizing image pixels according to scene related properties - such as motion, contrast, depth and reflectance. I will describe our attempts to understand low-level vision in humans and machines as optimal inference given the statistics of the world. In particular, I will show how message passing algorithms allow us to solve real-world instances of NP-hard problems and to efficiently learn energy functions despite an exponential number of constraints.


Andrew Zisserman

University of Oxford

Human Focussed Video Analysis

Date: Friday, Sep. 24 09:00 - 10:00


Session-Chair: Bernt Schiele



Determining the pose and actions of humans is one of the central problems of image and video analysis. The visual problem is challenging because humans are articulated animals, wear loose and varying clothing, self-occlude themselves, and stand against difficult and confusing backgrounds. Nevertheless, the area has seen great progress over the last decade due to advances in modelling, learning, and in the efficiency of algorithms. We describe approaches for recognizing human actions and inter-actions, and for determining 2D upper body pose. Results will be shown for various TV videos and feature films, and applications demonstrated for (i) learning the gestures of sign language, and (ii) pose based video retrieval.