Invited Speakers

Donald Geman

Johns Hopkins University, Baltimore

Image Interpretation by Entropy Pursuit

Image interpretation, which is effortless and instantaneous for people, is one of the grand challenges of artificial intelligence. The dream is to build a "description machine" which produces a rich semantic annotation of the underlying scene, including the names and poses of the objects that are present, as well as recognizing actions and context. Mathematical frameworks are advanced from time to time, but none is yet widely accepted, and none clearly points the way to closing the gap with natural vision. After reviewing the general situation, I will outline an approach inspired by the efficiency of the divide-and-conquer strategy in games like ``twenty questions'' and by selective attention in natural vision. This leads to an information-theoretic, model-based framework for determining what evidence to acquire from multiple scales, locations and semantic resolutions, and for coherently integrating the evidence by updating likelihoods.

Yann LeCun

New York University, New York

Learning visual feature hierarchies

Intelligent perceptual tasks such as vision and audition require theconstruction of good internal representations. Theoretical andempirical evidence suggest that the perceptual world is bestrepresented by a multi-stage hierarchy in which features in successivestages are increasingly global, invariant, and abstract. An importantchallenge for Machine Learning and Pattern Recognition is to devise"deep learning" methods for multi-stage architecture than can automatically learn good feature hierarchies from labeled andunlabeled data.

We will demonstrate the use of deep learning methods, based onunsupervised sparse coding, to train convolutional network (ConvNets). ConvNets are biologically-inspired architecturesconsisting of multiple stages of filter banks, interspersed withnon-linear operations, and spatial pooling operations.

A number of applications will be shown through videos and live demos, including a category-level object recognition system that can betrained on the fly, a pedestrian detector, a system that recognizeshuman activities in videos, and a trainable vision system for off-roadmobile robot navigation. Specialized hardware architecture thatimplement these algorithms will also be described.

Further keynote presentations have been arranged by the GfKl conference ongoing in parallel.