Invited Speakers

Prof. Reinhard Klette, University of Auckland, New Zealand
Stereo and motion analysis of long stereo image sequences for vision-based driver assistance
Prof. Josef Kittler, University of Surrey, UK
Information fusion in content-based retrieval from multimedia databases
Prof. Kyros Kutulakos, University of Toronto, Canada
High-performance photography with a conventional camera

Prof. Reinhard Klette: Stereo and motion analysis of long stereo image sequences for vision-based driver assistance

The .enpeda.. (Environment perception and Driver Assistance) project at the University of Auckland provides test data and evaluation methodology for comparing the performance of low-level vision processes (stereo, image features, motion, etc.). The generation of ground truth is an interesting aspect of this evaluation, with, for example, possibilities in `accurate post-processing' of recorded stereo sequences, or of pre-modeling of larger areas to be used for recording sequences with a test vehicle. Current evaluations also address more advanced subjects such as scene flow, object tracking, or free-space calculation. The .enpeda.. project is in close collaboration with Daimler AG, Germany (group of Dr. Uwe Franke).

Prof. Josef Kittler: Information fusion in content-based retrieval from multimedia databases

The retrieval of information from multimedia databases is a challenging problem because of the number of different concepts that may be of interest to the user and the multifaceted characteristics of each concept. The concept properties may span different sensing modalities and within each modality call for the use of a diverse set of features. Commonly, the retrieval problem is formulated as a detection problem (a two class pattern recognition problem), whereby the content of interest is looked for in the multimedia material and discriminated from the anti-concept class. The detectors are designed to capture the different manifestations of each concept class (colour, texture, shape, sounds). The design process is often hampered by small sample set and class imbalance problems.

The nature of the retrieval problem raises issues in information fusion. Both, feature level and decision level fusion provide useful mechanisms for tackling different aspects of the concept detector design process. At the feature level, the fusion is often accomplished with multi-kernel machine learning methods. The key question in this approach is how to weigh the contributions of the respective kernels. The weight allocation is normally controlled by regularisation. We discuss the effect of different norms on weight assignment. The findings lead to a two-stage machine learning strategy where the first stage serves simply as a means to eliminate non informative kernels. In contrast, decision level fusion is adopted for dealing with the class population imbalance problem. We show that by extreme under sampling of the negative (anti concept) class we can create a large number of weak classifiers, the fusion of which has the capacity to improve retrieval performance.

The techniques discussed are evaluated on standard benchmark databases, including PASCAL VOC 08 image data set and Mediamill Challenge video database, based on the NIST TRECVID 2005 benchmark. The performance is measured using average precision that combines precision and recall into one performance figure. The benefits of various fusion mechanisms are demonstrated.

Prof. Kyros Kutulakos: High-performance photography with a conventional camera

In this talk I will give the traditional camera a fresh, "computational" look: I will show that we can significantly boost the optical performance of a camera by slightly changing the way it captures photos: instead of taking a single snapshot at the press of a button, the camera should record a whole sequence of wide-aperture photos, corresponding to a special type of "focal stack". This sequence is then merged algorithmically into a final photo that the photographer sees.

By generalizing the traditional photographic concepts of "depth of field" and "exposure time" to the case of focal stacks, I will show that this type of photography has two performance advantages: (1) we can capture a given depth of field much faster than one-shot photography allows, and (2) we can significantly increase the quality (i.e., signal-to-noise ratio) of photos captured within a restricted exposure time. I will consider these advantages in detail and discuss their implications for photography.