DAGM GCPR | 2025
DAGM German Conference on Pattern Recognition, Freiburg

Keynote Talks
Venkatesh Babu Radhakrishnan
Indian Institute of Science (IISc), Bangalore

Title: Towards Fair and Controllable Diffusion Models
Time: 11:00 am to 12:00 noon. Wednesday, 24.09.2025
Abstract: Diffusion models have transformed text-to-image generation, but challenges remain in fairness, representativeness, and user control. In this talk, we present some of our efforts that address these critical gaps. We begin by examining the demographic and geographic biases in popular generative models, showing over-representation of certain regions and attributes. To mitigate the biases in generative models, we propose distribution-guided debiasing methods that align outputs with desired attribute distributions without retraining, enabling fairer and more inclusive generations. Beyond fairness, we introduce fine-grained control mechanisms, enabling precise attribute editing and identity preservation, bridging realism with user-driven customization. We extend controllability to spatial reasoning with affordance-aware text-guided human placement, ensuring semantically plausible compositions, while the proposed zero-shot, depth-aware editing enables realistic scene modifications without additional supervision. We hope these contributions help in making the generative models that are equitable, transparent, and highly controllable for real-world applications.
Alex Kolesnikov
OpenAI

Title: A Journey Toward Unified Vision Models
Time: 2:30 pm to 3:30 pm. Wednesday, 24.09.2025
Abstract: Text‑only transformer models thrive on a single, simple interface: next‑token prediction. They then let scale and data do the heavy lifting. By contrast, models in the vision domain remain fragmented and are hindered by non‑trivial components such as box proposals, non‑maximum suppression, and matching losses. In this talk, I’ll share insights from my research journey toward simpler, more unified vision models.
I’ll trace a path through three projects. First, UViM shows how to express structured outputs, such as panoptic segmentation masks and depth maps, as discrete codes that a vanilla autoregressive transformer can generate. Next, I’ll dive into policy‑gradient RL fine‑tuning, addressing fundamental limitations of pure log‑likelihood training by optimizing the metrics we actually care about. Finally, I’ll introduce JetFormer, a decoder‑only autoregressive transformer capable of full end‑to‑end modeling of high‑resolution images.
Dima Damen
University of Bristol and Google DeepMind

Title: Opportunities in Egocentric Vision
Time: 10:30 am to 11:30 am. Thursday, 25.09.2025
Abstract: Forecasting the rise of wearable devices equipped with audio-visual feeds, this talk will present opportunities for research in egocentric video understanding. The talk argues for new ways to foresee egocentric videos as partial observations of a dynamic 3D world, where objects are out of sight but not out of mind. I’ll review new data collection and annotation HD-EPIC (https://hd-epic.github.io/) that merges video understanding with 3D modelling, showcasing current failures of VLMs in understanding the perspective outside the camera’s field of view — a task trivial for humans.
All projects details are at: dimadamen.github.io/index.html
Stefanie Jegelka
MIT EECS and TU Munich

Title: Does computational structure tell us about deep learning? Some thoughts and examples
Time: 3:30 pm to 4:30 pm. Thursday, 25.09.2025
Abstract: Understanding and steering deep learning training and inference is a nontrivial endeavor. In this talk, I will look at training, learning and inference from the perspective of computational structure, via a few diverse examples. First, computational structure may help understand expressiveness and biases in deep learning models. For instance, it can connect graph neural networks to SDPs, indicating their capability of learning optimal approximation algorithms. It can also help explain position biases in LLMs. Second, computational structure exists not only in the architecture but also in inference procedures such as chain-of-thought. Finally, if time permits, we will connect architectural structure via neural parameter symmetries to the training and loss landscape of deep models and explore the effect of removing symmetries.
Efstratios Gavves
University of Amsterdam and Ellogon.AI

Title: Cyberphysical World Models and Agents
Time: 9:00 am to 10:00 am. Friday, 26.09.2025
Abstract: Artificial intelligence has moved from passive perception to active interaction, yet current systems remain limited in their ability to reason about the physical and causal structure of the world. We propose cyberphysical world models as a new paradigm that unites perception with governing mechanisms of dynamics and causality. These models go beyond appearance-based representations by encoding the properties, interactions, and consequences that underpin real-world processes. Building on digital twins, mechanistic neural networks, and scalable causal learning, I will be describing a vision –and recent works—towards embodied agents that can predict outcomes, plan interventions, and adapt through curiosity-driven exploration. The resulting cyberphysical agents will be offering a pathway toward reliable, trustworthy, and autonomous systems, bridging the gap between data-driven learning and the physical world.