Deep into 3DV: Pushing the Boundaries of 3D Vision
3D computer vision has significantly advanced over the past several decades, with modern algorithms successfully reconstructing entire urban cities. However, many questions remain unexplored, as geometric reasoning alone cannot fully infer the connections among images capturing different parts of the scene or semantic relationships between images captured at distant geographic locations.
In this talk, I will present an ongoing line of research that leverages powerful deep networks to address new and exciting problems in 3D vision. Considering a single 3D scene, we ask: Can we estimate the relative camera rotation between a pair of images in an extreme setting, where the images have little to no overlap? We address this seemingly impossible task by designing a neural network that can implicitly reason about hidden cues, such as vanishing points and direction of shadows. Expanding beyond a single scene, we jointly analyze dozens of 3D-augmented collections and connect them to a new domain: language. We demonstrate how a joint learned model that considers language, images, and 3D geometry can reason about the rich semantics associated with complex architectural landmarks. Finally, I will discuss several future directions.
Last Updated Date : 08/12/2020