In this talk, I will cover our latest research on 3D reconstruction and semantic scene understanding. To this end, we use modern machine learning techniques, in particular deep learning algorithms, in combination with traditional computer vision approaches. Specifically, I will talk about real-time 3D reconstruction using RGB-D sensors, which enable us to capture high-fidelity geometric representations of the real world. In a new line of research, we use these representations as input to 3D Neural Networks that infer semantic class labels and object classes directly from the volumetric input. In order to train these data-driven learning methods, we introduce several annotated datasets, such as ScanNet and Matterport3D, that are directly annotated in 3D and allow tailored volumetric CNNs to achieve remarkable accuracy. In addition to these discriminative tasks, we put a strong emphasis on generative models. For instance, we aim to predict missing geometry in occluded regions, and obtain completed 3D reconstructions with the goal of eventual use in production applications. We believe that this research has significant potential for application in content creation scenarios (e.g., for Virtual and Augmented Reality) as well as in the field of Robotics where autonomous entities need to obtain an understanding of the surrounding environment.