Computer graphics focuses on rendering high-quality 2D images from 3D scenes, with most research focusing on simulating elements of the physical world (such as light transport models or material simulation). However, the current physically-based rendering pipeline can be computationally expensive, and more importantly, not differentiable. Computer vision investigates the inference of scene properties from 2D images, and has achieved great success with the adoption of neural networks (NNs). However, NNs make few explicit assumptions about the physical world or how images are formed from it. Therefore, they still struggle in tasks that require 3D understanding such as novel-view synthesis, re-texturing or relighting. In this talk, I will present my recent work on combining the expressiveness of NNs and the knowledge of the physical world in the tasks of neural rendering and inverse rendering.