This talk will describe a framework for constructing simulation-based kernels for ultra data-efficient Bayesian optimization (BO). We consider challenging settings that allow only 10-20 hardware trials, with access to only mid/low-fidelity simulators. First, I will describe how we can construct an informed kernel by embedding the space of simulated trajectories into a lower-dimensional space of latent paths. Our sequential variational autoencoder handles large-scale learning from ample simulated data; its modular design ensures quick adaptation to close the sim-to-real gap. The approach does not require domain-specific knowledge. Hence, we are able to demonstrate on hardware that the same architecture works for different areas of robotics: locomotion and manipulation. For domains with severe sim-to-real mismatch, I will describe our variant of BO which ensures that discrepancies between simulation and reality do not hinder online adaption. Using task-oriented grasping as an example, I will demonstrate how this approach helps quick recovery in case of corrupted/degraded simulation. My longer-term research vision is to build priors from simulation without requiring a specific simulation scenario. So, I will conclude by providing the motivation for this direction, and will describe our initial work on ‘timescale fidelity’ priors. Such priors could help transfer-aware models and simulators to automatically adjust their timestep/frequency or planning horizon.