Probabilistic Model-based Imitation Learning


Efficient skill acquisition is crucial for creating versatile robots. One intuitive way to teach a robot new tricks is to demonstrate a task and enable the robot to imitate the demonstrated behavior. This approach is known as imitation learning. Classical methods of imitation learning, such as inverse reinforcement learning or behavioral cloning, suffer substantially from the correspondence problem when the actions (i.e., motor commands, torques or forces) of the teacher are not observed or the body of the teacher differs substantially, e.g., in the actuation. To address these drawbacks we propose to train a robot-specific controller that directly matches robot trajectories with observed ones. We present a novel and robust probabilistic model-based approach for solving a probabilistic trajectory matching problem via policy search. For this purpose, we propose to learn a probabilistic model of the system, which we exploit for mental rehearsal of the current controller by making predictions about future trajectories. These internal simulations allow for learning a controller without continuously interacting with the real system, which results in a reduced overall interaction time. Using long-term predictions from this learned model, we train robot-specific controllers that reproduce the expert’s distribution of demonstrations without the need to observe motor commands during the demonstration. We show that our method achieves a higher learning speed than both model-based imitation learning based on dynamics motor primitives and trial-and-error based learning systems with hand-crafted reward functions. We demonstrate that our approach addresses the correspondence problem in a principled way. The strength of the resulting approach is shown by imitating human behavior using a tendon-driven compliant robotic arm, where we also demonstrate the generalization ability of our approach.

Adaptive Behavior