Improving Robotic Training with Goal-Conditioned Imitation Learning

Imitation Learning, where an agent learns how to perform a task based on a demonstration by another, is a common form of learning in robotics. Human training for robots is well-studied; it’s a mode of instruction that intuitively makes sense, even to non-experts. Humans, after all, learn by imitating other humans and replicating their behavior.

However, many current human-trained robotic machines face major challenges. For instance, they are often unable to outperform human experts. They’re also susceptible to problems if trained by humans with a lower than optimal level of expertise. Lastly, despite being good at replicating clear and demonstrated behavior, robotic systems often cannot generalize to new instances of the same task.

This last issue is a problem in environments like industrialized settings, where algorithms may need to adapt to new parts or business practices. Finding ways to train machines that do not rely solely on replicating human operators is a major objective in robotics.

A rising technique that does not require human demonstrations is Reinforcement Learning (RL). This branch of machine learning develops algorithms that learn to perform tasks by maximizing a given reward function. Unfortunately, designing reward functions that make RL yield specific desired behaviors is also challenging. Furthermore, RL algorithms are known to be very sample inefficient, making them less suitable for real-world applications.

Goal-GAIL

In work authored by Yiming Ding*, Carlos Florensa*, Pieter Abbee, and Intel AI deep learning data scientist Mariano Phielipp presented at NeurIPS 2019, we propose a new algorithm, goal-GAIL, that improves upon , an algorithm that is efficient at GAIL (Generative Adversarial Imitation Learning)learning from few demonstrations but struggles to generalize to new situations or contexts, especially in the goal-conditioned setting.

We show that GAIL can be successfully combined with Hindsight Experience Replay (HER) to overcome this issue. HER allows for systems to reach goals without the need for a specific goal or reward, which them more flexible, adaptable, and useful in a variety of circumstances. However, HER tends to be sample-inefficient. Our methods take advantage of GAIL’s abilities to learn quickly based on a few demonstrations at the beginning of a task, and HER’s abilities to generalize and learn new tasks. Systems taught by goal-GAIL learn initial policies quickly similar to how systems taught with GAIL do. However, they also retain the adaptability and flexibility of systems taught with HER, so they can rapidly learn the most common tasks but maintain the capacity for flexibility.

To test our work, we conducted experiments in four different environments, each with a robotic system pushing a block in a simple environment. We defined certain outcomes for each environment and monitored the arm’s performance after it was given different methods for learning.

Figure 1: Four different environments used in our experiments

Figure 1: Four different environments used in our experiments.

With goal-GAIL, a robot can potentially outperform an expert human. At the same time, it allows a robot to be able to reach many goals in a wide variety of situations. Our method is also useful in training robotic systems when only kinesthetic demonstrations are available.

A visualization of our results is below, with goal-GAIL outperforming both GAIL and HER.

Figure 2: Results from four different test environments

Figure 2: Results from four different test environments.

Practical Value for goal-GAIL

This has potential use for practical robotics. In settings like warehouses or factories, a worker training a robot for a task would not necessarily have to be an expert, thanks to the potential adaptive abilities our method provides. Our method, though, retains the efficiency associated with more traditional robotic training methods. Efficient training and adaptability do not necessarily have to be at odds with each other, and there is room for both in a robotic training model.

For more on this research, you can read our paper, check out our code, look for us at the 2019 NeurIPS conference, and stay tuned to @IntelAIResearch on Twitter.