Reinforcement Learning & Imitation Learning in Robotics

Author

Sonny George
Dec 10 2024
ROS version: N/A

What is reinforcement learning?

Reinforcment Learning (RL) is a type of machine learning where an agent learns to make decisions by interacting with an environment. The agent receives feedback in the form of rewards or penalties, which it uses to learn a policy that maximizes the cumulative reward over time.

When is RL useful?

RL is useful for tasks requiring controls that are difficult to model with explicit, programmatic instructions. If a robot can be robustly controlled with reasonable hand-written instructions, it is very likely that RL should not be used.

How does RL work?

A "policy" is a mapping from environment states to actions. E.g.:

chess board state -> POLICY -> move to make

There are many different algorithms for learning policies (e.g. Q-learning, DDPG, PPO) that are suited to different types of problems.

Of course, the agent needs to interact with the environment to get (or not get) reward signal.

However, learning from scratch (a randomly initialized policy), it can be very sample inefficient to learn good behavior, and it can require a lot of improbable "lucky guessing" to ever get a reward signal to learn from. This is especially true in robotic control, where the state space is high-dimensional and the reward signal is sparse.

What is imitation learning?

Imitation Learning

Imitation learning mitigates this problem by building upon demonstrations of desired behavior provided by a human or another expert. Instead of starting from a randomly initialized policy, imitation learning initializes the agent's policy based on these demonstrations. This approach allows the agent to learn more efficiently, especially in tasks where exploration from scratch would be too costly, time-consuming, or unsafe.

Types of imitation learning

There are several methods of imitation learning, including:

Behavior Cloning (BC):
In behavior cloning, the agent learns to mimic the actions of the expert by training a supervised model on the demonstration dataset. This method works well when the expert's demonstrations are comprehensive and cover the task's state space adequately. However, it is susceptible to compounding errors if the agent encounters states not covered in the demonstrations.
Inverse Reinforcement Learning (IRL):
In IRL, the agent infers the reward function that the expert appears to be optimizing and uses it to train its policy via reinforcement learning. This method is powerful because it focuses on learning the underlying intent of the behavior, better allowing the agent to generalize to states not explicitly demonstrated by the expert.