Date: 2026-01-30 Estimated Reading Time: 30 min Author: Yuan Zhang

Imitation Learning (IL) and Inverse Reinforcement Learning (IRL) address the problem of learning behavior from expert demonstrations, especially when defining an explicit reward function is difficult.

[Papers: HLFH, GAIL]


1. Problem Setup

Given expert trajectories: [ \tau_E = {(s_t, a_t)} ]

Goal:

  • Learn a policy ( \pi(a s) ) (IL)
  • Or infer a reward function ( r(s,a) ) (IRL)

2. Behavioral Cloning (BC)

BC treats imitation as supervised learning: [ \min_\theta \mathbb{E}{(s,a)\sim \tau_E} [| \pi\theta(s) - a |^2] ]

Pros

  • Simple
  • Stable

Cons

  • Covariate shift
  • Error accumulation

3. Inverse Reinforcement Learning

IRL assumes:

The expert is (near-)optimal under an unknown reward.

Classic IRL:

  • Maximum entropy IRL
  • Feature matching

Main issue:

  • Reward ambiguity (reward shaping equivalence)

4. Adversarial Imitation Learning

4.1 GAIL

GAIL learns a discriminator: [ D(s,a) ]

The policy is trained to fool the discriminator, similar to GANs.

Interpretation:

  • Implicit reward learning
  • Avoids explicit reward engineering

4.2 AIRL

AIRL introduces a structured reward: [ r(s,a) = f_\theta(s,a) + \gamma h(s’) - h(s) ]

Benefits:

  • Reward transferability
  • Better interpretability

5. Practical Training Strategies

  • BC warm-start + GAIL fine-tuning
  • Off-policy adversarial IL
  • Hybrid IL + RL pipelines

6. When to Use IRL?

Good fit:

  • Reward is ambiguous
  • Transfer across environments is required

Poor fit:

  • Dense, well-defined rewards
  • Limited expert data

7. Key References

  • Ho & Ermon, GAIL
  • Fu et al., AIRL
  • Ziebart, Maximum Entropy IRL

8. Open Problems

  • Sample-efficient IL
  • Multi-agent imitation
  • Foundation models for IL