This on-going project is inspired by Levine et al. (2012). It aims to imitate human driving patterns. The rationale is that as long as other agents are human, autonomous vehicles should adopt “human-like” driving skills.
The assumption is that drivers act based on a utility (reward) function, which represents the preference of the driver and elicits the driving behavior. The role of inverse reinforcement learning (IRL) is to infer or discover the latent utility function from driver (expert) demonstrations and thus generalize the driving policy to unobserved situations.
Currently, we investigated learning longitudinal driving tasks. First, features (e.g., car-following gap and speed) related to reward function was extracted. Then, linear or non-linear parameters that are used to construct reward function were estimated by IRL, based on human demonstration data.
Future work will focus on 1) representing reward function with deep neural network, and 2) comparing IRL’s performance with that of Generative Adversarial Networks (GAN).
- Levine, S., & Koltun, V. (2012). Continuous inverse optimal control with locally optimal examples. arXiv preprint arXiv:1206.4617.