Diffusion-ES: Gradient-free Planning with Diffusion for Autonomous Driving and Zero-Shot Instruction Following

Read original: arXiv:2402.06559 - Published 7/18/2024 by Brian Yang, Huangyuan Su, Nikolaos Gkanatsios, Tsung-Wei Ke, Ayush Jain, Jeff Schneider, Katerina Fragkiadaki

👀

Overview

Diffusion models excel at modeling complex and multimodal trajectory distributions for decision-making and control
Reward-gradient guided denoising has been proposed to generate trajectories that maximize both a differentiable reward function and the likelihood under the data distribution captured by a diffusion model
This method has limitations as it requires a differentiable reward function fitted to both clean and noised samples
The paper proposes DiffusionES, a method that combines gradient-free optimization with trajectory denoising to optimize black-box non-differentiable objectives while staying in the data manifold

Plain English Explanation

Diffusion models are a type of AI system that can capture the complexity and diversity of real-world trajectory data, like the paths a self-driving car might take. This makes them useful for decision-making and control tasks.

Previous work has tried to use diffusion models to generate trajectories that not only match the patterns in the training data, but also maximize a reward function - a mathematical way of describing how "good" the trajectory is. However, this approach has limitations because it requires the reward function to be differentiable, meaning it needs to be a smooth, continuous function that can be easily optimized.

In this paper, the researchers propose a new method called DiffusionES that overcomes this limitation. DiffusionES combines gradient-free optimization (a way of searching for good solutions without relying on gradients or derivatives) with a process called "trajectory denoising" (removing noise from the generated trajectories to keep them realistic). This allows DiffusionES to optimize non-differentiable, "black-box" reward functions - functions that may be complex, discontinuous, or even unknown.

The key idea is to sample candidate trajectories from a diffusion model, score them using a black-box reward function, and then "mutate" the high-scoring trajectories using a truncated diffusion process (applying a small number of noising and denoising steps). This allows for efficient exploration of the solution space without wandering too far from the data manifold (the set of realistic trajectories).

Technical Explanation

The paper introduces DiffusionES, a method that combines gradient-free optimization with trajectory denoising to optimize black-box non-differentiable objectives while staying in the data manifold.

The key steps of DiffusionES are:

Sample candidate trajectories from a pre-trained diffusion model, which can capture the complex and multimodal distribution of real-world trajectories.
Score the sampled trajectories using a black-box reward function, which may be non-differentiable and difficult to optimize.
"Mutate" the high-scoring trajectories by applying a truncated diffusion process - a small number of noising and denoising steps. This allows for efficient exploration of the solution space while keeping the trajectories close to the data manifold.

The researchers show that DiffusionES achieves state-of-the-art performance on the nuPlan benchmark for autonomous driving, outperforming existing sampling-based planners, reactive deterministic or diffusion-based policies, and reward-gradient guidance methods.

Additionally, the paper demonstrates that DiffusionES can optimize non-differentiable language-shaped reward functions generated by few-shot prompting of large language models (LLMs). This allows the method to generate novel, highly complex behaviors, such as aggressive lane weaving, that are not present in the training data, solving the hardest nuPlan scenarios.

Critical Analysis

The paper presents a compelling approach to trajectory optimization using diffusion models and gradient-free optimization. The key strength of DiffusionES is its ability to handle non-differentiable, black-box reward functions, which greatly expands the types of objectives that can be optimized.

However, the paper does not address some potential limitations and areas for further research:

Scalability: The performance of DiffusionES on the nuPlan benchmark is impressive, but it's unclear how well the method would scale to more complex, high-dimensional trajectory optimization problems.
Interpretability: As with many deep learning-based methods, the inner workings of DiffusionES may be difficult to interpret, which can be a concern for safety-critical applications like autonomous driving.
Robustness: The paper does not explore the sensitivity of DiffusionES to factors like noisy or incomplete reward functions, or its ability to generalize to unseen environments or task variations.

Further research could investigate ways to address these limitations, such as exploring physics-informed diffusion models or tuning-free alignment techniques to improve scalability and interpretability.

Conclusion

The proposed DiffusionES method represents an important advance in trajectory optimization, enabling the use of complex, non-differentiable reward functions while maintaining realism and diversity in the generated trajectories. This flexibility opens up new possibilities for applications like autonomous driving, where desired behaviors may be difficult to capture in a simple, differentiable objective function.

The strong performance on the nuPlan benchmark and the ability to optimize language-shaped rewards suggest that DiffusionES could be a valuable tool for solving challenging decision-making and control problems. As the field of diffusion models continues to evolve, further research on scalability, interpretability, and robustness will be key to unlocking the full potential of this approach.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

👀

Diffusion-ES: Gradient-free Planning with Diffusion for Autonomous Driving and Zero-Shot Instruction Following

Brian Yang, Huangyuan Su, Nikolaos Gkanatsios, Tsung-Wei Ke, Ayush Jain, Jeff Schneider, Katerina Fragkiadaki

Diffusion models excel at modeling complex and multimodal trajectory distributions for decision-making and control. Reward-gradient guided denoising has been recently proposed to generate trajectories that maximize both a differentiable reward function and the likelihood under the data distribution captured by a diffusion model. Reward-gradient guided denoising requires a differentiable reward function fitted to both clean and noised samples, limiting its applicability as a general trajectory optimizer. In this paper, we propose DiffusionES, a method that combines gradient-free optimization with trajectory denoising to optimize black-box non-differentiable objectives while staying in the data manifold. Diffusion-ES samples trajectories during evolutionary search from a diffusion model and scores them using a black-box reward function. It mutates high-scoring trajectories using a truncated diffusion process that applies a small number of noising and denoising steps, allowing for much more efficient exploration of the solution space. We show that DiffusionES achieves state-of-the-art performance on nuPlan, an established closed-loop planning benchmark for autonomous driving. Diffusion-ES outperforms existing sampling-based planners, reactive deterministic or diffusion-based policies, and reward-gradient guidance. Additionally, we show that unlike prior guidance methods, our method can optimize non-differentiable language-shaped reward functions generated by few-shot LLM prompting. When guided by a human teacher that issues instructions to follow, our method can generate novel, highly complex behaviors, such as aggressive lane weaving, which are not present in the training data. This allows us to solve the hardest nuPlan scenarios which are beyond the capabilities of existing trajectory optimization methods and driving policies.

7/18/2024

Diffusion Models as Optimizers for Efficient Planning in Offline RL

Renming Huang, Yunqiang Pei, Guoqing Wang, Yangming Zhang, Yang Yang, Peng Wang, Hengtao Shen

Diffusion models have shown strong competitiveness in offline reinforcement learning tasks by formulating decision-making as sequential generation. However, the practicality of these methods is limited due to the lengthy inference processes they require. In this paper, we address this problem by decomposing the sampling process of diffusion models into two decoupled subprocesses: 1) generating a feasible trajectory, which is a time-consuming process, and 2) optimizing the trajectory. With this decomposition approach, we are able to partially separate efficiency and quality factors, enabling us to simultaneously gain efficiency advantages and ensure quality assurance. We propose the Trajectory Diffuser, which utilizes a faster autoregressive model to handle the generation of feasible trajectories while retaining the trajectory optimization process of diffusion models. This allows us to achieve more efficient planning without sacrificing capability. To evaluate the effectiveness and efficiency of the Trajectory Diffuser, we conduct experiments on the D4RL benchmarks. The results demonstrate that our method achieves $it 3$-$it 10 times$ faster inference speed compared to previous sequence modeling methods, while also outperforming them in terms of overall performance. https://github.com/RenMing-Huang/TrajectoryDiffuser Keywords: Reinforcement Learning and Efficient Planning and Diffusion Model

7/24/2024

Derivative-Free Guidance in Continuous and Discrete Diffusion Models with Soft Value-Based Decoding

Xiner Li, Yulai Zhao, Chenyu Wang, Gabriele Scalia, Gokcen Eraslan, Surag Nair, Tommaso Biancalani, Aviv Regev, Sergey Levine, Masatoshi Uehara

Diffusion models excel at capturing the natural design spaces of images, molecules, DNA, RNA, and protein sequences. However, rather than merely generating designs that are natural, we often aim to optimize downstream reward functions while preserving the naturalness of these design spaces. Existing methods for achieving this goal often require ``differentiable'' proxy models (textit{e.g.}, classifier guidance or DPS) or involve computationally expensive fine-tuning of diffusion models (textit{e.g.}, classifier-free guidance, RL-based fine-tuning). In our work, we propose a new method to address these challenges. Our algorithm is an iterative sampling method that integrates soft value functions, which looks ahead to how intermediate noisy states lead to high rewards in the future, into the standard inference procedure of pre-trained diffusion models. Notably, our approach avoids fine-tuning generative models and eliminates the need to construct differentiable models. This enables us to (1) directly utilize non-differentiable features/reward feedback, commonly used in many scientific domains, and (2) apply our method to recent discrete diffusion models in a principled way. Finally, we demonstrate the effectiveness of our algorithm across several domains, including image generation, molecule generation, and DNA/RNA sequence generation. The code is available at href{https://github.com/masa-ue/SVDD}{https://github.com/masa-ue/SVDD}.

9/14/2024

Optimizing Diffusion Models for Joint Trajectory Prediction and Controllable Generation

Yixiao Wang, Chen Tang, Lingfeng Sun, Simone Rossi, Yichen Xie, Chensheng Peng, Thomas Hannagan, Stefano Sabatini, Nicola Poerio, Masayoshi Tomizuka, Wei Zhan

Diffusion models are promising for joint trajectory prediction and controllable generation in autonomous driving, but they face challenges of inefficient inference steps and high computational demands. To tackle these challenges, we introduce Optimal Gaussian Diffusion (OGD) and Estimated Clean Manifold (ECM) Guidance. OGD optimizes the prior distribution for a small diffusion time $T$ and starts the reverse diffusion process from it. ECM directly injects guidance gradients to the estimated clean manifold, eliminating extensive gradient backpropagation throughout the network. Our methodology streamlines the generative process, enabling practical applications with reduced computational overhead. Experimental validation on the large-scale Argoverse 2 dataset demonstrates our approach's superior performance, offering a viable solution for computationally efficient, high-quality joint trajectory prediction and controllable generation for autonomous driving. Our project webpage is at https://yixiaowang7.github.io/OptTrajDiff_Page/.

8/2/2024