Volume 37, Issue 10 pp. 8046-8067
RESEARCH ARTICLE

Beyond backpropagate through time: Efficient model-based training through time-splitting

Jiaxin Gao

Jiaxin Gao

School of Mechanical Engineering, University of Science and Technology Beijing, Beijing, China

Jiaxin Gao and Yang Guan contributed equally to this work.

Search for more papers by this author
Yang Guan

Yang Guan

School of Vehicle and Mobility, Tsinghua University, Beijing, China

Jiaxin Gao and Yang Guan contributed equally to this work.

Search for more papers by this author
Wenyu Li

Wenyu Li

College of Artificial Intelligence, Nankai University, Tianjin, China

Search for more papers by this author
Shengbo Eben Li

Corresponding Author

Shengbo Eben Li

School of Vehicle and Mobility, Tsinghua University, Beijing, China

Jiaxin Gao and Yang Guan contributed equally to this work.

Correspondence Shengbo Eben Li, School of Vehicle and Mobility, Tsinghua University, 100084 Beijing, China.

Email: [email protected]

Search for more papers by this author
Fei Ma

Fei Ma

School of Mechanical Engineering, University of Science and Technology Beijing, Beijing, China

Search for more papers by this author
Jianfeng Zheng

Jianfeng Zheng

Urban Transportation Division, DiDi Chuxing, Beijing, China

Search for more papers by this author
Junqing Wei

Junqing Wei

Urban Transportation Division, DiDi Chuxing, Beijing, China

Search for more papers by this author
Bo Zhang

Bo Zhang

Urban Transportation Division, DiDi Chuxing, Beijing, China

Search for more papers by this author
Keqiang Li

Keqiang Li

School of Vehicle and Mobility, Tsinghua University, Beijing, China

Search for more papers by this author
First published: 24 May 2022

Abstract

Model-based policy gradient (MBPG) has been employed to seek an approximate solution to the optimal control problem. However, there is coupling between adjacent states due to temporal dependencies, making the training time grow linearly with the time horizon. This paper reshapes the training process of MBPG with the time-splitting technique to establish a time-independent algorithm called Training Through Time-Splitting (T3S). First, copy the coupled variables to obtain two independent variables. Meanwhile, an extra variable together with an equivalence constraint is introduced for problem consistency. Then, the transformed problem divides into subproblems with carefully derived loss functions. Subproblems own decoupled variables and shared policy networks, which means they can be optimized concurrently. Guided by the algorithm design, this paper further proposes an asynchronous parallel training scheme to accelerate training efficiency. Numerical simulation shows that the T3S algorithm outperforms the MBPG algorithm by 83.6% in wall-clock time with a trajectory tracking task.

DATA AVAILABILITY STATEMENT

The data that support the findings of this study are available from the corresponding author upon reasonable request.

The full text of this article hosted at iucr.org is unavailable due to technical difficulties.