Volume 37, Issue 10 pp. 7334-7355
RESEARCH ARTICLE

An automatic learning rate decay strategy for stochastic gradient descent optimization methods in neural networks

Kang Wang

Kang Wang

National Laboratory for Parallel and Distributed Processing, School of Computer, National University of Defense Technology, Changsha, China

Search for more papers by this author
Yong Dou

Yong Dou

National Laboratory for Parallel and Distributed Processing, School of Computer, National University of Defense Technology, Changsha, China

Search for more papers by this author
Tao Sun

Corresponding Author

Tao Sun

National Laboratory for Parallel and Distributed Processing, School of Computer, National University of Defense Technology, Changsha, China

Correspondence Tao Sun, National Laboratory for Parallel and Distributed Processing, School of Computer, National University of Defense Technology, 410073 Changsha, China.

Email: [email protected]

Search for more papers by this author
Peng Qiao

Peng Qiao

National Laboratory for Parallel and Distributed Processing, School of Computer, National University of Defense Technology, Changsha, China

Search for more papers by this author
Dong Wen

Dong Wen

National Laboratory for Parallel and Distributed Processing, School of Computer, National University of Defense Technology, Changsha, China

Search for more papers by this author
First published: 31 March 2022
Citations: 20

Abstract

Stochastic Gradient Descent (SGD) series optimization methods play the vital role in training neural networks, attracting growing attention in science and engineering fields of the intelligent system. The choice of learning rates affects the convergence rate of SGD series optimization methods. Currently, learning rate adjustment strategies mainly face the following problems: (1) The traditional learning rate decay method mainly adopts manual manner during training iterations, the small learning rate produced from which causes slow convergence in training neural networks. (2) Adaptive method (e.g., Adam) has poor generalization performance. To alleviate the above issues, we propose a novel automatic learning rate decay strategy for SGD optimization methods in neural networks. On the basis of the observation that the convergence rate's upper bound enjoys minimization in a specific iteration concerning the current learning rate, we first present the expression of the current learning rate determined by historical learning rates. And merely one extra parameter is initialized to generate automatic decreasing learning rates during the training process. Our proposed approach is applied to SGD and Momentum SGD optimization algorithms, and concrete theoretical proof explains its convergence. Numerical simulations are conducted on the MNIST and Cifar-10 data sets with different neural networks. Experimental results show that our algorithm outperforms existing classical ones, achieving faster convergence rate, better stability, and generalization performance in neural network training. It also lays a foundation for large-scale parallel search of initial parameters in intelligent systems.

CONFLICTS OF INTEREST

The authors declare no conflicts of interest.

The full text of this article hosted at iucr.org is unavailable due to technical difficulties.