Volume 2021, Issue 1 5815280
Research Article
Open Access

Short-Term Traffic Prediction considering Spatial-Temporal Characteristics of Freeway Flow

Jiaqi Wang

Jiaqi Wang

Department of Transportation Engineering, South China University of Technology, 381 Wushan Road, Guangzhou 510641, China scut.edu.cn

Search for more papers by this author
Yingying Ma

Corresponding Author

Yingying Ma

Department of Transportation Engineering, South China University of Technology, 381 Wushan Road, Guangzhou 510641, China scut.edu.cn

Search for more papers by this author
Xianling Yang

Xianling Yang

Department of Transportation Engineering, South China University of Technology, 381 Wushan Road, Guangzhou 510641, China scut.edu.cn

Search for more papers by this author
Teng Li

Teng Li

Department of Transportation Engineering, South China University of Technology, 381 Wushan Road, Guangzhou 510641, China scut.edu.cn

Search for more papers by this author
Haoxi Wei

Haoxi Wei

Department of Transportation Engineering, South China University of Technology, 381 Wushan Road, Guangzhou 510641, China scut.edu.cn

Search for more papers by this author
First published: 13 October 2021
Citations: 4
Academic Editor: Feng-Jang Hwang

Abstract

This paper presents a short-term traffic prediction method, which takes the historical data of upstream points and prediction point itself and their spatial-temporal characteristics into consideration. First, the Gaussian mixture model (GMM) based on Kullback–Leibler divergence and Grey relation analysis coefficient calculated by the data in the corresponding period is proposed. It can select upstream points that have a great impact on prediction point to reduce computation and increase accuracy in the next prediction work. Second, the hybrid model constructed by long short-term memory and K-nearest neighbor (LSTM-KNN) algorithm using transformed grey wolf optimization is discussed. Parallel computing is used in this part to reduce complexity. Third, some meaningful experiments are carried out using real data with different upstream points, time steps, and prediction model structures. The results show that GMM can improve the accuracy of the multifactor models, such as the support vector machines, the KNN, and the multi-LSTM. Compared with other conventional models, the TGWO-LSTM-KNN prediction model has better accuracy and stability. Since the proposed method is able to export the prediction dataset of upstream and prediction points simultaneously, it can be applied to collaborative management and also has good potential prospects for application in freeway networks.

1. Introduction

Intelligent transportation system (ITS) has become an effective way to reduce pollution and improves the performance of freeways, while the short-term traffic flow prediction is an important part to support the smart management and control of freeways. The trend of short-term traffic flow prediction is changing from parametric statistical models to nonparametric models and mixed models. Time-series methods were widely used in parametric statistical models, including exponential smoothing [13], moving average [4, 5], and autoregressive integrated moving average (ARIMA) model [68]. Kalman filtering was also used for traffic flow prediction, such as adaptive Kalman filter [911], hybrid dual Kalman filter [12], and noise-identified Kalman filter [13]. With the rapid development of ITS and improvement of data quality, more nonparametric prediction methods are used in the prediction of traffic flow. K-Nearest Neighbor (KNN) nonparametric regression, a nonlinear prediction method, was used to calculate Euclidean distance to find the nearest neighbor for prediction [14]. The improved Bayesian combination model was proposed to increase the accuracy of prediction [15]. Support vector machines (SVM) were also used considering the weak sensitivity to outliers [16]. The combined algorithm based on wavelet packet analysis and least square support vector machines was used to resolve the uncertainty and randomness of data [17]. Particle swarm optimization (PSO) and other optimization algorithms were applied to SVM because of small model calculation and good prediction performance [18]. With the development of artificial intelligence (AI), deep learning models have been widely used in traffic prediction. Smith and Demetsky [19] used backpropagation (BP) neural network to do the prediction. Optimization algorithms such as PSO and genetic algorithm (GA) were also applied to BP, and the effect is obvious [20, 21]. Recurrent neural network (RNN) can realize long-term memory calculation and was used in prediction, but it had the problem of gradient explosion [22]. Long short-term memory (LSTM) network was proposed to solve it by using a forget gate [23, 24], which was not only used in natural language processing [25], for example, language generation [26], text classification [27], and phoneme classification [28], but also in prediction fields, such as short-term traffic flow prediction [29], housing load prediction [30], and pedestrian trajectory prediction [31]. Furthermore, improvements and combinations with other models have been proposed in many fields, from application in large-scale data problems [32] to the prediction of traffic flow, such as using GA to optimize the LSTM hyperparameters to get better performance [33]. The comparison of typical machine learning models is shown in Table 1.

Table 1. Comparison of machine learning models in short-term traffic flow prediction.
The basic model Model performance Improved models
Backpropagation neural network (BP) Advantages [34] GA-BP [21], PSO-BP [20], EEMD-IAGA-BP [35]
(1) Self-adaptive self-learning
(2) Strong nonlinear mapping ability
(3) Strong generalization ability
Disadvantages [34]
(1) Slow convergence speed down
(2) Easy to fall into the local optimal solution
  
Least-squares support vector machines (LSSVM) Advantages [36] W-LSSVM [17], PSO-SVM [18], EMD-GPSO-SVM [37]
(1) Suitable for small datasets
(2) Simple and convenient
(3) Strong nonlinear generalization ability
Disadvantages [36]
(1) Unable to handle large datasets
(2) Sensitive to missing values
  
Long short-term memory (LSTM) Advantages [38] GA-LSTM [33], SVD-PSO-LSTM [39], KNN-LSTM [40]
(1) High precision
(2) Solve the problem of RNN gradient
Disadvantages [38]
(1) Time-consuming calculation
(2) Large consumption of hardware resources
(3) Too many parameters, easy to overfit

Deep learning models are widely used in traffic flow prediction, especially in short-term prediction [41]. However, traffic flow has strong spatial-temporal characteristics on time series [42, 43]. More attention was paid to this characteristic in recent years’ research of short-term traffic flow prediction [4446]. Luo et al. [40] proposed a spatial-temporal traffic flow prediction model with KNN and LSTM to screen highly correlated upstream points and produced the prediction. Ma et al. put forward a method to select input data for daily traffic flow forecasting through contextual mining and intraday pattern recognition [47] and produced the daily traffic flow forecasting with CNN and LSTM [48]. Supervisory learning was used to mine the relationship between the factors of historical data and current traffic flow to train the predictor in advance so as to reduce the predicting time [49]. In addition, the match-then-predict method [50] and the fuzzy hybrid framework [51] with dynamic weights by mining spatial-temporal correlations were both proposed. Attention mechanisms were also combined in LSTM to increase the accuracy of prediction [52]. These methods that combine various factors using attention mechanism can reasonably allocate limited resources, increase the efficiency, and reduce computation.

In this paper, we propose the short-term traffic flow prediction model considering the spatial-temporal characteristics using LSTM and KNN under the concept of attention mechanism. First, the Gaussian mixture model (GMM) is used to select the upstream detection points to produce the prediction. Two parameters are used for the classification: one is the Kullback–Leibler Divergence (KL), also known as the relative entropy, which reflects the difference in the distribution of two datasets through approximate calculations, especially for large-sample traffic data. The other is the grey relation analysis (GRA) coefficient, which reflects the correlation between two groups of normalized data after similarity analysis. Second, the hybrid model of LSTM and KNN is proposed to produce the prediction using the selected data. LSTM is used to predict the traffic flow of upstream points as the training dataset of KNN. To solve the problem of time lag, input time of upstream data is changed in the model according to the space distance between the input point and the prediction point and the average speed of traffic flow. Moreover, transformed grey wolf optimizer (TGWO) is used to optimize key parameters, and Savitzky-Golay (SG) filter smoothing is used to reduce the noise in the model to improve the performance. The proposed TGWO-LSTM-KNN prediction model in this paper gives greater consideration to the spatial-temporal characteristic of freeway traffic flow to improve the accuracy of prediction and reduces the complexity of computation by selecting and preprocessing input data.

The rest of this paper is organized as follows. Section 2 introduces the methodology of the proposed model. Section 3 carries out the experiments and analysis of the proposed model with real-world traffic flow data. Section 4 presents the conclusions and the prospects of the research. The abbreviations used in the rest of the paper are listed in Table 2.

Table 2. Abbreviation table for algorithm required in this paper.
Name Abbreviation
Long short-term memory LSTM
K-nearest neighbor algorithm KNN
Kullback–Leibler KL
Grey relation analysis GRA
Gaussian mixture model GMM
Expectation maximization EM
Transformed grey wolf optimization TGWO
Support vector machines SVM
Backpropagation neural network BP
Root mean square error RMSE
Mean absolute error MAE
Mean absolute percentage error MAPE

2. Methodology

2.1. Framework

This paper proposes TGWO-LSTM-KNN with the GMM classification model, which includes two parts: data preparation and prediction. GMM is used to choose the input data in the data preparation part considering the spatial-temporal characteristics of freeway traffic flow, while the prediction part is composed of LSTM parallel computing module, KNN module, and TGWO module. The framework of the proposed model is shown in Figure 1.

Details are in the caption following the image
Framework of TGWO-LSTM-KNN with GMM classification.

2.1.1. Data Preparation

In the freeway network, the traffic flow of upstream correlates with the prediction point flow, which is considered as spatial correlation. Moreover, the traffic flow of the prediction point changes over time inter- and intraday, and each specific period may have different patterns, such as the morning peak hour and the off-peak hour. Therefore, time series are divided into different parts according to the flow patterns, which will help to improve the accuracy of prediction. The temporal characteristic of prediction point flow is observed through a flow chart. Then set the midpoint of two adjacent extreme points and complete the time series division task.

Related upstream sections are analyzed and selected using GMM binary classification. In this paper, two parameters are used as the classification criteria. One is the KL divergence, which is a commonly used method in information science to quantify the difference between two datasets. In a large-sample traffic dataset with complex distribution, the difference can be simply and quickly reflected. The other is the GRA coefficient, which can analyze the linear similarity between two datasets through a small amount of data. These two parameters can well reflect the correlation of upstream sections and predicted sections. The steps of the classification part are as follows:
  • Step 1: use time step and speed to determine the space range. Note the upstream points within the scope as O1 ~ Om.

  •  Step 2: divide day time into T1 ~ Tη.

  •  Step 3: construct dataset. Divide the dataset into working days and nonworking days. Change the input time of upstream data to meet the time lag of the prediction point considering the distance and travel speed.

  •  Step 4: calculate the KL divergence and the GRA coefficient in Ti of working days and nonworking days.

  •  Step 5: input the KL divergence and the GRA coefficient into GMM for binary classification. O1 ~ Om are divided into two groups in each Ti. The group of points with the KL divergence close to 0 and the GRA coefficient close to 1 is used as the strongly related section of the prediction point for the next prediction.

2.1.2. Prediction

The prediction part consists of three modules, which are LSTM module, KNN module, and TGWO module. KNN is selected at the bottom of the model considering the spatial features of freeway flow with the advantages of fast calculation speed and no lag. LSTM is used to predict the short-term traffic flow of upstream sections and then put the prediction results of upstream sections into KNN to predict the prediction point traffic flow. Because the relationship among upstream points is ignored in the model, multithread LSTM parallel computing (LSTMs) is used to reduce the time consumption of prediction. Also, to improve the performance of LSTM-KNN, TGWO is used to optimize the parameters of LSTM and KNN.
  •  Step 1: use TGWO to optimize the steps and epochs in LSTM, K value in KNN.

  •  Step 2: multithread LSTM parallel computing is used to reduce calculation by ignoring the relationship among upstream points. Each Oi is input into the corresponding LSTM module and then the output Pi set and D0 together form a new dataset.

  •  Step 3: input the dataset into the KNN module to predict the traffic flow and output .

2.2. Data Preparation

There are three steps in data preparation: determination of spatial scope, time-series division, and GMM classification.

2.2.1. Determination of Spatial Scope

Since the time step of short-term traffic flow prediction (Tstep) is usually less than one hour and the highway speed (V) is limited, the radius range of spatial scope can be calculated:
(1)

The accesses and ramps within the radius of the prediction point centered are selected as upstream points.

2.2.2. Time Series Division

There are random fluctuations in daytime traffic flow, as shown in Figure 2. SG is a method to smooth data based on local least-squares polynomial approximation proposed by Savitzky and Golay [53], which is used to get the extreme traffic value points of the prediction point (D0) (see Figure 2). Set the point in the middle of the two adjacent extreme values, and each time series between two middle points is a time part (Ti) (see Figure 3).

Details are in the caption following the image
Traffic flow of the day before SG.
Details are in the caption following the image
Time series division diagram.

2.2.3. SG Calculation Method

SG includes two parameters, which are the window length  n  and the order number k. For the window length n, with the increasing of n, the deviation between the processed data and the real data increases, and also the smoothness. For order number k, with the increasing of k, the deviation between the processed data and the real data decreases, and also the smoothness. According to the characteristics of highway traffic flow and existing research, the choice of n and k in this paper is 31 and 1, respectively.

Input upstream points data Om = (om(1), om(2), …, om(x)), where x denotes the length of data. Select the window length n and order number k. The data in a window are . The fitting polynomial is obtained using the k − 1 least square method as follows:
(2)
Then form n equations to form k element equations. If n > k, equation has a solution, then
(3)

The matrix is expressed as

A  is the least square fitting solution of different windows, and the value is as follows:
(4)

Smoothing dataset is

2.2.4. GMM Classification

Gaussian mixture model (GMM) is a probabilistic model that assumes all the data points are generated from a mixture of a finite number of Gaussian distributions with unknown parameters [54], which is used to judge the correlation between upstream and predicted traffic flow, while EM is used to obtain the maximum likelihood estimation of GMM [55]. In this paper, KL-divergence and GRA coefficient are two parameters to do the binary classification.

(1) KL Divergence. KL divergence [56], which is also known as relative entropy, is broadly used as the measurement of the dissimilarity between two probabilistic models [57]. Define KL divergence between Oi = (oi(1), oi(2), …, oi(n)) and D0 = (d0(1), d0(2), …, d0(n)) as
(5)

The closer KL divergence is to 0, the more similar the two distributions are.

(2) GRA Coefficient. GRA is a method to judge the similarity of different datasets. Compared with traditional Pearson correlation [58], it can use a smaller amount of data to reflect the linear similarity between traffic flows [59]. The steps of GRA are as follows.

Since there is little difference in the magnitude of traffic flow at the same point, the data are initialized by dividing the initial value of the flow oi(1) and d0(1).
(6)
where oi(k)  denotes traffic flow of upstream point O1 in the k time and do(k) denotes traffic flow of prediction point  D0 in the k time.

Define the prediction sequence (d0(1)GRA, d0(2)GRA, …, d0(n)GRA) as DGRA and the factor sequence (oi(1)GRA, oi(2)GRA, …, oi(n)GRA)  as. OiGRA

Then calculate the relation coefficient in the k time
(7)
where ξ denotes the coefficient to control the degree of differentiation, which is generally 0.5 [58].
Define the mean value as GRA coefficient of D0  and Oi:
(8)
(3) GMM Classification. Let , . GMM is classified by calculating probability. Two-dimensional Gaussian mixture model is as follows:
(9)
where μk denotes expectation, Σk denotes covariance, n denotes data dimension, ℛ(x|μk, Σk) denotes k component in a hybrid model, ϑk denotes mixture coefficient and
(10)

Then use EM algorithm to calculate ϑ1, μ1, Σ1, ϑ2, μ2, Σ2.

EM-GMM pseudocode is as follows: (Algorithm 1)

    Algorithm 1: EM-GMM pseudocode.
  • function EM − GMM(k⟵2,  ϑ1, μ1, Σ1, ϑ2, μ2, Σ2, i⟵0)

  • doii + 1:

  • calculatep(x);

  • Estep:

  •   

  •   

  •   

  • until  ln  pi+1(x|ϑ, μ, Σ) − ln  pi(x|ϑ, μ, Σ) < T;

  • returnϑ1, μ1, Σ1, ϑ2, μ2, Σ2;

  • endfunction

2.3. TGWO-LSTM-KNN Hybrid Model

TGWO-LSTM-KNN hybrid model consists of LSTMs module, KNN module, and TGWO module. The specific process is shown in Figure 4.

Details are in the caption following the image
TGWO-LSTM-KNN with GMM classification flowchart.

The TGWO-LSTM-KNN model first splits the dataset. The traffic flow of  O1 ~ Om  is trained in the LSTMs module to predict future traffic flow P1 ~ Pm  in parallel. Then use traffic flow dataset consisting of  O1 ~ Om, P1 ~ Pm, and D0 to predict in the KNN module. The step and epochs of LSTM are the parameters with great influence [25], and the coefficient K in the KNN module is also one of the key parameters. Then use the TGWO module to optimize these parameters.

2.3.1. LSTMs Module

LSTM is a special RNN [24] with a forgetting gate. The sigmoid function is used to prevent gradient explosion and disappearance. The traffic flow data of upstream points are trained in different LSTM threads in parallel, which is defined as LSTMs. The data input time is also changed to reduce the lag of the module, which makes LSTM more accurate. The memory unit of the module is shown in Figure 5.

Details are in the caption following the image
LSTM memory unit [24].
The calculation processes of the memory unit om(n) and parameters (see Table 3) are as follows:
(11)
Table 3. LSTM parameters interpretation.
Symbol Meaning
The updated state of a memory cell
in, fn, pn, cn, hn Input gate, forgetting gate, output gate, memory cell, hidden layer output value
woc, wch The weight of om(n) and the hidden layer
om(n) The input of the n.th data at the m.th  upstream point
cn−1, hn−1 The output of the memory cell and the hidden layer at n − 1
woi, whi, wci The weight of the input gate of om(n), the hidden layer, and the memory cell
wof, whf, wcf The weight of the forgetting gate of om(n), the hidden layer, and the memory cell
wop, whp, wcp The weight of the output gate of om(n), the hidden layer, and the memory cell
bc, bi, bf, bp Offset item
tanh and σ Activation functions
″∘″ Dot product

2.3.2. KNN Module

In this paper, the KNN model is used to predict by using data of  O1 ~ Om, P1 ~ Pm, and D0. This method not only has high efficiency and less complexity but also can meet the needs of multifactor. Euclidean distance is calculated as follows [14]:
(12)
where dn denotes the Euclidean distance between Pi  in the current time and the  O(x)  vector x time, Pi denotes the current traffic flow vector of different upstream points (p1, p2, …, pm) in the prediction dataset, pi denotes the current value of the ith upstream point in Pi, O(x) denotes x time traffic flow vector of different upstream points (o1(x), o2(x), …, om(x)), oi(x) denotes the value of ith upstream point in O(x), and m denotes the number of upstream points.
To sort dx in ascending order, define K smallest O(x) as O(x1), O(x2), …, O(xk) The corresponding values of its prediction point are d0(x1), d0(x2), …, d0(xk). Predict with weighted method and define prediction set of at different time as .
(13)
where wi(xn) denotes weight and denotes prediction. LSTM-KNN pseudocode is shown in Algorithm 2.
    Algorithm 2: LSTM-KNN pseudocode.
  • function LSTM_KNN(step1, …, stepn, epochs1, …, epochsn, k):

  •  transform dataset into MinMaxScaler(featurerange = (0,  1));

  • spilt dataset[′On′]into On trainand On test;

  •  LSTMs − step:

  • thread1 : look_back⟵step1; reshape O1trainand O1testinto LSTMdatasetwith look_back;

  •   model1.add(LSTM(nunit));

  •   model1.dense(1);

  •   model1.fit(O1train, epochs1, batchsize⟵number of days);

  •   O1prediction⟵model1.predict(O1test);

  • returnO1prediction;

  •  ⋮

  • threadn : look_back⟵stepn; reshape On train and On test  into LSTMdataset with look_back;

  •   modeln.add(LSTM(nunit));

  •   modeln.dense(1);

  •   modeln.fit(On train, epochsn, batchsize⟵number of days);

  •   O1prediction⟵model1.predict(O1test);

  • returnOn prediction;

  • KNNstep:

  •   knntest⟵stack O1prediction, …, On prediction in sequence horizontally;

  •   knntrain⟵dataset;

  •   knn.fit(knntrain[′On′], knntrain[′D0′]);

  •   knnprediction⟵knn.predict(knntest, k);

  • restore knnprediction;

  • return knnprediction;

  • endfunction

The number of units in the hidden layer is nunit, and the data dimension D of LSTM in each thread is 1, so the time complexity is . On account of the parallel calculation structure, the total time does not change too much with the increasing number of threads. Compared with the original complexity , which reduces a lot of computation and improves efficiency. The time complexity of KNN is O(n), which is only related to the number of data, so the calculation speed is fast.

LSTM has good robustness in traffic prediction [60] and also improves performance by setting reasonable time lag [61] or forgetting layer. KNN can avoid large deviation directly because it calculates the closest Euclidean distance to produce the prediction [62]. These two positive aspects of robustness will support the proposed method to gain more adaptability on data fluctuations and environmental change.

2.3.3. TGWO Module

Grey wolves algorithm was put forward by scholars Mir Jalili Australia in 2014, the Grey wolf groups according to social relations are divided into four grades. Each wolf represents a candidate solution, while the most optimal solution is αT, the suboptimal solution is βT, the third optimal solution is δT, and the last is ωT. In each iteration, the three optimum solutions as αT,   βT,   δT determine the prey position and direct the ωT to update the position around it [63].

By using the improved adaptive convergence factor [64], the extremum can be quickly found when the step size is large in a global search. Besides, the extremum can be prevented from missing when the step size becomes smaller in the local search. The weight step size formula [64] adds the weight decreasing strategy, which can reduce unnecessary iterative processes and improve efficiency. The calculation method is as follows.

(1) Initialize the Population. The upper bound Ub  and lower bound Lb are defined, respectively. The number of wolves is N. The dimensions are S. MN×S denotes an N × S two-dimensional matrix, which is the searching field. There are 2m + 1 key parameters to be optimized in each element of the field array, which are step, epochs in the LSTMs module, and the K value in the KNN module. Generate integers randomly at their respective upper and lower bounds to form an element of field array:
(14)
Each parameter has a different Ub and Lb. Set up two vectors to record the bounds.
(15)

(2) Calculate Fitness. Input the element corresponding to each wolf into LSTM-KNN and compare the error. Define the optimal solution as αT,   βT,   δT, respectively.

(3) Update location with aT,  AT,  CT:
(16)
where D denotes the distance between the grey wolves and their prey, t denotes current iteration times, WP(t) denotes the position of the grey wolf in t iteration times, W(t) denotes the position of the prey in t iteration times, A  and C denote the coefficient vectors, r1 and r2 denote random coefficients with scalars between 0 and 1, generally take 0.5, a denotes the convergence factor, φ  denotes the weight of inertia, φmax denotes the maximum weight of inertia, generally take 0.9, and φmin denotes the minimum weight of inertia, generally take 0.4.
(4) Complete the Iteration and Output the Result. The optimal solution αT is denoted as Mbest. The global optimal solution is as follows:
(17)

3. Experiments and Analysis

3.1. Experimental Data

Whitemud Drive is an in-city highway across Edmonton, Alberta, Canada. It is 28 kilometers long with a basic speed limit of 80 kilometers per hour. As a test road, Whitemud Drive is equipped with seven traffic video cameras and seven loop detectors (VDS1017, VDS1037, VDS1034, VDS1031, VDS1029, VDS1027, and VDS1019) from west to east on the main road and gate road to observe the vehicle flow, the vehicle speed, and the vehicle density. In this paper, data of 15 working days are used as historical data for experiments.

3.2. GMM Selecting Test

VDS1019 is set as the prediction point  D0, and the change of traffic flow within one day of the working day is plotted according to 5 mins (see Figure 6). To better carry out the time-division work, the data are smoothed by SG, and the image is reconstructed (see Figure 7).

Details are in the caption following the image
VDS1019 traffic flow before SG filter.
Details are in the caption following the image
VDS1019 traffic flow after SG filter.

Find the extreme values, set up the midpoints, and divide the time series (see Figure 8).

Details are in the caption following the image
VDS1019 time-division diagram.

The time-division results are shown in Table 4.

Table 4. Time-division results.
Time part name Number of data Time
T1 0–70 0 : 00–7 : 00
T2 71–110 7 : 00–9 : 00
T3 111–155 9 : 00–13 : 00
T4 156–249 13 : 00–20 : 00

3.2.1. Reconstructing the Dataset by Time-Division

VDS1017, VDS1037, VDS1034, VDS1031, VDS1029, and VDS1027 are recorded as  O1 ~ O6, their historical data on working days are divided into parts according to T1 ∼ T4, and the data in the same time part are put into the same column. Since the length of the road is 28 km and the speed limit is 80 km/h, we choose 60 km/h as the test speed. It takes 30 mins to go from VDS1017 to VDS1019. Considering the continuity of the road network, the vehicle passes through each point every 5 mins, so O2 ~ O6 is delayed by 5–25 mins for input. And the prediction point  D0 is delayed by 30 mins for input. In this way, the data of each day are the delayed input to form a dataset.

Calculate KL divergence and GRA coefficient (see Table 5).

Table 5. Calculation results of KL divergence and GRA coefficient.
KL/GRA result T1 T2 T3 T4
KL GRA KL GRA KL GRA KL GRA
O1 0.030 0.90 0.023 0.78 0.020 0.86 0.010 0.87
O2 0.029 0.90 0.009 0.81 0.017 0.88 0.012 0.86
O3 0.038 0.89 0.021 0.78 0.031 0.85 0.035 0.82
O4 0.035 0.90 0.009 0.81 0.018 0.88 0.011 0.85
O5 0.027 0.91 0.008 0.83 0.016 0.89 0.009 0.87
O6 0.008 0.95 0.005 0.88 0.004 0.93 0.004 0.92
D 0 1 0 1 0 1 0 1

Input the result table into the GMM module for classification.

Taking T2 as an example, the GMM classification results are shown in Figure 9.

Details are in the caption following the image
GMM for T2 dataset classification result.

The classification results are two types. The closer the GRA coefficient is to 1, the better the correlation is, and the closer the KL divergence is to 0, the more similar the distribution is. So, choose the yellow mark points as the input points (see Figure 9). The following is the final classification results of four datasets (see Table 6).

Table 6. Classification results of GMM.
GMM result T1 T2 T3 T4
D point O6 O2, O4, O5, O6 O2, O4, O5, O6 O1, O2, O4, O5, O6

The upstream points selected between T2 and T3 are the same, so T2 and T3 can be regarded as the same time part, which is 7 : 00–13 : 00, and the dataset T2+3 can be reconstructed. The next experimental data are T2+3.

3.3. Comparative Experiments at Different Upstream Points

Different upstream points are selected for model prediction and mean absolute percentage error (MAPE) comparison so as to verify the effect after classification. Results are shown in Table 7.

Table 7. MAPE comparison table of different points.
Model/points V34 V34 + V31 V34 + V31 + V29 V34 + V31 + V29 + V27 Including abandonment points
LSTM-KNN 14.45 14.96 14.31 12.19 12.46
Linear-SVM 20.47 21.14 20.03 21.17 21.24
Poly-SVM 15.85 17.30 17.15 17.65 18.04
RBF-SVM 20.51 20.29 20.87 20.17 20.48
Multi-LSTM 14.32 14.51 11.96 9.51 23.14

In this dataset, it can be seen that the abandoned upstream points have little difference from the selected points (see Table 5). So it is not obvious in the accuracy improvement, which is reasonable. If in datasets with a large difference, the meaning of GMM classification operation will be reflected more.

3.4. TGWO-LSTM-KNN Experiments

3.4.1. LSTM-KNN Structural Test

Considering the operation time and efficiency, this paper constructs the following structures for testing. Since the training data are 15 days’ traffic flow, the batch size is set to 15, and the data are divided into 15 groups, which will ignore the data relationship between each group and reduce the risk of overfitting. If the batch size is too small, it is not conducive to the training model and is easy to overfitting. The test step is 3, epochs are 100, and the comparison results are shown in Table 8.

Table 8. Error comparison table of different structures.
Structure RMSE MAE MAPE
Double-layer structure 42.49 29.81 11.64
LSTM (256 units) +
LSTM (128 units)
  
Four-layer structure 44.87 31.33 12.25
LSTM (256 units) +
Dropout (0.2) +
LSTM (128 units) +
Dropout (0.2)
  
Single-layer structure 44.42 31.55 12.32
LSTM (256 units)
  
Single-layer structure 44.43 31.29 12.19
LSTM (128 units)

The results show that increasing the number of LSTM layers can improve the prediction accuracy, but it increases a lot of computing time and overfits easily. Increasing the forgetting layer (the forgetting rate is 0.2) will reduce the accuracy. In the first-layer structure, there is only a small difference between 256 units and 128 units; therefore, the single-layer 128 units structure is selected for prediction in the following LSTM.

3.4.2. LSTM-KNN Time Steps Test

The model is tested under different time steps (5 mins, 10 mins, 15 mins, and 30 mins). The result shows that the model has good accuracy and stability (see Figure 10). The overall prediction accuracy shows a downward trend, and the absolute error shows an upward trend. It is worth noting that when the time step is 10 mins, the trend of accuracy will change and the error is lower than 15 mins. In general, even if the time step is different, the model still has good performance for the prediction accuracy.

Details are in the caption following the image
Different error (MAPE, MAE, RMSE) comparisons of different time steps.

3.4.3. TGWO Optimization Test

The dataset has four upstream points, and the key parameters to be optimized are the steps of O2, O4, O5, O6, the training epochs in the LSTM module, and the K in the KNN module. So, M[i, j] = [step2, step4, step5, step6, epochs2, epochs4, epochs5, epochs6, k]. Set the upper and lower bounds: the step size ranges from 3 to 20. Training time ranges from 100 to 200. The coefficient K ranges from 5 to 50, the number of iterations t is 50, the number of wolves is N = 5, and the dimension S is 30. The training results are shown in Figure 11.

Details are in the caption following the image
TGWO iterative convergence diagram.

When the MAPE is 10.04 (green mark in Figure 11), it will reach the optimal solution Mbest = [5,6,5,6,120,100, 120,100,25]. The steps of O2, O5  and O4, O6 are 5 and 6, the training epochs are 120 and 100, and K is 25.

3.4.4. Comparison of TGWO-LSTM-KNN and Other Models

TGWO-LSTM-KNN is compared with SVM, LSTM, and BP. The parameters of TGWO-LSTM-KNN are the optimal solution. The step of LSTM-KNN is 3, K is 16, and the epochs is 100. LSTM has 256 units in the first layer and 128 units in the second layer of the double-hidden layer structure, with the step of 3 and epochs of 100. SVM uses three modes to fit, linear and poly, RBF. BP neural network is a three-layer structure, two fully connected layers, the middle increased the forgetting layer (rate is 0.2), and the epochs is 100. Results are shown in Figure 12.

Details are in the caption following the image
Model comparison diagram.

The accuracy of LSTM-KNN can reach the level of the popular model. The accuracy of TGWO-LSTM-KNN can be improved by 15.27% compared with single-LSTM, 9.47% compared with BP, and 43.12% compared with poly-SVM (see Table 9). Besides, the advantage of the hybrid model is not accuracy, but being able to output the prediction set of upstream points for collaborative management.

Table 9. Error comparison table of different models.
Model RMSE MAE MAPE
TGWO-LSTM-KNN 29.70 23.17 10.04
LSTM-KNN 44.43 31.29 12.19
Single-LSTM 36.75 27.88 11.85
Multi-LSTM 33.32 24.52 9.51
BP 37.59 27.59 11.09
Linear-SVM 38.78 27.03 21.17
Poly-SVM 45.36 33.87 17.65
RBF-SVM 40.62 28.29 20.17

4. Conclusions

In this paper, the TGWO-LSTM-KNN prediction model with GMM classification considering spatial-temporal characteristics under the concept of attention mechanism is proposed. The time series is divided into parts by using the temporal characteristic of the prediction point. And GMM through KL and GRA is used for further classification. The upstream points with a small difference in distribution and high linear similarity are selected to increase the accuracy and reduce the complexity. Then the hybrid model TGWO-LSTM-KNN is used to train and predict. Parallel computing is used in the LSTM module to improve efficiency.

GMM as an unsupervised model can be very flexible to classify. This model can be applied to the multifactor model to reduce complexity. KNN, as the next part of LSTM, fully combines upstream points data and prediction points data for prediction. Compared with SVM, KNN has the characteristics of fast speed and greater data processing ability, which is more suitable for multiple factors and complex data of freeway. LSTMs module can ignore the relationship between upstream points to perform the parallel computation. It can reduce the operation time and make the model more practical. TGWO is less likely to fall into local optimal solution and also has fast speed and good performance. To sum up, TGWO-LSTM-KNN with GMM classification can be better used in real freeways with complex data and multifactor with high accuracy, fast calculation speed, and strong adaptability. It can be applied in the real freeway to achieve the purpose of collaborative management.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

The authors want to thank Tianjiao Wang, Xi Chen, Xiang Wang, and Lingbin Kong for their help in the study. Thanks also should be given to the OpenITS platform (http://www.openits.cn/). The research and publication of this article were funded by the National Natural Science Foundation of China (52072129).

    Data Availability

    The data used to support the findings of this study are openly available in the OpenITS platform for noncommercial purposes only and the website is https://www.openits.cn/openData1/700.jhtml.

      The full text of this article hosted at iucr.org is unavailable due to technical difficulties.