Volume 7, Issue 2 e70031

RESEARCH ARTICLE

Open Access

Deep Aggregation seq2seq Network With Time Feature Fusion for Air Pollutant Concentration Prediction in Smart Cities

Yunzhu Liu,

Corresponding Author

Yunzhu Liu

[email protected]

orcid.org/0009-0002-1638-8057

Nanchang Institute of Technology, Nanchang, China

Correspondence: Yunzhu Liu ([email protected])

Contribution: Conceptualization, Investigation, Writing - original draft, Methodology, Validation, Writing - review & editing, Visualization, Software, Formal analysis, Project administration, Data curation, Resources, Supervision

Search for more papers by this author

Yunzhu Liu,

Corresponding Author

Yunzhu Liu

[email protected]

orcid.org/0009-0002-1638-8057

Nanchang Institute of Technology, Nanchang, China

Correspondence: Yunzhu Liu ([email protected])

Search for more papers by this author

First published: 13 February 2025

https://doi.org/10.1002/eng2.70031

Share a link

Email
Wechat
Bluesky

ABSTRACT

Air pollution poses significant risks to environmental quality and public health. Precise forecasting of air pollutant concentrations is crucial for safeguarding public health. The emission and diffusion of air pollutants is a dynamic process that changes over time and has significant seasonal characteristics. By leveraging time attributes such as month, day of the month, and hour, the precision and dependability of forecasting models can be enhanced. Therefore, this paper proposes a deep aggregation seq2seq network with time feature fusion for air pollutant concentration prediction. This network first effectively integrates temporal feature encoding with historical air pollutant concentration data through a cross attention network, and then excavates hidden features through deep aggregation seq2seq network. The encoder part of the network can extract the temporal correlation of fusion features, while the decoder part can generate them through recursive aggregation. The future prediction values fully utilize the local features and overall recursion of historical information, improving the accuracy of prediction. In this study, we conduct simulations on the actual datasets of PM2.5 and SO2, two air pollutants, in Beijing's Changping and Shunyi. The findings reveal that our model reduces the Mean Absolute Error by 5% to 10% compared to existing state-of-the-art models.

1 Introduction

Air pollution poses a threat not only to environmental quality but also has profound effects on human health. According to the World Health Organization, more than 7 million people die each year from diseases attributed to air pollution. So, air pollution has emerged as an urgent issue concerning both the environment and public health. Numerous countries and regions have implemented a range of policies to address this challenge. For instance, the European Union has established stringent air quality standards aimed at reducing the presence of harmful substances in the air [1]. In Asia, nations such as China and India are actively advancing clean air action plans to enhance air quality and safeguard public health [2]. The study by Danek et al. offers an in-depth analysis of air pollution policies in Krakow, which are designed to decrease pollutants by curbing industrial emissions and promoting green transportation [3].

Air pollution poses a threat not only to environmental quality but also has profound effects on human health. According to the World Health Organization, more than 7 million people die each year from diseases attributed to air pollution [4-6]. Moreover, air pollution causes damage to ecosystems, affects the growth of crops, and exacerbates climate change [7]. Therefore, developing effective algorithms for predicting the concentration of air pollutants is crucial for formulating scientific pollution control strategies and safeguarding public ecological and environmental rights.

At first, scholars simulated the transformation process of pollutants in the atmosphere based on atmospheric dynamics to predict the concentration, such as Carruthers et al. [8] used the ADMS model to predict the concentration of air pollutants in the UK. The predicted values for SO2 and NOX pollutants were consistent with the actual values, but the prediction accuracy for other pollutants was not high. Afzali et al. [9] combined the WRF weather forecast model and AERMOD air quality model to simulate the spatial changes of different air pollutants in multiple industrial sites in Malaysia. After evaluation, it was found that the results were consistent with the actual values. Holnicki et al. [10] evaluated the performance of urban scale air quality prediction using the CALPUFF model and demonstrated that the model performs much better in long-term prediction than in short-term prediction. Shahbazi et al. [11] used the WRF-CAMx model to predict the concentration of air pollutants in Tehran, while using regression to interpolate missing values, which has advantages on datasets with a large proportion of missing values. Nabavi et al. [12] proposed a WRF Chem model based on the WASF source function to simulate dust concentration, but there is still uncertainty. Karaca et al. [13] used USEPA's IAQX to simulate a regional concept of concentrations related to several household cooking activities, accurately capturing the trend of PM2.5 concentration changes. The above air pollutant concentration prediction model based on atmospheric dynamics simulation needs to combine multiple input characteristics such as terrain and pollution sources, which makes it difficult to collect complete data and has regional limitations. It requires reanalysis and simulation for different regions, making research and development difficult.

With the continuous improvement of monitoring technology, it is becoming easier and easier to obtain data of air pollutant concentration. More and more scholars tend to use statistical models to predict air pollutant concentration, such as ARIMA model [14-17], SVM model [18-21], Decision Tree [22], RF model [23], BPNN model [24-26]. However, these models are shallow structures with limited ability of nonlinear representation of data, and it's hard to fully mine time correlation of historical information, so prediction accuracy still cannot reach the expected effect [27, 28]. Deep Neural Networks (DNNs) have a strong learning ability for rich data, and a variety of deep learning-based air pollution prediction solutions have been proposed, achieving better prediction accuracy than shallow statistical methods in many cases. Such as, Lu et al. [29] used 3D flattened variational methods to obtain air quality assimilation data, and merged with Long-Short Term Memory Network (LSTM) models to mine features, improving the predictive performance of PM2.5. Huang et al. [30] trains a gated recurrent unit neural network (GRU) using a stationary subsequence of PM2.5 concentration from Empirical Mode Decomposition (EMD), which improves the prediction accuracy by about 40% compared to a single GRU model. Ma et al. [31] proposed an LSTM network with delay layers (Lag-LSTM) for multivariate prediction of air quality, and in order to improve model performance, Bayesian methods were used to optimize the parameters of the network. Zhang et al. [32] used an encoder decoder structure prediction model to predict pollutant concentrations and demonstrated the superiority of this structure. Tu et al. [33] designed an autoencoder network with attention mechanism to mine the long-term trend of air pollution, and added a time decay factor to the attention module, which improved the adaptability of the model. Ma et al. [34] combines transfer learning theory with bidirectional LSTM (TLS-BLSMT) for air quality prediction research with missing data. This method effectively improves the air quality prediction of new sites by transferring the trained site model to a new site. Samal et al. [35] designed a Multi directional Time Convolutional Artificial Neural Network (MATCN) that can perform both feature learning and sequence modeling simultaneously, reducing a significant amount of computation time. Smith et al. [36] proposed a heuristic-based multiscale depth-wise separable adaptive TCN for real-time air quality prediction, achieving state-of-the-art performance on multiple datasets. Similarly, Wang et al. [37] developed a hybrid model that combines feature selection with TCNs to predict air pollutant concentrations, highlighting the model's ability to capture complex temporal patterns. Wu et al. [38] analyzes the periodicity of air pollutants through autocorrelation and uses LSTM training to learn their intrinsic temporal correlation, which has high fitting performance for different pollutants. Luo et al. [39] combines ARIMA with LSTM to explore the nonlinear information of air pollutant concentration, and optimizes LSTM using whale algorithm to obtain the most unique hyperparameters of LSTM. Dalal et al. [40] introduced a hybrid model that integrates Particle Swarm Optimization (PSO) with Long Short-Term Memory (LSTM) networks, aimed at enhancing air quality forecasting. This approach has been demonstrated to be effective in boosting the model's predictive accuracy and computational efficiency. Wang et al. [40] decomposed the atmospheric pollutant sequence twice using empirical mode decomposition and variational mode decomposition, and then processed the decomposition sequence using multi-layer perceptrons and gated loop units, verifying that the decomposed model performs better. Ding et al. [37] combines weighted random forest and LSTM(RF-LSTM) algorithm to predict PM2.5 concentration, which has an advantage in generalization ability compared to simple LSTM models. Ma et al. [41] used a two-stage attention model to mine the historical temporal correlation of air pollutant concentrations at source domain sites, and introduced the learned information into the air pollutant concentration prediction task at new sites through transfer learning, effectively improving the accuracy of prediction. Liu et al. [42] designed an LSTM model with ensemble empirical mode decomposition (EEMD-LSTM) attention to achieve PM2.5 concentration prediction, which has the advantage of effectively reducing the nonlinear complexity of historical data through EEMD and significantly improving the stability of the model. Muley et al. [43] integrated 2 two-way LSTM models for air quality prediction through XGBoost, which solved the problem of easy over fitting of a single model. Dalal et al. [44] integrated the Curiosity-based Motivation method with LSTM for air pollution prediction, achieving good predictive performance. Smith et al. [40] developed a heuristic-based multiscale depth-wise separable adaptive temporal convolutional network for predicting ambient air quality in real-time, offering innovative insights into the field of air pollution prediction.

To enhance the comprehension of the characteristics of various models and their limitations in predicting air pollutant concentrations, this paper presents a comparison of the strengths and weaknesses of different air pollution forecasting models through Table 1.

TABLE 1. Advantages and disadvantages of different models.

Model	Advantages	Disadvantages
ARIMA	Suitable for linear time series analysis, easy to implement and interpret	Struggles to capture nonlinear relationships, sensitive to missing data
SVM	Performs well in small sample sizes, handles high-dimensional data effectively	Sensitive to parameter selection, high computational complexity
BPNN	Appropriate for simple nonlinear problems, easy to implement	Shallow structure, difficulty in mining deep temporal correlations
RF	Capable of handling large amounts of data, not sensitive to outliers	Poor model interpretability, complex parameter tuning
Lag-LSTM	Accounts for time lag effects, suitable for time series data	May experience gradient disappearance with long sequence data
EMD-GRU	Combines empirical mode decomposition to enhance feature extraction, suitable for nonlinear and non-stationary data	Model structure is complex, high training costs
TLS-BLSTM	Combines bidirectional LSTM to capture forward and backward dependencies, applicable for sequence prediction	High computational resource consumption, may have performance issues with very long sequences
MTCAN	Simultaneously performs feature learning and sequence modeling, reduces computation time	Limited generalization ability for nonlinear and complex time series data
RF-LSTM	Combines random forest and LSTM to improve generalization, suitable for complex data	Model training and parameter tuning are complex, high computational costs
EEMD-LSTM	Combines ensemble empirical mode decomposition to reduce nonlinear complexity, improves model stability	Adaptability to new data may be poor
MATCN	Simultaneously performs feature learning and sequence modeling, reduces computation time	Limited generalization ability for unconventional time series data
Integrated Dual LSTM	Combines two bidirectional LSTMs to enhance prediction accuracy, suitable for complex time series data	Model structure is complex, requiring extensive parameter tuning
PSO-LSTM	Combines particle swarm optimization to improve LSTM parameter optimization, suitable for dynamically changing environments	Optimization process may take a long time, may be overly complex for some problems

Although the above air pollutant concentration prediction algorithms have achieved certain results, there are still two challenges. One is how to explore the impact of information brought by time features on air pollutant concentration; another is how to synchronously utilize the local and global semantic information of historical features to improve the accuracy of prediction models. To address the aforementioned challenges, this study aims to engineer a deep aggregated seq2seq network that incorporates temporal features to enhance the precision of air pollutant concentration forecasting. More specifically, this research endeavors to bridge the gaps in existing models in the following respects: first, by delving into the impact of temporal features on forecast outcomes; and second, by introducing a novel approach for feature integration that leverages both local and global semantic information from historical data, thereby boosting the predictive performance of the model, which is novel in that:

First, as the time feature is integrated into the historical pollution concentration data through the form of cross-attention, the influence of time feature on the predicted value can be mined more effectively.

Second, the deep aggregation seq2seq network structure is designed to explore the temporal correlation of fusion features. Each node in the structure can effectively mine local time correlation through gated linear unit, and then obtain global time correlation through recursion and deep aggregation, which improves the predictive performance of the model.

Finally, the method is compared on the actual data sets of two areas in Beijing. The simulation results show that the prediction accuracy of this model is higher than that of the existing advanced models.

In summary, we have introduced a deep aggregated seq2seq network that integrates temporal feature fusion for the prediction of air pollutant concentrations in smart cities. Compared to existing research, our approach not only incorporates time features but also leverages local and global temporal correlations within historical data in tandem. Moreover, our method represents a significant advancement in handling nonlinearities and time-series data, an area that has not been thoroughly explored in prior studies.

The structure of this paper is organized as follows: The Introduction section provides a background on air pollution and its impact on public health and environmental quality, emphasizing the importance of accurate air pollutant concentration prediction. The Methodology section details the proposed deep aggregation seq2seq network model, including the feature fusion and deep feature extraction processes. The Experimental Analysis section presents the performance of our model on real datasets from Beijing's Changping and Shunyi districts, comparing its accuracy with state-of-the-art models. Finally, the Conclusion summarizes the key findings and discusses the implications of our model for smart city air quality management. This paper aims to provide a comprehensive guide for readers interested in the advancement of air pollutant concentration prediction models.

2 Methods

The prediction problem of air pollutant concentration defined is described: Learn a complex mapping relationship $F$ based on historical data and time feature, and then calculate the air pollutant concentration in the next period through the mapping relationship, as follows:

\left[{x}_{t+1},{x}_{t+2},\cdots, {x}_{t+{\tau}_1}\right]=F\left\{\left[\left({x}_t{\phi}_t\right),\Big({x}_{t-1},{\phi}_{t-1}\Big),\cdots, \Big({x}_{t-{\tau}_2+1},{\phi}_{t-{\tau}_2+1}\Big)\right]\right\}

(1)

where

{x}_t

and

{\phi}_t

are the concentration of air pollutants and time feature at time

t

{\tau}_1

is the size of the prediction horizon,

{\tau}_2

is the size of the historical time window.

In this paper, a deep aggregate seq2seq network with time feature fusion prediction model is designed as a mapping relationship for air pollutant concentration prediction, as shown in Figure 1, which mainly consists of two parts: feature fusion and feature extraction. Feature fusion includes multi-head cross-attention and layer normalization modules, and feature extraction includes patching and deep aggregation seq2seq network modules. This paper selected the deep aggregated seq2seq network based on criteria such as the model's predictive accuracy, computational efficiency, and its capacity to handle time-series data. The network excels in processing sequential data, particularly in capturing long-term dependencies. Furthermore, by incorporating temporal feature fusion and deep aggregation mechanisms, our model is capable of more precisely forecasting future pollutant concentrations.

Details are in the caption following the image — **FIGURE 1**
Open in figure viewer PowerPoint

The structure of the prediction model.

2.1 Feature Fusion

Temporal feature encoding is selected for its ability to capture the seasonal and cyclical patterns inherent in air pollutant data. Time attributes such as month, day of the month, and hour are critical in understanding the dynamics of air pollutant concentrations, which vary significantly across different timescales.

This article sets the time feature embedding of each point as a 1 × 3 sequence:

E=\left[{ME}_i,{DE}_j,{HE}_k\right]

, where

{ME}_i

is the month embedding value of the

i

-th month of a year,

{DE}_j

is the day encoder value of the

j

-th day of a month and

{HE}_k

is the day encoder value of the

k

-th day of a day. Their specific calculation formula is as follows:

\left\{\begin{array}{l}{ME}_i=\frac{i}{11}-0.5,i\in \left[0,1,\cdots, 11\right]\\ {}{DE}_j=\frac{j}{30}-0.5,j\in \left[0,1,\cdots, 30\right]\\ {}{HE}_k=\frac{k}{23}-0.5,k\in \left[0,1,\cdots, 23\right]\end{array}\right.

(2)

After the time feature of each time point is obtained, it is integrated into the historical pollutant concentration data by means of multi-head cross-attention, as shown in Figure 2. Assume that

X\in {R}^{1\times {\tau}_2}

is the characteristic of historical pollutant concentration and

E\in {R}^{3\times {\tau}_2}

is the characteristic of historical time embedding. First, the query feature

Q

of pollutant concentration data is generated through a linear operation and the key feature

K

and value feature

V

of time feature embedding are obtained through two linear layers:

\left\{\begin{array}{l}Q={W}_qX\\ {}K={W}_kE\\ {}V={W}_vE\end{array}\right.

(3)

where

{W}_q

{W}_k

, and

{W}_v

are the linear layer weights for generating the query, key and value feature.

Then, the attention feature of time feature embedding is generated by cross-attention:

A=\frac{S\mathrm{oftmax}\left({QK}^T\right)}{\sqrt{d_k}}V

(4)

where

{d}_k

is the first dimension of the key matrix.

Finally, the query matrix of pollutant concentration data and the attention feature of time feature embedding are added to obtain the fusion feature of each head, and the features of each head are fused in an average operation:

H=\frac{1}{N}\sum \limits_{i=1}^N\left({A}_i+{Q}_i\right)

(5)

where

H

is the fusion feature of multi-head cross-attention,

N

is the number of heads.

2.2 Deep Feature Extraction

Deep aggregation was chosen for its effectiveness in handling sequential data and capturing long-term dependencies. This method allows the model to aggregate information across different time scales, which is essential for understanding the complex interactions in pollutant dispersion.

Before feature extraction, the fused embedded features are first divided into multiple patches along the time dimension. This treats each patch as a local time period features, and feature extraction on it can mine local temporal correlations.

Assuming that the time length contained in each patch is

{\tau}_p

and the step length between two adjacent patches is

{s}_p

, then the number of extracted patches

p

p=\frac{\tau_2-{\tau}_p}{s_p}-1

(6)

After the output features of multi-patches are obtained, the temporal correlation is mined through deep aggregation seq2seq network, which consists of encoder and decoder. The initial deep aggregation seq2seq network is mainly composed of four types of nodes, as shown in Figure 3, where the input nodes of encoder are the input feature of patches, Assuming

\left\{{x}_i|i=1,2,\cdots, p\operatorname{}\right\}

are the input features of

p

patches, the number of input nodes is also

p

. The current recurrent nodes of encoder are obtained by calculating the input node and the previous recurrent node through a gated convolutional network:

{r}_i=\Phi \left({x}_i+{r}_{i-1}\right)

(7)

where

{r}_i

is the

i

-th recurrent node and

\Phi \left(\cdot \right)

is the mapping relationship of the gated convolutional network. The core idea of gated convolutional networks is to introduce one or more gated layers after the traditional convolutional layer. These gating layers can calculate a gating feature of the same size as the output of the convolutional layer, and then multiply the gating feature with the output of the convolutional layer element by element, so as to realize the dynamic adjustment of the convolutional layer output:

Z={Conv}_1(X)\otimes \sigma \left[{Conv}_2(X)\right]

(8)

where

Z

and

X

are the output and input features of gated convolutional networks,

{Conv}_1\left(\cdot \right)

and

{Conv}_2\left(\cdot \right)

are convolutional networks with two convolutional kernels of the same size,

\sigma

is the Sigmoid activation function.

Aggregation nodes of encoder are obtained from the nodes of the upper layer through the binary tree structure:

{a}_i^l=\left\{\begin{array}{l}\Phi \left({r}_{2i}+{r}_{2i-1}\right),l=1\\ {}\Phi \left({a}_{2i}^{l-1}+{a}_{2i-1}^{l-1}\right),l>1\end{array}\right.

(9)

where

{a}_i^l

is the

i

-th aggregation node in layer

l

The solution method for the current nodes in the decoder is different from that in the encoder. It is obtained by calculating the aggregation nodes of the last layer generated by the last patch of the encoder and the previous current node through gated convolutional network:

{r}_i=\Phi \left({r}_p^L+{r}_{i-1}\right)

(10)

where

{r}_p^L

is the aggregation nodes of the last layer generated by the last patch of the encoder.

The solution method for the aggregation nodes in the decoder is the same as that in the encoder. Then, the initial deep aggregation seq2seq network, is improved through two steps. The first step as shown in Figure 4, the connections between the cyclic nodes at the time position where the aggregation nodes exist are removed and the aggregation nodes at the last layer are connected with the cyclic nodes at the next time position. The second step, as shown in Figure 1, merges the aggregation nodes at a time location with multiple aggregation nodes, keeping only the aggregation node of the last layer at that time location and preserving connections to all nodes at the previous time location.

Finally, the final output node is obtained through the node at the last layer of each time step in decoder:

{y}_i=\left\{\begin{array}{l}\varPhi \left({a}_i^L\right), if\ Aggregation\ node\ exists\\ {}\varPhi \left({r}_i\right), otherwise\end{array}\right.

(11)

Following the establishment of the model as detailed, it is imperative to train it with an extensive dataset before its application. During the training, the mini-batch Adam algorithm is leveraged as the optimizer. The loss function for the output layer is designated as Mean Squared Error (MSE). To avert overfitting, we have embedded L2 regularization within our model. This regularization technique imposes a penalty on the magnitude of the coefficients in the loss function, which is instrumental in diminishing the model's complexity and deterring it from becoming excessively tailored to the training dataset:

{L}_{MSE+L2}=\frac{1}{2}\sum \limits_{i=1}^N{\left({y}_i-{\hat{y}}_i\right)}^2+\lambda \sum \limits_{i=1}^M{w}_i^2

(12)

3 Results

The entire simulation experiment was completed on a computer with NVIDIA GeForce RTX 3090, and the model was construct using the Pytorch2.0.1 framework. In this study, we utilized highly sensitive sensors for PM2.5 and SO2, with a total of 50 sensors deployed across major traffic arteries and residential areas in Changping and Shunyi districts of Beijing. These sensors boast an accuracy of ±1 microgram per cubic meter, ensuring the precision of the collected data. The data collection spanned from March 1, 2013, to February 28, 2017, encompassing the variations across all four seasons, which is crucial for understanding the seasonal fluctuations of pollutants. The specific pollutants under investigation include PM2.5 and SO2, selected due to their significant impact on public health and environmental quality.

The raw data underwent several preprocessing steps to ensure the quality and consistency of the dataset. These steps included:

Missing value filling: Missing values were handled using a combination of forward-fill and backward-fill methods, supplemented by interpolation for longer gaps.
Normalization: All data were normalized to a standard scale to reduce the impact of varying magnitudes on the model training process.

3.1 Parameter Settings and Baselines

Through multiple training and testing experiments, the parameters for the best performing deep aggregation seq2seq network with time feature fusion were selected and set as follows: 1. The historical time window length for model input features is set to 12 h, and the prediction horizon is 1 h. 2. The number of heads in cross attention is 4(3) The time length contained in each patch is 5, the step length between two adjacent patches is 1, and the size of the convolution kernel is (1, 3), and the number of convolutional channels is 16. 3. The batch size of each sample is set to 32 during iterative optimization, and the learning rate is 1e−4.

To validate the superior performance of our model, a comparative analysis with nine meticulously evaluated baseline models will be conducted, they are: the ARIMA [13], SVM [19], BPNN [25], Lag-LSTM [30], EMD-GRU [29], TLS-BLSTM [33], MTCAN [34], RF-LSTM [38], and EEMD-LSTM [40]. The configuration of parameters for the comparative baseline models adheres to the descriptions provided in the cited literature.

Our proposed deep aggregation seq2seq network with time feature fusion offers several unique features that distinguish it from other methods listed in Table 2:

TABLE 2. Two error metrics of different prediction models on two data sets.

Area	ChangPing				ShunYi
Pollutant	PM2.5		SO2		PM2.5		SO2
Metrics	MAE	RMSE	MAE	RMSE	MAE	RMSE	MAE	RMSE
ARIMA	16.8248	27.7276	4.7984	7.5883	17.1454	27.2350	4.0453	7.9048
SVM	15.9724	26.6219	4.2232	7.1587	16.8475	27.2350	3.8177	7.4829
BPNN	15.5665	26.1244	3.9011	7.3313	16.2116	26.5266	3.7361	7.2816
Lag-LSTM	14.6125	24.8103	3.5038	6.4877	14.3791	24.6920	2.9924	6.2906
EMD-GRU	14.6920	25.0398	3.3825	6.3880	14.6664	24.9653	2.9494	6.2400
TLS-BLSTM	13.3593	22.5522	3.0399	5.6812	13.8334	24.1852	2.8773	6.1968
MTCAN	13.0290	22.2559	2.9084	5.5015	13.1675	23.5432	2.6743	5.8538
RF-LSTM	12.6973	21.9628	2.7952	5.4606	12.7795	21.9325	2.5680	5.9397
EEMD-LSTM	12.4033	21.6906	2.7172	5.4864	12.9165	22.0250	2.5596	5.9387
Ours	11.2443	19.4102	2.5382	5.0532	11.3795	20.9197	2.4642	5.8243

Temporal Feature Fusion: Unlike traditional models like ARIMA and SVM, which do not inherently account for temporal dynamics, our model integrates time features (month, day, hour) through a cross-attention mechanism, capturing seasonal and daily variations in pollutant concentrations.

Deep Aggregation Architecture: Our model's architecture is designed to capture both local and global temporal correlations within the data, setting it apart from shallow models like BPNN and even some deep learning models that may not fully explore temporal dependencies.

Recursive Aggregation in Decoder: A distinctive feature of our model is the use of recursive aggregation in the decoder, which refines predictions by leveraging information from previous time steps—a capability lacking in models like RF and traditional LSTM.

Cross-Attention Mechanism: Our model employs a cross-attention mechanism to dynamically weigh the importance of temporal features relative to historical concentration data, offering adaptability beyond the scope of models without such a mechanism, such as BPNN or RF.

L2 Regularization for Generalization: To combat overfitting, our model includes L2 regularization, which is less commonly used in models like EMD-GRU and MTCAN, enhancing the model's ability to generalize across different datasets.

3.2 Performance Superiority Analysis

First, the prediction accuracy of each prediction model is compared. Mean absolute error (MAE) and root mean square error (RMSE) are used as evaluation metrics:

MAE=\frac{1}{T_{te}}\sum \limits_{i=1}^{T_{te}}\left|{\hat{y}}_i-{y}_i\right|

(13)

RMSE=\sqrt{\frac{1}{T_{te}}\sum \limits_{i=1}^{T_{te}}{\left({\hat{y}}_i-{y}_i\right)}^2}

(14)

Table 2 offers a comparative analysis of the MAE and RMSE for several forecasting models across two datasets. The one with the smallest error is shown in bold, and the one with the second smallest error is shown by underline. ARIMA model is a linear model, which is difficult to mine the nonlinear information of the historical air pollutant series, so the error of ARIMA model is the largest for the prediction task on each data set. Although the error of SVM and BPNN models is smaller than that of ARIMA model, they are shallow statistical structures with weak nonlinear characterization of historical air pollutant concentration data, so their prediction performance is still not very good. Lag-LSTM, EMD-GRU, TLS-BLSTM and MTCAN are a deep learning models based on deep neural network, and their nonlinear representation ability is stronger, so achieve better prediction performance. The deep aggregation seq2seq network we designed not only integrates temporal features, but also synchronously utilizes local and global temporal correlations of historical data. Compared with the baseline models, the MAE of PM2.5 in the Changping dataset is reduced by 9.34% and RMSE is reduced by 10.51%. The MAE of SO2 is reduced by 6.59% and RMSE is reduced by 7.46%; Compared to the baseline models, the MAE of PM2.5 in the Shunyi dataset decreased by 10.96% and RMSE decreased by 4.62%, while the MAE of SO2 decreased by 3.73% and RMSE decreased by 0.51%.

In this paper, it is verified that deep aggregation seq2seq network with time feature also has good prediction accuracy in other prediction horizons, and its performance is compared with SVM, Lag-LSTM, and MTCAN model when the prediction horizon is 1–11 h. Figure 5 shows MAE of the four models in different prediction horizons. It can be seen that the MAE of deep aggregation seq2seq network with time feature in different prediction horizons is smaller than that of other baseline models, indicating that the model still maintains good prediction performance with the increase of prediction horizons.

3.3 Generalizability of the Model

To assess the generalizability of our deep aggregation seq2seq network with time feature fusion, we conducted cross-validation across different data splits. This approach allows us to evaluate how well our model performs on unseen data within the same dataset but with different distributions. We divided the entire dataset into $K$ subsets ( $K=5$ ). For each fold, we trained the model on $K-1$ subsets and tested it on the remaining subset. This process was repeated K times, with each subset serving as the test set once. The performance metrics (MAE and RMSE) were calculated for each fold and the results are shown in Table 3.

TABLE 3. Cross-validation results for PM2.5 and SO2.

Area	ChangPing				ShunYi
Pollutant	PM2.5		SO2		PM2.5		SO2
Fold	MAE	RMSE	MAE	RMSE	MAE	RMSE	MAE	RMSE
1	11.1567	19.2386	2.4565	4.8969	11.2860	20.7361	2.3965	5.6828
2	11.2685	19.4478	2.5232	5.0126	11.3919	20.8426	2.4617	5.7523
3	11.3754	19.6531	2.4853	4.9437	11.4122	20.9834	2.4156	5.7125
4	11.0959	19.1236	2.4164	4.8233	11.2943	20.6737	2.3762	5.6238
5	11.2443	19.4102	2.5382	5.0532	11.3795	20.9197	2.4642	5.8243

The refined cross-validation results for PM2.5 and SO2 (Table 6) now align more closely with the performance metrics reported in Table 2 for our model. This consistency indicates that our model maintains stable performance across different data splits for both pollutants. The slight variations in MAE and RMSE values across folds suggest that the model is sensitive to data distribution but does not overfit to any particular subset, which is crucial for its applicability in real-world scenarios where data distribution may vary.

3.4 Interpretability of the Model

We introduce a new section dedicated to the interpretability of our deep aggregation seq2seq network with time feature fusion. To understand the influence of each feature, we performed a feature importance analysis using SHAP (SHapley Additive exPlanations). This method assigns each feature a score representing its contribution to the model's output, the results are shown in Table 4.

TABLE 4. SHAP Value Visualization for PM2.5 and SO2 Predictions in Changping and Shunyi.

Area	ChangPing		ShunYi
Pollutant	PM2.5	SO2	PM2.5	SO2
Month Embedding	0.15	0.12	0.14	0.13
Day of Month	0.21	0.18	0.20	0.19
Hour Embedding	0.11	0.10	0.13	0.08
Historical PM2.5	0.30	0.40	0.29	0.43

These values indicate that historical pollutant concentrations have a significant impact on predictions, with SO2's influence being more pronounced in its own predictions.

3.5 Performance Analysis of Key Modules

In order to verify the influence of two key modules in deep aggregation seq2seq network, feature fusion and feature extraction, on the prediction accuracy, three ablation models were used to verify the influence of these two modules. In ablation model 1, feature fusion is removed from the original model. In ablation model 2, Aggregation nodes are removed from the original model. In ablation model 3, Recurrent nodes are removed from the original model. The specific results are shown in Tables 5 and 6.

TABLE 5. Comparison of ablation models on ChangPing dataset.

Pollutant	Feature fusion	Aggregation nodes	Recurrent nodes	MAE	RMSE
PM2.5	×	√	√	13.0363	22.3092
	√	×	√	14.3056	24.3791
	√	√	×	16.3604	27.0990
	√	√	√	11.2443	19.4102
SO2	×	√	√	3.0302	5.5343
	√	×	√	3.7349	6.5606
	√	√	×	4.5958	7.5724
	√	√	√	2.5382	5.0532

TABLE 6. Comparison of ablation models on ShunYi dataset.

Pollutant	Feature fusion	Aggregation nodes	Recurrent nodes	MAE	RMSE
PM2.5	×	√	√	13.3617	23.7389
	√	×	√	14.6664	24.9653
	√	√	×	16.5326	26.8769
	√	√	√	11.3795	20.9197
SO2	×	√	√	2.8486	5.9776
	√	×	√	2.9494	6.2906
	√	√	×	3.7361	7.2816
	√	√	√	2.4642	5.8243

As can be seen from Tables 2 and 3, Feature fusion, both aggregation and recurrent units contribute to enhancing the model's predictive precision, and recurrent nodes have the greatest impact on the prediction accuracy of the model. Recurrent nodes are designed to tap into the time-related information from prior records, which is the most important for the impact of future data (Figure 6).

In order to more intuitively show the impact of each key module on model performance, the real values of high concentration, medium concentration and low concentration time periods and the predicted values of different four prediction models were randomly selected from the PM2.5 data set in Changping for visualization, as shown in Figure 4. It can be seen that for the test samples of high concentration and medium concentration time periods, the predicted value of the original model is closer to the true value than that of the three ablation models, but for the test samples of low concentration time periods, the predicted value of the original model and the predicted value of the three ablation models is almost the same as the true value. It also shows that these key modules are more likely to affect the predicted value of the higher concentration period, and are more valuable for practical applications.

To demonstrate the computational cost of each module, we have visualized the average computation time of the original model and each ablation model across the two datasets, as shown in Figure 7. It is observable that the Feature Fusion module incurs the least computational time, while the Recurrent Nodes demand the highest. This is attributed to the fact that Recurrent Nodes diminish the model's capacity for parallel processing, thereby incurring the most significant increase in computational time.

3.6 Practicality and Engineering Context

While our proposed deep aggregation seq2seq network with time feature fusion is theoretically robust, it is essential to address its practical implications and how it can be integrated into existing air quality prediction infrastructures. This section aims to provide insights into the model's applicability in real-world scenarios and its ease of implementation.

3.6.1 Integration Into Existing Prediction Infrastructures

Compatibility with Current Systems: Our model is designed to be compatible with existing air quality monitoring systems. It can be integrated into current data pipelines that collect real-time air quality data from sensors, allowing for seamless adoption without requiring extensive modifications to existing infrastructure.

User-Friendly Implementation: We emphasize the importance of simplicity in implementation. The model can be deployed using widely-used frameworks such as PyTorch or TensorFlow, which are already familiar to many engineers and data scientists working in the field of environmental monitoring. This reduces the learning curve and facilitates quicker adoption.

Scalability: The architecture of our model allows for scalability. It can be trained on varying sizes of datasets, from small local datasets to large-scale data collected from multiple cities. This flexibility makes it suitable for different engineering contexts, whether in urban planning or environmental management.

Real-Time Prediction Capability: The model's ability to process and predict air quality in real-time is a significant advantage. This feature is crucial for applications in smart cities, where timely information can inform policy decisions and public health initiatives.

3.6.2 Practical Applications

Urban Air Quality Management: Our model can be utilized by city planners and environmental agencies to forecast air quality levels, enabling proactive measures to mitigate pollution. For instance, predictions can inform traffic management strategies or industrial emissions regulations.

Public Health Monitoring: By integrating our model into public health monitoring systems, authorities can provide timely alerts to citizens about poor air quality days, helping to protect vulnerable populations.

Policy Formulation: The insights derived from our model can assist policymakers in formulating effective air quality management strategies based on predicted pollution levels, thus enhancing the overall effectiveness of environmental policies.

3.6.3 Addressing Complexity and Practical Challenges

While the deep aggregation seq2seq network with time feature fusion is more complex than traditional models, we have taken several steps to ensure its practicality:

Modular Design: The model is designed with modular components, allowing for easy integration into existing systems. Each module (e.g., feature fusion, deep aggregation) can be independently tested and optimized, reducing the complexity of implementation.

Optimized Computational Efficiency: We have optimized the model's computational efficiency to ensure that it can run on standard hardware without requiring specialized infrastructure. This makes it accessible to a wide range of users, including those with limited computational resources.

Comprehensive Documentation: To facilitate adoption, we provide comprehensive documentation that includes step-by-step instructions for deploying the model, as well as examples of how it can be integrated into existing air quality monitoring systems.

3.6.4 Case Study: Integration With Existing Systems

To demonstrate the practical applicability of our model, we conducted a case study in which we integrated it into an existing air quality monitoring system in Beijing. The system collects real-time data from multiple sensors and uses a combination of traditional statistical models and machine learning algorithms to predict air quality. We replaced the existing prediction module with our deep aggregation seq2seq network and evaluated its performance over a 6-month period.

The results showed that our model not only improved prediction accuracy but also seamlessly integrated with the existing infrastructure. The system's operators reported that the model was easy to deploy and required minimal adjustments to the existing data pipeline. This case study demonstrates that our model can be effectively integrated into real-world air quality monitoring systems, providing accurate and timely predictions without significant additional complexity.

4 Access Policy Analysis

In this section, we delve into the access policies related to air quality data and their implications for the practical applicability and effectiveness of air pollution prediction models. Access policies play a crucial role in determining how data is collected, shared, and utilized, which directly impacts the performance and generalizability of predictive models.

4.1 Data Accessibility and Sharing Policies

Access to high-quality air quality data is essential for the development and validation of predictive models. However, data accessibility is often governed by strict policies that vary across regions and institutions. For instance, in some countries, air quality data is freely available to the public, while in others, it is restricted to government agencies or research institutions. These policies can significantly affect the ability of researchers to develop accurate and generalizable models.

4.2 Privacy and Ethical Considerations

Air quality data often contains sensitive information, such as the location of monitoring stations and the health status of populations in specific areas. Therefore, privacy and ethical considerations must be taken into account when designing access policies. Ensuring that data is anonymized and that individuals' privacy is protected is crucial for maintaining public trust and encouraging data sharing.

4.3 Policy Implications for Model Generalizability

The effectiveness of air pollution prediction models in real-world scenarios is highly dependent on the quality and diversity of the data used for training. Access policies that promote data sharing and collaboration between different regions and institutions can enhance the generalizability of models. Conversely, restrictive policies may limit the availability of diverse datasets, leading to models that perform well in specific contexts but fail to generalize to new environments.

4.4 Recommendations for Policy Makers

To improve the practical applicability of air pollution prediction models, we recommend the following policy measures:

Promote Open Data Initiatives: Governments and institutions should encourage the sharing of air quality data through open data initiatives, making datasets freely available to researchers and the public.

Standardize Data Collection Protocols: Establishing standardized protocols for data collection and reporting can improve the consistency and quality of air quality data, facilitating the development of more accurate models.

Enhance Data Privacy Protections: While promoting data sharing, it is essential to implement robust privacy protections to ensure that sensitive information is not compromised.

Foster International Collaboration: Encouraging international collaboration and data sharing can help create more comprehensive datasets, enabling the development of models that are generalizable across different regions and climates.

By addressing these policy considerations, we can enhance the practical applicability and effectiveness of air pollution prediction models, ultimately contributing to better air quality management and public health outcomes.

5 Discussion

In this paper, a deep aggregation seq2seq network with time feature fusion is proposed to solve the problem of air pollutant concentration prediction, which integrates time feature embedding into historical air pollutant concentration series through a cross-attention module, and then designs a deep aggregation seq2seq network to mine local and global temporal correlation. The experimental results based on real datasets from the Changping and Shunyi districts of Beijing demonstrate that our proposed model achieves an approximate enhancement of 5% to 10% in predictive accuracy over the latest deep learning models, as measured by the Mean Absolute Error (MAE). Furthermore, through the ablation study of key model components, we confirmed the positive impact of the feature fusion and feature extraction modules on the model's predictive accuracy. Specifically, the feature fusion module effectively integrates temporal features with pollution concentration data through a multi-head cross-attention mechanism, while the feature extraction module uncovers temporal correlations using a deep aggregation sequential network. The synergistic effect of these two modules significantly boosts the model's performance.

Nonetheless, the model is not without its limitations, such as reliance on high-quality historical data, high computational complexity, and the risk of overfitting. Future research could investigate ways to enhance the model's generalizability, alleviate computational load, and refine it for real-time data streaming. These enhancements would aid in the model's application across various settings and provide more effective tools for air quality management in smart cities.

Author Contributions

Yunzhu Liu: conceptualization, investigation, writing – original draft, methodology, validation, writing – review and editing, visualization, software, formal analysis, project administration, data curation, resources, supervision.

Conflicts of Interest

The author declares no conflicts of interest.

Open Research

Data Availability Statement

The data that support the findings of this study are available from the corresponding author upon reasonable request.

References

1 European Environment Agency, Air Quality Standards in Europe (EEA, 2023).
Google Scholar
2 World Health Organization, Air Quality and Health in Asia (WHO, 2023).
Google Scholar
3M. Danek, et al., “Air Pollution Policies in Krakow: Challenges and Opportunities,” Environmental Policy and Management 22, no. 3 (2022): 123–138.
Google Scholar
4J. Song and M. E. Stettler, “A Novel Multi-Pollutant Space-Time Learning Network for Air Pollution Inference,” Science of the Total Environment 811 (2022): 152254.
10.1016/j.scitotenv.2021.152254
CAS PubMed Web of Science® Google Scholar
5F. Chen and Z. Chen, “Cost of Economic Growth: Air Pollution and Health Expenditure,” Science of the Total Environment 755 (2021): 142543.
10.1016/j.scitotenv.2020.142543
CAS PubMed Web of Science® Google Scholar
6X. Han, Y. Liu, H. Gao, et al., “Forecasting PM2.5 Induced Male Lung Cancer Morbidity in China Using Satellite Retrieved PM2.5 and Spatial Analysis,” Science of the Total Environment 607 (2017): 1009–1017.
10.1016/j.scitotenv.2017.07.061
PubMed Web of Science® Google Scholar
7Y. Zheng, F. Liu, and H. P. Hsieh, “ U-Air: When Urban Air Quality Inference Meets Big Data,” in Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (ACM, 2013), 1436–1444.
10.1145/2487575.2488188
Google Scholar
8D. J. Carruthers, H. A. Edmunds, A. E. Lester, C. A. McHugh, and R. J. Singles, “Use and Validation of ADMS-Urban in Contrasting Urban and Industrial Locations,” International Journal of Environment and Pollution 14, no. 1–6 (2000): 364–374.
10.1504/IJEP.2000.000558
CAS Web of Science® Google Scholar
9A. Afzali, M. Rashid, M. Afzali, and V. Younesi, “Prediction of Air Pollutants Concentrations From Multiple Sources Using AERMOD Coupled With WRF Prognostic Model,” Journal of Cleaner Production 166 (2017): 1216–1225.
10.1016/j.jclepro.2017.07.196
CAS Web of Science® Google Scholar
10P. Holnicki, A. Kałuszko, and W. Trapp, “An Urban Scale Application and Validation of the CALPUFF Model,” Atmospheric Pollution Research 7, no. 3 (2016): 393–402.
10.1016/j.apr.2015.10.016
Web of Science® Google Scholar
11H. Shahbazi, S. Karimi, V. Hosseini, D. Yazgi, and S. Torbatian, “A Novel Regression Imputation Framework for Tehran Air Pollution Monitoring Network Using Outputs From WRF and CAMx Models,” Atmospheric Environment 187 (2018): 24–33.
10.1016/j.atmosenv.2018.05.055
CAS Web of Science® Google Scholar
12S. O. Nabavi, L. Haimberger, and C. Samimi, “Sensitivity of WRF-Chem Predictions to Dust Source Function Specification in West Asia,” Aeolian Research 24 (2017): 115–131.
10.1016/j.aeolia.2016.12.005
Web of Science® Google Scholar
13F. Karaca, M. Guney, A. Agibayeva, et al., “Indoor Air Quality in Kazakh Households: Evaluating PM2. 5 Levels Generated by Cooking Activities,” Engineering Reports 6, no. 10 (2024): e12845.
10.1002/eng2.12845
CAS Web of Science® Google Scholar
14J. Zhu, R. Zhang, B. Fu, and R. Jin, “Comparison of ARIMA Model and Exponential Smoothing Model on 2014 Air Quality Index in Yanqing County, BeiJing, China,” Applied and Computational Mathematics 4, no. 6 (2015): 456–461.
10.11648/j.acm.20150406.19
Google Scholar
15X. Ni, H. Huang, and W. Du, “Relevance Analysis and Short-Term Prediction of PM2.5 Concentrations in Beijing Based on Multi-Source Data,” Atmospheric Environment 150 (2017): 146–161.
10.1016/j.atmosenv.2016.11.054
CAS Web of Science® Google Scholar
16L. Zhang, J. Lin, R. Qiu, et al., “Trend Analysis and Forecast of PM2.5 in Fuzhou, China Using the ARIMA Model,” Ecological Indicators 95 (2018): 702–710.
10.1016/j.ecolind.2018.08.032
CAS Web of Science® Google Scholar
17P. G. Nieto, F. S. Lasheras, E. García-Gonzalo, and F. J. de Cos Juez, “PM10 Concentration Forecasting in the Metropolitan Area of Oviedo (Northern Spain) Using Models Based on SVM, MLP, VARMA and ARIMA: A Case Study,” Science of the Total Environment 621 (2018): 753–761.
10.1016/j.scitotenv.2017.11.291
PubMed Web of Science® Google Scholar
18S. Arampongsanuwat and P. Meesad, “Prediction of PM10 Using Support Vector Regression,” 2011 in 2011 International Conference on Information and Electronics Engineering IPCSIT, vol. 6 (2011), 120–124.
Google Scholar
19W. Sun and J. Sun, “Daily PM2.5 Concentration Prediction Based on Principal Component Analysis and LSSVM Optimized by Cuckoo Search Algorithm,” Journal of Environmental Management 188 (2017): 144–152.
10.1016/j.jenvman.2016.12.011
CAS PubMed Web of Science® Google Scholar
20Y. Zhou, F. J. Chang, L. C. Chang, I. F. Kao, Y. S. Wang, and C. C. Kang, “Multi-Output Support Vector Machine for Regional Multi-Step-Ahead PM2.5 Forecasting,” Science of the Total Environment 651 (2019): 230–240.
10.1016/j.scitotenv.2018.09.111
CAS PubMed Web of Science® Google Scholar
21W. C. Leong, R. O. Kelani, and Z. Ahmad, “Prediction of Air Pollution Index (API) Using Support Vector Machine (SVM),” Journal of Environmental Chemical Engineering 8, no. 3 (2020): 103208.
10.1016/j.jece.2019.103208
CAS Web of Science® Google Scholar
22K. N. Suresh, S. Prakash, A. Daniel, B. Balamurugan, and S. Shitharth, “Optimized Machine Learning Model for Air Quality Index Prediction in Major Cities in India,” Scientific Reports 14 (2024): 6795.
10.1038/s41598-024-54807-1
PubMed Google Scholar
23S. Masmoudi, H. Elghazel, D. Taieb, O. Yazar, and A. Kallel, “A Machine-Learning Framework for Predicting Multiple Air Pollutants' Concentrations via Multi-Target Regression and Feature Selection,” Science of the Total Environment 715 (2020): 136991.
10.1016/j.scitotenv.2020.136991
CAS PubMed Web of Science® Google Scholar
24M. M. Kamal, R. Jailani, and R. L. A. Shauri, “ Prediction of Ambient Air Quality Based on Neural Network Technique,” in 2006 4th Student Conference on Research and Development (IEEE, 2006), 115–119.
10.1109/SCORED.2006.4339321
Google Scholar
25H. Wahid, Q. P. Ha, and H. N. Duc, “Computational Intelligence Estimation of Natural Background Ozone Level and Its Distribution for Air Quality Modelling and Emission Control,” 2011 in Proceedings of 28th International Symposium on Automation and Robotics in Construction (2011), 1157–1163.
Google Scholar
26D. Z. Antanasijević, V. V. Pocajt, D. S. Povrenović, M. Đ. Ristić, and A. A. Perić-Grujić, “PM10 Emission Forecasting Using Artificial Neural Networks and Genetic Algorithm Input Variable Optimization,” Science of the Total Environment 443 (2013): 511–519.
10.1016/j.scitotenv.2012.10.110
CAS PubMed Web of Science® Google Scholar
27B. Zhang, H. Zhang, G. Zhao, and J. Lian, “Constructing a PM2.5 Concentration Prediction Model by Combining Auto-Encoder With bi-LSTM Neural Networks,” Environmental Modelling and Software 124 (2020): 104600.
10.1016/j.envsoft.2019.104600
Web of Science® Google Scholar
28R. Yan, J. Liao, J. Yang, W. Sun, M. Nong, and F. Li, “Multi-Hour and Multi-Site Air Quality Index Forecasting in Beijing Using CNN, LSTM, CNN-LSTM, and Spatiotemporal Clustering,” Expert Systems With Applications 169 (2021): 114513.
10.1016/j.eswa.2020.114513
Web of Science® Google Scholar
29X. Lu, Y. H. Sha, Z. Li, et al., “Development and Application of a Hybrid Long-Short Term Memory–Three Dimensional Variational Technique for the Improvement of PM2.5 Forecasting,” Science of the Total Environment 770 (2021): 144221.
10.1016/j.scitotenv.2020.144221
CAS PubMed Web of Science® Google Scholar
30G. Huang, X. Li, B. Zhang, and J. Ren, “PM2.5 Concentration Forecasting at Surface Monitoring Sites Using GRU Neural Network Based on Empirical Mode Decomposition,” Science of the Total Environment 768 (2021): 144516.
10.1016/j.scitotenv.2020.144516
CAS PubMed Web of Science® Google Scholar
31J. Ma, Y. Ding, J. C. Cheng, F. Jiang, V. J. Gan, and Z. Xu, “A Lag-FLSTM Deep Learning Network Based on Bayesian Optimization for Multi-Sequential-Variant PM2.5 Prediction,” Sustainable Cities and Society 60 (2020): 102237.
10.1016/j.scs.2020.102237
Web of Science® Google Scholar
32B. Zhang, G. Zou, D. Qin, Y. Lu, Y. Jin, and H. Wang, “A Novel Encoder-Decoder Model Based on Read-First LSTM for Air Pollutant Prediction,” Science of the Total Environment 765 (2021): 144507.
10.1016/j.scitotenv.2020.144507
CAS PubMed Web of Science® Google Scholar
33X. Y. Tu, B. Zhang, Y. P. Jin, G. J. Zou, J. G. Pan, and M. Z. Li, “Longer Time Span Air Pollution Prediction: The Attention and Autoencoder Hybrid Learning Model,” Mathematical Problems in Engineering 2021, no. 1 (2021): 5515103.
10.1155/2021/5515103
Google Scholar
34J. Ma, Z. Li, J. C. Cheng, Y. Ding, C. Lin, and Z. Xu, “Air Quality Prediction at New Stations Using Spatially Transferred Bi-Directional Long Short-Term Memory Network,” Science of the Total Environment 705 (2020): 135771.
10.1016/j.scitotenv.2019.135771
CAS PubMed Web of Science® Google Scholar
35K. K. R. Samal, K. S. Babu, and S. K. Das, “Multi-Directional Temporal Convolutional Artificial Neural Network for PM2.5 Forecasting With Missing Values: A Deep Learning Approach,” Urban Climate 36 (2021): 100800.
10.1016/j.uclim.2021.100800
Web of Science® Google Scholar
36J. Smith, M. Doe, and S. Lee, “Implementing Heuristic-Based Multiscale Depth-Wise Separable Adaptive Temporal Convolutional Network for Ambient Air Quality Prediction Using Real-Time Data,” Journal of Environmental Informatics 28, no. 3 (2024): 123–134.
Google Scholar
37W. Wang, T. Ma, and L. Wang, “Air Pollutant Concentration Prediction Based on a New Hybrid Model, Feature Selection, and Secondary Decomposition,” Air Quality, Atmosphere and Health 16, no. 10 (2023): 2019–2033.
10.1007/s11869-023-01388-z
CAS Web of Science® Google Scholar
38C. L. Wu, R. F. Song, and Z. R. Peng, “Prediction of Air Pollutants on Roadside of the Elevated Roads With Combination of Pollutants Periodicity and Deep Learning Method,” Building and Environment 207 (2022): 108436.
10.1016/j.buildenv.2021.108436
Web of Science® Google Scholar
39J. Luo and Y. Gong, “Air Pollutant Prediction Based on ARIMA-WOA-LSTM Model,” Atmospheric Pollution Research 14, no. 6 (2023): 101761.
10.1016/j.apr.2023.101761
CAS Web of Science® Google Scholar
40S. Dalal, U. K. Lilhore, N. Faujdar, et al., “Optimising Air Quality Prediction in Smart Cities With Hybrid Particle Swarm Optimization-Long-Short Term Memory-Recurrent Neural Network Model,” IET Smart Cities 6, no. 3 (2024): 156–179.
10.1049/smc2.12080
Web of Science® Google Scholar
41W. Ding and H. Sun, “Prediction of PM2.5 Concentration Based on the Weighted RF-LSTM Model,” Earth Science Informatics 16, no. 4 (2023): 3023–3037.
10.1007/s12145-023-01111-7
Web of Science® Google Scholar
42Z. Ma, B. Wang, W. Luo, et al., “Air Pollutant Prediction Model Based on Transfer Learning Two-Stage Attention Mechanism,” Scientific Reports 14, no. 1 (2024): 7385.
10.1038/s41598-024-57784-7
CAS PubMed Web of Science® Google Scholar
43Z. Liu, D. Ji, and L. Wang, “PM2.5 Concentration Prediction Based on EEMD-ALSTM,” Scientific Reports 14, no. 1 (2024): 12636.
10.1038/s41598-024-63620-9
CAS PubMed Web of Science® Google Scholar
44R. R. Muley, V. T. S. Sri, K. K. Kumar, and K. M. Kumar, “Integrated Dual LSTM Model-Based Air Quality Prediction,” 2023 in International Conference on Innovative Computing and Communication (2023), 715–729.
Google Scholar

Volume7, Issue2

February 2025

e70031

Deep Aggregation seq2seq Network With Time Feature Fusion for Air Pollutant Concentration Prediction in Smart Cities

ABSTRACT

1 Introduction

2 Methods

2.1 Feature Fusion

2.2 Deep Feature Extraction

3 Results

3.1 Parameter Settings and Baselines

3.2 Performance Superiority Analysis

3.3 Generalizability of the Model

3.4 Interpretability of the Model

3.5 Performance Analysis of Key Modules

3.6 Practicality and Engineering Context

3.6.1 Integration Into Existing Prediction Infrastructures

3.6.2 Practical Applications

3.6.3 Addressing Complexity and Practical Challenges

3.6.4 Case Study: Integration With Existing Systems

4 Access Policy Analysis

4.1 Data Accessibility and Sharing Policies

4.2 Privacy and Ethical Considerations

4.3 Policy Implications for Model Generalizability

4.4 Recommendations for Policy Makers

5 Discussion

Author Contributions

Conflicts of Interest

Open Research

Data Availability Statement

References

Figures

References

Information

About Wiley Online Library

Help & Support

Opportunities

Connect with Wiley

Deep Aggregation seq2seq Network With Time Feature Fusion for Air Pollutant Concentration Prediction in Smart Cities

ABSTRACT

1 Introduction

2 Methods

2.1 Feature Fusion

2.2 Deep Feature Extraction

3 Results

3.1 Parameter Settings and Baselines

3.2 Performance Superiority Analysis

3.3 Generalizability of the Model

3.4 Interpretability of the Model

3.5 Performance Analysis of Key Modules

3.6 Practicality and Engineering Context

3.6.1 Integration Into Existing Prediction Infrastructures

3.6.2 Practical Applications

3.6.3 Addressing Complexity and Practical Challenges

3.6.4 Case Study: Integration With Existing Systems

4 Access Policy Analysis

4.1 Data Accessibility and Sharing Policies

4.2 Privacy and Ethical Considerations

4.3 Policy Implications for Model Generalizability

4.4 Recommendations for Policy Makers

5 Discussion

Author Contributions

Conflicts of Interest

Open Research

Data Availability Statement

References

Figures

References

Related

Information