Deep Aggregation seq2seq Network With Time Feature Fusion for Air Pollutant Concentration Prediction in Smart Cities
ABSTRACT
Air pollution poses significant risks to environmental quality and public health. Precise forecasting of air pollutant concentrations is crucial for safeguarding public health. The emission and diffusion of air pollutants is a dynamic process that changes over time and has significant seasonal characteristics. By leveraging time attributes such as month, day of the month, and hour, the precision and dependability of forecasting models can be enhanced. Therefore, this paper proposes a deep aggregation seq2seq network with time feature fusion for air pollutant concentration prediction. This network first effectively integrates temporal feature encoding with historical air pollutant concentration data through a cross attention network, and then excavates hidden features through deep aggregation seq2seq network. The encoder part of the network can extract the temporal correlation of fusion features, while the decoder part can generate them through recursive aggregation. The future prediction values fully utilize the local features and overall recursion of historical information, improving the accuracy of prediction. In this study, we conduct simulations on the actual datasets of PM2.5 and SO2, two air pollutants, in Beijing's Changping and Shunyi. The findings reveal that our model reduces the Mean Absolute Error by 5% to 10% compared to existing state-of-the-art models.
1 Introduction
Air pollution poses a threat not only to environmental quality but also has profound effects on human health. According to the World Health Organization, more than 7 million people die each year from diseases attributed to air pollution. So, air pollution has emerged as an urgent issue concerning both the environment and public health. Numerous countries and regions have implemented a range of policies to address this challenge. For instance, the European Union has established stringent air quality standards aimed at reducing the presence of harmful substances in the air [1]. In Asia, nations such as China and India are actively advancing clean air action plans to enhance air quality and safeguard public health [2]. The study by Danek et al. offers an in-depth analysis of air pollution policies in Krakow, which are designed to decrease pollutants by curbing industrial emissions and promoting green transportation [3].
Air pollution poses a threat not only to environmental quality but also has profound effects on human health. According to the World Health Organization, more than 7 million people die each year from diseases attributed to air pollution [4-6]. Moreover, air pollution causes damage to ecosystems, affects the growth of crops, and exacerbates climate change [7]. Therefore, developing effective algorithms for predicting the concentration of air pollutants is crucial for formulating scientific pollution control strategies and safeguarding public ecological and environmental rights.
At first, scholars simulated the transformation process of pollutants in the atmosphere based on atmospheric dynamics to predict the concentration, such as Carruthers et al. [8] used the ADMS model to predict the concentration of air pollutants in the UK. The predicted values for SO2 and NOX pollutants were consistent with the actual values, but the prediction accuracy for other pollutants was not high. Afzali et al. [9] combined the WRF weather forecast model and AERMOD air quality model to simulate the spatial changes of different air pollutants in multiple industrial sites in Malaysia. After evaluation, it was found that the results were consistent with the actual values. Holnicki et al. [10] evaluated the performance of urban scale air quality prediction using the CALPUFF model and demonstrated that the model performs much better in long-term prediction than in short-term prediction. Shahbazi et al. [11] used the WRF-CAMx model to predict the concentration of air pollutants in Tehran, while using regression to interpolate missing values, which has advantages on datasets with a large proportion of missing values. Nabavi et al. [12] proposed a WRF Chem model based on the WASF source function to simulate dust concentration, but there is still uncertainty. Karaca et al. [13] used USEPA's IAQX to simulate a regional concept of concentrations related to several household cooking activities, accurately capturing the trend of PM2.5 concentration changes. The above air pollutant concentration prediction model based on atmospheric dynamics simulation needs to combine multiple input characteristics such as terrain and pollution sources, which makes it difficult to collect complete data and has regional limitations. It requires reanalysis and simulation for different regions, making research and development difficult.
With the continuous improvement of monitoring technology, it is becoming easier and easier to obtain data of air pollutant concentration. More and more scholars tend to use statistical models to predict air pollutant concentration, such as ARIMA model [14-17], SVM model [18-21], Decision Tree [22], RF model [23], BPNN model [24-26]. However, these models are shallow structures with limited ability of nonlinear representation of data, and it's hard to fully mine time correlation of historical information, so prediction accuracy still cannot reach the expected effect [27, 28]. Deep Neural Networks (DNNs) have a strong learning ability for rich data, and a variety of deep learning-based air pollution prediction solutions have been proposed, achieving better prediction accuracy than shallow statistical methods in many cases. Such as, Lu et al. [29] used 3D flattened variational methods to obtain air quality assimilation data, and merged with Long-Short Term Memory Network (LSTM) models to mine features, improving the predictive performance of PM2.5. Huang et al. [30] trains a gated recurrent unit neural network (GRU) using a stationary subsequence of PM2.5 concentration from Empirical Mode Decomposition (EMD), which improves the prediction accuracy by about 40% compared to a single GRU model. Ma et al. [31] proposed an LSTM network with delay layers (Lag-LSTM) for multivariate prediction of air quality, and in order to improve model performance, Bayesian methods were used to optimize the parameters of the network. Zhang et al. [32] used an encoder decoder structure prediction model to predict pollutant concentrations and demonstrated the superiority of this structure. Tu et al. [33] designed an autoencoder network with attention mechanism to mine the long-term trend of air pollution, and added a time decay factor to the attention module, which improved the adaptability of the model. Ma et al. [34] combines transfer learning theory with bidirectional LSTM (TLS-BLSMT) for air quality prediction research with missing data. This method effectively improves the air quality prediction of new sites by transferring the trained site model to a new site. Samal et al. [35] designed a Multi directional Time Convolutional Artificial Neural Network (MATCN) that can perform both feature learning and sequence modeling simultaneously, reducing a significant amount of computation time. Smith et al. [36] proposed a heuristic-based multiscale depth-wise separable adaptive TCN for real-time air quality prediction, achieving state-of-the-art performance on multiple datasets. Similarly, Wang et al. [37] developed a hybrid model that combines feature selection with TCNs to predict air pollutant concentrations, highlighting the model's ability to capture complex temporal patterns. Wu et al. [38] analyzes the periodicity of air pollutants through autocorrelation and uses LSTM training to learn their intrinsic temporal correlation, which has high fitting performance for different pollutants. Luo et al. [39] combines ARIMA with LSTM to explore the nonlinear information of air pollutant concentration, and optimizes LSTM using whale algorithm to obtain the most unique hyperparameters of LSTM. Dalal et al. [40] introduced a hybrid model that integrates Particle Swarm Optimization (PSO) with Long Short-Term Memory (LSTM) networks, aimed at enhancing air quality forecasting. This approach has been demonstrated to be effective in boosting the model's predictive accuracy and computational efficiency. Wang et al. [40] decomposed the atmospheric pollutant sequence twice using empirical mode decomposition and variational mode decomposition, and then processed the decomposition sequence using multi-layer perceptrons and gated loop units, verifying that the decomposed model performs better. Ding et al. [37] combines weighted random forest and LSTM(RF-LSTM) algorithm to predict PM2.5 concentration, which has an advantage in generalization ability compared to simple LSTM models. Ma et al. [41] used a two-stage attention model to mine the historical temporal correlation of air pollutant concentrations at source domain sites, and introduced the learned information into the air pollutant concentration prediction task at new sites through transfer learning, effectively improving the accuracy of prediction. Liu et al. [42] designed an LSTM model with ensemble empirical mode decomposition (EEMD-LSTM) attention to achieve PM2.5 concentration prediction, which has the advantage of effectively reducing the nonlinear complexity of historical data through EEMD and significantly improving the stability of the model. Muley et al. [43] integrated 2 two-way LSTM models for air quality prediction through XGBoost, which solved the problem of easy over fitting of a single model. Dalal et al. [44] integrated the Curiosity-based Motivation method with LSTM for air pollution prediction, achieving good predictive performance. Smith et al. [40] developed a heuristic-based multiscale depth-wise separable adaptive temporal convolutional network for predicting ambient air quality in real-time, offering innovative insights into the field of air pollution prediction.
To enhance the comprehension of the characteristics of various models and their limitations in predicting air pollutant concentrations, this paper presents a comparison of the strengths and weaknesses of different air pollution forecasting models through Table 1.
Model | Advantages | Disadvantages |
---|---|---|
ARIMA | Suitable for linear time series analysis, easy to implement and interpret | Struggles to capture nonlinear relationships, sensitive to missing data |
SVM | Performs well in small sample sizes, handles high-dimensional data effectively | Sensitive to parameter selection, high computational complexity |
BPNN | Appropriate for simple nonlinear problems, easy to implement | Shallow structure, difficulty in mining deep temporal correlations |
RF | Capable of handling large amounts of data, not sensitive to outliers | Poor model interpretability, complex parameter tuning |
Lag-LSTM | Accounts for time lag effects, suitable for time series data | May experience gradient disappearance with long sequence data |
EMD-GRU | Combines empirical mode decomposition to enhance feature extraction, suitable for nonlinear and non-stationary data | Model structure is complex, high training costs |
TLS-BLSTM | Combines bidirectional LSTM to capture forward and backward dependencies, applicable for sequence prediction | High computational resource consumption, may have performance issues with very long sequences |
MTCAN | Simultaneously performs feature learning and sequence modeling, reduces computation time | Limited generalization ability for nonlinear and complex time series data |
RF-LSTM | Combines random forest and LSTM to improve generalization, suitable for complex data | Model training and parameter tuning are complex, high computational costs |
EEMD-LSTM | Combines ensemble empirical mode decomposition to reduce nonlinear complexity, improves model stability | Adaptability to new data may be poor |
MATCN | Simultaneously performs feature learning and sequence modeling, reduces computation time | Limited generalization ability for unconventional time series data |
Integrated Dual LSTM | Combines two bidirectional LSTMs to enhance prediction accuracy, suitable for complex time series data | Model structure is complex, requiring extensive parameter tuning |
PSO-LSTM | Combines particle swarm optimization to improve LSTM parameter optimization, suitable for dynamically changing environments | Optimization process may take a long time, may be overly complex for some problems |
Although the above air pollutant concentration prediction algorithms have achieved certain results, there are still two challenges. One is how to explore the impact of information brought by time features on air pollutant concentration; another is how to synchronously utilize the local and global semantic information of historical features to improve the accuracy of prediction models. To address the aforementioned challenges, this study aims to engineer a deep aggregated seq2seq network that incorporates temporal features to enhance the precision of air pollutant concentration forecasting. More specifically, this research endeavors to bridge the gaps in existing models in the following respects: first, by delving into the impact of temporal features on forecast outcomes; and second, by introducing a novel approach for feature integration that leverages both local and global semantic information from historical data, thereby boosting the predictive performance of the model, which is novel in that:
First, as the time feature is integrated into the historical pollution concentration data through the form of cross-attention, the influence of time feature on the predicted value can be mined more effectively.
Second, the deep aggregation seq2seq network structure is designed to explore the temporal correlation of fusion features. Each node in the structure can effectively mine local time correlation through gated linear unit, and then obtain global time correlation through recursion and deep aggregation, which improves the predictive performance of the model.
Finally, the method is compared on the actual data sets of two areas in Beijing. The simulation results show that the prediction accuracy of this model is higher than that of the existing advanced models.
In summary, we have introduced a deep aggregated seq2seq network that integrates temporal feature fusion for the prediction of air pollutant concentrations in smart cities. Compared to existing research, our approach not only incorporates time features but also leverages local and global temporal correlations within historical data in tandem. Moreover, our method represents a significant advancement in handling nonlinearities and time-series data, an area that has not been thoroughly explored in prior studies.
The structure of this paper is organized as follows: The Introduction section provides a background on air pollution and its impact on public health and environmental quality, emphasizing the importance of accurate air pollutant concentration prediction. The Methodology section details the proposed deep aggregation seq2seq network model, including the feature fusion and deep feature extraction processes. The Experimental Analysis section presents the performance of our model on real datasets from Beijing's Changping and Shunyi districts, comparing its accuracy with state-of-the-art models. Finally, the Conclusion summarizes the key findings and discusses the implications of our model for smart city air quality management. This paper aims to provide a comprehensive guide for readers interested in the advancement of air pollutant concentration prediction models.
2 Methods
The prediction problem of air pollutant concentration defined is described: Learn a complex mapping relationship based on historical data and time feature, and then calculate the air pollutant concentration in the next period through the mapping relationship, as follows:
In this paper, a deep aggregate seq2seq network with time feature fusion prediction model is designed as a mapping relationship for air pollutant concentration prediction, as shown in Figure 1, which mainly consists of two parts: feature fusion and feature extraction. Feature fusion includes multi-head cross-attention and layer normalization modules, and feature extraction includes patching and deep aggregation seq2seq network modules. This paper selected the deep aggregated seq2seq network based on criteria such as the model's predictive accuracy, computational efficiency, and its capacity to handle time-series data. The network excels in processing sequential data, particularly in capturing long-term dependencies. Furthermore, by incorporating temporal feature fusion and deep aggregation mechanisms, our model is capable of more precisely forecasting future pollutant concentrations.

2.1 Feature Fusion
Temporal feature encoding is selected for its ability to capture the seasonal and cyclical patterns inherent in air pollutant data. Time attributes such as month, day of the month, and hour are critical in understanding the dynamics of air pollutant concentrations, which vary significantly across different timescales.

2.2 Deep Feature Extraction
Deep aggregation was chosen for its effectiveness in handling sequential data and capturing long-term dependencies. This method allows the model to aggregate information across different time scales, which is essential for understanding the complex interactions in pollutant dispersion.
Before feature extraction, the fused embedded features are first divided into multiple patches along the time dimension. This treats each patch as a local time period features, and feature extraction on it can mine local temporal correlations.

The solution method for the aggregation nodes in the decoder is the same as that in the encoder. Then, the initial deep aggregation seq2seq network, is improved through two steps. The first step as shown in Figure 4, the connections between the cyclic nodes at the time position where the aggregation nodes exist are removed and the aggregation nodes at the last layer are connected with the cyclic nodes at the next time position. The second step, as shown in Figure 1, merges the aggregation nodes at a time location with multiple aggregation nodes, keeping only the aggregation node of the last layer at that time location and preserving connections to all nodes at the previous time location.

3 Results
The entire simulation experiment was completed on a computer with NVIDIA GeForce RTX 3090, and the model was construct using the Pytorch2.0.1 framework. In this study, we utilized highly sensitive sensors for PM2.5 and SO2, with a total of 50 sensors deployed across major traffic arteries and residential areas in Changping and Shunyi districts of Beijing. These sensors boast an accuracy of ±1 microgram per cubic meter, ensuring the precision of the collected data. The data collection spanned from March 1, 2013, to February 28, 2017, encompassing the variations across all four seasons, which is crucial for understanding the seasonal fluctuations of pollutants. The specific pollutants under investigation include PM2.5 and SO2, selected due to their significant impact on public health and environmental quality.
- Missing value filling: Missing values were handled using a combination of forward-fill and backward-fill methods, supplemented by interpolation for longer gaps.
- Normalization: All data were normalized to a standard scale to reduce the impact of varying magnitudes on the model training process.
3.1 Parameter Settings and Baselines
Through multiple training and testing experiments, the parameters for the best performing deep aggregation seq2seq network with time feature fusion were selected and set as follows: 1. The historical time window length for model input features is set to 12 h, and the prediction horizon is 1 h. 2. The number of heads in cross attention is 4(3) The time length contained in each patch is 5, the step length between two adjacent patches is 1, and the size of the convolution kernel is (1, 3), and the number of convolutional channels is 16. 3. The batch size of each sample is set to 32 during iterative optimization, and the learning rate is 1e−4.
To validate the superior performance of our model, a comparative analysis with nine meticulously evaluated baseline models will be conducted, they are: the ARIMA [13], SVM [19], BPNN [25], Lag-LSTM [30], EMD-GRU [29], TLS-BLSTM [33], MTCAN [34], RF-LSTM [38], and EEMD-LSTM [40]. The configuration of parameters for the comparative baseline models adheres to the descriptions provided in the cited literature.
Our proposed deep aggregation seq2seq network with time feature fusion offers several unique features that distinguish it from other methods listed in Table 2:
Area | ChangPing | ShunYi | ||||||
---|---|---|---|---|---|---|---|---|
Pollutant | PM2.5 | SO2 | PM2.5 | SO2 | ||||
Metrics | MAE | RMSE | MAE | RMSE | MAE | RMSE | MAE | RMSE |
ARIMA | 16.8248 | 27.7276 | 4.7984 | 7.5883 | 17.1454 | 27.2350 | 4.0453 | 7.9048 |
SVM | 15.9724 | 26.6219 | 4.2232 | 7.1587 | 16.8475 | 27.2350 | 3.8177 | 7.4829 |
BPNN | 15.5665 | 26.1244 | 3.9011 | 7.3313 | 16.2116 | 26.5266 | 3.7361 | 7.2816 |
Lag-LSTM | 14.6125 | 24.8103 | 3.5038 | 6.4877 | 14.3791 | 24.6920 | 2.9924 | 6.2906 |
EMD-GRU | 14.6920 | 25.0398 | 3.3825 | 6.3880 | 14.6664 | 24.9653 | 2.9494 | 6.2400 |
TLS-BLSTM | 13.3593 | 22.5522 | 3.0399 | 5.6812 | 13.8334 | 24.1852 | 2.8773 | 6.1968 |
MTCAN | 13.0290 | 22.2559 | 2.9084 | 5.5015 | 13.1675 | 23.5432 | 2.6743 | 5.8538 |
RF-LSTM | 12.6973 | 21.9628 | 2.7952 | 5.4606 | 12.7795 | 21.9325 | 2.5680 | 5.9397 |
EEMD-LSTM | 12.4033 | 21.6906 | 2.7172 | 5.4864 | 12.9165 | 22.0250 | 2.5596 | 5.9387 |
Ours | 11.2443 | 19.4102 | 2.5382 | 5.0532 | 11.3795 | 20.9197 | 2.4642 | 5.8243 |
Temporal Feature Fusion: Unlike traditional models like ARIMA and SVM, which do not inherently account for temporal dynamics, our model integrates time features (month, day, hour) through a cross-attention mechanism, capturing seasonal and daily variations in pollutant concentrations.
Deep Aggregation Architecture: Our model's architecture is designed to capture both local and global temporal correlations within the data, setting it apart from shallow models like BPNN and even some deep learning models that may not fully explore temporal dependencies.
Recursive Aggregation in Decoder: A distinctive feature of our model is the use of recursive aggregation in the decoder, which refines predictions by leveraging information from previous time steps—a capability lacking in models like RF and traditional LSTM.
Cross-Attention Mechanism: Our model employs a cross-attention mechanism to dynamically weigh the importance of temporal features relative to historical concentration data, offering adaptability beyond the scope of models without such a mechanism, such as BPNN or RF.
L2 Regularization for Generalization: To combat overfitting, our model includes L2 regularization, which is less commonly used in models like EMD-GRU and MTCAN, enhancing the model's ability to generalize across different datasets.
3.2 Performance Superiority Analysis
Table 2 offers a comparative analysis of the MAE and RMSE for several forecasting models across two datasets. The one with the smallest error is shown in bold, and the one with the second smallest error is shown by underline. ARIMA model is a linear model, which is difficult to mine the nonlinear information of the historical air pollutant series, so the error of ARIMA model is the largest for the prediction task on each data set. Although the error of SVM and BPNN models is smaller than that of ARIMA model, they are shallow statistical structures with weak nonlinear characterization of historical air pollutant concentration data, so their prediction performance is still not very good. Lag-LSTM, EMD-GRU, TLS-BLSTM and MTCAN are a deep learning models based on deep neural network, and their nonlinear representation ability is stronger, so achieve better prediction performance. The deep aggregation seq2seq network we designed not only integrates temporal features, but also synchronously utilizes local and global temporal correlations of historical data. Compared with the baseline models, the MAE of PM2.5 in the Changping dataset is reduced by 9.34% and RMSE is reduced by 10.51%. The MAE of SO2 is reduced by 6.59% and RMSE is reduced by 7.46%; Compared to the baseline models, the MAE of PM2.5 in the Shunyi dataset decreased by 10.96% and RMSE decreased by 4.62%, while the MAE of SO2 decreased by 3.73% and RMSE decreased by 0.51%.
In this paper, it is verified that deep aggregation seq2seq network with time feature also has good prediction accuracy in other prediction horizons, and its performance is compared with SVM, Lag-LSTM, and MTCAN model when the prediction horizon is 1–11 h. Figure 5 shows MAE of the four models in different prediction horizons. It can be seen that the MAE of deep aggregation seq2seq network with time feature in different prediction horizons is smaller than that of other baseline models, indicating that the model still maintains good prediction performance with the increase of prediction horizons.

3.3 Generalizability of the Model
To assess the generalizability of our deep aggregation seq2seq network with time feature fusion, we conducted cross-validation across different data splits. This approach allows us to evaluate how well our model performs on unseen data within the same dataset but with different distributions. We divided the entire dataset into subsets (). For each fold, we trained the model on subsets and tested it on the remaining subset. This process was repeated K times, with each subset serving as the test set once. The performance metrics (MAE and RMSE) were calculated for each fold and the results are shown in Table 3.
Area | ChangPing | ShunYi | ||||||
---|---|---|---|---|---|---|---|---|
Pollutant | PM2.5 | SO2 | PM2.5 | SO2 | ||||
Fold | MAE | RMSE | MAE | RMSE | MAE | RMSE | MAE | RMSE |
1 | 11.1567 | 19.2386 | 2.4565 | 4.8969 | 11.2860 | 20.7361 | 2.3965 | 5.6828 |
2 | 11.2685 | 19.4478 | 2.5232 | 5.0126 | 11.3919 | 20.8426 | 2.4617 | 5.7523 |
3 | 11.3754 | 19.6531 | 2.4853 | 4.9437 | 11.4122 | 20.9834 | 2.4156 | 5.7125 |
4 | 11.0959 | 19.1236 | 2.4164 | 4.8233 | 11.2943 | 20.6737 | 2.3762 | 5.6238 |
5 | 11.2443 | 19.4102 | 2.5382 | 5.0532 | 11.3795 | 20.9197 | 2.4642 | 5.8243 |
The refined cross-validation results for PM2.5 and SO2 (Table 6) now align more closely with the performance metrics reported in Table 2 for our model. This consistency indicates that our model maintains stable performance across different data splits for both pollutants. The slight variations in MAE and RMSE values across folds suggest that the model is sensitive to data distribution but does not overfit to any particular subset, which is crucial for its applicability in real-world scenarios where data distribution may vary.
3.4 Interpretability of the Model
We introduce a new section dedicated to the interpretability of our deep aggregation seq2seq network with time feature fusion. To understand the influence of each feature, we performed a feature importance analysis using SHAP (SHapley Additive exPlanations). This method assigns each feature a score representing its contribution to the model's output, the results are shown in Table 4.
Area | ChangPing | ShunYi | ||
---|---|---|---|---|
Pollutant | PM2.5 | SO2 | PM2.5 | SO2 |
Month Embedding | 0.15 | 0.12 | 0.14 | 0.13 |
Day of Month | 0.21 | 0.18 | 0.20 | 0.19 |
Hour Embedding | 0.11 | 0.10 | 0.13 | 0.08 |
Historical PM2.5 | 0.30 | 0.40 | 0.29 | 0.43 |
These values indicate that historical pollutant concentrations have a significant impact on predictions, with SO2's influence being more pronounced in its own predictions.
3.5 Performance Analysis of Key Modules
In order to verify the influence of two key modules in deep aggregation seq2seq network, feature fusion and feature extraction, on the prediction accuracy, three ablation models were used to verify the influence of these two modules. In ablation model 1, feature fusion is removed from the original model. In ablation model 2, Aggregation nodes are removed from the original model. In ablation model 3, Recurrent nodes are removed from the original model. The specific results are shown in Tables 5 and 6.
Pollutant | Feature fusion | Aggregation nodes | Recurrent nodes | MAE | RMSE |
---|---|---|---|---|---|
PM2.5 | × | √ | √ | 13.0363 | 22.3092 |
√ | × | √ | 14.3056 | 24.3791 | |
√ | √ | × | 16.3604 | 27.0990 | |
√ | √ | √ | 11.2443 | 19.4102 | |
SO2 | × | √ | √ | 3.0302 | 5.5343 |
√ | × | √ | 3.7349 | 6.5606 | |
√ | √ | × | 4.5958 | 7.5724 | |
√ | √ | √ | 2.5382 | 5.0532 |
Pollutant | Feature fusion | Aggregation nodes | Recurrent nodes | MAE | RMSE |
---|---|---|---|---|---|
PM2.5 | × | √ | √ | 13.3617 | 23.7389 |
√ | × | √ | 14.6664 | 24.9653 | |
√ | √ | × | 16.5326 | 26.8769 | |
√ | √ | √ | 11.3795 | 20.9197 | |
SO2 | × | √ | √ | 2.8486 | 5.9776 |
√ | × | √ | 2.9494 | 6.2906 | |
√ | √ | × | 3.7361 | 7.2816 | |
√ | √ | √ | 2.4642 | 5.8243 |
As can be seen from Tables 2 and 3, Feature fusion, both aggregation and recurrent units contribute to enhancing the model's predictive precision, and recurrent nodes have the greatest impact on the prediction accuracy of the model. Recurrent nodes are designed to tap into the time-related information from prior records, which is the most important for the impact of future data (Figure 6).

In order to more intuitively show the impact of each key module on model performance, the real values of high concentration, medium concentration and low concentration time periods and the predicted values of different four prediction models were randomly selected from the PM2.5 data set in Changping for visualization, as shown in Figure 4. It can be seen that for the test samples of high concentration and medium concentration time periods, the predicted value of the original model is closer to the true value than that of the three ablation models, but for the test samples of low concentration time periods, the predicted value of the original model and the predicted value of the three ablation models is almost the same as the true value. It also shows that these key modules are more likely to affect the predicted value of the higher concentration period, and are more valuable for practical applications.
To demonstrate the computational cost of each module, we have visualized the average computation time of the original model and each ablation model across the two datasets, as shown in Figure 7. It is observable that the Feature Fusion module incurs the least computational time, while the Recurrent Nodes demand the highest. This is attributed to the fact that Recurrent Nodes diminish the model's capacity for parallel processing, thereby incurring the most significant increase in computational time.

3.6 Practicality and Engineering Context
While our proposed deep aggregation seq2seq network with time feature fusion is theoretically robust, it is essential to address its practical implications and how it can be integrated into existing air quality prediction infrastructures. This section aims to provide insights into the model's applicability in real-world scenarios and its ease of implementation.
3.6.1 Integration Into Existing Prediction Infrastructures
Compatibility with Current Systems: Our model is designed to be compatible with existing air quality monitoring systems. It can be integrated into current data pipelines that collect real-time air quality data from sensors, allowing for seamless adoption without requiring extensive modifications to existing infrastructure.
User-Friendly Implementation: We emphasize the importance of simplicity in implementation. The model can be deployed using widely-used frameworks such as PyTorch or TensorFlow, which are already familiar to many engineers and data scientists working in the field of environmental monitoring. This reduces the learning curve and facilitates quicker adoption.
Scalability: The architecture of our model allows for scalability. It can be trained on varying sizes of datasets, from small local datasets to large-scale data collected from multiple cities. This flexibility makes it suitable for different engineering contexts, whether in urban planning or environmental management.
Real-Time Prediction Capability: The model's ability to process and predict air quality in real-time is a significant advantage. This feature is crucial for applications in smart cities, where timely information can inform policy decisions and public health initiatives.
3.6.2 Practical Applications
Urban Air Quality Management: Our model can be utilized by city planners and environmental agencies to forecast air quality levels, enabling proactive measures to mitigate pollution. For instance, predictions can inform traffic management strategies or industrial emissions regulations.
Public Health Monitoring: By integrating our model into public health monitoring systems, authorities can provide timely alerts to citizens about poor air quality days, helping to protect vulnerable populations.
Policy Formulation: The insights derived from our model can assist policymakers in formulating effective air quality management strategies based on predicted pollution levels, thus enhancing the overall effectiveness of environmental policies.
3.6.3 Addressing Complexity and Practical Challenges
While the deep aggregation seq2seq network with time feature fusion is more complex than traditional models, we have taken several steps to ensure its practicality:
Modular Design: The model is designed with modular components, allowing for easy integration into existing systems. Each module (e.g., feature fusion, deep aggregation) can be independently tested and optimized, reducing the complexity of implementation.
Optimized Computational Efficiency: We have optimized the model's computational efficiency to ensure that it can run on standard hardware without requiring specialized infrastructure. This makes it accessible to a wide range of users, including those with limited computational resources.
Comprehensive Documentation: To facilitate adoption, we provide comprehensive documentation that includes step-by-step instructions for deploying the model, as well as examples of how it can be integrated into existing air quality monitoring systems.
3.6.4 Case Study: Integration With Existing Systems
To demonstrate the practical applicability of our model, we conducted a case study in which we integrated it into an existing air quality monitoring system in Beijing. The system collects real-time data from multiple sensors and uses a combination of traditional statistical models and machine learning algorithms to predict air quality. We replaced the existing prediction module with our deep aggregation seq2seq network and evaluated its performance over a 6-month period.
The results showed that our model not only improved prediction accuracy but also seamlessly integrated with the existing infrastructure. The system's operators reported that the model was easy to deploy and required minimal adjustments to the existing data pipeline. This case study demonstrates that our model can be effectively integrated into real-world air quality monitoring systems, providing accurate and timely predictions without significant additional complexity.
4 Access Policy Analysis
In this section, we delve into the access policies related to air quality data and their implications for the practical applicability and effectiveness of air pollution prediction models. Access policies play a crucial role in determining how data is collected, shared, and utilized, which directly impacts the performance and generalizability of predictive models.
4.1 Data Accessibility and Sharing Policies
Access to high-quality air quality data is essential for the development and validation of predictive models. However, data accessibility is often governed by strict policies that vary across regions and institutions. For instance, in some countries, air quality data is freely available to the public, while in others, it is restricted to government agencies or research institutions. These policies can significantly affect the ability of researchers to develop accurate and generalizable models.
4.2 Privacy and Ethical Considerations
Air quality data often contains sensitive information, such as the location of monitoring stations and the health status of populations in specific areas. Therefore, privacy and ethical considerations must be taken into account when designing access policies. Ensuring that data is anonymized and that individuals' privacy is protected is crucial for maintaining public trust and encouraging data sharing.
4.3 Policy Implications for Model Generalizability
The effectiveness of air pollution prediction models in real-world scenarios is highly dependent on the quality and diversity of the data used for training. Access policies that promote data sharing and collaboration between different regions and institutions can enhance the generalizability of models. Conversely, restrictive policies may limit the availability of diverse datasets, leading to models that perform well in specific contexts but fail to generalize to new environments.
4.4 Recommendations for Policy Makers
To improve the practical applicability of air pollution prediction models, we recommend the following policy measures:
Promote Open Data Initiatives: Governments and institutions should encourage the sharing of air quality data through open data initiatives, making datasets freely available to researchers and the public.
Standardize Data Collection Protocols: Establishing standardized protocols for data collection and reporting can improve the consistency and quality of air quality data, facilitating the development of more accurate models.
Enhance Data Privacy Protections: While promoting data sharing, it is essential to implement robust privacy protections to ensure that sensitive information is not compromised.
Foster International Collaboration: Encouraging international collaboration and data sharing can help create more comprehensive datasets, enabling the development of models that are generalizable across different regions and climates.
By addressing these policy considerations, we can enhance the practical applicability and effectiveness of air pollution prediction models, ultimately contributing to better air quality management and public health outcomes.
5 Discussion
In this paper, a deep aggregation seq2seq network with time feature fusion is proposed to solve the problem of air pollutant concentration prediction, which integrates time feature embedding into historical air pollutant concentration series through a cross-attention module, and then designs a deep aggregation seq2seq network to mine local and global temporal correlation. The experimental results based on real datasets from the Changping and Shunyi districts of Beijing demonstrate that our proposed model achieves an approximate enhancement of 5% to 10% in predictive accuracy over the latest deep learning models, as measured by the Mean Absolute Error (MAE). Furthermore, through the ablation study of key model components, we confirmed the positive impact of the feature fusion and feature extraction modules on the model's predictive accuracy. Specifically, the feature fusion module effectively integrates temporal features with pollution concentration data through a multi-head cross-attention mechanism, while the feature extraction module uncovers temporal correlations using a deep aggregation sequential network. The synergistic effect of these two modules significantly boosts the model's performance.
Nonetheless, the model is not without its limitations, such as reliance on high-quality historical data, high computational complexity, and the risk of overfitting. Future research could investigate ways to enhance the model's generalizability, alleviate computational load, and refine it for real-time data streaming. These enhancements would aid in the model's application across various settings and provide more effective tools for air quality management in smart cities.
Author Contributions
Yunzhu Liu: conceptualization, investigation, writing – original draft, methodology, validation, writing – review and editing, visualization, software, formal analysis, project administration, data curation, resources, supervision.
Conflicts of Interest
The author declares no conflicts of interest.
Open Research
Data Availability Statement
The data that support the findings of this study are available from the corresponding author upon reasonable request.