Time-Series Mapping of PM10 Concentration Using Multi-Gaussian Space-Time Kriging: A Case Study in the Seoul Metropolitan Area, Korea
Abstract
This paper presents space-time kriging within a multi-Gaussian framework for time-series mapping of particulate matter less than 10 μm in aerodynamic diameter (PM10) concentration. To account for the spatiotemporal autocorrelation structures of monitoring data and to model the uncertainties attached to the prediction, conventional multi-Gaussian kriging is extended to the space-time domain. Multi-Gaussian space-time kriging presented in this paper is based on decomposition of the PM10 concentrations into deterministic trend and stochastic residual components. The deterministic trend component is modelled and regionalized using the temporal elementary functions. For the residual component which is the main target for space-time kriging, spatiotemporal autocorrelation information is modeled and used for space-time mapping of the residual. The conditional cumulative distribution functions (ccdfs) are constructed by using the trend and residual components and space-time kriging variance. Then, the PM10 concentration estimate and conditional variance are empirically obtained from the ccdfs at all locations in the study area. A case study using the monthly PM10 concentrations from 2007 to 2011 in the Seoul metropolitan area, Korea, illustrates the applicability of the presented method. The presented method generated time-series PM10 concentration mapping results as well as supporting information for interpretations, and led to better prediction performance, compared to conventional spatial kriging.
1. Introduction
Outdoor air pollution has been known as one of the risk factors that affect human health directly and/or indirectly [1–4]. In Korea, it is reported that long-term exposure to ambient air pollution has a reasonable association with tuberculosis, cardiovascular diseases, and preterm delivery [5–7]. Thus, periodic monitoring and management of air pollution are required for exposure assessment for effective health management.
In Korea, several air pollutants including particulate matter less than 10 μm in aerodynamic diameter (PM10), sulfur dioxide (SO2), carbon monoxide (CO), nitrogen dioxide (NO2), ozone (O3), and particulate matter less than 2.5 μm in aerodynamic diameter (PM2.5) have been periodically collected at several monitoring stations. Based on this real-time monitoring of air pollution, air quality levels are provided to the public domain [8]. Due to the few stations, however, it is very difficult to analyze the spatial characteristics and spatiotemporal dynamics of air pollutants over a wide study area during the predefined time interval [9]. To overcome these difficulties, spatial interpolation or prediction is routinely applied to the sparse air pollutants observations to obtain exhaustive concentration values over the study area.
Among various spatial interpolation methods, geostatistical kriging has been widely applied to spatial interpolation tasks, due to its ability to account for spatial autocorrelation structures inherent to sample data and to integrate auxiliary data [10, 11]. When kriging is applied for spatial interpolation, spatial autocorrelation structures are quantified by variogram which denotes the spatial variability between samples as a function of distance [10]. If only sparsely sampled data are available, distinct spatial autocorrelation structures may not be captured from the sample data. As a result, spatial interpolation results would not show better prediction performance, compared to other deterministic interpolation methods such as inverse distance weighting. If data are collected at a limited number of locations but continuously in a time domain such as air pollutants, temperature, and precipitation, temporal autocorrelation information may complement the lack of spatial autocorrelation information and improve the prediction performance for spatial interpolation tasks. Regarding the processing of this kind of space-poor but time-rich data, conventional geostatistical kriging, which was developed for considering spatial autocorrelation information only, can be extended to space-time kriging [12]. Space-time kriging or simulation has been applied to time-series mapping of various environmental variables such as air pollutants, temperature, and precipitation [13–16]. Despite its great potential for time-series mapping, however, uncertainties attached to the interpolation have not been fully accounted for. Most approaches have focused on the generation and interpretation of spatiotemporal mapping results. To the author’s knowledge, very few studies have been conducted using stochastic simulation [14] and local uncertainty assessment based on space-time kriging that does not require heavy computational cost is not fully considered. Recently, Park [17] presented a multi-Gaussian framework for time-series mapping of environmental variables. As the case study in [17] was carried out in the very small area, however, its applicability should be thoroughly investigated.
The main objective of this paper is to present space-time kriging capable of providing uncertainty assessment information and time-series mapping of PM10 concentrations. Within a spatial time-series framework [14, 17], conventional spatial multi-Gaussian kriging is extended to a space-time domain and its potential is illustrated via a case study of monthly PM10 concentration mapping in the Seoul metropolitan area, Korea.
2. Study Area and Data
A case study was conducted in the Seoul metropolitan area of Korea which includes 66 provincial districts in Seoul city, Incheon city, and Gyeonggi province (Figure 1). The metropolitan area covers approximately 11.78% of the entire land area of Korea and accounts for 49.07% of the entire population of Korea, as of 2014 and 2010, respectively [7, 18]. The study area comprised various types of land-covers, including the large urban areas of Seoul city and Incheon city located in the central and western parts of the study area, and the forests and agricultural lands (78.46% of the whole study area) located in the northern and eastern parts of the study area.

A monthly PM10 concentration dataset collected at 94 monitoring stations in the study area from January 2007 to December 2011 (60 months) was downloaded from the AirKorea website [8] and used for the case study. As shown in Figure 1, each district in the Seoul metropolitan city includes one station, but there are very few monitoring stations in other districts in Gyeonggi province, which comprised nearly half (47.74%) of the study area. This location information on the monitoring stations implies that relatively large uncertainties may be attached to the sparsely sampled locations. Within the administrative boundaries, 500 m interval grid points were generated and PM10 concentrations were mapped at these points. It should be noted that the main purpose of this case study is to exemplify the analytical procedures and potential of the geostatistical approach presented in this paper, not to reveal detailed local characteristics of PM10 concentrations in the study area.
3. Method
Figure 2 illustrates the entire procedure for the multi-Gaussian spatial time-series approach presented in this paper.

3.1. Multi-Gaussian Spatial Time-Series Approach
3.2. Trend Component Modeling
The above two regression coefficients are only available at monitoring stations after linear regression. Thus, they should be interpolated at all grid points in the study area for all time intervals in order to obtain the trend component distributions. If reasonable correlations are observed between two coefficients, simple cokriging, which can account for both the autocorrelation structures of the two coefficients and the cross-correlation structure between them, can be applied for spatial interpolation. Otherwise, univariate kriging or another deterministic interpolation method is independently applied to each coefficient. After regionalization or interpolation of the two coefficients, a trend component over the study area at each month was obtained by combining the interpolated coefficients with the spatially averaged time-series set.
3.3. Residual Component Modeling
The residual components, which are regarded as the second-order stationary random variable and subject to the main geostatistical analysis, were modelled via space-time kriging.
3.4. ccdf Modeling
The space-time kriging estimate and variance for the residuals were used for fully characterizing the ccdf in a Gaussian space in (2). More specifically, the residual estimate at any grid point was added to the trend component at the corresponding grid point and then used as a mean value of the ccdf. Since the trend component was assumed to be deterministic, the space-time kriging variance was directly used as the variance value of the ccdf.
Unlike kriging variance that provides only the proximity from the sample data, conditional variance can provide information on the spread of the conditional probability distribution function or the steepness of the ccdf and thus can be used as a quantitative measure of the uncertainty. The larger the conditional variance, the greater the uncertainty attached to the prediction.
In addition to the computation of PM10 concentration estimates and uncertainty measures from the ccdf, a probability of exceeding a certain critical concentration level can be easily computed. Based on this probability and the PM10 concentration estimates, misclassification risks, which are associated with the classification of the study areas into hazardous and safe classes, can be computed and then used for decision supporting information.
3.5. Validation
The prediction performance of multi-Gaussian space-time kriging was quantitatively evaluated by leave-one-out cross validation since kriging is an exact interpolator. After one monitoring station was temporarily eliminated, kriging using the remaining stations was conducted to predict the PM10 concentration at the eliminated monitoring station. This procedure was repeated for all monitoring stations. Then the prediction performance was quantified using the linear correlation coefficient between the true PM10 concentration and the mean absolute error (MAE).
4. Results and Discussion
4.1. Trend Component Modeling Result
After preparing time-series PM10 concentration datasets, normal score transform was first applied using GSLIB [11]. Figure 3 shows a spatially averaged time-series that was computed from normal score transformed PM10 concentrations and used as the elementary temporal profile function. During the 5-year period from 2007 to 2011, a decreasing pattern was observed from April to August, but the increase in PM10 concentration commenced in fall and continued to winter. However, the winter PM10 concentration exhibited a different pattern each year. This overall pattern may be related to yellow dust in spring and meteorological factors such as wind, relative humidity, and precipitation. In winter and spring, the relatively stable atmospheric condition with high relative humidity and yellow dust contributes to the increase in PM10 concentration, respectively; meanwhile, the low PM10 concentration in summer is due to the washout effect by precipitation, as reported in previous studies [24, 25].

Regression between the spatially averaged time-series set and the time-series set at each monitoring station was conducted and two regression coefficients are presented in Figure 4. If the intercept and slope values approach zero and one, respectively, the time-series at the monitoring station is very similar to the spatially averaged time-series set. The different similarities at the monitoring stations led to differences of trend components, and hence the residual components, which are the main targets of space-time kriging, also varied across the study area.


The next step for the regionalization or estimation of trend components at unmonitored locations was to interpolate the intercept and slope values in Figure 4. The linear correlation coefficient between the two coefficients at 94 monitoring stations was very low (−0.08), so an independent univariate ordinary kriging was applied to the two coefficients. By combining the interpolated regression coefficients with the spatially averaged time-series set in Figure 3, the trend components during the considered time period were retrieved and used for ccdf modeling.
4.2. Residual Component Modeling Result
After computing trend components at each monitoring station, the residual components that could not be explained by the trend components were computed at each monitoring station. The modified Fortran routines of De Cesare et al. [21] were used to compute the experimental spatiotemporal variogram. The marginal spatial and temporal experimental variograms of the residuals with the fitted models are given in Figures 5(a) and 5(b), respectively. The marginal spatial variogram (Figure 5(a)) showed large relative nugget effects, but a reasonable temporal autocorrelation structure with an effective range of about 7 months was observed in the marginal temporal variogram (Figure 5(b)). This result implies that to account for temporal autocorrelation information during the interpolation could improve prediction performance, compared to the interpolation case with only spatial autocorrelation information. Figure 5(c) presents the experimental spatiotemporal variogram surface of the residuals. From this figure, the spatiotemporal variogram model, which satisfies the constraints in (8), was finally estimated and then used as an input variogram model for space-time kriging. Space-time kriging was applied to obtain the residuals at all grid points in the study area by using the spatiotemporal variogram model of the residuals. The Edinburgh space-time geostatistics Fortran program [26] was used to implement space-time kriging of the residuals.



4.3. PM10 Concentration Mapping and Uncertainty Analysis Results
The simple space-time kriging estimate of the residuals was added to the interpolated trend components and then used as a mean value for the Gaussian ccdf at all locations. The simple space-time kriging variance was also used as the variance of the Gaussian ccdf, as in (2). After constructing ccdfs at all locations, the PM10 concentration estimate and conditional variance were computed using (9) and (10), respectively. All postprocessing was implemented by Fortran programming and ArcGIS was used for visualization.
Only the PM10 concentration mapping results for two months in 2011 are given in Figure 6, due to space limitation. The PM10 concentration in April was much higher than that in August due to less precipitation and yellow dust transported to Korea by prevailing westerly winds in April. In April, relatively high PM10 concentrations were observed in northern Incheon, Dongducheon, Pyeongtaek, and Gwangju due to the large concentrations either at the monitoring stations in those cities or at the nearby monitoring stations. The PM10 concentration in August was relatively high in the northern Incheon, Gimpo, Dongducheon, Hwaseong, Seongnam, Gwangju, and northern Icheon. In both months, the northern Incheon, Dongducheon, and Gwangju showed relatively high concentrations, but low concentrations were observed in Seoul city.


The spatial distribution of conditional variance that measures the uncertainty for prediction is given in Figure 7. A large conditional variance was observed in some concentration areas (e.g., northern Incheon and Pyeongtaek in April and Gimpo in August, resp.) where the PM10 concentration values at monitoring stations fluctuated greatly both temporally and spatially. Some areas with very few or even no monitoring showed relatively large conditional variance which is similar to conventional kriging variance. This uncertainty statistic revealed that the conditional variance, which provides information on both the sample variations and the sample configuration, can be used as supporting information to interpret the PM10 concentration mapping result.


To generate misclassification risk maps, the probability of exceeding a certain threshold value was first mapped. The atmospheric environmental standard in Korea is defined only for an annual average (25 μm/m3) or a 24-hour average (100 μm/m3) [8]. Thus, it is not feasible to directly use the atmospheric environmental stand value as the threshold, since the monthly PM10 concentration was considered in this study. Since the ccdfs were established at all locations in the study area, a variety of probability maps could easily be generated by applying different threshold values. For an illustration purpose, the PM10 concentration of 80 μm/m3 was used as the threshold. By combining the classification result with the exceeding probability using a PM10 concentration of 80 μm/m3 as the critical threshold, the risk α and risk β maps were generated, as shown in Figure 8. By definition, risk α is only mapped where the PM10 concentration exceeds the predefined threshold. On the contrary, risk β is defined where risk α is not mapped. In the risk α map in Figure 8(a), the false positive probability is relatively low (i.e., less than 0.3), but not negligible. The risk β map in Figure 8(b) shows very large variations of the false negative probability, which is greater than 0.7 in the northern part of the study area including Pocheon and Yeoncheon. A large misclassification risk β was also found around the areas that are classified as hazardous (i.e., exceeding the PM10 concentration of 80 μm/m3). Although choosing proper probability thresholds is difficult or subjective, these misclassification risk maps, which cannot be provided by deterministic interpolation methods or kriging algorithms without ccdf modeling, can be useful information for further decision-making or interpretations. For example, the areas showing high misclassification risk values can be considered as candidates for further monitoring or in-depth investigations.


4.4. Validation Results
To quantitatively evaluate the prediction performance of space-time kriging, leave-one-out cross validation was carried out and error statistics such as the linear correlation coefficient with the true values and MAE were computed. Spatial ordinary kriging, which considers only spatial autocorrelation information, was also applied for comparison purpose.
Figure 9 presents the scatter-plots with error statistics computed from leave-one-out cross validation. Although the underestimation of high values and overestimation of low values were observed in both results, this mismatch arising from the smoothing effects of kriging was relatively weakened in the validation result of space-time kriging. The linear correlation coefficients for space-time kriging and spatial ordinary kriging were 0.92 and 0.87, respectively. Space-time kriging also showed an improvement of 13.23% in MAE, compared to that of spatial ordinary kriging. Similar to the previous case study result in Park [17], these quantitative evaluation results confirmed that the incorporation of temporal autocorrelation information via space-time kriging improved the prediction performance and generated reliable mapping results for space-poor and time-rich data such as PM10 concentrations.


5. Conclusions
A geostatistical approach based on spatiotemporal multi-Gaussian kriging was presented for time-series mapping of PM10 concentrations. Unlike conventional space-time kriging and spatial kriging, which provide the estimate and kriging variance only, the presented approach generated rich interpretable by-products as well as the PM10 estimates. From a case study in the Seoul metropolitan area of Korea, multi-Gaussian space-time kriging accounted for temporal autocorrelation information as well as spatial autocorrelation information and generated reliable mapping results that outperformed those of conventional spatial kriging. In addition, the presented approach produced uncertainty measures and misclassification risks from the ccdf modeling that are useful for interpretation or decision-making.
To strengthen the major findings of this study, several outstanding issues should be addressed in future work. First, several auxiliary variables such as the proximity to major roads and weather data will be integrated within the framework of the present study in order to generate much more reliable PM10 concentration mapping results. In relation to uncertainty modeling, the multi-Gaussian approach adopted herein may not be appropriate for datasets with a strong positively skewed distribution which may be often observed in air pollutant concentrations. Thus, the extension of the conventional spatial indicator approach [10, 11] to the space-time domain and the comparison with the multi-Gaussian approach presented herein will also be included in future work.
Conflict of Interests
The author declares that there is no conflict of interests regarding the publication of this paper.
Acknowledgments
This research was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Science, ICT & Future Planning (NRF-2015R1A1A1A05000966). This work was also supported by Inha University Research Grant. The author thanks Dr. L. Spadavecchia for providing the Edinburgh space-time geostatistics program.