Funding: The research of Ilsuk Kang was supported by the Bio & Medical Technology Development Program of the National Research Foundation (NRF) funded by the Korean government (MSIT) (No. RS-2024-00440787). Hosik Choi was supported by Korea Environmental Industry & Technology Institute (KEITI) through the Technology Development Project for Safety Management of Household Chemical Products, funded by the Korea Ministry of Environment (MOE) (RS-2023-00215309). Cheolwoo Park's work was supported in part by the National Research Foundation of Korea (NRF) grant funded by the Korean government (NRF-2021R1A2C1092925, RS-2022-NR068758).

Read the full text

About

PDF

Tools

Share a link

Email
Wechat
Bluesky

ABSTRACT

This paper proposes the Deep Symbolic Learning (DSL) model, a deep learning-based framework for robust regression, specifically designed when both the response and predictors are histogram-valued variables. DSL utilizes cumulative distribution functions (CDFs) of covariate histograms within a one-dimensional convolutional neural network (1D-CNN) to transform the conditional density estimation problem into a multi-class classification task, optimized using the joint binary cross-entropy (JBCE) loss function. Extensive simulations and real-world applications, including air quality, traffic volume, and climate data, demonstrate that the DSL model outperforms existing methods across three key evaluation metrics: CDF distance, empirical coverage of the 90% prediction interval, and average quantile loss. This work contributes to the field of symbolic data analysis and conditional density estimation.

Conflicts of Interest

The authors declare no conflicts of interest.

Open Research

Data Availability Statement

The data that support the findings of this study are openly available in UC Irvine Machine Learning Repository at https://archive.ics.uci.edu/dataset/360/air+quality.

References

1R. Koenker and G. Bassett, Jr., “Regression Quantiles,” Econometrica 46 (1978): 33–50.
10.2307/1913643
Web of Science® Google Scholar
2J. W. Taylor, “A Quantile Regression Neural Network Approach to Estimating the Conditional Nsity of Multiperiod Returns,” Journal of Forecasting 19, no. 4 (2000): 299–311.
10.1002/1099-131X(200007)19:4<299::AID-FOR775>3.0.CO;2-V
Web of Science® Google Scholar
3L. Liao, C. Park, and H. Choi, “Penalized Expectile Regression: An Alternative to Penalized Quantile Regression,” Annals of the Institute of Statistical Mathematics 71, no. 2 (2019): 409–438.
10.1007/s10463-018-0645-1
Web of Science® Google Scholar
4D. L. Shrestha and D. P. Solomatine, “Machine Learning Approaches for Estimation of Prediction Interval for the Model Output,” Neural Networks 19, no. 2 (2006): 225–235.
10.1016/j.neunet.2006.01.012
PubMed Web of Science® Google Scholar
5A. Khosravi, S. Nahavandi, D. Creighton, and A. F. Atiya, “Comprehensive Review of Neural Network-Based Prediction Intervals and New Advances,” IEEE Transactions on Neural Networks 22, no. 9 (2011): 1341–1356.
10.1109/TNN.2011.2162110
PubMed Web of Science® Google Scholar
6M. Rosenblatt, “Conditional Probability Density and Regression Estimators,” Multivariate Analysis II 25 (1969): 31.
Google Scholar
7D. M. Bashtannyk and R. J. Hyndman, “Bandwidth Selection for Kernel Conditional Density Estimation,” Computational Statistics & Data Analysis 36, no. 3 (2001): 279–298.
10.1016/S0167-9473(00)00046-3
Web of Science® Google Scholar
8J. G. De Gooijer and D. Zerom, “On Conditional Density Estimation,” Statistica Neerlandica 57, no. 2 (2003): 159–176.
10.1111/1467-9574.00226
Web of Science® Google Scholar
9M. P. Holmes, A. G. Gray, and C. L. Isbell, “Fast Nonparametric Conditional Density Estimation.” arXiv Preprint arXiv:1206.5278 (2012).
Google Scholar
10R. Izbicki and A. B. Lee, “Nonparametric Conditional Density Estimation in a High-Dimensional Regression Setting,” Journal of Computational and Graphical Statistics 25, no. 4 (2016): 1297–1316.
10.1080/10618600.2015.1094393
Google Scholar
11M. D. Escobar and M. West, “Bayesian Density Estimation and Inference Using Mixtures,” Journal of the American Statistical Association 90, no. 430 (1995): 577–588.
10.1080/01621459.1995.10476550
Web of Science® Google Scholar
12M. T. Fahey, C. W. Thane, G. D. Bramwell, and W. A. Coward, “Conditional Gaussian Mixture Modelling for Dietary Pattern Analysis,” Journal of the Royal Statistical Society: Series A (Statistics in Society) 170, no. 1 (2007): 149–166.
10.1111/j.1467-985X.2006.00452.x
Web of Science® Google Scholar
13R. Li, B. J. Reich, and H. D. Bondell, “Deep Distribution Regression,” Computational Statistics & Data Analysis 159 (2021): 107203.
10.1016/j.csda.2021.107203
Web of Science® Google Scholar
14L. Billard and E. Diday, “ Regression Analysis for Interval-Valued Data, in Data Analysis,” in Classification and Related Methods: Proceedings of the Seventh Conference of the International Federation of Classification Societies (Springer-Verlag, 2000), 369–374.
10.1007/978-3-642-59789-3_58
Google Scholar
15L. Billard and E. Diday, “ Symbolic Regression Analysis,” in Classification and Related Methods: Proceedings of the Seventh Conference of the International Federation of Classification Societies (Springer-Verlag, 2002), 281–288.
10.1007/978-3-642-56181-8_31
Google Scholar
16E. d. A. L. Neto and F. d. A. T. Carvalho, “Centre and Range Method for Fitting a Linear Regression Model to Symbolic Interval Data,” Computational Statistics & Data Analysis 52, no. 3 (2008): 1500–1515.
10.1016/j.csda.2007.04.014
Web of Science® Google Scholar
17A. Irpino and R. Verde, “Linear Regression for Numeric Symbolic Variables: A Least Squares Approach Based on Wasserstein Distance,” Advances in Data Analysis and Classification 9, no. 1 (2015): 81–106.
10.1007/s11634-015-0197-7
Web of Science® Google Scholar
18A. Irpino and R. Verde, “ A New Wasserstein Based Distance for the Hierarchical Clustering of Histogram Symbolic Data,” in Data Science and Classification (Springer, 2006), 185–192.
10.1007/3-540-34416-0_20
Web of Science® Google Scholar
19S. Ioffe and C. Szegedy, Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift (2015).
Google Scholar
20D. P. Kingma and J. Ba, Adam: A Method for Stochastic Optimization (2017).
Google Scholar
21K. He, X. Zhang, S. Ren, and J. Sun, Delving Deep Into Rectifiers: Surpassing Human-Level Performance on Imagenet Classification (2015).
Google Scholar
22T. Gneiting and A. E. Raftery, “Strictly Proper Scoring Rules, Prediction, and Estimation,” Journal of the American Statistical Association 102, no. 477 (2007): 359–378.
10.1198/016214506000001437
CAS Web of Science® Google Scholar
23L. Breiman, “Random Forests,” Machine Learning 45 (2001): 5–32.
10.1023/A:1010933404324
Web of Science® Google Scholar
24J. H. Friedman, “Greedy Function Approximation: A Gradient Boosting Machine,” Annals of Statistics 29 (2001): 1189–1232.
10.1214/aos/1013203451
Web of Science® Google Scholar
25J. Lei, M. G'Sell, A. Rinaldo, R. J. Tibshirani, and L. Wasserman, “Distribution-Free Predictive Inference for Regression,” Journal of the American Statistical Association 113, no. 523 (2018): 1094–1111.
10.1080/01621459.2017.1307116
CAS Web of Science® Google Scholar
26S. M. Lundberg and S.-I. Lee, A Unified Approach to Interpreting Model Predictions, vol. 30 (Advances in Neural Information Processing Systems, 2017).
Google Scholar
27A. Zien, N. Krämer, S. Sonnenburg, and G. Rätsch, “ The Feature Importance Ranking Measure,” in Joint European Conference on Machine Learning and Knowledge Discovery in Databases (Springer, 2009), 694–709.
10.1007/978-3-642-04174-7_45
Google Scholar
28Q. Au, J. Herbinger, C. Stachl, B. Bischl, and G. Casalicchio, “Grouped Feature Importance and Combined Features Effect Plot,” Data Mining and Knowledge Discovery 36, no. 4 (2022): 1401–1450.
10.1007/s10618-022-00840-5
Web of Science® Google Scholar
29T. Hong, P. Pinson, S. Fan, H. Zareipour, A. Troccoli, and R. J. Hyndman, “Probabilistic Energy Forecasting: Global Energy Forecasting Competition 2014 and Beyond,” International Journal of Forecasting 32 (2016): 896–913.
10.1016/j.ijforecast.2016.02.001
Web of Science® Google Scholar
30J. Kim and L. Billard, “Dissimilarity Measures for Histogram-Valued Observations,” Communications in Statistics 42, no. 2 (2013): 283–303.
10.1080/03610926.2011.581785
Google Scholar
31S. Dias and P. Brito, “Linear Regression Model With Histogram-Valued Variables,” Statistical Analysis and Data Mining: The ASA Data Science Journal 8, no. 2 (2015): 75–113.
10.1002/sam.11260
Web of Science® Google Scholar
32S. Vito, Air Quality (UCI Machine Learning Repository, 2016), https://doi.org/10.24432/C59K5F.
Google Scholar
33J. Hogue, Metro Interstate Traffic Volume (UCI Machine Learning Repository, 2019), https://doi.org/10.24432/C5X60B.
Google Scholar
34K. Hornik, “Approximation Capabilities of Multilayer Feedforward Networks,” Neural Networks 4, no. 2 (1991): 251–257.
10.1016/0893-6080(91)90009-T
Web of Science® Google Scholar
35J. Schmidt-Hieber, “Nonparametric Regression Using Deep Neural Networks With Relu Activation Function,” Annals of Statistics 48, no. 4 (2020): 1875–1897.
Web of Science® Google Scholar
36A. W. van der Vaart and J. A. Wellner, Weak Convergence and Empirical Processes With Applications to Statistics (Springer, 1996).
10.1007/978-1-4757-2545-2
Web of Science® Google Scholar
37F. Nicolas and G. Arnaud, “On the Rate of Convergence in Wasserstein Distance of the Empirical Measure,” Probability Theory and Related Fields 162 (2015): 707–738.
10.1007/s00440-014-0583-7
Google Scholar

Volume18, Issue4

August 2025

e70033

Deep Symbolic Learning for Histogram-Valued Regression Data

ABSTRACT

Conflicts of Interest

Open Research

Data Availability Statement

References

References

Information

About Wiley Online Library

Help & Support

Opportunities

Connect with Wiley

Deep Symbolic Learning for Histogram-Valued Regression Data

ABSTRACT

Conflicts of Interest

Open Research

Data Availability Statement

References

References

Related

Information