Deep Symbolic Learning for Histogram-Valued Regression Data
Funding: The research of Ilsuk Kang was supported by the Bio & Medical Technology Development Program of the National Research Foundation (NRF) funded by the Korean government (MSIT) (No. RS-2024-00440787). Hosik Choi was supported by Korea Environmental Industry & Technology Institute (KEITI) through the Technology Development Project for Safety Management of Household Chemical Products, funded by the Korea Ministry of Environment (MOE) (RS-2023-00215309). Cheolwoo Park's work was supported in part by the National Research Foundation of Korea (NRF) grant funded by the Korean government (NRF-2021R1A2C1092925, RS-2022-NR068758).
ABSTRACT
This paper proposes the Deep Symbolic Learning (DSL) model, a deep learning-based framework for robust regression, specifically designed when both the response and predictors are histogram-valued variables. DSL utilizes cumulative distribution functions (CDFs) of covariate histograms within a one-dimensional convolutional neural network (1D-CNN) to transform the conditional density estimation problem into a multi-class classification task, optimized using the joint binary cross-entropy (JBCE) loss function. Extensive simulations and real-world applications, including air quality, traffic volume, and climate data, demonstrate that the DSL model outperforms existing methods across three key evaluation metrics: CDF distance, empirical coverage of the 90% prediction interval, and average quantile loss. This work contributes to the field of symbolic data analysis and conditional density estimation.
Conflicts of Interest
The authors declare no conflicts of interest.
Open Research
Data Availability Statement
The data that support the findings of this study are openly available in UC Irvine Machine Learning Repository at https://archive.ics.uci.edu/dataset/360/air+quality.