Volume 18, Issue 4 e70033
RESEARCH ARTICLE

Deep Symbolic Learning for Histogram-Valued Regression Data

Ilsuk Kang

Ilsuk Kang

Department of Information Statistics Chungbuk National University, Cheongju, South Korea

Search for more papers by this author
Donghwa Kim

Donghwa Kim

The Kim Jaechul Graduate School of AI KAIST, Seoul, South Korea

Search for more papers by this author
Hosik Choi

Hosik Choi

Department of Urban Big Data Convergence University of Seoul, Seoul, South Korea

Search for more papers by this author
Young Joo Yoon

Young Joo Yoon

Department of Mathematics Education Korea National University of Education, Cheongju, South Korea

Search for more papers by this author
Cheolwoo Park

Corresponding Author

Cheolwoo Park

Department of Mathematical Sciences KAIST, Daejeon, South Korea

Correspondence: Cheolwoo Park ([email protected])

Search for more papers by this author
First published: 20 July 2025

Funding: The research of Ilsuk Kang was supported by the Bio & Medical Technology Development Program of the National Research Foundation (NRF) funded by the Korean government (MSIT) (No. RS-2024-00440787). Hosik Choi was supported by Korea Environmental Industry & Technology Institute (KEITI) through the Technology Development Project for Safety Management of Household Chemical Products, funded by the Korea Ministry of Environment (MOE) (RS-2023-00215309). Cheolwoo Park's work was supported in part by the National Research Foundation of Korea (NRF) grant funded by the Korean government (NRF-2021R1A2C1092925, RS-2022-NR068758).

ABSTRACT

This paper proposes the Deep Symbolic Learning (DSL) model, a deep learning-based framework for robust regression, specifically designed when both the response and predictors are histogram-valued variables. DSL utilizes cumulative distribution functions (CDFs) of covariate histograms within a one-dimensional convolutional neural network (1D-CNN) to transform the conditional density estimation problem into a multi-class classification task, optimized using the joint binary cross-entropy (JBCE) loss function. Extensive simulations and real-world applications, including air quality, traffic volume, and climate data, demonstrate that the DSL model outperforms existing methods across three key evaluation metrics: CDF distance, empirical coverage of the 90% prediction interval, and average quantile loss. This work contributes to the field of symbolic data analysis and conditional density estimation.

Conflicts of Interest

The authors declare no conflicts of interest.

Data Availability Statement

The data that support the findings of this study are openly available in UC Irvine Machine Learning Repository at https://archive.ics.uci.edu/dataset/360/air+quality.

The full text of this article hosted at iucr.org is unavailable due to technical difficulties.