Volume 35, Issue 4 e12311
SPECIAL ISSUE PAPER

Resampling with neighbourhood bias on imbalanced domains

Paula Branco

Corresponding Author

Paula Branco

INESC TEC/DCC—Faculty of Sciences, University of Porto, Porto, Portugal

Correspondence

Paula Branco, INESC TEC/DCC—Faculty of Sciences, University of Porto, Porto, Portugal.

Email: [email protected]

Search for more papers by this author
Luis Torgo

Luis Torgo

INESC TEC/DCC—Faculty of Sciences, University of Porto, Porto, Portugal

Faculty of Computer Science, Dalhousie University, Halifax, Nova Scotia, Canada

Search for more papers by this author
Rita P. Ribeiro

Rita P. Ribeiro

INESC TEC/DCC—Faculty of Sciences, University of Porto, Porto, Portugal

Search for more papers by this author
First published: 11 July 2018
Citations: 4

Abstract

Imbalanced domains are an important problem that arises in predictive tasks causing a loss in the performance on the most relevant cases for the user. This problem has been extensively studied for classification problems, where the target variable is nominal. Recently, it was recognized that imbalanced domains occur in several other contexts and for multiple tasks, such as regression tasks, where the target variable is continuous. This paper focuses on imbalanced domains in both classification and regression tasks. Resampling strategies are among the most successful approaches to address imbalanced domains. In this work, we propose variants of existing resampling strategies that are able to take into account the information regarding the neighbourhood of the examples. Instead of performing sampling uniformly, our proposals bias the strategies to reinforce some regions of the data sets. With an extensive set of experiments, we provide evidence of the advantage of introducing a neighbourhood bias in the resampling strategies for both classification and regression tasks with imbalanced data sets.

The full text of this article hosted at iucr.org is unavailable due to technical difficulties.