Volume 37, Issue 5 e12473
SPECIAL ISSUE PAPER

A method for outlier detection based on cluster analysis and visual expert criteria

Juan A. Lara

Corresponding Author

Juan A. Lara

Department of Computer Science, Madrid Open University, UDIMA, Engineering School, Madrid, Spain

Correspondence

Juan A. Lara, Madrid Open University, UDIMA, Engineering School, Carretera A6 km 38,500 – Vía de Servicio, 15-28400, Collado Villalba, Madrid, Spain.

Email: [email protected]

Search for more papers by this author
David Lizcano

David Lizcano

Department of Computer Science, Madrid Open University, UDIMA, Engineering School, Madrid, Spain

Search for more papers by this author
Víctor Rampérez

Víctor Rampérez

ETS de Ingenieros Informáticos, Universidad Politécnica de Madrid, Campus de Montegancedo, Madrid, Spain

Search for more papers by this author
Javier Soriano

Javier Soriano

ETS de Ingenieros Informáticos, Universidad Politécnica de Madrid, Campus de Montegancedo, Madrid, Spain

Search for more papers by this author
First published: 03 November 2019
Citations: 7

Abstract

Outlier detection is an important problem occurring in a wide range of areas. Outliers are the outcome of fraudulent behaviour, mechanical faults, human error, or simply natural deviations. Many data mining applications perform outlier detection, often as a preliminary step in order to filter out outliers and build more representative models. In this paper, we propose an outlier detection method based on a clustering process. The aim behind the proposal outlined in this paper is to overcome the specificity of many existing outlier detection techniques that fail to take into account the inherent dispersion of domain objects. The outlier detection method is based on four criteria designed to represent how human beings (experts in each domain) visually identify outliers within a set of objects after analysing the clusters. This has an advantage over other clustering-based outlier detection techniques that are founded on a purely numerical analysis of clusters. Our proposal has been evaluated, with satisfactory results, on data (particularly time series) from two different domains: stabilometry, a branch of medicine studying balance-related functions in human beings and electroencephalography (EEG), a neurological exploration used to diagnose nervous system disorders. To validate the proposed method, we studied method outlier detection and efficiency in terms of runtime. The results of regression analyses confirm that our proposal is useful for detecting outlier data in different domains, with a false positive rate of less than 2% and a reliability greater than 99%.

The full text of this article hosted at iucr.org is unavailable due to technical difficulties.