Volume 41, Issue 6 e13122
ORIGINAL ARTICLE

ERIM: An ensemble of rare itemset mining and its application in the automotive industry

Devrim Naz Akdas

Devrim Naz Akdas

Graduate School of Natural and Applied Sciences, Dokuz Eylül University, Izmir, Turkey

Search for more papers by this author
Derya Birant

Derya Birant

Department of Computer Engineering, Dokuz Eylül University, Izmir, Turkey

Search for more papers by this author
Pelin Yildirim Taser

Corresponding Author

Pelin Yildirim Taser

Department of Computer Engineering, Izmir Bakırçay University, Izmir, Turkey

Correspondence

Pelin Yildirim Taser, Department of Computer Engineering, Izmir Bakırçay University, Izmir, Turkey.

Email: [email protected]

Search for more papers by this author
First published: 03 August 2022
Citations: 1

Abstract

Discovering previously unknown anomalies that are rare and dramatically differ from the majority of the data is a critical need for the automotive industry. Rare itemset mining (RIM), one of the pattern-based methods, has been used for anomaly detection due to providing successful analysis results. However, several aspects still need to be explored, such as improving the mining process by identifying more targeted, valuable and reliable rare itemsets. Motivated by this fact, this study proposes a novel approach, named ensemble of rare itemset mining (ERIM), which investigates weak rare itemsets (WRIs) using different algorithms and aggregates these rules to obtain strong rare itemsets (SRIs). This study also combines four different RIM algorithms (Apriori Rare, Apriori Inverse, CORI and RP-Growth) as base learners for the first time. The proposed ERIM approach is a general methodology that can be applied to any field, but, in this study, it was used in the automotive industry as a case study. In the experiments, ERIM was applied to a real-world gear manufacturing dataset to discover anomalies in machine downtimes. The experimental results were evaluated in terms of the number of itemsets and the length of itemsets by giving some samples, as well. The results showed that the proposed ERIM approach gives more reliable common knowledge by jointly considering the relation between WRIs discovered by the base learners. The findings indicated that the proposed ERIM technique was successful in detecting anomalies whose support values are below 7.12. Furthermore, it is clear from the experimental results that the ERIM discovered the highest number of SRIs, 1403, each of which is a 3-itemset. Finally, the results showed that our method performed 43.37% better on average than state-of-the-art methods on the same dataset.

CONFLICT OF INTEREST

The authors declare no conflicts of interest.

DATA AVAILABILITY STATEMENT

The data that support the findings of this study are available on request from the corresponding author. The data are not publicly available due to privacy or ethical restrictions.

The full text of this article hosted at iucr.org is unavailable due to technical difficulties.