Knowledge Discovery in Databases

Guozhu Dong

Guozhu Dong

Wright State University

Search for more papers by this author
First published: 15 January 2002
Citations: 1

Abstract

Knowledge discovery in databases (KDD), commonly known as data mining, (In the KDD community, it is common to consider data mining as a step of KDD. This article, is concerned with the identification of knowledge from data. The discovered knowledge is represented as patterns defined in a broad sense, and is required to be novel, potentially useful, and ultimately understandable. Many pattern types have been studied, including classifiers, association rules, and clustering. Many new ones will be introduced in the future, to capture the diverse range of human knowledge and to meet the diverse needs of different applications.

It is a complex process to identify novel knowledge patterns from data, requiring different techniques and multiple steps, including data cleaning and preparation, the efficient search of patterns, and the evaluation of the usefulness and novelty of patterns. This process may be iterated many times, where later iterations utilize the insights from earlier iterations. Proper use of background knowledge can greatly improve the quality of the resulting knowledge patterns. A good data mining system architecture is needed to allow coupling with databases and data warehouses, iterative data selection, and interaction between different data mining algorithms and between patterns of different types and so on.

The subsequent sections of this article will give an overview of some of the major concepts and techniques of KDD, including a brief introduction and some pointers to representative literature. There are six sections in addition to the introduction, namely, data types and preprocessing, pattern and pattern search space, popular pattern types and search algorithms, understandability and interestingness, concluding remarks, and references.

The full text of this article hosted at iucr.org is unavailable due to technical difficulties.