Classifier Ensemble Methods

Multiclassifier systems, the focus of this article, provide scientists and data professionals with powerful techniques for tackling complex datasets. The basic idea behind the multiclassifier approach is to average the decisions or hypotheses of a diverse group of classifiers in order to produce a better decision or hypothesis.

As an introduction to our subject, we begin with a detailed examination of the canonic single-classifier system, as this provides the mathematical foundation needed in our presentation of multiclassifier systems. We then describe some important methods for constructing multiclassifier systems at all the levels mentioned above: the classifier level, the combination level, the data level, and the feature level.

We thus end our overview of multiclassifier systems with a section that provides guidance for experimentally constructing general-purpose (GP) multiclassifier systems.

Bibliography

1 J. W. Tukey. Exploratory Data Analysis. Addison-Wesley: Reading, MA, 1977.
Google Scholar
2 B. V. Dasarathy and B. V. Sheela. Composite Classifier System Design: Concepts and Methodology. Proc. IEEE 1979, 67(5), pp 708–713.
10.1109/PROC.1979.11321
Web of Science® Google Scholar
3 L. K. Hansen and P. Salamon. Neural Network Ensembles. IEEE Trans. Pattern Anal. Mach. Intell. 1990, 12, pp 993–1001.
10.1109/34.58871
Web of Science® Google Scholar
4 R. E. Schapire. The Strength of Weak Learnability. Mach. Learn. 1990, 5(2), pp 197–227.
10.1007/BF00116037
Web of Science® Google Scholar
5 Y. Freund and R. E. Schapire. A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting. J. Comput. Syst. Sci. 1997, 55(1), pp 119–139.
10.1006/jcss.1997.1504
Web of Science® Google Scholar
6 N. Oza and K. Tumer. Classiﬁer Ensembles: Select Real-World Applications. Inf. Fusion 2008, 9(1), pp 4–20.
10.1016/j.inffus.2007.07.002
Web of Science® Google Scholar
7 P. A. Gislason, J. A. Benediktsson, and J. R. Sveinsson. Decision Fusion for the Classification of Urban Remote Sensing Images. Pattern Recognit. Lett. 2006, 27, pp 294–300.
10.1016/j.patrec.2005.08.011
Web of Science® Google Scholar
8 G. Giacinto and F. Roli. Ensembles of Neural Networks for Soft Classification of Remote Sensing Images, in European Symposium on Intelligent Techniques; Bari, Italy, 1997; pp 166–170.
Google Scholar
9 M. Fauvel, J. Chanuscot, and J. A. Benediktsson. Decision Fusion for the Classification of Urban Remote Sensing Images. IEEE Trans. Geosci. Remote Sens. 2006, 44, pp 2828–2838.
10.1109/TGRS.2006.876708
Web of Science® Google Scholar
10 A. Ross and A. Jain. Information Fusion in Biometrics. Pattern Recognit. Lett. 2003, 24(13), pp 2115–2125.
10.1016/S0167-8655(03)00079-5
Web of Science® Google Scholar
11 C.-F. Tsai. Combining Cluster Analysis with Classifier Ensembles to Predict Financial Distress. Inf. Fusion 2014, 16, pp 46–58.
10.1016/j.inffus.2011.12.001
Web of Science® Google Scholar
12 X. Y. Pan, Y. Tian, Y. Huang, and H. B. Shen. Towards Better Accuracy for Missing Value Estimation of Epistatic Miniarray Profiling Data by a Novel Ensemble Approach. Genomics 2011, 97(5), pp 257–264.
10.1016/j.ygeno.2011.03.001
CAS PubMed Web of Science® Google Scholar
13 H. B. Shen and K. C. Chou. Ensemble Classifier for Protein Fold Pattern Recognition. Bioinformatics 2006, 22(14), pp 1717–1722.
10.1093/bioinformatics/btl170
CAS PubMed Web of Science® Google Scholar
14 L. Peng et al. An Abnormal ECG Beat Detection Approach for Long-Term Monitoring of Heart Patients Based on Hybrid Kernel Machine Ensemble. In Multiple Classifier Systems, Lecture Notes in Computer Science. Springer: Berlin, 2005; pp 346–355.
Web of Science® Google Scholar
15 L. I. Kuncheva. Combining Pattern Classifiers: Methods and Algorithms, 2nd ed.; John Wiley & Sons, Inc.: New York, 2014.
10.1002/9781118914564
Google Scholar
16 L. Rokach. Pattern Classification Using Ensemble Methods. In Series in Machine Perception Artificial Intelligence, Vol. 75; World Scientific: New Jersey, 2010.
Google Scholar
17 P. M. Narendra and K. Fukunaga. A Branch and Bound Algorithm for Feature Subset Selection. IEEE Trans. Comput. 1977, 26, pp 917–922.
10.1109/TC.1977.1674939
Web of Science® Google Scholar
18 A. Whitney. A Direct Method of Nonparametric Measurement Selection. IEEE Trans. Comput. 1971, 20, pp 1100–1103.
10.1109/T-C.1971.223410
Web of Science® Google Scholar
19 P. Pudil, J. Novovicova, and J. Kittler. Floating Search Methods in Feature Selection. Pattern Recognit. Lett. 1994, 5(11), pp 1119–1125.
10.1016/0167-8655(94)90127-9
Web of Science® Google Scholar
20 P. A. Devijver and J. Kittler. Pattern Recognition: A Statistical Approach. Prentice Hall: Englewood Cliffs, NJ, 1982.
Google Scholar
21 A. Jain and D. Zongker. Feature Selection: Evaluation, Application, and Small Sample Performance. IEEE Trans. Pattern Anal. Mach. Intell. 1997, 19(2), pp 153–158.
10.1109/34.574797
Web of Science® Google Scholar
22 T. K. Ho, J. J. Hull, and S. N. Srihari. Decision Combination in Multiclassifier Systems. IEEE Trans. Pattern Anal. Mach. Intell. 1994, 16, pp 66–75.
10.1109/34.273716
Web of Science® Google Scholar
23 J. D. Tubbs and W. O. Alltop. Measure of Confidence Associated with Combining Classification Rules. IEEE Trans. Syst. Man Cybern. 1991, 21, pp 690–692.
10.1109/21.97462
Web of Science® Google Scholar
24 L. I. Kuncheva. Combining Pattern Classifiers: Methods and Algorithms. John Wiley & Sons, Inc.: New York, 2004.
10.1002/0471660264
Web of Science® Google Scholar
25 D. Ruta and B. Gabrys. An Overview of Classifier Fusion Methods. Comput. Inf. Syst. 2000, 7, pp 1–10.
Google Scholar
26 O. Melnik, Y. Vardi, and C. H. Zhang. Mixed Group Ranks: Preference and Confidence in Classifier Combination. IEEE Trans. Pattern Anal. Mach. Intell. 2004, 26, pp 973–981.
10.1109/TPAMI.2004.48
PubMed Web of Science® Google Scholar
27 G. Rogova. Combining the Results of Several Neural Network Classifiers. Neural Networks 1994, 7, pp 777–781.
10.1016/0893-6080(94)90099-X
Web of Science® Google Scholar
28 L. Zhang and W.-D. Zhou. Sparse Ensembles Using Weighted Combination Methods Based on Linear Programming. Pattern Recognit. 2011, 44(1), pp 97–106.
10.1016/j.patcog.2010.07.021
PubMed Web of Science® Google Scholar
29 C. Schaffer. Selecting a Classification Method by Cross-Validation. Mach. Learn. 1993, 13, pp 135–143.
10.1007/BF00993106
Web of Science® Google Scholar
30 K. P. Woods, W. Kegelmeyer, and K. Bowyer. Combination of Multiple Classifiers Using Local Accuracy Estimates. IEEE Trans. Pattern Anal. Mach. Intell. 1997, 19(4), pp 405–410.
10.1109/34.588027
Web of Science® Google Scholar
31 L. Breiman. Bagging Predictors. Mach. Learn. 1996, 24(2), pp 123–140.
10.1023/A:1018054314350
Web of Science® Google Scholar
32 G. Bologna and R. D. Appel. A Comparison Study on Protein Fold Recognition, in Proc. of the 9th International Conference on Neural Information Processing; Singapore, 2002.
Google Scholar
33 G. Martínez-Muñoz and A. Suárez. Switching Class Labels to Generate Classification Ensembles. Pattern Recognit. 2005, 38(10), pp 1483–1494.
Web of Science® Google Scholar
34 P. Melville and R. J. Mooney. Creating Diversity in Ensembles Using Artificial. J. Inf. Fusion 2005, 6(1), pp 99–111.
10.1016/j.inffus.2004.04.001
Google Scholar
35 L. Nanni and A. Lumini. FuzzyBagging: A Novel Ensemble of Classifiers. Pattern Recognit. 2006, 39(3), pp 488–490.
Web of Science® Google Scholar
36 E. Bauer and R. Kohavi. An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants. Mach. Learn. 1999, 36(1–2), pp 105–139.
10.1023/A:1007515423169
Web of Science® Google Scholar
37 L. Nanni. Cluster-Based Pattern Discrimination: A Novel Technique for Feature Selection. Pattern Recognit. Lett. 2006, 27(6), pp 682–687.
10.1016/j.patrec.2005.10.007
Web of Science® Google Scholar
38 T. K. Ho. The Random Subspace Method for Constructing Decision Forests. IEEE Trans. Pattern Anal. Mach. Intell. 1998, 20(8), pp 832–844.
10.1109/34.709601
Web of Science® Google Scholar
39 K. Tumer and N. C. Oza. Input Decimated Ensembles. Pattern Anal. Appl. 2003, 6, pp 65–77.
10.1007/s10044-002-0181-7
Web of Science® Google Scholar
40 L. Breiman. Random Forest. Mach. Learn. 2001, 45(1), pp 5–32.
10.1023/A:1010933404324
Web of Science® Google Scholar
41 J. J. Rodriguez, L. I. Kuncheva, and C. J. Alonso. Rotation Forest: A New Classifier Ensemble Method. IEEE Trans. Pattern Anal. Mach. Intell. 2006, 28(10), pp 1619–1630.
10.1109/TPAMI.2006.211
PubMed Web of Science® Google Scholar
42 L. Nanni and A. Lumini. On Selecting Gabor Features for Biometric Authentication. Int. J. Comput. Appl. Technol. 2009, 35(1), pp 23–28.
Google Scholar
43 K. Liu and D. Huang. Cancer Classification Using Rotation Forest. Comput. Biol. Med. 2008, 38(5), pp 601–610.
PubMed Web of Science® Google Scholar
44 C.-X. Zhang and J.-S. Zhang. RotBoost: A Technique for Combining Rotation Forest and AdaBoost. Pattern Recognit. Lett. 2008, 29(10), pp 1524–1536.
Web of Science® Google Scholar
45 W. Leigh, R. Purvis, and J. M. Ragusa. Forecasting the NYSE Composite Index with Echnical Analysis, Pattern Recognizer, Neural Networks, and Genetic Algorithm: A Case Study in Romantic Decision Support. Decision Support Syst. 2002, 32(4), pp 361–377.
10.1016/S0167-9236(01)00121-X
Web of Science® Google Scholar
46 D. H. Wolpert. The Supervised Learning No-Free-Lunch Theorems, in Proc. of 6th Online World Conference on Soft Computing in Industrial Applications; 2001; pp 25–42.
Google Scholar
47 S. Droste, T. Jansen, and I. Wegener. Rigorous Complexity Analysis of the (1 + 1) Evolutionary Algorithm for Linear Functions with Boolean Inputs, in Proc. of the IEEE Conference on Evolutionary Computation; Anchorage, AK, 1998; pp 499–504.
Google Scholar
48 L. Nanni et al. Heterogeneous Ensembles for the Missing Feature Problem, in Proc. of Northeast Decision Sciences Institute; New York City, 2013; pp 523–535.
Google Scholar
49 L. Nanni, S. Brahnam, and A. Lumini. Double Committee adaBoost. J. King Saud Univ. 2013, 25(1), pp 29–37.
10.1016/j.jksus.2012.02.001
Google Scholar
50 L. Nanni, A. Lumini, and S. Brahnam. An Empirical Study of Different Approaches for Protein Classification. Sci. World J. 2014, Article ID 236717, pp 1–17.
Google Scholar
51 A. K. Jain, R. P. W. Duin, and J. Mao. Statistical Pattern Recognition: A Review. IEEE Trans. Pattern Anal. Mach. Intell. 2000, 22(1), pp 4–37.
10.1109/34.824819
Web of Science® Google Scholar
52 R. Polikar. Ensemble Based Systems in Decision Making. IEEE Circuits Syst. Mag. 2006, 6(3), pp 21–45.
10.1109/MCAS.2006.1688199
Google Scholar
53 L. Rokach. Taxonomy for Characterizing Ensemble Methods in Classiﬁcation Tasks: A Review and Annotated Bibliography. Comput. Stat. Data Anal. 2009, 53(12), pp 4046–4072.
10.1016/j.csda.2009.07.017
Web of Science® Google Scholar
54 G. Seni and J. Elder. Ensemble Methods in Data Mining: Improving Accuracy through Combining Predictions. Morgan and Claypool Publishers, 2010.
10.1007/978-3-031-01899-2
Google Scholar
55 B. Baruque and E. Corchado. Fusion Methods for Unsupervised Learning Ensembles. Springer: New York, 2011.
10.1007/978-3-642-16205-3
Google Scholar

Citing Literature

Wiley Encyclopedia of Electrical and Electronics Engineering

Browse other articles of this reference work: