A Review on Modal Clustering
Corresponding Author
Giovanna Menardi
Dipartimento di Scienze Statistiche, Università di Padova, via C. Battisti, 241, 35121 Padova, Italy
Search for more papers by this authorCorresponding Author
Giovanna Menardi
Dipartimento di Scienze Statistiche, Università di Padova, via C. Battisti, 241, 35121 Padova, Italy
Search for more papers by this authorSummary
In spite of the current availability of numerous methods of cluster analysis, evaluating a clustering configuration is questionable without the definition of a true population structure, representing the ideal partition that clustering methods should try to approximate. A precise statistical notion of cluster, unshared by most of the mainstream methods, is provided by the density-based approach, which assumes that clusters are associated to some specific features of the probability distribution underlying the data. The non-parametric formulation of this approach, known as modal clustering, draws a correspondence between the groups and the modes of the density function. An appealing implication is that the ill-specified task of cluster detection can be regarded to as a more circumscribed problem of estimation, and the number of clusters is also conceptually well defined. In this work, modal clustering is critically reviewed from both conceptual and operational standpoints. The main directions of current research are outlined as well as some challenges and directions of further research.
References
- Ankerst M., Breunig M. M., Kriegel H. & Sander J. (1999). Optics: Ordering points to identify the clustering structure. In ACM Sigmod Record, Vol. 28: ACM; 49–60.
10.1145/304182.304187 Google Scholar
- Azzalini A. & Menardi G. (2014). Clustering via nonparametric density estimation: The R package pdfCluster. J. Stat. Softw., 57(11), 1–26.
- Azzalini A. & Torelli N. (2007). Clustering via nonparametric density estimation. Stat. Comput., 17, 71–80.
- Baíllo A. (2003). Total error in a plug-in estimator of level sets. Statist. Probab. Lett., 65(4), 411–417.
- Baíllo A., Cuevas A. & Justel A. (2000). Set estimation and nonparametric detection. Canad. J. Statist., 28(4), 765–782.
- Baíllo A., Cuesta-Albertos J. A. & Cuevas A. (2001). Convergence rates in nonparametric estimation of level sets. Statist. Probab. Lett., 53(1), 27–35.
- Ben-Hur A., Horn D., Siegelmann H. T. & Vapnik V. (2001). Support vector clustering. J. Mach. Learn. Res., 2, 125–137.
- Bowman A. W. (1984). An alternative method of cross-validation for the smoothing of density estimates. Biometrika, 71(2), 353–360.
- Burman P. & Polonik W. (2009). Multivariate mode hunting: Data analytic tools with measures of significance. J. Multivar. Anal., 100(6), 1198–1218.
- Carmichael J. W., George J. A. & Julius R. S. (1968). Finding natural clusters. Syst. Zool., 17(2), 144–150.
- Carreira-Perpinan M. A. (2007). Gaussian mean-shift is an EM algorithm. IEEE Trans. Pattern Anal. Mach. Intell., 29(5), 767–776.
- Carreira-Perpinan M. A. (2008). Generalised blurring mean-shift algorithms for nonparametric clustering. In CVPR: 2013 IEEE Conference on Computer Vision and Pattern Recognition; 1–8.
- Chacón J. & Duong T. (2013). Data-driven density derivative estimation, with applications to nonparametric clustering and bump hunting. Electron. J. Stat., 7, 499–532.
- Chacón J. E. (2014). A population background for nonparametric density-based clustering. ArXiv e-prints: 1408.1381.
- Chacón J. E. & Duong T. (2010). Multivariate plug-in bandwidth selection with unconstrained pilot bandwidth matrices. Test, 19(2), 375–398.
- Chacón J. E & Monfort P. (2013). A comparison of bandwidth selectors for mean shift clustering. arXiv preprint:1310.7855.
- Chakravarthy S. V. & Ghosh J. (1996). Scale-based clustering using the radial basis function network.. IEEE Trans. Neural Netw., 7(5), 1250–1261.
- Chaudhuri K. & Dasgupta S. (2010). Rates of convergence for the cluster tree. In Advances in Neural Information Processing Systems, pp. 343–351. USA: Curran Associates.
- Chaudhuri K., Dasgupta S., Kpotufe S. & von Luxburg U. (2014). Consistent procedures for cluster tree estimation and pruning. arXiv preprint math.PR/0000000.
- Cheng Y. (1995). Mean shift, mode seeking, and clustering. IEEE Trans. Pattern Anal. Mach. Intell., 17(8), 790–799.
- Cheng Y. & Ray S. (2014a). Multivariate modality inference using Gaussian kernel. Open J. Statist., 4(05), 419–434.
- Cheng Y. & Ray S. (2014b). Parallel and hierarchical mode association clustering with an r package modalclust. Open J. Statist., 4(10), 826–836.
10.4236/ojs.2014.410078 Google Scholar
- Comaniciu D. (2003). An algorithm for data-driven bandwidth selection. IEEE Trans. Pattern Anal. Mach. Intell., 25(2), 281–288.
- Comaniciu D. & Meer P. (2002). Mean shift: a robust approach toward feature space analysis. IEEE Trans. Pattern Anal. Mach. Intell., 24(5), 603–619.
- Contreras-Reyes J. E. & Azzalini A. (2012). On the spatial correlation between areas of high coseismic slip and aftershock clusters of the Maule earthquake Mw=8.8. ArXiv e-prints: 1208.1517.
- Cormen T. H., Leiserson C. E., Rivest R. L. & Stein C. (2001). Introduction to Algorithms, Vol. 2 Cambridge: MIT Press.
- Cuevas A., Febrero M. & Fraiman R. (2000). Estimating the number of clusters. Canad. J. Statist., 28, 367–382.
- Cuevas A., Febrero M. & Fraiman R. (2001). Cluster analysis: a further approach based on density estimation. Comput. Stat. Data Anal., 36, 441–459.
- De Bin R. & Risso D. (2011). A novel approach to the clustering of microarray data via nonparametric density estimation. BMC Bioinformatics, 12(49), 1–8.
- Duong T. & Hazelton M. (2003). Plug-in bandwidth matrices for bivariate kernel density estimation. J. Nonparametr. Stat., 15(1), 17–30.
- Duong T. & Hazelton M. L. (2005). Cross-validation bandwidth matrices for multivariate kernel density estimation. Scand. J. Stat., 32(3), 485–506.
- Einbeck J. (2011). Bandwidth selection for mean-shift based unsupervised learning techniques: a unified approach via self-coverage. J. Pattern. Recogn. Res., 6(2), 175–192.
10.13176/11.288 Google Scholar
- Einbeck J. & Evers L. (2013). Lpcm: Local principal curve methods. Available at http://CRAN.R-project.org/package=LPCM. R package version 0.44-8.
- Ertoz L., Steinbach M. & Kumar V. (2002). A new shared nearest neighbor clustering algorithm and its applications. In Workshop on Clustering High Dimensional Data and its Applications at 2nd Siam International Conference on Data Mining, pp. 105–115. Arlington, USA.
- Ester M., Kriegel H. P., Sander J. & Xu X. (1996). A density based algorithm for discovering clusters in large spatial databases withe noise. In Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining (KDD-96), pp. 226–231. Portland: AAAI Press.
- Fraley C. & Raftery A. (1998). How many cluster? Which clustering method? Answers via model-based cluster analysis. Comput. J., 41, 578–588.
- Fraley C. & Raftery A. E. (2002). Model-based clustering, discriminant analysis and density estimation. J. Am. Stat. Assoc., 97, 611–631.
- Fukunaga K. & Hostetler L. D. (1975). The estimation of the gradient of a density function, with application in pattern recognition. IEEE Trans. Inform. Theory, 21(1), 32–40.
- Genovese C., Perone-Pacifico M., Verdinelli I. & Wasserman L. (2013). Nonparametric inference for density modes. ArXiv e-prints.
- Giordan M. & Diana G. (2011). A clustering method for categorical ordinal data. Commun. Stat.-Theor. Methods, 40(7), 1315–1334.
- Guha S., Rastogi R. & Shim K. (1998). Cure: an efficient clustering algorithm for large databases. In ACM SIGMOD Record, Vol. 27, pp. 73–84: Seattle: ACM.
- Hall P., Minnotte M. C. & Zhang C. (2004). Bump hunting with non-Gaussian kernels. Ann. Stat., 32(5), 2124–2141.
- Hartigan J. A. (1975). Clustering Algorithms. New York: J. Wiley & Sons.
- Hartigan J. A. (1981). Consistency of single linkage for high-density clusters. J. Am. Stat. Assoc., 76(374), 388–394.
- Hennig C. (2010). Methods for merging Gaussian mixture components. Adv. Data Anal. Classif., 4(1), 3–34.
10.1007/s11634-010-0058-3 Google Scholar
- Hennig C. (2014). fpc: Flexible procedures for clustering. Available at http://CRAN.R-project.org/package=fpc. R package version 2.1-7.
- Hinneburg A. & Keim D. A. (1998). An efficient approach to clustering in large multimedia databases with noise. In Proceedings of the 4th International Conference of Knowledge Discovery and Data Mining (KDD',98), Vol. 98,. pp. 58–65. New York.
- Hubert L. & Arabie P. (1985). Comparing partitions. J. Classif., 2, 193–218.
- Hyndman R. J. (2013). hdrcde. Highest Density Regions and Conditional Density Estimation. R Package. Available at http://cran.r-project.org/package=pdfCluster.
- Jang W. (2006). Nonparametric density estimation and clustering in astronomical sky surveys. Comput. Stat. Data Anal., 50(3), 760–774.
- Klemelä J. (2004). Visualization of multivariate density estimates with level set trees. J. Comput. Graph. Stat., 13(3), 599–620.
- Klemelä J. (2009). Smoothing of Multivariate Data: Density Estimation and Visualization. Hoboken: Wiley.
- Koontz W. & Fukunaga K. (1972). A nonparametric valley-seeking technique for cluster analysis. IEEE Trans. Comput., 100(2), 171–178.
10.1109/TC.1972.5008922 Google Scholar
- Kriegel H., Kröger P., Sander J. & Zimek A. (2011). Density-based clustering. Wiley Interdiscip. Rev. Data Min. Knowl. Discov., 1(3), 231–240.
- Lebourgeois F., Drira F., Gaceb D. & Duong J. (2013). Fast integral meanshift: Application to color seg- mentation of document images. In 12th International Conference on Document Analysis and Recognition (ICDAR), pp. 52–56. Washinton, DC: IEEE.
- Lee H. & Li J. (2012). Variable selection for clustering by separability based on ridgelines. J. Comput. Graph. Stat., 21(2), 315–337.
- Leung Y., Zhang J. S. & Xu Z. (2000). Clustering by scale-space filtering. IEEE Trans. Pattern Anal. Mach. Intell., 22(12), 1396–1410.
- Li J., Ray S. & Lindsay B. G. (2007). A nonparametric statistical approach to clustering via mode identification. J. Mach. Learn. Res., 8, 1687–1723.
- Maier M., Heinb M. & von Luxburg U. (2009). Optimal construction of -nearest-neighbor graphs for identifying noisy clusters. Theor. Comput. Sci., 410(19), 1749–1764.
- Mason D. M. & Polonik W. (2009). Asymptotic normality of plug-in level set estimates. Ann. Appl. Probab., 19(3), 1108–1142.
- Matsumoto Y. (2002). An Introduction to Morse Theory. Providence: American Mathematical Society.
- McLachlan G. J. & Basford K. E. (1988). Mixture Models, Vol. 74. New York: Marcel Dekker.
- Menardi G. & Azzalini A. (2014). An advancement in clustering via nonparametric density estimation. Stat. Comput., 24(5), 753–767.
- Menardi G. & Torelli N. (2013). Reducing data dimension for cluster detection. J. Stat. Comput. Sim., 83(11), 2047–2063.
- Nugent R. & Stuetzle W. (2010). Clustering with confidence: A low-dimensional binning approach. In Classification as a Tool for Research, Eds. H. Jocarek-Junge & C. Weihs, pp. 117–125. Berlin: Springer.
10.1007/978-3-642-10745-0_12 Google Scholar
- Ooi H. ,. Density visualization and mode hunting using trees. J. Comput. Graph. Stat., 11(2), 328–347.
- Paris S. & Durand F. (2007). A topological approach to hierarchical segmentation using meanshift. In Proceedings of the IEEE Conference Computer Vision and Pattern Recognition CVPR '07, pp. 1–8. Minneapolis.
- Ray S. & Lindsay B. G. (2005). The topography of multivariate normal mixtures. Ann. Statist., 33, 2042–2065.
- Rigollet P. & Vert R. (2009). Optimal rates for plug-in estimators of density level sets. Bernoulli, 15(4), 1154–1178.
- Rinaldo A. & Wasserman L. (2010). Generalized density clustering. Ann. Statist., 38(5), 2678–2722.
- Sain S. R., Baggerly K. A. & Scott D. W. (1994). Cross-validation of multivariate densities. J. Am. Stat. Assoc., 89(427), 807–817.
- Samworth R. J. & Wand M. P. (2010). Asymptotics and optimal bandwidth selection for highest density region estimation. Ann. Statist., 38(3), 1767–1792.
- Scott D. & Sain S. (2005). Multidimensional density estimation. Handbook Statist., 24, 229–261.
10.1016/S0169-7161(04)24009-3 Google Scholar
- Scott D. W. (1992). Multivariate Density Estimation: Theory, Practice and Visualization New York: Wiley.
10.1002/9780470316849 Google Scholar
- Stuetzle W. (2003). Estimating the cluster tree of a density by analyzing the minimal spanning tree of a sample. J. Classif., 20, 25–47.
- Stuetzle W. & Nugent R. (2010). A generalized single linkage method for estimating the cluster tree of a density. J. Comput. Graph. Stat., 19(2), 397–418.
- Tao W., Jin H. & Zhang Y. (2007). Color image segmentation based on mean shift and normalized cuts. IEEE Trans Syst Man Cybern, Part B: Cybern., 37(5), 1382–1389.
- Tsybakov A. B. (1997). On nonparametric estimation of density level sets. Ann. Statist., 25(3), 948–969.
- Von Luxburg U. (2007). A tutorial on spectral clustering. Stat. Comp., 17(4), 395–416.
- Walther G. (1997). Granulometric smoothing. Ann. Statist., 25(6), 2273–2299, 12.
- Wand M. P. & Jones M. C. (1994). Multivariate plug-in bandwidth selection. Comput. Stat., 9(2), 97–116.
- Wishart D. (1969). Mode analysis: A generalization of nearest neighbor which reduces chaining effects. In Numerical taxonomy, Ed. A. J. Cole, pp. 282–308. London: Academic Press.
- Wolfe J. H. (1970). Pattern clustering by multivariate mixture analysis. Multivar. Behav. Res., 5(3), 329–350.
- Wong A. M. & Lane T. (1983). The k-th nearest neighbour clustering procedure. J. R. Stat. Soc. Series B, 45, 362–368.
- Yang C., Duraiswami R., DeMenthon D. & Davis L. (2003a). Mean-shift analysis using quasinewton methods. In Proceedings of the International Conference on Image Processing, (ICIP), Vol. 2. pp. II-447. Nice: IEEE.
- Yang C., Duraiswami R., Gumerov N. A. & Davis L. (2003b). Improved fast gauss transform and efficient kernel density estimation. In Proceedings of the ninth IEEE International Conference on Computer Vision: IEEE, pp. 664–671.
- Yao W. & Lindsay B. G. (2009). Bayesian mixture labeling by highest posterior density. J. Am. Statist. Assoc., 104(486), 758–767.
- Yao W., Lindsay B. G. & Li R. (2012). Local modal regression. J. Nonparametr. Stat., 24(3), 647–663.
- Yuan X., Hu B. & He R. (2012). Agglomerative mean-shift clustering. IEEE Trans. Knowl. Data Engin., 24(2), 209–219.