A topological data analysis approach is taken to the challenging problem of finding and validating the statistical significance of local modes in a data set. As with the SIgnificance of the ZERo (SiZer) approach to this problem, statistical inference is performed in a multi-scale way, that is, across bandwidths. The key contribution is a two-parameter approach to the persistent homology representation. For each kernel bandwidth, a sub-level set filtration of the resulting kernel density estimate is computed. Inference based on the resulting persistence diagram indicates statistical significance of modes. It is seen through a simulated example, and by analysis of the famous Hidalgo stamps data, that the new method has more statistical power for finding bumps than SiZer. Copyright © 2017 John Wiley & Sons, Ltd.

References

Basford, KE, McLachlan, GJ & York, MG (1997), ‘Modelling the distribution of stamp paper thickness via finite normal mixtures: the 1872 Hidalgo stamp issue of Mexico revisited’, Journal of Applied Statistics, 24(2), 169–180.
10.1080/02664769723783
Web of Science® Google Scholar
Bubenik, P & Kim, PT (2007), ‘A statistical approach to persistent homology’, Homology, Homotopy and Applications, 9(2), 337–362.
10.4310/HHA.2007.v9.n2.a12
Web of Science® Google Scholar
Carlsson, G (2009), ‘Topology and data’, Bulletin of the American Mathematical Society, 46(2), 255–308.
10.1090/S0273-0979-09-01249-X
Web of Science® Google Scholar
Carlsson, G & Zomorodian, A (2009), ‘The theory of multidimensional persistence’, Discrete & Computational Geometry, 42(1), 71–93.
10.1007/s00454-009-9176-0
Web of Science® Google Scholar
Chaudhuri, P & Marron, J (1999), ‘SiZer for exploration of structures in curves’, Journal of the American Statistical Association, 94(447), 807–823.
10.1080/01621459.1999.10474186
Web of Science® Google Scholar
Chaudhuri, P & Marron, J (2000), ‘Scale space view of curve estimation’, Annals of Statistics, 28(2), 408–428.
10.1214/aos/1016218224
Web of Science® Google Scholar
Chazal, F, Cohen-Steiner, D & Mérigot, Q (2011), ‘Geometric inference for probability measures’, Foundations of Computational Mathematics, 11(6), 733–751.
10.1007/s10208-011-9098-0
Web of Science® Google Scholar
Cohen-Steiner, D, Edelsbrunner, H & Harer, J (2007), ‘Stability of persistence diagrams’, Discrete & Computational Geometry, 37(1), 103–120.
10.1007/s00454-006-1276-5
Web of Science® Google Scholar
Devroye, L & Gyorfi, L (1985), Nonparametric Density Estimation: The L1 View, Vol. 119, John Wiley & Sons Incorporated.
Google Scholar
Edelsbrunner, H & Harer, J (2008), ‘Persistent homology—a survey’, Contemporary Mathematics, 453, 257–282.
10.1090/conm/453/08802
Web of Science® Google Scholar
Efron, B & Tibshirani, RJ (1994), An Introduction to the Bootstrap, CRC press.
10.1201/9780429246593
Google Scholar
Erästö, P & Holmström, L (2007), ‘Bayesian analysis of features in a scatter plot with dependent observations and errors in predictors’, Journal of Statistical Computation and Simulation, 77(5), 421–431.
10.1080/10629360600711988
Web of Science® Google Scholar
Erästö, P & Holmström, L (2012), ‘Bayesian multiscale smoothing for making inferences about features in scatterplots’, Journal of Computational and Graphical Statistics, 14(3), 569–589.
10.1198/106186005X59315
Web of Science® Google Scholar
Fasy, BT, Lecci, F, Rinaldo, A, Wasserman, L, Balakrishnan, S & Singh, A (2014), ‘Confidence sets for persistence diagrams’, The Annals of Statistics, 42(6), 2301–2339.
10.1214/14-AOS1252
Web of Science® Google Scholar
Fisher, N & Marron, JS (2001), ‘Mode testing via the excess mass estimate’, Biometrika, 88(2), 499–517.
10.1093/biomet/88.2.499
Web of Science® Google Scholar
Ghrist, R (2008), ‘Barcodes: the persistent topology of data’, Bulletin of the American Mathematical Society, 45(1), 61–75.
10.1090/S0273-0979-07-01191-3
Web of Science® Google Scholar
Godtliebsen, F, Marron, J & Chaudhuri, P (2002), ‘Significance in scale space for bivariate density estimation’, Journal of Computational and Graphical Statistics, 11(1), 1–21.
10.1198/106186002317375596
Web of Science® Google Scholar
Good, I & Gaskins, R (1980), ‘Density estimation and bump-hunting by the penalized likelihood method exemplified by scattering and meteorite data’, Journal of the American Statistical Association, 75(369), 42–56.
10.1080/01621459.1980.10477419
Web of Science® Google Scholar
Hannig, J & Marron, J (2006), ‘Advanced distribution theory for SiZer’, Journal of the American Statistical Association, 101(474), 484–499.
10.1198/016214505000001294
CAS Web of Science® Google Scholar
Holmström, L & Erästö, P (2002), ‘Making inferences about past environmental change using smoothing in multiple time scales’, Computational Statistics & Data Analysis, 41(2), 289–309.
10.1016/S0167-9473(02)00079-8
Web of Science® Google Scholar
Izenman, AJ & Sommer, CJ (1988), ‘Philatelic mixtures and multimodal densities’, Journal of the American Statistical association, 83(404), 941–953.
Web of Science® Google Scholar
Jones, M, Marron, JS & Sheather, S (1996a), ‘Progress in data-based bandwidth selection for kernel density estimation’, Computational Statistics, 11(3), 337–381.
PubMed Google Scholar
Jones, MC, Marron, JS & Sheather, SJ (1996b), ‘A brief survey of bandwidth selection for density estimation’, Journal of the American Statistical Association, 91(433), 401–407.
10.1080/01621459.1996.10476701
Web of Science® Google Scholar
Marron, JS & Wand, MP (1992), ‘Exact mean integrated squared error’, The Annals of Statistics, 20, 712–736.
10.1214/aos/1176348653
Web of Science® Google Scholar
Minnotte, MC (2010), ‘Mode testing via higher-order density estimation’, Computational Statistics, 25(3), 391–407.
10.1007/s00180-010-0183-7
Web of Science® Google Scholar
Minnotte, MC & Scott, DW (1993), ‘The mode tree: a tool for visualization of nonparametric density features’, Journal of Computational and Graphical Statistics, 2(1), 51–68.
10.1080/10618600.1993.10474599
Google Scholar
Scott, DW (2015), Multivariate Density Estimation: Theory, Practice, and Visualization, John Wiley & Sons.
10.1002/9781118575574
Google Scholar
Silverman, BW (1986), Density Estimation for Statistics and Data Analysis, Vol. 26, CRC Press.
10.1007/978-1-4899-3324-9
Web of Science® Google Scholar
Simonoff, JS (2012), Smoothing Methods in Statistics, Springer Science & Business Media.
Google Scholar
Walther, G (2002), ‘Detecting the presence of mixing with multiscale maximum likelihood’, Journal of the American Statistical Association, 97(458), 508–513.
10.1198/016214502760047032
Web of Science® Google Scholar
Wand, MP & Jones, MC (1994), Kernel Smoothing, CRC Press.
10.1201/b14876
Google Scholar
Xia, K, Zhao, Z & Wei, GW (2015a), ‘Multiresolution persistent homology for excessively large biomolecular datasets’, The Journal of Chemical Physics, 143(13), 134103.
10.1063/1.4931733
CAS PubMed Web of Science® Google Scholar
Xia, K, Zhao, Z & Wei, GW (2015b), ‘Multiresolution topological simplification’, Journal of Computational Biology: A Journal of Computational Molecular Cell Biology, 22(9), 887–891.
10.1089/cmb.2015.0104
CAS PubMed Web of Science® Google Scholar

Citing Literature

Volume6, Issue1

2017

Pages 462-471

Bump hunting by topological data analysis

Abstract

References

Citing Literature

References

Information

About Wiley Online Library

Help & Support

Opportunities

Connect with Wiley

Bump hunting by topological data analysis

Abstract

References

Citing Literature

References

Related

Information