Using adaptive runtime filtering to support an event-based performance analysis
Corresponding Author
Jonas Stolle
Center for Information Services and High Performance Computing (ZIH), Technische Universität Dresden, Dresden, 01062 Germany
Correspondence to: Jonas Stolle, Center for Information Services and High Performance Computing (ZIH), Technische Universität Dresden, Dresden 01062, Germany.
E-mail: [email protected]
Search for more papers by this authorMichael Wagner
Center for Information Services and High Performance Computing (ZIH), Technische Universität Dresden, Dresden, 01062 Germany
Barcelona Supercomputing Center (BSC), Barcelona, 08034 Spain
Search for more papers by this authorJens Doleschal
Center for Information Services and High Performance Computing (ZIH), Technische Universität Dresden, Dresden, 01062 Germany
Search for more papers by this authorFelix Schmitt
Center for Information Services and High Performance Computing (ZIH), Technische Universität Dresden, Dresden, 01062 Germany
NVIDIA, Santa Clara, 95050 CA, USA
Search for more papers by this authorHolger Brunst
Center for Information Services and High Performance Computing (ZIH), Technische Universität Dresden, Dresden, 01062 Germany
Search for more papers by this authorCorresponding Author
Jonas Stolle
Center for Information Services and High Performance Computing (ZIH), Technische Universität Dresden, Dresden, 01062 Germany
Correspondence to: Jonas Stolle, Center for Information Services and High Performance Computing (ZIH), Technische Universität Dresden, Dresden 01062, Germany.
E-mail: [email protected]
Search for more papers by this authorMichael Wagner
Center for Information Services and High Performance Computing (ZIH), Technische Universität Dresden, Dresden, 01062 Germany
Barcelona Supercomputing Center (BSC), Barcelona, 08034 Spain
Search for more papers by this authorJens Doleschal
Center for Information Services and High Performance Computing (ZIH), Technische Universität Dresden, Dresden, 01062 Germany
Search for more papers by this authorFelix Schmitt
Center for Information Services and High Performance Computing (ZIH), Technische Universität Dresden, Dresden, 01062 Germany
NVIDIA, Santa Clara, 95050 CA, USA
Search for more papers by this authorHolger Brunst
Center for Information Services and High Performance Computing (ZIH), Technische Universität Dresden, Dresden, 01062 Germany
Search for more papers by this authorSummary
Event-based performance monitoring and analysis are effective means when tuning parallel applications for optimal resource usage. In this article, we address the data capacity challenge that arises when applying the tracing methodology to large-scale parallel applications and long execution times. Existing approaches use static, pre-defined event filters to reduce the performance data to a manageable size. In contrast, we propose self-guided filters that automatically adapt to an application's runtime behaviour and therefore, do not require any previous knowledge or application executions. Our contribution consists of four adaptive runtime filters, which target a specific type of data redundancy each. The filters focus on detecting identical events in loop iterations, constant events with no variation in time, and very short, highly frequent, typically not very meaningful events, having a severe impact on the total data volume. We evaluate our prototype implementation with five real-world applications and achieve a data reduction of two orders of magnitude while increasing execution time less than 1%. Likewise, we show that the qualitative impact of our filters on performance analysis in state-of-the-art analysis tools can be reduced by adding feedback methods and statistical information to the filtered traces. Copyright © 2017 John Wiley & Sons, Ltd.
References
- 1Top500. Top 500 supercomputer sites, 2016. http://www.top500.org/ [Accessed on 28 October 2016].
- 2Bergman K, Borkar S, Campbell D, Carlson W, Dally W, Denneau M, Franzon P, Harrod W, Hiller J, Karp S. et al. ExaScale Computing Study: Technology Challenges in Achieving Exascale Systems, 2008.
- 3Wagner M, Doleschal J, Knüpfer A. Tracing Long Running Applications: A Case Study Using Gromacs. High Performance Computing Simulation (HPCS), 2015 International Conference on, Amsterdam, Netherlands, 2015; 129–136.
- 4Knüpfer A, Brunst H, Doleschal J, Jurenz M, Lieber M, Mickler H, Müller MS, Nagel WE. The Vampir Performance Analysis Tool Set. In Tools for High Performance Computing. Springer Berlin Heidelberg: Berlin, Heidelberg, 2008: 139–155.
10.1007/978-3-540-68564-7_9 Google Scholar
- 5Knüpfer A, Rössel C, Mey D, Biersdorff S, Diethelm K, Eschweiler D, Geimer M, Gerndt M, Lorenz D, Malony A, Nagel WE, Oleynik Y, Philippen P, Saviankou P, Schmidl D, Shende S, Tschüter R, Wagner M, Wesarg B, Wolf F. Score-P: A Joint Performance Measurement Run-Time Infrastructure for Periscope, Scalasca, TAU, and Vampir. In Tools for High Performance Computing 2011, H Brunst, MS Müller, WE Nagel, MM Resch (eds). Springer: Berlin Heidelberg, 2012; 79–91.
10.1007/978-3-642-31476-6_7 Google Scholar
- 6Wagner M, Doleschal J, Knüpfer A, Nagel WE. Selective Runtime Monitoring: Non-intrusive Elimination of High-frequency Functions. In High Performance Computing Simulation (HPCS), 2014 International Conference on. IEEE: Bologna, Italy, 2014; 295–302.
10.1109/HPCSim.2014.6903698 Google Scholar
- 7Wagner M, Knüpfer A, Nagel WE. Hierarchical Memory Buffering Techniques for an In-Memory Event Tracing Extension to the Open Trace Format 2. In Parallel Processing (ICPP), 2013 42nd International Conference on. IEEE: Lyon, France, 2013; 970–976.
10.1109/ICPP.2013.115 Google Scholar
- 8Barcelona SC. Extrae User Guide Manual for Version 2.5.1, 2015. http://www.bsc.es/computer-sciences/performance-tools/ documentation [Accessed on 20 October 2015].
- 9Roth PC, Arnold DC, Miller BP. MRNet: A Software-Based Multicast/Reduction Network for Scalable Tools. Proceedings of the 2003 ACM/IEEE Conference on Supercomputing, SC '03, Phoenix, AZ, 2003: 21–21.
- 10Llort G, Gonzalez J, Servat H, Gimenez J, Labarta J. On-line Detection of Large-scale Parallel Application's Structure. Parallel Distributed Processing (IPDPS), 2010 IEEE International Symposium on, Atlanta, Georgia, 2010; 1–10.
- 11Llort G, Casas M, Servat H, Huck K, Gimenez J, Labarta J. Trace Spectral Analysis toward Dynamic Levels of Detail. Parallel and Distributed Systems (ICPADS), 2011 IEEE 17th International Conference on, Tainan, Taiwan, 2011; 332–339.
- 12Labarta J, Gimenez J, Martnez E, Gonzlez P, Servat H, Llort G, Aguilar X. Scalability of tracing and visualization tools. In Parco, vol. 33, GR Joubert, WE Nagel, FJ Peters, OG Plata, P Tirado, EL Zapata (eds)., John von Neumann Institute for Computing Series. Central Institute for Applied Mathematics: Jlich, Germany, 2005; 869–876.
- 13Perez A, Abreu R, Riboira A. A dynamic code coverage approach to maximize fault localization efficiency. Journal of Systems and Software 2014; 90: 18–28.
- 14Knüpfer A, Nagel WE. Compressible Memory Data Structures for Event-based Trace Analysis. Future Generation Computer Systems 2006; 22(3): 359–368.
- 15Mohror K, Karavanic KL. Evaluating similarity-based trace reduction techniques for scalable performance analysis. Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, SC 09, Portland, Oregon, 2009; 55:1 to 55:12.
- 16Freitag F, Corbalan J, Labarta J. A dynamic periodicity detector: application to speedup computation. Parallel and Distributed Processing Symposium., Proceedings 15th International, San Francisco, CA, 2001; 6.
- 17Yan J, Schmidt M. Constructing space-time views from fixed size trace files - getting the best of both worlds. Parco'97, Bonn, Germany, 1997; 633–640.
- 18Noeth M, Ratn P, Mueller F, Schulz M, de Supinski M. Scalatrace: Scalable compression and replay of communication traces for high-performance computing. Journal of Parallel and Distributed Computing 2009; 69(8): 696–710.
- 19Havlak P, Kennedy K. An implementation of interprocedural bounded regular section analysis. IEEE Transactions on Parallel and Distributed Systems 1991; 2(3): 350–360.
- 20 University of Oregon Advanced Computing Laboratory, Research Centre Julich Tau reference guide. 2.22 edn, 2012.
- 21Wagner M, Knüpfer A, Nagel WE. Enhanced Encoding Techniques for the Open Trace Format 2. Procedia Computer Science 2012; 9: 1979–1987.
- 22Geimer M, Wolf F, Wylie BJ, Ábrahám E, Becker D, Mohr B. The Scalasca Performance Toolset Architecture. Concurrency and Computation: Practice and Experience 2010; 22(6): 702–719.
- 23Doleschal J, William T, Wesarg B, Ziegenbalg J, Brunst H, Knüpfer A, Nagel WE. Towards Detailed Exascale Application Analysis: Selective Monitoring and Visualisation. In Solving Software Challenges for Exascale, vol. 8759, LNCS. Springer International Publishing: Stockholm, Sweden, 2015; 122–129.
10.1007/978-3-319-15976-8_9 Google Scholar
- 24Müssler J, Lorenz D, Wolf F. Reducing the overhead of direct application instrumentation using prior static analysis. In Proceedings of the 17th international conference on parallel processing - volume part i, Euro-Par'11. Springer-Verlag: Berlin, Heidelberg, 2011; 65–76.
- 25Buck B, Hollingsworth JK. An api for runtime code patching. International Journal of High Performance Computing Applications 2000; 14(4): 317–329.
- 26Bach M, Charney M, Cohn R, Demikhovsky E, Devor T, Hazelwood K, Jaleel A, Luk C-K, Lyons G, Patil H, Tal A. Analyzing parallel programs with pin. Computer 2010; 43(3): 34–41.
- 27Gehani A, Tariq D. Middleware 2012: ACM/IFIP/USENIX 13th International Middleware Conference, Montreal, QC, Canada, December 3-7, 2012. Proceedings P Narasimhan, P Triantafillou (eds)., chap. SPADE: Support for Provenance Auditing in Distributed Environments. Springer Berlin Heidelberg: Berlin, Heidelberg, 2012.
- 28Eschweiler D, Wagner M, Geimer M, Knüpfer A, Nagel WE, Wolf F. Open Trace Format 2: The Next Generation of Scalable Trace Formats and Support Libraries. In Applications, Tools and Techniques on the Road to Exascale Computing, vol. 22, Advances in Parallel Computing. IOS Press, 2012; 481–490.
- 29Wagner M, Nagel WE. Strategies for Real-Time Event Reduction. In Euro-Par 2012: Parallel Processing Workshops, vol. 7640, Lecture Notes in Computer Science. Springer: Rhodes Island, Greece, 2013; 429–438.
10.1007/978-3-642-36949-0_48 Google Scholar
- 30Wagner M, Doleschal J, Nagel WE, Knüpfer A. Runtime Message Uniquification for Accurate Communication Analysis on Incomplete MPI Event Traces. Proceedings of the 20th European MPI Users' Group Meeting, EuroMPI '13, Madrid, Spain, 2013; 123–128.
- 31Plimpton S. Fast Parallel Algorithms for Short-Range Molecular Dynamics. Journal of Computational Physics 1995; 117: 1–19.
- 32Sandia NL. LAMMPS Molecular Dynamics Simulator, 2015. http://lammps.sandia.gov [Accessed on 24 October 2015].
- 33Fischer PF, Lottes JW, Kerkemeier SG. nek5000 Web page, 2015. http://nek5000.mcs.anl.gov [Accessed on 24 October 2015].
- 34 CRESTA. Collaborative Research into Exascale Systemware, Tools and Applications. http://cresta-project.eu [Accessed on 24 October 2015].
- 35Hess B, Kutzner C, van der Spoel D, Lindahl E. GROMACS 4: Algorithms for Highly Efficient, Load-Balanced, and Scalable Molecular Simulation. Journal of Chemical Theory and Computation 2008; 4(3): 435–447.
- 36Erik L, van der Spoel D, Hess B. Gromacs, Version 4.6 beta 3, 2012. http://www.gromacs.org [Accessed on 24 October 2015].
- 37Song F, Wolf F, Bhatia N, Dongarra J, Moore S. An algebra for cross-experiment performance analysis. In Proc. of the International Conference on Parallel Processing (ICPP), Nice, France, 2004; 63–72.