Concurrency and Computation: Practice and Experience

Special Issue Paper

Using adaptive runtime filtering to support an event-based performance analysis

Corresponding Author

Jonas Stolle

[email protected]

orcid.org/0000-0002-3042-4387

Center for Information Services and High Performance Computing (ZIH), Technische Universität Dresden, Dresden, 01062 Germany

Correspondence to: Jonas Stolle, Center for Information Services and High Performance Computing (ZIH), Technische Universität Dresden, Dresden 01062, Germany.

E-mail: [email protected]

Search for more papers by this author

Michael Wagner,

Michael Wagner

Center for Information Services and High Performance Computing (ZIH), Technische Universität Dresden, Dresden, 01062 Germany

Barcelona Supercomputing Center (BSC), Barcelona, 08034 Spain

Search for more papers by this author

Jens Doleschal,

Jens Doleschal

Center for Information Services and High Performance Computing (ZIH), Technische Universität Dresden, Dresden, 01062 Germany

Search for more papers by this author

Felix Schmitt,

Felix Schmitt

Center for Information Services and High Performance Computing (ZIH), Technische Universität Dresden, Dresden, 01062 Germany

NVIDIA, Santa Clara, 95050 CA, USA

Search for more papers by this author

Holger Brunst,

Holger Brunst

Center for Information Services and High Performance Computing (ZIH), Technische Universität Dresden, Dresden, 01062 Germany

Search for more papers by this author

Jonas Stolle,

Corresponding Author

Jonas Stolle

[email protected]

orcid.org/0000-0002-3042-4387

Center for Information Services and High Performance Computing (ZIH), Technische Universität Dresden, Dresden, 01062 Germany

Correspondence to: Jonas Stolle, Center for Information Services and High Performance Computing (ZIH), Technische Universität Dresden, Dresden 01062, Germany.

E-mail: [email protected]

Search for more papers by this author

Michael Wagner,

Michael Wagner

Center for Information Services and High Performance Computing (ZIH), Technische Universität Dresden, Dresden, 01062 Germany

Barcelona Supercomputing Center (BSC), Barcelona, 08034 Spain

Search for more papers by this author

Jens Doleschal,

Jens Doleschal

Center for Information Services and High Performance Computing (ZIH), Technische Universität Dresden, Dresden, 01062 Germany

Search for more papers by this author

Felix Schmitt,

Felix Schmitt

Center for Information Services and High Performance Computing (ZIH), Technische Universität Dresden, Dresden, 01062 Germany

NVIDIA, Santa Clara, 95050 CA, USA

Search for more papers by this author

Holger Brunst,

Holger Brunst

Center for Information Services and High Performance Computing (ZIH), Technische Universität Dresden, Dresden, 01062 Germany

Search for more papers by this author

First published: 24 February 2017

https://doi.org/10.1002/cpe.4094

Citations: 1

Share a link

Email
Wechat
Bluesky

Summary

Event-based performance monitoring and analysis are effective means when tuning parallel applications for optimal resource usage. In this article, we address the data capacity challenge that arises when applying the tracing methodology to large-scale parallel applications and long execution times. Existing approaches use static, pre-defined event filters to reduce the performance data to a manageable size. In contrast, we propose self-guided filters that automatically adapt to an application's runtime behaviour and therefore, do not require any previous knowledge or application executions. Our contribution consists of four adaptive runtime filters, which target a specific type of data redundancy each. The filters focus on detecting identical events in loop iterations, constant events with no variation in time, and very short, highly frequent, typically not very meaningful events, having a severe impact on the total data volume. We evaluate our prototype implementation with five real-world applications and achieve a data reduction of two orders of magnitude while increasing execution time less than 1%. Likewise, we show that the qualitative impact of our filters on performance analysis in state-of-the-art analysis tools can be reduced by adding feedback methods and statistical information to the filtered traces. Copyright © 2017 John Wiley & Sons, Ltd.

References

1Top500. Top 500 supercomputer sites, 2016. http://www.top500.org/ [Accessed on 28 October 2016].
Google Scholar
2Bergman K, Borkar S, Campbell D, Carlson W, Dally W, Denneau M, Franzon P, Harrod W, Hiller J, Karp S. et al. ExaScale Computing Study: Technology Challenges in Achieving Exascale Systems, 2008.
Google Scholar
3Wagner M, Doleschal J, Knüpfer A. Tracing Long Running Applications: A Case Study Using Gromacs. High Performance Computing Simulation (HPCS), 2015 International Conference on, Amsterdam, Netherlands, 2015; 129–136.
Google Scholar
4Knüpfer A, Brunst H, Doleschal J, Jurenz M, Lieber M, Mickler H, Müller MS, Nagel WE. The Vampir Performance Analysis Tool Set. In Tools for High Performance Computing. Springer Berlin Heidelberg: Berlin, Heidelberg, 2008: 139–155.
10.1007/978-3-540-68564-7_9
Google Scholar
5Knüpfer A, Rössel C, Mey D, Biersdorff S, Diethelm K, Eschweiler D, Geimer M, Gerndt M, Lorenz D, Malony A, Nagel WE, Oleynik Y, Philippen P, Saviankou P, Schmidl D, Shende S, Tschüter R, Wagner M, Wesarg B, Wolf F. Score-P: A Joint Performance Measurement Run-Time Infrastructure for Periscope, Scalasca, TAU, and Vampir. In Tools for High Performance Computing 2011, H Brunst, MS Müller, WE Nagel, MM Resch (eds). Springer: Berlin Heidelberg, 2012; 79–91.
10.1007/978-3-642-31476-6_7
Google Scholar
6Wagner M, Doleschal J, Knüpfer A, Nagel WE. Selective Runtime Monitoring: Non-intrusive Elimination of High-frequency Functions. In High Performance Computing Simulation (HPCS), 2014 International Conference on. IEEE: Bologna, Italy, 2014; 295–302.
10.1109/HPCSim.2014.6903698
Google Scholar
7Wagner M, Knüpfer A, Nagel WE. Hierarchical Memory Buffering Techniques for an In-Memory Event Tracing Extension to the Open Trace Format 2. In Parallel Processing (ICPP), 2013 42nd International Conference on. IEEE: Lyon, France, 2013; 970–976.
10.1109/ICPP.2013.115
Google Scholar
8Barcelona SC. Extrae User Guide Manual for Version 2.5.1, 2015. http://www.bsc.es/computer-sciences/performance-tools/ documentation [Accessed on 20 October 2015].
Google Scholar
9Roth PC, Arnold DC, Miller BP. MRNet: A Software-Based Multicast/Reduction Network for Scalable Tools. Proceedings of the 2003 ACM/IEEE Conference on Supercomputing, SC '03, Phoenix, AZ, 2003: 21–21.
Google Scholar
10Llort G, Gonzalez J, Servat H, Gimenez J, Labarta J. On-line Detection of Large-scale Parallel Application's Structure. Parallel Distributed Processing (IPDPS), 2010 IEEE International Symposium on, Atlanta, Georgia, 2010; 1–10.
Google Scholar
11Llort G, Casas M, Servat H, Huck K, Gimenez J, Labarta J. Trace Spectral Analysis toward Dynamic Levels of Detail. Parallel and Distributed Systems (ICPADS), 2011 IEEE 17th International Conference on, Tainan, Taiwan, 2011; 332–339.
Google Scholar
12Labarta J, Gimenez J, Martnez E, Gonzlez P, Servat H, Llort G, Aguilar X. Scalability of tracing and visualization tools. In Parco, vol. 33, GR Joubert, WE Nagel, FJ Peters, OG Plata, P Tirado, EL Zapata (eds)., John von Neumann Institute for Computing Series. Central Institute for Applied Mathematics: Jlich, Germany, 2005; 869–876.
Google Scholar
13Perez A, Abreu R, Riboira A. A dynamic code coverage approach to maximize fault localization efficiency. Journal of Systems and Software 2014; 90: 18–28.
10.1016/j.jss.2013.12.036
Web of Science® Google Scholar
14Knüpfer A, Nagel WE. Compressible Memory Data Structures for Event-based Trace Analysis. Future Generation Computer Systems 2006; 22(3): 359–368.
10.1016/j.future.2004.11.021
Web of Science® Google Scholar
15Mohror K, Karavanic KL. Evaluating similarity-based trace reduction techniques for scalable performance analysis. Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, SC 09, Portland, Oregon, 2009; 55:1 to 55:12.
Google Scholar
16Freitag F, Corbalan J, Labarta J. A dynamic periodicity detector: application to speedup computation. Parallel and Distributed Processing Symposium., Proceedings 15th International, San Francisco, CA, 2001; 6.
Google Scholar
17Yan J, Schmidt M. Constructing space-time views from fixed size trace files - getting the best of both worlds. Parco'97, Bonn, Germany, 1997; 633–640.
Google Scholar
18Noeth M, Ratn P, Mueller F, Schulz M, de Supinski M. Scalatrace: Scalable compression and replay of communication traces for high-performance computing. Journal of Parallel and Distributed Computing 2009; 69(8): 696–710.
10.1016/j.jpdc.2008.09.001
Web of Science® Google Scholar
19Havlak P, Kennedy K. An implementation of interprocedural bounded regular section analysis. IEEE Transactions on Parallel and Distributed Systems 1991; 2(3): 350–360.
10.1109/71.86110
Web of Science® Google Scholar
20 University of Oregon Advanced Computing Laboratory, Research Centre Julich Tau reference guide. 2.22 edn, 2012.
Google Scholar
21Wagner M, Knüpfer A, Nagel WE. Enhanced Encoding Techniques for the Open Trace Format 2. Procedia Computer Science 2012; 9: 1979–1987.
10.1016/j.procs.2012.04.216
Web of Science® Google Scholar
22Geimer M, Wolf F, Wylie BJ, Ábrahám E, Becker D, Mohr B. The Scalasca Performance Toolset Architecture. Concurrency and Computation: Practice and Experience 2010; 22(6): 702–719.
10.1002/cpe.1556
Web of Science® Google Scholar
23Doleschal J, William T, Wesarg B, Ziegenbalg J, Brunst H, Knüpfer A, Nagel WE. Towards Detailed Exascale Application Analysis: Selective Monitoring and Visualisation. In Solving Software Challenges for Exascale, vol. 8759, LNCS. Springer International Publishing: Stockholm, Sweden, 2015; 122–129.
10.1007/978-3-319-15976-8_9
Google Scholar
24Müssler J, Lorenz D, Wolf F. Reducing the overhead of direct application instrumentation using prior static analysis. In Proceedings of the 17th international conference on parallel processing - volume part i, Euro-Par'11. Springer-Verlag: Berlin, Heidelberg, 2011; 65–76.
Google Scholar
25Buck B, Hollingsworth JK. An api for runtime code patching. International Journal of High Performance Computing Applications 2000; 14(4): 317–329.
10.1177/109434200001400404
Web of Science® Google Scholar
26Bach M, Charney M, Cohn R, Demikhovsky E, Devor T, Hazelwood K, Jaleel A, Luk C-K, Lyons G, Patil H, Tal A. Analyzing parallel programs with pin. Computer 2010; 43(3): 34–41.
10.1109/MC.2010.60
Web of Science® Google Scholar
27Gehani A, Tariq D. Middleware 2012: ACM/IFIP/USENIX 13th International Middleware Conference, Montreal, QC, Canada, December 3-7, 2012. Proceedings P Narasimhan, P Triantafillou (eds)., chap. SPADE: Support for Provenance Auditing in Distributed Environments. Springer Berlin Heidelberg: Berlin, Heidelberg, 2012.
Google Scholar
28Eschweiler D, Wagner M, Geimer M, Knüpfer A, Nagel WE, Wolf F. Open Trace Format 2: The Next Generation of Scalable Trace Formats and Support Libraries. In Applications, Tools and Techniques on the Road to Exascale Computing, vol. 22, Advances in Parallel Computing. IOS Press, 2012; 481–490.
Google Scholar
29Wagner M, Nagel WE. Strategies for Real-Time Event Reduction. In Euro-Par 2012: Parallel Processing Workshops, vol. 7640, Lecture Notes in Computer Science. Springer: Rhodes Island, Greece, 2013; 429–438.
10.1007/978-3-642-36949-0_48
Google Scholar
30Wagner M, Doleschal J, Nagel WE, Knüpfer A. Runtime Message Uniquification for Accurate Communication Analysis on Incomplete MPI Event Traces. Proceedings of the 20th European MPI Users' Group Meeting, EuroMPI '13, Madrid, Spain, 2013; 123–128.
Google Scholar
31Plimpton S. Fast Parallel Algorithms for Short-Range Molecular Dynamics. Journal of Computational Physics 1995; 117: 1–19.
10.1006/jcph.1995.1039
CAS Web of Science® Google Scholar
32Sandia NL. LAMMPS Molecular Dynamics Simulator, 2015. http://lammps.sandia.gov [Accessed on 24 October 2015].
Google Scholar
33Fischer PF, Lottes JW, Kerkemeier SG. nek5000 Web page, 2015. http://nek5000.mcs.anl.gov [Accessed on 24 October 2015].
Google Scholar
34 CRESTA. Collaborative Research into Exascale Systemware, Tools and Applications. http://cresta-project.eu [Accessed on 24 October 2015].
Google Scholar
35Hess B, Kutzner C, van der Spoel D, Lindahl E. GROMACS 4: Algorithms for Highly Efficient, Load-Balanced, and Scalable Molecular Simulation. Journal of Chemical Theory and Computation 2008; 4(3): 435–447.
10.1021/ct700301q
CAS PubMed Web of Science® Google Scholar
36Erik L, van der Spoel D, Hess B. Gromacs, Version 4.6 beta 3, 2012. http://www.gromacs.org [Accessed on 24 October 2015].
Google Scholar
37Song F, Wolf F, Bhatia N, Dongarra J, Moore S. An algebra for cross-experiment performance analysis. In Proc. of the International Conference on Parallel Processing (ICPP), Nice, France, 2004; 63–72.
Google Scholar

Citing Literature

Volume29, Issue7

Combined Special Issues on Security and privacy in social networks (NSS2015) and 18th IEEE International Conference on Computational Science and Engineering (CSE2015)

10 April 2017

e4094

Using adaptive runtime filtering to support an event-based performance analysis

Summary

References

Citing Literature

References

Information

About Wiley Online Library

Help & Support

Opportunities

Connect with Wiley

Using adaptive runtime filtering to support an event-based performance analysis

Summary

References

Citing Literature

References

Related

Information