Analyzing and understanding the performance behavior of parallel applications on parallel computing platforms is a long-standing concern in the High Performance Computing community. When the targeted platforms are not available, simulation is a reasonable approach to obtain objective performance indicators and explore various hypothetical scenarios. In the context of applications implemented with the Message Passing Interface, two simulation methods have been proposed, on-line simulation and off-line simulation, both with their own drawbacks and advantages. In this work, we present an off-line simulation framework, that is, one that simulates the execution of an application based on event traces obtained from an actual execution. The main novelty of this work, when compared to previously proposed off-line simulators, is that traces that drive the simulation can be acquired on large, distributed, heterogeneous, and non-dedicated platforms. As a result, the scalability of trace acquisition is increased, which is achieved by enforcing that traces contain no time-related information. Moreover, our framework is based on a state-of-the-art scalable, fast, and validated simulation kernel. We introduce the notion of performing off-line simulation from time-independent traces, propose and evaluate several trace acquisition strategies, describe our simulation framework, and assess its quality in terms of trace acquisition scalability, simulation accuracy, and simulation time. Copyright © 2014 John Wiley & Sons, Ltd.

References

1 Shende S, Malony A. The TAU parallel performance system. International Journal of High Performance Computing and Applications 2006; 20(2): 287–311.
10.1177/1094342006064482
Web of Science® Google Scholar
2 Geimer M, Wolf F, Wylie B, Mohr B. A scalable tool architecture for diagnosing wait states in massively parallel applications. Parallel Computing 2009; 35(7): 375–388.
10.1016/j.parco.2009.02.003
Web of Science® Google Scholar
3 Knüpfer A, Rössel C, Mey D, Biersdorff S, Diethelm K, Eschweiler D, Geimer M, Gerndt M, Lorenz D, Malony A, Nagel WE, Oleynik Y, Philippen P, Saviankou P, Schmidl D, Shende S, Tschter R, Wagner M, Wesarg B, Wolf F. Score-P: a joint performance measurement run-time infrastructure for periscope, scalasca, tau, and vampir. In Proceedings of the 5th International Workshop on Parallel Tools for High Performance Computing, H Brunst, MS Müller, WE Nagel, MM Resch (eds). Springer: Dresden, Germany, 2012; 79–91.
10.1007/978-3-642-31476-6_7
Google Scholar
4 Dickens P, Heidelberger P, Nicol D. Parallelized direct execution simulation of message-passing parallel programs. IEEE Transactions on Parallel and Distributed Systems 1996; 7(10): 1090–1105.
10.1109/71.539740
Web of Science® Google Scholar
5 Bagrodia R, Deelman E, Phan T. Parallel simulation of large-scale parallel applications. International Journal of High Performance Computing and Applications 2001; 15(1): 3–12.
10.1177/109434200101500101
Web of Science® Google Scholar
6 Snavely A, Carrington L, Wolter N, Labarta Jesús, Badia R, Purkayastha A. A framework for application performance modeling and prediction. Proceedings of the ACM/IEEE Conference on Supercomputing (SC'02), Baltimore, MA, 2002.
Google Scholar
7 Zheng G, Kakulapati G, Kale L. BigSim: a parallel simulator for performance prediction of extremely large parallel machines. Proceedings of the 18th International Parallel and Distributed Processing Symposium, Santa Fe, NM, 2004.
Google Scholar
8 Riesen R. A hybrid mpi simulator. Proceedings of the IEEE International Conference on Cluster Computing, Barcelona, Spain, 2006; 1–9.
Google Scholar
9 León E, Riesen R, Maccabe A. Instruction-level simulation of a cluster at scale. Proceedings of the International Conference for High Performance Computing and Communications, Portland, OR, 2009.
Google Scholar
10 Penoff B, Wagner A, Tüxen M, Rüngeler I. MPI-NetSim: a network simulation module for mpi. Proceedings of the 15th International Conference on Parallel and Distributed Systems (ICPADS), Shenzen, China, 2009; 464–471.
Google Scholar
11 Hoefler T, Siebert C, Lumsdaine A. LogGOPSim - simulating large-scale applications in the loggops model. Proceedings of the ACM Workshop on Large-scale System and Application Performance, Chicago, IL, 2010; 597–604.
Google Scholar
12 Tikir M, Laurenzano M, Carrington L, Snavely A. PSINS: an open source event tracer and execution simulator for mpi applications. Proceedings of the 15th International Europar Conference, Lecture Notes in Computer Science, vol. 5704, Delft, Netherlands, August 2009; 135–148.
Google Scholar
13 Núñez A, Fernández J, Garcia JD, Garcia F, Carretero J. New techniques for simulating high performance mpi applications on large storage networks. Journal of Supercomputing 2010; 51(1): 40–57.
10.1007/s11227-009-0279-4
Web of Science® Google Scholar
14 Zhai J, Chen W, Zheng W. PHANTOM: predicting performance of parallel applications on large-scale parallel machines using a single node. Proceedings of the 15th ACM Sigplan Symposium on Principles and Practice of Parallel Programming, Bangalore, India, 2010; 305–314.
Google Scholar
15 Hermanns MA, Geimer M, Wolf F, Wylie B. Verifying causality between distant performance phenomena in large-scale mpi applications, February 2009; 78–84.
10.1109/PDP.2009.50
Google Scholar
16 Adve VS, Bagrodia R, Deelman E, Sakellariou R. Compiler-optimized simulation of large-scale applications on high performance architectures. Journal of Parallel and Distributed Computing 2002; 62(3): 393–426.
10.1006/jpdc.2001.1800
Web of Science® Google Scholar
17 Clauss PN, Stillwell M, Genaud S, Suter F, Casanova H, Quinson M. Single node on-line simulation of mpi applications with smpi. Proceedings of the 25th IEEE International Parallel and Distributed Processing Symposium (IPDPS), Anchorage, AK, 2011; 661–672.
Google Scholar
18 Casanova H, Legrand A, Quinson M. SimGrid: a generic framework for large-scale distributed experiments. Proceedings of the 10th IEEE International Conference on Computer Modeling and Simulation, Cambridge, UK, 2008; 126–131.
Google Scholar
19 Prakash S, Deelman E, Bagrodia R. Asynchronous parallel simulation of parallel programs. IEEE Transactions on Software Engineering 2000; 26(5): 385–400.
10.1109/32.846297
Web of Science® Google Scholar
20 Bédaride P, Degomme A, Genaud S, Legrand A, Markomanolis GS, Quinson M, Stillwell M, Suter F, Videau B. Toward better simulation of mpi applications on ethernet/tcp networks. Proceedings of the 4th International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS), Denver, CO, 2013.
Google Scholar
21 Zheng G, Wilmarth T, Jagadishprasad P, Kalé L. Simulation-based performance prediction for large parallel machines. International Journal of Parallel Programming 2005; 33(2-3): 183–207.
10.1007/s10766-005-3582-6
Web of Science® Google Scholar
22 Badia R, Labarta J, Giménez J, Escalé F. Dimemas: predicting mpi applications behavior in grid environments. Proceedings of the Workshop on Grid Applications and Programming Tools, Seattle, WA, USA, 2003; 52–62.
Google Scholar
23 Noeth M, Mueller F, Schulz M, de Supinski BR. Scalable compression and replay of communication traces in massively parallel environments. Proceedings of the IEEE International Parallel and Distributed Processing Symposium, Long Beach, CA, USA, 2007; 1–11.
Google Scholar
24 Ratn P, Mueller F, de Supinski BR, Schulz M. Preserving time in large-scale communication traces. Proceedings of the 22nd Annual International Conference on Supercomputing, Austin, TX, USA, 2008; 46–55.
Google Scholar
25 Gropp W. MPICH2: a new start for mpi implementations. In Proceedings of the 9th European PVM/MPI Users’ Group Meeting, vol. 2474, Lecture Notes in Computer Science. Springer: Linz, Austria, 2002; 7.
10.1007/3-540-45825-5_5
Google Scholar
26 Gabriel E, Fagg G, Bosilca G, Angskun T, Dongarra J, Squyres J, Sahay V, Kambadur P, Barrett B, Lumsdaine A, Castain R, Daniel D, Graham R, Woodall T. Open MPI: goals, concept, and design of a next generation mpi implementation. In Proceedings of the 11th European PVM/MPI Users’ Group Meeting, vol. 3241, Lecture Notes in Computer Science. Springer: Budapest, Hungary, September 2004; 97–104.
10.1007/978-3-540-30218-6_19
Web of Science® Google Scholar
27 Browne S, Dongarra J, Garner N, Ho G, Mucci P. A portable programming interface for performance evaluation on modern processors. Internation Journal of High Performance Computing and Applications 2000; 14(3): 189–204.
10.1177/109434200001400303
Web of Science® Google Scholar
28 Kufrin R. Perfsuite: an accessible, open source performance analysis environment for linux. Proceedings of the 6th International Conference on Linux Clusters: The HPC Revolution 2005 (LCI-05), Chapel Hill, NC, 2005. (Available from: http://www.linuxclustersinstitute.org/conferences/archive/2005/PDF/21-Kufrin_R.pdf).
Google Scholar
29 Knüpfer A, Brunst H, Doleschal J, Jurenz M, Lieber M, Mickler H, Müller M, Nagel W. The vampir performance analysis tool-set. Proceedings of the 2nd International Workshop on Parallel Tools for High Performance Computing (HLRS), Stuttgart, Germany, 2008; 139–155.
Google Scholar
30 Chassin de Kergommeaux J, de Oliveira Stein B, Bernard P. Pajé, an interactive visualization tool for tuning multi-threaded parallel applications. Parallel Computing 2000; 26(10): 1253–1274.
10.1016/S0167-8191(00)00010-7
Google Scholar
31 Desprez F, Markomanolis GS, Quinson M, Suter F. Assessing the performance of mpi applications through time-independent trace replay. Proceedings of the 2nd International Workshop on Parallel Software Tools and Tool Infrastructures (PSTI), Taipei, Taiwan, 2011; 467–476.
Google Scholar
32 Desprez F, Markomanolis GS, Suter F. Improving the accuracy and efficiency of time-independent trace replay. Proceedings of the 3rd International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS), Salt Lake City, UT, 2012; 446–455.
Google Scholar
33 Bailey D, Barszcz E, Barton J, Browning D, Carter R, Dagum L, Fatoohi R, Frederickson P, Lasinski T, Schreiber R, Simon H, Venkatakrishnan V, Weeratunga S. The NAS parallel benchmarks - summary and preliminary results. Proceedings of Supercomputing '91, Albuquerque, NM, 1991; 158–165.
Google Scholar
34 Jeannot E. Improving middleware performance with adoc: an adaptive online compression library for data transfer. Proceedings of the 19th International Parallel and Distributed Processing Symposium, Denver, CO, 2005.
Google Scholar
35 Top 500 Supercomputer Sites. (Available from: http://www.top500.org/) [Accessed on February 2014].
Google Scholar
36 Reussner R, Sanders P, Träff JL. SKaMPI: a comprehensive benchmark for public benchmarking of mpi. Scientific Programming 2002; 10(1): 55–65.
10.1155/2002/202839
Google Scholar
37 Genovese L, Neelov A, Goedecker S, Deutsch T, Ghasemi SA, Willand A, Caliste D, Zilberberg O, Rayson M, Bergman A, Schneider R. Daubechies wavelets as a basis set for density functional pseudopotential calculations. Journal of Chemical Physics 2008; 129(014109). (Available from: http://scitation.aip.org/content/aip/journal/jcp/129/1/10.1063/1.2949547).
CAS PubMed Google Scholar
38 Mont-Blanc: european approach towards energy efficient high performance. Montblanc. (Available from: http://www.montblanc-project.eu/) [Accessed on February 2014].
Google Scholar
39 Markomanolis GS, Suter F. Time-independent trace acquisition framework – a grid’5000 How-to. Technical Report RT-0407, Institut National de Recherche en Informatique et en Automatique (INRIA), 2011. (Available from: http://hal.inria.fr/inria-00586052) [Accessed on February 2014].
Google Scholar

Citing Literature

Volume27, Issue5

10 April 2015

Pages 1145-1168

Simulation of MPI applications with time-independent traces

Summary

References

Citing Literature

References

Information

About Wiley Online Library

Help & Support

Opportunities

Connect with Wiley

Simulation of MPI applications with time-independent traces

Summary

References

Citing Literature

References

Related

Information