Simulation of MPI applications with time-independent traces
Henri Casanova
Dept. of Information and Computer Sciences, University of Hawai‘i at Manoa, Manoa, HI, USA
Search for more papers by this authorCorresponding Author
Frédéric Suter
INRIA, LIP, ENS Lyon, Lyon, France
IN2P3 Computing Center, CNRS, IN2P3, Lyon-Villeurbanne, France
Correspondence to: Frédéric Suter, Centre de Calcul de l'IN2P3, 43 bld du 11 novembre 1918, 69622 Villeurbanne Cedex, France.
E-mail: [email protected]
Search for more papers by this authorHenri Casanova
Dept. of Information and Computer Sciences, University of Hawai‘i at Manoa, Manoa, HI, USA
Search for more papers by this authorCorresponding Author
Frédéric Suter
INRIA, LIP, ENS Lyon, Lyon, France
IN2P3 Computing Center, CNRS, IN2P3, Lyon-Villeurbanne, France
Correspondence to: Frédéric Suter, Centre de Calcul de l'IN2P3, 43 bld du 11 novembre 1918, 69622 Villeurbanne Cedex, France.
E-mail: [email protected]
Search for more papers by this authorSummary
Analyzing and understanding the performance behavior of parallel applications on parallel computing platforms is a long-standing concern in the High Performance Computing community. When the targeted platforms are not available, simulation is a reasonable approach to obtain objective performance indicators and explore various hypothetical scenarios. In the context of applications implemented with the Message Passing Interface, two simulation methods have been proposed, on-line simulation and off-line simulation, both with their own drawbacks and advantages. In this work, we present an off-line simulation framework, that is, one that simulates the execution of an application based on event traces obtained from an actual execution. The main novelty of this work, when compared to previously proposed off-line simulators, is that traces that drive the simulation can be acquired on large, distributed, heterogeneous, and non-dedicated platforms. As a result, the scalability of trace acquisition is increased, which is achieved by enforcing that traces contain no time-related information. Moreover, our framework is based on a state-of-the-art scalable, fast, and validated simulation kernel. We introduce the notion of performing off-line simulation from time-independent traces, propose and evaluate several trace acquisition strategies, describe our simulation framework, and assess its quality in terms of trace acquisition scalability, simulation accuracy, and simulation time. Copyright © 2014 John Wiley & Sons, Ltd.
References
- 1 Shende S, Malony A. The TAU parallel performance system. International Journal of High Performance Computing and Applications 2006; 20(2): 287–311.
- 2 Geimer M, Wolf F, Wylie B, Mohr B. A scalable tool architecture for diagnosing wait states in massively parallel applications. Parallel Computing 2009; 35(7): 375–388.
- 3
Knüpfer A,
Rössel C,
Mey D,
Biersdorff S,
Diethelm K,
Eschweiler D,
Geimer M,
Gerndt M,
Lorenz D,
Malony A,
Nagel WE,
Oleynik Y,
Philippen P,
Saviankou P,
Schmidl D,
Shende S,
Tschter R,
Wagner M,
Wesarg B,
Wolf F. Score-P: a joint performance measurement run-time infrastructure for periscope, scalasca, tau, and vampir. In Proceedings of the 5th International Workshop on Parallel Tools for High Performance Computing, H Brunst, MS Müller, WE Nagel, MM Resch (eds). Springer: Dresden, Germany, 2012; 79–91.
10.1007/978-3-642-31476-6_7 Google Scholar
- 4 Dickens P, Heidelberger P, Nicol D. Parallelized direct execution simulation of message-passing parallel programs. IEEE Transactions on Parallel and Distributed Systems 1996; 7(10): 1090–1105.
- 5 Bagrodia R, Deelman E, Phan T. Parallel simulation of large-scale parallel applications. International Journal of High Performance Computing and Applications 2001; 15(1): 3–12.
- 6 Snavely A, Carrington L, Wolter N, Labarta Jesús, Badia R, Purkayastha A. A framework for application performance modeling and prediction. Proceedings of the ACM/IEEE Conference on Supercomputing (SC'02), Baltimore, MA, 2002.
- 7 Zheng G, Kakulapati G, Kale L. BigSim: a parallel simulator for performance prediction of extremely large parallel machines. Proceedings of the 18th International Parallel and Distributed Processing Symposium, Santa Fe, NM, 2004.
- 8 Riesen R. A hybrid mpi simulator. Proceedings of the IEEE International Conference on Cluster Computing, Barcelona, Spain, 2006; 1–9.
- 9 León E, Riesen R, Maccabe A. Instruction-level simulation of a cluster at scale. Proceedings of the International Conference for High Performance Computing and Communications, Portland, OR, 2009.
- 10 Penoff B, Wagner A, Tüxen M, Rüngeler I. MPI-NetSim: a network simulation module for mpi. Proceedings of the 15th International Conference on Parallel and Distributed Systems (ICPADS), Shenzen, China, 2009; 464–471.
- 11 Hoefler T, Siebert C, Lumsdaine A. LogGOPSim - simulating large-scale applications in the loggops model. Proceedings of the ACM Workshop on Large-scale System and Application Performance, Chicago, IL, 2010; 597–604.
- 12 Tikir M, Laurenzano M, Carrington L, Snavely A. PSINS: an open source event tracer and execution simulator for mpi applications. Proceedings of the 15th International Europar Conference, Lecture Notes in Computer Science, vol. 5704, Delft, Netherlands, August 2009; 135–148.
- 13 Núñez A, Fernández J, Garcia JD, Garcia F, Carretero J. New techniques for simulating high performance mpi applications on large storage networks. Journal of Supercomputing 2010; 51(1): 40–57.
- 14 Zhai J, Chen W, Zheng W. PHANTOM: predicting performance of parallel applications on large-scale parallel machines using a single node. Proceedings of the 15th ACM Sigplan Symposium on Principles and Practice of Parallel Programming, Bangalore, India, 2010; 305–314.
- 15
Hermanns MA,
Geimer M,
Wolf F,
Wylie B. Verifying causality between distant performance phenomena in large-scale mpi applications, February 2009; 78–84.
10.1109/PDP.2009.50 Google Scholar
- 16 Adve VS, Bagrodia R, Deelman E, Sakellariou R. Compiler-optimized simulation of large-scale applications on high performance architectures. Journal of Parallel and Distributed Computing 2002; 62(3): 393–426.
- 17 Clauss PN, Stillwell M, Genaud S, Suter F, Casanova H, Quinson M. Single node on-line simulation of mpi applications with smpi. Proceedings of the 25th IEEE International Parallel and Distributed Processing Symposium (IPDPS), Anchorage, AK, 2011; 661–672.
- 18 Casanova H, Legrand A, Quinson M. SimGrid: a generic framework for large-scale distributed experiments. Proceedings of the 10th IEEE International Conference on Computer Modeling and Simulation, Cambridge, UK, 2008; 126–131.
- 19 Prakash S, Deelman E, Bagrodia R. Asynchronous parallel simulation of parallel programs. IEEE Transactions on Software Engineering 2000; 26(5): 385–400.
- 20 Bédaride P, Degomme A, Genaud S, Legrand A, Markomanolis GS, Quinson M, Stillwell M, Suter F, Videau B. Toward better simulation of mpi applications on ethernet/tcp networks. Proceedings of the 4th International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS), Denver, CO, 2013.
- 21 Zheng G, Wilmarth T, Jagadishprasad P, Kalé L. Simulation-based performance prediction for large parallel machines. International Journal of Parallel Programming 2005; 33(2-3): 183–207.
- 22 Badia R, Labarta J, Giménez J, Escalé F. Dimemas: predicting mpi applications behavior in grid environments. Proceedings of the Workshop on Grid Applications and Programming Tools, Seattle, WA, USA, 2003; 52–62.
- 23 Noeth M, Mueller F, Schulz M, de Supinski BR. Scalable compression and replay of communication traces in massively parallel environments. Proceedings of the IEEE International Parallel and Distributed Processing Symposium, Long Beach, CA, USA, 2007; 1–11.
- 24 Ratn P, Mueller F, de Supinski BR, Schulz M. Preserving time in large-scale communication traces. Proceedings of the 22nd Annual International Conference on Supercomputing, Austin, TX, USA, 2008; 46–55.
- 25
Gropp W. MPICH2: a new start for mpi implementations. In Proceedings of the 9th European PVM/MPI Users’ Group Meeting, vol. 2474, Lecture Notes in Computer Science. Springer: Linz, Austria, 2002; 7.
10.1007/3-540-45825-5_5 Google Scholar
- 26 Gabriel E, Fagg G, Bosilca G, Angskun T, Dongarra J, Squyres J, Sahay V, Kambadur P, Barrett B, Lumsdaine A, Castain R, Daniel D, Graham R, Woodall T. Open MPI: goals, concept, and design of a next generation mpi implementation. In Proceedings of the 11th European PVM/MPI Users’ Group Meeting, vol. 3241, Lecture Notes in Computer Science. Springer: Budapest, Hungary, September 2004; 97–104.
- 27 Browne S, Dongarra J, Garner N, Ho G, Mucci P. A portable programming interface for performance evaluation on modern processors. Internation Journal of High Performance Computing and Applications 2000; 14(3): 189–204.
- 28 Kufrin R. Perfsuite: an accessible, open source performance analysis environment for linux. Proceedings of the 6th International Conference on Linux Clusters: The HPC Revolution 2005 (LCI-05), Chapel Hill, NC, 2005. (Available from: http://www.linuxclustersinstitute.org/conferences/archive/2005/PDF/21-Kufrin_R.pdf).
- 29 Knüpfer A, Brunst H, Doleschal J, Jurenz M, Lieber M, Mickler H, Müller M, Nagel W. The vampir performance analysis tool-set. Proceedings of the 2nd International Workshop on Parallel Tools for High Performance Computing (HLRS), Stuttgart, Germany, 2008; 139–155.
- 30
Chassin de Kergommeaux J,
de Oliveira Stein B,
Bernard P. Pajé, an interactive visualization tool for tuning multi-threaded parallel applications. Parallel Computing 2000; 26(10): 1253–1274.
10.1016/S0167-8191(00)00010-7 Google Scholar
- 31 Desprez F, Markomanolis GS, Quinson M, Suter F. Assessing the performance of mpi applications through time-independent trace replay. Proceedings of the 2nd International Workshop on Parallel Software Tools and Tool Infrastructures (PSTI), Taipei, Taiwan, 2011; 467–476.
- 32 Desprez F, Markomanolis GS, Suter F. Improving the accuracy and efficiency of time-independent trace replay. Proceedings of the 3rd International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS), Salt Lake City, UT, 2012; 446–455.
- 33 Bailey D, Barszcz E, Barton J, Browning D, Carter R, Dagum L, Fatoohi R, Frederickson P, Lasinski T, Schreiber R, Simon H, Venkatakrishnan V, Weeratunga S. The NAS parallel benchmarks - summary and preliminary results. Proceedings of Supercomputing '91, Albuquerque, NM, 1991; 158–165.
- 34 Jeannot E. Improving middleware performance with adoc: an adaptive online compression library for data transfer. Proceedings of the 19th International Parallel and Distributed Processing Symposium, Denver, CO, 2005.
- 35 Top 500 Supercomputer Sites. (Available from: http://www.top500.org/) [Accessed on February 2014].
- 36
Reussner R,
Sanders P,
Träff JL. SKaMPI: a comprehensive benchmark for public benchmarking of mpi. Scientific Programming 2002; 10(1): 55–65.
10.1155/2002/202839 Google Scholar
- 37 Genovese L, Neelov A, Goedecker S, Deutsch T, Ghasemi SA, Willand A, Caliste D, Zilberberg O, Rayson M, Bergman A, Schneider R. Daubechies wavelets as a basis set for density functional pseudopotential calculations. Journal of Chemical Physics 2008; 129(014109). (Available from: http://scitation.aip.org/content/aip/journal/jcp/129/1/10.1063/1.2949547).
- 38 Mont-Blanc: european approach towards energy efficient high performance. Montblanc. (Available from: http://www.montblanc-project.eu/) [Accessed on February 2014].
- 39 Markomanolis GS, Suter F. Time-independent trace acquisition framework – a grid’5000 How-to. Technical Report RT-0407, Institut National de Recherche en Informatique et en Automatique (INRIA), 2011. (Available from: http://hal.inria.fr/inria-00586052) [Accessed on February 2014].