Volume 32, Issue 3 e5159
SPECIAL ISSUE PAPER

On the memory attribution problem: A solution and case study using MPI

Samuel K. Gutiérrez

Corresponding Author

Samuel K. Gutiérrez

Computer, Computational, and Statistical Sciences Division, Los Alamos National Laboratory, Los Alamos, New Mexico

Scalable Systems Laboratory, Department of Computer Science, University of New Mexico, Albuquerque, New Mexico

Samuel K. Gutiérrez, Computer, Computational, and Statistical Sciences Division, Los Alamos National Laboratory, Los Alamos, NM 87545; or Scalable Systems Laboratory, Department of Computer Science, University of New Mexico, Albuquerque, NM 87131.

Email: [email protected]

Search for more papers by this author
Dorian C. Arnold

Dorian C. Arnold

Department of Math and Computer Science, Emory University, Atlanta, Georgia

Search for more papers by this author
Kei Davis

Kei Davis

Computer, Computational, and Statistical Sciences Division, Los Alamos National Laboratory, Los Alamos, New Mexico

Search for more papers by this author
Patrick McCormick

Patrick McCormick

Computer, Computational, and Statistical Sciences Division, Los Alamos National Laboratory, Los Alamos, New Mexico

Search for more papers by this author
First published: 04 February 2019
Citations: 2

Summary

As parallel applications running on large-scale computing systems become increasingly memory constrained, the ability to attribute memory usage to the various components of the application is becoming increasingly important. We present the design and implementation of memnesia, a novel memory usage profiler for parallel and distributed message-passing applications. Our approach captures both application– and message-passing library–specific memory usage statistics from unmodified binaries dynamically linked to a message-passing communication library. Using microbenchmarks and proxy applications, we evaluated our profiler across three Message Passing Interface (MPI) implementations and two hardware platforms. The results show that our approach and the corresponding implementation can accurately quantify memory resource usage as a function of time, scale, communication workload, and software or hardware system architecture, clearly distinguishing between application and MPI library memory usage at a per-process level. With this new capability, we show that job size, communication workload, and hardware/software architecture influence peak runtime memory usage. In practice, this tool provides a potentially valuable source of information for application developers seeking to measure and optimize memory usage.

The full text of this article hosted at iucr.org is unavailable due to technical difficulties.