Volume 32, Issue 21 e5928
SPECIAL ISSUE PAPER

A novel data partitioning algorithm for dynamic energy optimization on heterogeneous high-performance computing platforms

Hamidreza Khaleghzadeh

Corresponding Author

Hamidreza Khaleghzadeh

School of Computer Science, University College Dublin, Belfield, Ireland

Correspondence

Hamidreza Khaleghzadeh, School of Computer Science, University College Dublin, Belfield, Dublin 4, Ireland.

Email: [email protected]

Search for more papers by this author
Muhammad Fahad

Muhammad Fahad

School of Computer Science, University College Dublin, Belfield, Ireland

Search for more papers by this author
Ravi Reddy Manumachu

Ravi Reddy Manumachu

School of Computer Science, University College Dublin, Belfield, Ireland

Search for more papers by this author
Alexey Lastovetsky

Alexey Lastovetsky

School of Computer Science, University College Dublin, Belfield, Ireland

Search for more papers by this author
First published: 22 July 2020
Citations: 5

Funding information: Science Foundation Ireland, 14/IA/2474

Summary

Energy is one of the most important objectives for optimization on modern heterogeneous high-performance computing (HPC) platforms. The tight integration of multicore CPUs with accelerators such as graphical processing units (GPUs) and Xeon Phi coprocessors in these platforms presents several challenges to the optimization of multithreaded data-parallel applications for energy. In this work, the problem of optimization of data-parallel applications on heterogeneous HPC platforms for dynamic energy through workload distribution is formulated. We propose a workload partitioning algorithm to solve this problem. It employs load-imbalancing technique to determine the workload distribution minimizing the dynamic energy consumption of the parallel execution of an application. The inputs to the algorithm are discrete dynamic energy profiles of individual computing devices. The profiles are practically constructed using an approach that accurately models the energy consumption by execution of a hybrid scientific data-parallel application on a heterogeneous platform containing different computing devices such as CPU, GPU, and Xeon Phi. The proposed algorithm is experimentally analyzed using two multithreaded data-parallel applications, matrix multiplication and 2D fast Fourier transform. The load-imbalanced solutions provided by the algorithm achieve significant dynamic energy reductions for the two applications (in average by 130% and 44%, respectively) compared with the load-balanced solutions.

The full text of this article hosted at iucr.org is unavailable due to technical difficulties.