A novel data partitioning algorithm for dynamic energy optimization on heterogeneous high-performance computing platforms
Corresponding Author
Hamidreza Khaleghzadeh
School of Computer Science, University College Dublin, Belfield, Ireland
Correspondence
Hamidreza Khaleghzadeh, School of Computer Science, University College Dublin, Belfield, Dublin 4, Ireland.
Email: [email protected]
Search for more papers by this authorMuhammad Fahad
School of Computer Science, University College Dublin, Belfield, Ireland
Search for more papers by this authorRavi Reddy Manumachu
School of Computer Science, University College Dublin, Belfield, Ireland
Search for more papers by this authorAlexey Lastovetsky
School of Computer Science, University College Dublin, Belfield, Ireland
Search for more papers by this authorCorresponding Author
Hamidreza Khaleghzadeh
School of Computer Science, University College Dublin, Belfield, Ireland
Correspondence
Hamidreza Khaleghzadeh, School of Computer Science, University College Dublin, Belfield, Dublin 4, Ireland.
Email: [email protected]
Search for more papers by this authorMuhammad Fahad
School of Computer Science, University College Dublin, Belfield, Ireland
Search for more papers by this authorRavi Reddy Manumachu
School of Computer Science, University College Dublin, Belfield, Ireland
Search for more papers by this authorAlexey Lastovetsky
School of Computer Science, University College Dublin, Belfield, Ireland
Search for more papers by this authorFunding information: Science Foundation Ireland, 14/IA/2474
Summary
Energy is one of the most important objectives for optimization on modern heterogeneous high-performance computing (HPC) platforms. The tight integration of multicore CPUs with accelerators such as graphical processing units (GPUs) and Xeon Phi coprocessors in these platforms presents several challenges to the optimization of multithreaded data-parallel applications for energy. In this work, the problem of optimization of data-parallel applications on heterogeneous HPC platforms for dynamic energy through workload distribution is formulated. We propose a workload partitioning algorithm to solve this problem. It employs load-imbalancing technique to determine the workload distribution minimizing the dynamic energy consumption of the parallel execution of an application. The inputs to the algorithm are discrete dynamic energy profiles of individual computing devices. The profiles are practically constructed using an approach that accurately models the energy consumption by execution of a hybrid scientific data-parallel application on a heterogeneous platform containing different computing devices such as CPU, GPU, and Xeon Phi. The proposed algorithm is experimentally analyzed using two multithreaded data-parallel applications, matrix multiplication and 2D fast Fourier transform. The load-imbalanced solutions provided by the algorithm achieve significant dynamic energy reductions for the two applications (in average by 130% and 44%, respectively) compared with the load-balanced solutions.
REFERENCES
- 1Hsu J. Three paths to exascale supercomputing. IEEE Spectrum. 2016; 53(1): 14-15.
10.1109/MSPEC.2016.7367447 Google Scholar
- 2 Top500. Top500; 2018. https://www.top500.org/lists/2018/11/.
- 3 DOE. Preliminary conceptual design for an exascale computing initiative; 2014. https://science.energy.gov/∼/media/ascr/ascac/pdf/meetings/20141121/Exascale_Preliminary_Plan_V11_sb03c.pdf.
- 4Lang J, Rünger G. An execution time and energy model for an energy-aware execution of a conjugate gradient method with CPU/GPU collaboration. J Parall Distrib Comput. 2014; 74(9): 2884-2897.
- 5Chakrabarti A, Parthasarathy S, Stewart C. A pareto framework for data analytics on heterogeneous systems: implications for green energy usage and performance. Paper presented at: Proceedings of the 46th International Conference on Parallel Processing (ICPP); 2017:533-542; IEEE.
- 6Lastovetsky A, Reddy R. New model-based methods and algorithms for performance and energy optimization of data parallel applications on homogeneous multicore clusters. IEEE Trans Parall Distrib Syst. 2017; 28(4): 1119-1133.
- 7Manumachu RR, Lastovetsky A. Bi-objective optimization of data-parallel applications on homogeneous multicore clusters for performance and energy. IEEE Trans Comput. 2018; 67(2): 160-177.
- 8Khaleghzadeh H, Zhong Z, Reddy R, Lastovetsky A. Out-of-core implementation for accelerator kernels on heterogeneous clouds. J Supercomput. 2018; 74(2): 551-568.
- 9Zhong Z, Rychkov V, Lastovetsky A. Data partitioning on multicore and multi-GPU platforms using functional performance models. IEEE Trans Comput. 2015; 64(9): 2506-2518.
- 10 HCL HCLWattsUp: API for power and energy measurements using WattsUp Pro Meter; 2016. https://csgitlab.ucd.ie/ucd-hcl/hclwattsup.
- 11Konstantakos V, Chatzigeorgiou A, Nikolaidis S, Laopoulos T. Energy consumption estimation in embedded systems. IEEE Trans Instrument Measur. 2008; 57(4): 797-804.
- 12Rotem E, Naveh A, Ananthakrishnan A, Weissmann E, Rajwan D. Power-management architecture of the intel microarchitecture code-named sandy bridge. IEEE Micro. 2012; 32(2): 20-27.
- 13Intel Corporation, Intel® Xeon Phi™ Coprocessor System Software Developers Guide. Intel Corporation; 2014. https://software.intel.com/sites/default/files/managed/09/07/xeon-phi-coprocessor-system-software-developers-guide.pdf.
- 14 Nvidia Nvidia management library: NVML reference manual; 2018.
- 15Gough C, Steiner I, Saunders W. Energy Efficient Servers Blueprints for Data Center Optimization. Springer Nature; 2015.
10.1007/978-1-4302-6638-9 Google Scholar
- 16Economou D, Rivoire S, Kozyrakis C, Ranganathan P. Full-system power analysis and modeling for server environments. Paper presented at: Proceedings of Workshop on Modeling, Benchmarking, and Simulation; 2006:70-77.
- 17McCullough JC, Agarwal Y, Chandrashekar J, Kuppuswamy S, Snoeren AC, Gupta RK. Evaluating the effectiveness of model-based power characterization. Paper presented at: Proceedings of the 2011 USENIX Conference on USENIX Annual Technical Conference. USENIX Association; 2011:12.
- 18O'Brien K, Pietri I, Reddy R, Lastovetsky A, Sakellariou R. A survey of power and energy predictive models in HPC systems and applications. ACM Comput Surv. 2017; 50(3): 37.
- 19Shahid A, Fahad M, Reddy R, Lastovetsky A. Additivity: a selection criterion for performance events for reliable energy predictive modeling. Supercomput Front Innov Int J. 2017; 4(4): 50-65.
- 20Liu Y, Zhu H, Lu K, Wang X. Self-adaptive management of the sleep depths of idle nodes in large scale systems to balance between energy consumption and response times. Paper presented at: Proceedings of the 2012 IEEE 4th International Conference on Cloud Computing Technology and Science (CloudCom); 2012:633-639; IEEE.
- 21Benoit A, Lefèvre L, Orgerie AC, Rais I. Reducing the energy consumption of large-scale computing systems through combined shutdown policies with multiple constraints. Int J High Perf Comput Appl. 2018; 32(1): 176-188.
- 22Rossi FD, Xavier MG, De Rose CA, Calheiros RN, Buyya R. E-eco: Performance-aware energy-efficient cloud data center orchestration. J Netw Comput Appl. 2017; 78: 83-96.
- 23Chen K, Lenhardt J, Schiffmann W. Improving energy efficiency of web servers by using a load distribution algorithm and shutting down idle nodes. Paper presented at: Proceedings of the 2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid); 2015:745-748; IEEE.
- 24Rajamani K, Lefurgy C. On evaluating request-distribution schemes for saving energy in server clusters. Paper presented at: Proceedings of the 2003 IEEE International Symposium on Performance Analysis of Systems and Software, ISPASS 2003; 2003:111-122; IEEE.
- 25Basmadjian R, Ali N, Niedermeier F, Meer DH, Giuliani G. A methodology to predict the power consumption of servers in data centres. Paper presented at: Proceedings of the 2nd International Conference on Energy-Efficient Computing and Networking; 2011; ACM.
- 26Lively C, Wu X, Taylor V, et al. Power-aware predictive models of hybrid (MPI/OpenMP) scientific applications on multicore systems. Comput Sci-Res Dev. 2012; 27(4): 245-253.
10.1007/s00450-011-0190-0 Google Scholar
- 27Rofouei M, Stathopoulos T, Ryffel S, Kaiser W, Sarrafzadeh M. Energy-aware high performance computing with graphic processing units. Paper presented at: Proceedings of the Workshop on Power Aware Computing and System; 2008.
- 28Li S, Ahn JH, Strong RD, Brockman JB, Tullsen DM, Jouppi NP. McPAT: an integrated power, area, and timing modeling framework for multicore and manycore architectures. Paper presented at: Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture; 2009:469-480; ACM.
- 29Lim J, Lakshminarayana NB, Kim H, Song W, Yalamanchili S, Sung W. Power modeling for GPU architectures using McPAT. ACM Trans Des Automat Electron Syst (TODAES). 2014; 19(3): 26.
- 30Hong S, Kim H. An integrated GPU power and performance model. Paper presented at: Proceedings of the 38 of ACM SIGARCH Computer Architecture News; 2010:280-289; ACM.
- 31Nagasaka H, Maruyama N, Nukada A, Endo T, Matsuoka S. Statistical power modeling of GPU kernels using performance counters. Paper presented at: Proceedings of the 2010 International IEEE Green Computing Conference; 2010:115-122.
- 32Chen J, Li B, Zhang Y, Peng L, Peir JK. Statistical GPU power analysis using tree-based methods. Paper presented at: Proceedings of the 2011 International Green Computing Conference and Workshops (IGCC); 2011:1-6; IEEE.
- 33Song S, Su C, Rountree B, Cameron KW. A simplified and accurate model of power-performance efficiency on emergent GPU architectures. Paper presented at: Proceedings of the 27th IEEE International Parallel and Distributed Processing Symposium (IPDPS). IEEE Computer Society; 2013:673-686.
- 34Kestor G, Gioiosa R, Kerbyson DJ, Hoisie A. Enabling accurate power profiling of HPC applications on exascale systems. Paper presented at: Proceedings of the 3rd International Workshop on Runtime and Operating Systems for Supercomputers; 2013:4; ACM.
- 35Choi JW, Bedard D, Fowler R, Vuduc R. A roofline model of energy. Paper presented at: Proceedings of the 2013 IEEE 27th International Symposium on Parallel & Distributed Processing (IPDPS); 2013:661-672; IEEE.
- 36Williams S, Waterman A, Patterson D. Roofline: an insightful visual performance model for multicore architectures. Commun ACM. 2009; 52(4): 65-76.
- 37Shao YS, Brooks D. Energy characterization and instruction-level energy model of Intel's Xeon Phi processor. Paper presented at: Proceedings of the 2013 International Symposium on Low Power Electronics and Design; 2013; IEEE Press.
- 38Jarus M, Oleksiak A, Piontek T, Węglarz J. Runtime power usage estimation of HPC servers for various classes of real-life applications. Future Generat Comput Syst. 2014; 36: 299-310.
- 39Al-Khatib Z, Abdi S. Operand-value-based modeling of dynamic energy consumption of soft processors in FPGA. Paper presented at: Proceedings of the International Symposium on Applied Reconfigurable Computing; 2015:65-76; Springer.
- 40Shahid A, Fahad M, Reddy R, Lastovetsky A. Additivity: a selection criterion for performance events for reliable energy predictive modeling. Supercomput Front Innovat. 2017; 4(4): 50-65.
- 41McCullough JC, Agarwal Y, Chandrashekar J, Kuppuswamy S, Snoeren AC, Gupta RK. Evaluating the effectiveness of model-based power characterization. Paper presented at: Proceedings of the 20 of USENIX Annual Technical Conference; 2011.
- 42Manumachu RR, Lastovetsky A. Parallel data partitioning algorithms for optimization of data-parallel applications on modern extreme-scale multicore platforms for performance and energy. IEEE Access. 2018; 6: 69075-69106.
- 43Fahad M, Shahid A, Manumachu RR, Lastovetsky A. A comparative study of methods for measurement of energy of computing. Energies. 2019; 12(11): 2204.
- 44Khaleghzadeh H, Manumachu RR, Lastovetsky A. A novel data-partitioning algorithm for performance optimization of data-parallel applications on heterogeneous HPC platforms. IEEE Trans Parall Distrib Syst. 2018; 29(10): 2176-2190.
- 45Khaleghzadeh H, Reddy R, Lastovetsky A. HEOPTA: heterogeneous model-based data partitioning algorithm for optimization of data-parallel applications for dynamic energy; 2019. https://csgitlab.ucd.ie/HKhaleghzadeh/heopt.