Volume 29, Issue 18 e4206
RESEARCH ARTICLE

Thermal-aware task assignments in high performance computing clusters

Shubbhi Taneja

Corresponding Author

Shubbhi Taneja

Department of Computer Science and Software Engineering, Auburn University, Auburn, AL, USA

Correspondence

Shubbhi Taneja, Department of Computer Science and Software Engineering, Auburn University, Auburn, AL, USA.

Email: [email protected]

Search for more papers by this author
Sanjay Kulkarni

Sanjay Kulkarni

Department of Computer Science and Software Engineering, Auburn University, Auburn, AL, USA

Search for more papers by this author
Yi Zhou

Yi Zhou

Department of Computer Science and Software Engineering, Auburn University, Auburn, AL, USA

Search for more papers by this author
Xiao Qin

Xiao Qin

Department of Computer Science and Software Engineering, Auburn University, Auburn, AL, USA

Search for more papers by this author
First published: 02 August 2017

Summary

Cluster-level thermal management has gained much attention over the past decade due to rising cooling costs associated with data centers. In this research, we propose and implement a static scheduler called SSched and a dynamic one named DSched. These 2 algorithms schedule jobs based on CPU and disk temperatures of a Hadoop cluster's nodes. Our schedulers rely on a monitoring mechanism to keep track of CPU and disk utilization, maintaining CPU and disk temperatures below a threshold through thermal-aware scheduling decisions. To facilitate the design of SSched and DSched, we classify jobs into the CPU-intensive and disk-intensive categories. When a job arrives, SSched retrieves the utilization stats from a profiled log, estimates the thermal behavior, and places the job on NodeManager to minimize thermal impacts. Unlike SSched, DSched improves thermal efficiency of Hadoop clusters through dynamic load balancing. DSched keeps track of the coolest and hottest nodes in the cluster; tasks are migrated from hot nodes into cool ones if any hot spot is detected. To evaluate the effectiveness of our schedulers, we keep track of average CPU and disk temperatures in a node, managing an optimal outlet temperature across a cluster. We demonstrate that compared with the traditional Hadoop scheduler, SSched and DSched achieve approximately 15% savings in terms of cooling cost with little performance overhead.

The full text of this article hosted at iucr.org is unavailable due to technical difficulties.