Scheduling independent tasks on multi-cores with GPU accelerators
Raphael Bleuse
University Grenoble Alpes, LIG, 655 Avenue de l'Europe, 38334 St. Ismier Cedex, France
Search for more papers by this authorSafia Kedad-Sidhoum
Sorbonne Universités, UPMC Univ. Paris 06, UMR 7606, LIP6, F-75005, Paris, France
Search for more papers by this authorCorresponding Author
Florence Monna
Sorbonne Universités, UPMC Univ. Paris 06, UMR 7606, LIP6, F-75005, Paris, France
University Grenoble Alpes, LIG, 655 Avenue de l'Europe, 38334 St. Ismier Cedex, France
Correspondence to: Florence Monna, Laboratoire d'Informatique de Paris 6, France.
E-mail: [email protected]
Search for more papers by this authorGrégory Mounié
University Grenoble Alpes, LIG, 655 Avenue de l'Europe, 38334 St. Ismier Cedex, France
Search for more papers by this authorDenis Trystram
University Grenoble Alpes, LIG, 655 Avenue de l'Europe, 38334 St. Ismier Cedex, France
Institut Universitaire de France, France
Search for more papers by this authorRaphael Bleuse
University Grenoble Alpes, LIG, 655 Avenue de l'Europe, 38334 St. Ismier Cedex, France
Search for more papers by this authorSafia Kedad-Sidhoum
Sorbonne Universités, UPMC Univ. Paris 06, UMR 7606, LIP6, F-75005, Paris, France
Search for more papers by this authorCorresponding Author
Florence Monna
Sorbonne Universités, UPMC Univ. Paris 06, UMR 7606, LIP6, F-75005, Paris, France
University Grenoble Alpes, LIG, 655 Avenue de l'Europe, 38334 St. Ismier Cedex, France
Correspondence to: Florence Monna, Laboratoire d'Informatique de Paris 6, France.
E-mail: [email protected]
Search for more papers by this authorGrégory Mounié
University Grenoble Alpes, LIG, 655 Avenue de l'Europe, 38334 St. Ismier Cedex, France
Search for more papers by this authorDenis Trystram
University Grenoble Alpes, LIG, 655 Avenue de l'Europe, 38334 St. Ismier Cedex, France
Institut Universitaire de France, France
Search for more papers by this authorSummary
More and more computers use hybrid architectures combining multi-core processors and hardware accelerators such as graphics processing units (GPUs). We present in this paper a new method for scheduling efficiently parallel applications with m CPUs and k GPUs, where each task of the application can be processed either on a core (CPU) or on a GPU. The objective is to minimize the maximum completion time (makespan). The corresponding scheduling problem is Non-deterministic Polynomial (NP)-time hard, Copyright © 2014 John Wiley & Sons, Ltd.
References
- 1
Lee VW,
Kim C,
Chhugani J,
Deisher M,
Kim D,
Nguyen AD,
Satish N,
Smelyanskiy M,
S Chennupaty,
Hammarlund P,
Singhal R,
Dubey P. Debunking the 100x gpu vs. cpu myth: an evaluation of throughput computing on CPU and GPU. In ISCA, A Seznec, UC Weiser, R Ronen (eds). ACM: New York, 2010; 451–460.
10.1145/1815961.1816021 Google Scholar
- 2 Hochbaum DS, Shmoys DB. A polynomial approximation scheme for scheduling on uniform processors: using the dual approximation approach. SIAM Journal on Computing 1988; 17(3): 539–551.
- 3 Bonifaci V, Wiese A. Scheduling unrelated machines of few different types. CoRR 2012; abs/1205.0974: 16.
- 4 Pinel F, Dorronsoro B, Bouvry P. Solving very large instances of the scheduling of independent tasks problem on the GPU. Journal of Parallel and Distributed Computing 2012; 73: 101–110.
- 5 Agullo E, Augonnet C, Dongarra J, Faverge M, Ltaief H, Thibault S, Tomov S. Qrfactorization on a multicore node enhanced with multiple GPU accelerators. IEEE International Parallel & Distributed Processing Symposium (IPDPS), Anchorage, AK, 2011; 932–943.
- 6 Song F, Tomov S, Dongarra J. Enabling and scaling matrix computations on heterogeneous multi-core and multi-gpu systems. 26th ACM International Conference on Supercomputing (ICS 2012): ACM, Venice, Italy, 2012; 365–376.
- 7 Boukerche A, Correa JM, Melo A, Jacobi RP. A hardware accelerator for the fast retrieval of dialign biological sequence alignments in linear space. IEEE Transactions on Computers 2010; 59: 808–821.
- 8 Phillips JC, Stone JE, Schulten K. Adapting a message-driven parallel application to GPU-accelerated clusters. SC, Austin, TX, 2008; 1–9.
- 9 Bueno J, Planas J, Duran A, Badia RM, Martorell X, Ayguadé E, Labarta J. Productive programming of GPU clusters with OMPSS. IPDPS. IEEE Computer Society, Shanghai, 2012; 557–568.
- 10 Augonnet C, Thibault S, Namyst R, Wacrenier P. -A. Starpu: a unified platform for task scheduling on heterogeneous multicore architectures. Concurrency and Computation: Practice and Experience 2011; 23: 187–198.
- 11 Gautier T, Ferreira L, Joao V, Maillard N, Raffin B. Xkaapi: a runtime system for data-flow task programming on heterogeneous architectures. Proceedings of IEEE International Parallel and Distributed Processing Symposium (IPDPS), Boston, USA, 2013.
- 12 Chen L, Ye D, Zhang G. Online scheduling on a CPU-GPU cluster. TAMC 2013; 7876: 1–9.
- 13 Hochbaum DS, Shmoys DB. Using dual approximation algorithms for scheduling problems theoretical and practical results. Journal of ACM 1987; 34(1): 144–162.
- 14 Topcuoglu H, Hariri S, Wu M-Y. Performance-effective and low-complexity task scheduling for heterogeneous computing. IEEE TPDS 2002; 13(3): 260–274.
- 15 Kedad-Sidhoum S, Monna F, Mounié G, Trystram D. Scheduling independent tasks on multi-cores with GPU accelerators. Proceedings of HeteroPar 2013, Aachen, 2013; 228–237.
- 16
Garey MR,
Grahams RL. Bounds for multiprocessor scheduling with resource constraints. SIAM Journal on Computing 1975; 4: 187–200.
10.1137/0204015 Google Scholar
- 17 Blazewicz J, Ecker K, Pesch E, Schmidt G, Weglarz J. Handbook on Scheduling, From theory to applications, International handbooks on information systems. Springer: Heidelberg, Germany, 2007.
- 18 Lenstra JK, Shmoys DB, Tardos E. Approximation algorithms for scheduling unrelated parallel machines. Mathematical Programming 1988; 46: 259–271.
- 19 Shmoys DB, Tardos E. An approximation algorithm for the generalized assignment problem. Mathematical Programming 1993; 62: 461–474.
- 20 Shchepin EV, Vakhania N. An optimal rounding gives a better approximation for scheduling unrelated machines. Operations Research Letters 2004; 33: 127–133.
- 21 Friesen DK. Tighter bounds for LPT scheduling on uniform processors. SIAM Journal on Computing 1987; 16(3): 554–560.
- 22 Nélis V, Raravi G. A PTAS for assigning sporadic tasks on two-type heterogeneous multiprocessors. RTSS 2012; 117–126.
- 23 Imreh C. Scheduling problems on two sets of identical machines. Computing 2003; 70: 277–294.
- 24 Seifu S. Scheduling on heterogeneous cluster environments. Master's Thesis, Grenoble University, Grenoble, France, 2012.
- 25 Agullo E, Demmel J, Dongarra J, Hadri B, Kurzak J, Langou J, Ltaief H, Luszczek P, Tomov S. Numerical linear algebra on emerging architectures: the plasma and magma projects. Journal of Physics: Conference Series 2009; 180.
- 26 Bolze R, Cappello F, Caron E, Daydé MJ, Desprez F, Jeannot E, Jégou Y, Lanteri S, Leduc J, Melab N, Mornet G, Namyst R, Primet P, Quétier B, Richard O, Talbi E. -G., Touche I. Grid'5000: a large scale and highly reconfigurable experimental grid testbed. IJHPCA 2006; 20(4): 481–494.
- 27 Bleuse R, Gautier T, Lima JF, Mounié G, Trystram D. Scheduling data flow program in xkaapi: A new affinity-based algorithm for heterogeneous architectures. 20th International European Conference on Parallel Processing, ARCoSS/LNCS, Springer, Porto, Portugal, 2014; 560–571.