Concurrency and Computation: Practice and Experience

Volume 29, Issue 15 e4120

SPECIAL ISSUE PAPER

Computing on many cores

Bernard Goossens,

Corresponding Author

Bernard Goossens

[email protected]

DALI, UPVD, 52 avenue Paul Alduy, 66860, Perpignan, Cedex 9 France

LIRMM, CNRS: UMR 5506 - UM2, 161 rue Ada, 34095, Montpellier, Cedex 5 France

Correspondence

Bernard Goossens, DALI, UPVD, 52 avenue Paul Alduy, 66860 Perpignan Cedex 9 France. LIRMM, CNRS: UMR 5506 - UM2, 161 rue Ada, 34095 Montpellier Cedex 5, France.

Email: [email protected]

Search for more papers by this author

David Parello,

David Parello

DALI, UPVD, 52 avenue Paul Alduy, 66860, Perpignan, Cedex 9 France

LIRMM, CNRS: UMR 5506 - UM2, 161 rue Ada, 34095, Montpellier, Cedex 5 France

Search for more papers by this author

Katarzyna Porada,

Katarzyna Porada

DALI, UPVD, 52 avenue Paul Alduy, 66860, Perpignan, Cedex 9 France

LIRMM, CNRS: UMR 5506 - UM2, 161 rue Ada, 34095, Montpellier, Cedex 5 France

Search for more papers by this author

Djallal Rahmoune,

Djallal Rahmoune

DALI, UPVD, 52 avenue Paul Alduy, 66860, Perpignan, Cedex 9 France

LIRMM, CNRS: UMR 5506 - UM2, 161 rue Ada, 34095, Montpellier, Cedex 5 France

Search for more papers by this author

Bernard Goossens,

Corresponding Author

Bernard Goossens

[email protected]

DALI, UPVD, 52 avenue Paul Alduy, 66860, Perpignan, Cedex 9 France

LIRMM, CNRS: UMR 5506 - UM2, 161 rue Ada, 34095, Montpellier, Cedex 5 France

Correspondence

Bernard Goossens, DALI, UPVD, 52 avenue Paul Alduy, 66860 Perpignan Cedex 9 France. LIRMM, CNRS: UMR 5506 - UM2, 161 rue Ada, 34095 Montpellier Cedex 5, France.

Email: [email protected]

Search for more papers by this author

David Parello,

David Parello

DALI, UPVD, 52 avenue Paul Alduy, 66860, Perpignan, Cedex 9 France

LIRMM, CNRS: UMR 5506 - UM2, 161 rue Ada, 34095, Montpellier, Cedex 5 France

Search for more papers by this author

Katarzyna Porada,

Katarzyna Porada

DALI, UPVD, 52 avenue Paul Alduy, 66860, Perpignan, Cedex 9 France

LIRMM, CNRS: UMR 5506 - UM2, 161 rue Ada, 34095, Montpellier, Cedex 5 France

Search for more papers by this author

Djallal Rahmoune,

Djallal Rahmoune

DALI, UPVD, 52 avenue Paul Alduy, 66860, Perpignan, Cedex 9 France

LIRMM, CNRS: UMR 5506 - UM2, 161 rue Ada, 34095, Montpellier, Cedex 5 France

Search for more papers by this author

First published: 28 March 2017

https://doi.org/10.1002/cpe.4120

Citations: 1

Share a link

Email
Wechat
Bluesky

Summary

This paper presents an alternative method to parallelize programs, better suited to manycore processors than actual operating system–/API-based approaches like OpenMP and MPI. The method relies on a parallelizing hardware and an adapted programming style. It frees and captures the instruction-level parallelism (ILP). A many-core design is presented in which cores are multithreaded and able to fork new threads. The programming style is based on functions. The hardware creates a concurrent thread at each function call. The programming style and the hardware create the conditions to free the ILP, by eliminating the architectural dependences between a call and its continuation after return. We illustrate the method on a sum reduction, a matrix multiplication and a sort. We measure the ILP of the parallel runs and show that it is high enough to feed thousands of cores because it increases with data size. We compare our method to pthread parallelization, showing that (1) our parallel execution is deterministic, (2) our thread management is cheap, (3) our parallelism is implicit, and (4) our method parallelizes functions and loops. Implicit parallelism makes parallel code easy to write and read. Deterministic parallel execution makes parallel code easy to debug.

REFERENCES

1Mittal S, Vetter JS. A survey of CPU-GPU heterogeneous computing techniques. ACM Comput Surv. 2015; 47(4): 69:1–69:35.
10.1145/2788396
Web of Science® Google Scholar
2Tullsen DM, Eggers SJ, Levy HM. Simultaneous multithreading: maximizing on-chip parallelism. 25 Years of the International Symposia on Computer Architecture (Selected Papers), ISCA '98. ACM, New York, NY, USA; 1998: 533–544.
Google Scholar
3Keller RM. Look-ahead processors. ACM Comput Surv. 1975; 7(4): 177–195.
10.1145/356654.356657
Google Scholar
4Hudak P. Conception, evolution, and application of functional programming languages. ACM Comput Surv. 1989; 21(3): 359–411.
10.1145/72551.72554
Web of Science® Google Scholar
5Hartel P, Muller H, Glaser H. The functional “C” experience. J Funct Program. 2004; 14(2): 129–135.
10.1017/S0956796803004817
Web of Science® Google Scholar
6Hudak P, Hughes J, Peyton Jones S, Wadler P. A history of haskell: being lazy with class. Proceedings of the Third ACM SIGPLAN Conference on History of Programming Languages, HOPL III. ACM, New York, NY, USA; 2007: 12–1–12–55.
Google Scholar
7Dean J, Ghemawat S. Mapreduce: simplified data processing on large clusters. Proceedings of the 6th Conference on Symposium on Opearting Systems Design & Implementation, OSDI'04, vol. 6. USENIX Association, Berkeley, CA, USA; 2004: 10–10.
Google Scholar
8Luk CK, Cohn R, Muth R, et al. Pin: building customized program analysis tools with dynamic instrumentation. SIGPLAN Not. 2005; 40(6): 190–200.
10.1145/1064978.1065034
Web of Science® Google Scholar
9Goossens B, Langlois P, Parello D, Petit E. Perpi: a tool to measure instruction level parallelism. PARA'2010 (1), Reykjavík; 2010: 270–281.
Google Scholar
10Wall DW. Limits of instruction-level parallelism. Proceedings of the Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS IV, Santa Clara, CA; 1991: 176–188.
Google Scholar
11Goossens B, Parello D. Limits of instruction-level parallelism capture. Procedia Comput Sci. 2013; 18(0): 1664–1673. 2013 International Conference on Computational Science.
10.1016/j.procs.2013.05.334
Web of Science® Google Scholar
12Diaz J, Munoz-Caro C, Nino A. A survey of parallel programming models and tools in the multi and many-core era. IEEE Trans Parallel Distrib Syst. 2012; 23(8): 1369–1386.
10.1109/TPDS.2011.308
Web of Science® Google Scholar
13Feautrier P. Dataflow analysis of array and scalar references. Int J Parallel Program. 1991; 20(1): 23–53.
10.1007/BF01407931
Web of Science® Google Scholar
14Benabderrahmane MW, Pouchet LN, Cohen A, Bastoul C. The polyhedral model is more widely applicable than you think. Proceedings of the 19th Joint European Conference on Theory and Practice of Software, International Conference on Compiler Construction, CC'10/ETAPS'10. Springer-Verlag, Berlin, Heidelberg; 2010: 283–303.
Google Scholar
15Chapman B, Jost G, Pas Rvd. Using OpenMP: Portable Shared Memory Parallel Programming (Scientific and Engineering Computation): The MIT Press: Cambridge, MA; 2007.
Google Scholar
16Butenhof DR. Programming with POSIX Threads: Addison-Wesley: Boston, MA; 1997.
Google Scholar
17Lee EA. The problem with threads. Comput. 2006; 39(5): 33–42.
10.1109/MC.2006.180
CAS PubMed Web of Science® Google Scholar
18Kim HS, Smith JE. An instruction set and microarchitecture for instruction level distributed processing. 29th Annual International Symposium on Computer Architecture, 2002. Proceedings, Anchorage, AK; 2002: 71–81.
Google Scholar
19Mehrara M, Hao J, Hsu PC, Mahlke S. Parallelizing sequential applications on commodity hardware using a low-cost software transactional memory. Proceedings of the 30th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI '09. ACM, New York, NY, USA; 2009: 166–176.
Google Scholar
20Ranjan R, Latorre F, Marcuello P, Gonzalez A. Fg-stp: Fine-grain single thread partitioning on multicores. 2011 IEEE 17th International Symposium on High Performance Computer Architecture (HPCA), San Antonio, TX; 2011: 15–24.
Google Scholar
21Sharafeddine M, Jothi K, Akkary H. Disjoint out-of-order execution processor. Trans Archit Code Optim (TACO). 2012; 9(3): 19:1–19:32.
Web of Science® Google Scholar
22Goossens B, Parello D, Porada K, Rahmoune D. Toward a core design to distribute an execution on a many core processor. In: Victor Malyshkin, ed. Parallel Computing Technologies, Lecture Notes in Computer Science, vol. 9251: Petrozavodsk:Springer International Publishing; 2015: 390–404.
10.1007/978-3-319-21909-7_38
Google Scholar
23Shun J, Blelloch GE, Fineman JT, et al. Brief announcement: the problem based benchmark suite. Proceedings of the 24th ACM Symposium on Parallelism in Algorithms and Architectures, SPAA '12, Pittsburgh, PA; 2012: 68–70.
Google Scholar
24Goossens B, Parello D, Porada K, Rahmoune D. Parallel locality and parallelization quality. 7th International Workshop on Programming Models and Applications for Multicores and Manycores, Barcelona; 2016: 59–68.
Google Scholar

Citing Literature

Volume29, Issue15

Combined Special Issues on Euro‐Par 2016 and the Workshop on the seventh international workshop on programming models and applications for multicores and manycores (PMAM 2016)

10 August 2017

e4120

Computing on many cores

Summary

REFERENCES

Citing Literature

References

Information

About Wiley Online Library

Help & Support

Opportunities

Connect with Wiley

Computing on many cores

Summary

REFERENCES

Citing Literature

References

Related

Information