Volume 32, Issue 3 e5189
SPECIAL ISSUE PAPER

Twister2: Design of a big data toolkit

Supun Kamburugamuve

Corresponding Author

Supun Kamburugamuve

School of Informatics, Computing and Engineering, Indiana University, Bloomington, Indiana

Supun Kamburugamuve, School of Informatics, Computing and Engineering, Indiana University, Bloomington, IN 47408.

Email: [email protected]

Search for more papers by this author
Kannan Govindarajan

Kannan Govindarajan

School of Informatics, Computing and Engineering, Indiana University, Bloomington, Indiana

Search for more papers by this author
Pulasthi Wickramasinghe

Pulasthi Wickramasinghe

School of Informatics, Computing and Engineering, Indiana University, Bloomington, Indiana

Search for more papers by this author
Vibhatha Abeykoon

Vibhatha Abeykoon

School of Informatics, Computing and Engineering, Indiana University, Bloomington, Indiana

Search for more papers by this author
Geoffrey Fox

Geoffrey Fox

School of Informatics, Computing and Engineering, Indiana University, Bloomington, Indiana

Search for more papers by this author
First published: 06 March 2019
Citations: 15
Abbreviations: Big data, Serverless Computing, Event-driven.

Summary

Data-driven applications are essential to handle the ever-increasing volume, velocity, and veracity of data generated by sources such as the Web and Internet of Things (IoT) devices. Simultaneously, an event-driven computational paradigm is emerging as the core of modern systems designed for database queries, data analytics, and on-demand applications. Modern big data processing runtimes and asynchronous many task (AMT) systems from high performance computing (HPC) community have adopted dataflow event-driven model. The services are increasingly moving to an event-driven model in the form of Function as a Service (FaaS) to compose services. An event-driven runtime designed for data processing consists of well-understood components such as communication, scheduling, and fault tolerance. Different design choices adopted by these components determine the type of applications a system can support efficiently. We find that modern systems are limited to specific sets of applications because they have been designed with fixed choices that cannot be changed easily. In this paper, we present a loosely coupled component-based design of a big data toolkit where each component can have different implementations to support various applications. Such a polymorphic design would allow services and data analytics to be integrated seamlessly and expand from edge to cloud to HPC environments.

The full text of this article hosted at iucr.org is unavailable due to technical difficulties.