Streaming Data Processing for IoT
John Davies
BT Research and Venturing, Head of Next Generation Web Research, United Kingdom
Search for more papers by this authorCarolina Fortuna
Jožef Stefan Institute, Department of Communication Systems, Ljubljana, Slovenia
Search for more papers by this authorSummary
The early Internet of Things stream processing platforms were mainly designed to collect and display real-time raw sensor measurements. Data often has to go through several phases of processing to lead to actionable automatic or human decision making. This chapter discusses five main operations performed on streaming data: compression, dimensionality reduction, summarization, learning, and visualization. It distinguishes two main types of stream data processing systems. The first type, also called a data stream management system, is based on relational database principles and introduced the concept of continuous queries. The second type does not enforce a relational view and enables the creation of custom operators. Based on batch and stream processing paradigms, two data processing architectures emerged: the Lambda architecture, which enables both batch and stream processing, and the simpler Kappa architecture, which enables stream processing.
References
- Brin, S. and Page, L. (1998). The anatomy of a large-scale hypertextual web search engine. Computer Networks and ISDN Systems 30 (1–7): 107–117.
- Vidal, J.J. (1977). Real-time detection of brain events in EEG. Proceedings of the IEEE 65 (5): 633–641.
- Farley, D.T. (1985). On-line data processing techniques for MST radars. Radio Science 20 (6): 1177–1184.
- Kulkarni, S., Bhagat, N., Fu, M. et al. (2015). Twitter heron: stream processing at scale. Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data. ACM.
- Kejariwal, A., Kulkarni, S., and Ramasamy, K. (2015). Real time analytics: algorithms and systems. Proceedings of the VLDB Endowment 8 (12): 2040–2041.
-
Kazemitabar, S.J., Demiryurek, U., Ali, M. et al. (2010). Geospatial stream query processing using Microsoft SQL Server StreamInsight. Proceedings of the VLDB Endowment
3 (1–2): 1537–1540.
10.14778/1920841.1921032 Google Scholar
- Srisooksai, T., Keamarungsi, K., Lamsrichan, P. et al. (2012). Practical data compression in wireless sensor networks: a survey. Journal of Network and Computer Applications 35 (1): 37–59.
- Barr, K.C. and Asanović, K. (2006). Energy-aware lossless data compression. ACM Transactions on Computer Systems (TOCS) 24 (3): 250–291.
- Guestrin, C., Bodik, P., Thibaux, R. et al. (2004). Distributed regression: an efficient framework for modeling sensor network data. Proceedings of the 3rd International Symposium on Information Processing in Sensor Networks. ACM.
- Pekhimenko, G., Guo, C., Jeon, M. et al. (2018). Tersecades: efficient data compression in stream processing. 2018 {USENIX} Annual Technical Conference ({USENIX}{ATC} 18).
- Cunningham, J.P. and Ghahramani, Z. (2015). Linear dimensionality reduction: survey, insights, and generalizations. The Journal of Machine Learning Research 16 (1): 2859–2900.
- Tenenbaum, J.B., De Silva, V., and Langford, J.C. (2000). A global geometric framework for nonlinear dimensionality reduction. Science 290 (5500): 2319–2323.
- Yan, J., Zhang, B., Liu, N. et al. (2006). Effective and efficient dimensionality reduction for large-scale and streaming data preprocessing. IEEE Transactions on Knowledge and Data Engineering 18 (3): 320–333.
- Fu, T.-c. (2011). A review on time series data mining. Engineering Applications of Artificial Intelligence 24 (1): 164–181.
- Palpanas, T., Vlachos, M., Keogh, E. et al. (2008). Streaming time series summarization using user-defined amnesic functions. IEEE Transactions on Knowledge and Data Engineering 20 (7): 992–1006.
- Liu, Y., Safavi, T., Dighe, A. et al. (2018). Graph summarization methods and applications: A survey. ACM Computing Surveys (CSUR) 51 (3): 62.
- Kacprzyk, J., Wilbik, A., and Zadrożny, S. (2008). Linguistic summarization of time series using a fuzzy quantifier driven aggregation. Fuzzy Sets and Systems 159 (12): 1485–1499.
- Alpaydin, E. (2009). Introduction to Machine Learning. MIT Press.
- Esling, P. and Agon, C. (2012). Time-series data mining. ACM Computing Surveys (CSUR) 45 (1): 12.
- Aigner, W., Miksch, S., Müller, W. et al. (2007). Visualizing time-oriented data – a systematic view. Computers & Graphics 31 (3): 401–409.
- Stopar, L., Skraba, P., Grobelnik, M. et al. (2018). StreamStory: exploring multivariate time series on multiple scales. IEEE Transactions on Visualization and Computer Graphics 25 (4): 1788–1802.
- Albers, D., Correll, M., and Gleicher, M. (2014). Task-driven evaluation of aggregation in time series visualization. Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM.
- Gama, J., Sebastião, R., and Rodrigues, P.P. (2013). On evaluating stream learning algorithms. Machine Learning 90 (3): 317–346.
- Puthal, D., Nepal, S., Ranjan, R. et al. (2016). A secure big data stream analytics framework for disaster management on the cloud. 2016 IEEE 18th International Conference on High Performance Computing and Communications; IEEE 14th International Conference on Smart City; IEEE 2nd International Conference on Data Science and Systems (HPCC/SmartCity/DSS). IEEE.
- Zhang, Q., Pang, C., Mcbride, S. et al. (2010). Towards health data stream analytics. 2010 IEEE/ICME International Conference on Complex Medical Engineering (CME). IEEE.
- Couceiro, M., Ferrando, R., Manzano, D. et al. (2012). Stream analytics for utilities. Predicting power supply and demand in a smart grid. 2012 3rd International Workshop on Cognitive Information Processing (CIP). IEEE.
- Meehan, J., Tatbul, N., Zdonik, S. et al. (2015). S-Store: streaming meets transaction processing. Proceedings of the VLDB Endowment 8 (13): 2134–2145.
-
Arasu, A., Babcock, B., Babu, S. et al. (2016). Stream: The stanford data stream management system. In: Data Stream Management (eds. M. Garofalakis, J. Gehrke and R. Rastogi), 317–336. Berlin, Heidelberg: Springer.
10.1007/978-3-540-28608-0_16 Google Scholar
- Abadi, D.J., Carney, D., Çetintemel, U. et al. (2003). Aurora: a new model and architecture for data stream management. The VLDB Journal 12 (2): 120–139.
- Arasu, A., Babu, S., and Widom, J. (2006). The CQL continuous query language: semantic foundations and query execution. The VLDB Journal 15 (2): 121–142.
-
Laptev, N., Mozafari, B., Mousavi, H. et al. (2016). Extending relational query languages for data streams. In: Data Stream Management, 361–386. Berlin, Heidelberg: Springer.
10.1007/978-3-540-28608-0_18 Google Scholar
- Cortes, C., Fisher, K., Pregibon, D. et al. (2004). Hancock: a language for analyzing transactional data streams. ACM Transactions on Programming Languages and Systems (TOPLAS) 26 (2): 301–338.
-
Abadi, D., Madden, S., and Lindner, W. (2016). Sensor Network Integration with Streaming Database Systems. In: Data Stream Management (eds. M. Garofalakis, J. Gehrke and R. Rastogi), 409–428. Berlin, Heidelberg: Springer.
10.1007/978-3-540-28608-0_20 Google Scholar
- Hirzel, M., Soulé, R., Schneider, S. et al. (2014). A catalog of stream processing optimizations. ACM Computing Surveys (CSUR) 46 (4): 46.
-
M. Garofalakis, J. Gehrke, and R. Rastogi (eds.) (2016). Data Stream Management: Processing High-Speed Data Streams. Springer.
10.1007/978-3-540-28608-0 Google Scholar
- Lopez, M.A., Lobato, A.G.P., and Duarte, O.C.M. (2016). A performance comparison of open-source stream processing platforms. 2016 IEEE Global Communications Conference (GLOBECOM). IEEE.
- Lin, J. (2017). The lambda and the kappa. IEEE Internet Computing 21 (5): 60–66.
- Kreps, J. (2014). Questioning the Lambda Architecture. O'Reilly | Safari. O'Reilly Media, Inc. http://www.oreilly.com/ideas/questioning-the-lambda-architecture.
- Fortuna, C., Gale, T., Solc, T. et al. (2018). Automatic detection and query of wireless spectrum events from streaming data. arXiv preprint arXiv:1804.05019.
- Yucek, T. and Arslan, H. (2009). A survey of spectrum sensing algorithms for cognitive radio applications. IEEE Communications Surveys and Tutorials 11 (1): 116–130.
- Rajendran, S., Calvo-Palomino, R., Fuchs, M. et al. (2018). Electrosense: open and big spectrum data. IEEE Communications Magazine 56 (1): 210–217.
- Marz, N., and Warren, J. (2015). Big Data: Principles and best practices of scalable real-time data systems. New York; Manning Publications Co.
- Seyvet, N. and Viela, I.M. (2016). Applying the Kappa Architecture in the Telco Industry. O'Reilly | Safari. O'Reilly Media, Inc. http://www.oreilly.com/ideas/applying-the-kappa-architecture-in-the-telco-industry.