A service-agnostic method for predicting service metrics in real time
Corresponding Author
Rerngvit Yanggratoke
ACCESS Linnaeus Center, KTH Royal Institute of Technology, Stockholm, Sweden
Correspondence
Rerngvit Yanggratoke, ACCESS Linnaeus Center, KTH Royal Institute of Technology, Stockholm, Sweden.
Email: [email protected]
Search for more papers by this authorJohn Ardelius
Swedish Institute of Computer Science (SICS), Stockholm, Sweden
Search for more papers by this authorDaniel Gillblad
Swedish Institute of Computer Science (SICS), Stockholm, Sweden
Search for more papers by this authorRolf Stadler
ACCESS Linnaeus Center, KTH Royal Institute of Technology, Stockholm, Sweden
Swedish Institute of Computer Science (SICS), Stockholm, Sweden
Search for more papers by this authorCorresponding Author
Rerngvit Yanggratoke
ACCESS Linnaeus Center, KTH Royal Institute of Technology, Stockholm, Sweden
Correspondence
Rerngvit Yanggratoke, ACCESS Linnaeus Center, KTH Royal Institute of Technology, Stockholm, Sweden.
Email: [email protected]
Search for more papers by this authorJohn Ardelius
Swedish Institute of Computer Science (SICS), Stockholm, Sweden
Search for more papers by this authorDaniel Gillblad
Swedish Institute of Computer Science (SICS), Stockholm, Sweden
Search for more papers by this authorRolf Stadler
ACCESS Linnaeus Center, KTH Royal Institute of Technology, Stockholm, Sweden
Swedish Institute of Computer Science (SICS), Stockholm, Sweden
Search for more papers by this authorSummary
We predict performance metrics of cloud services using statistical learning, whereby the behaviour of a system is learned from observations. Specifically, we collect device and network statistics from a cloud testbed and apply regression methods to predict, in real-time, client-side service metrics for video streaming and key-value store services. Results from intensive evaluation on our testbed indicate that our method accurately predicts service metrics in real time (mean absolute error below 16% for video frame rate and read latency, for instance). Further, our method is service agnostic in the sense that it takes as input operating systems and network statistics instead of service-specific metrics. We show that feature set reduction significantly improves the prediction accuracy in our case, while simultaneously reducing model computation time. We find that the prediction accuracy decreases when, instead of a single service, both services run on the same testbed simultaneously or when the network quality on the path between the server cluster and the client deteriorates. Finally, we discuss the design and implementation of a real-time analytics engine, which processes streams of device statistics and service metrics from testbed sensors and produces model predictions through online learning.
REFERENCES
- 1Bogojeska J, Lanyi D, Giurgiu I, Stark G, Wiesmann D. Classifying server behavior and predicting impact of modernization actions. In: 2013 9th International Conference on Network and Service Management (CNSM); Zürich, Switzerland; 2013; 59-66.
- 2Mirza M, Sommers J, Barford P, Zhu X. A machine learning approach to tcp throughput prediction. IEEE/ACM Trans Networking. 2010; 18(4): 1026-1039.
- 3Andrzejak A, Silva L. Using machine learning for non-intrusive modeling and prediction of software aging. In: Network Operations and Management Symposium, 2008. NOMS 2008. IEEE. IEEE: Salvador, Bahia, Brazil; 2008; 25-32.
- 4Hlavacs H, Treutner T. Predicting web service levels during vm live migrations. In: 2011 5th International DMTF Academic Alliance Workshop on Systems and Virtualization Management (SVM). IEEE: Paris, France; 2011; 1-10.
- 5 VLC. Available from: http://www.videolan.org/vlc. Accessed December, 2015.
- 6Voldemort. Voldemort. Available from: http://www.project-voldemort.com. Accessed December, 2015.
- 7Yanggratoke R, Ahmed J, Ardelius J, et al. Predicting service metrics for cluster-based services using real-time analytics. In: 2015 IFIP/IEEE International Symposium on Conference on Network and Service Management (CNSM 2015); November 2015; Barcelona, Spain: 135-143.
- 8James G, Hastie T, Witten D, Tibshirani R. An Introduction to Statistical Learning with Applications in R: Springer: New York; 2014.
- 9Gama J, Sebastiao R, Rodrigues PP. Issues in evaluation of stream learning algorithms. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM: Paris, France; 2009; 329-338.
- 10Gama J, Zliobaite I, Bifet A, Pechenizkiy M, Bouchachia A. A survey on concept drift adaptation. ACM Comput Surv (CSUR). 2014; 46(4): 37.
- 11Handurukande S, Fedor S, Wallin S, Zach M. Magneto approach to qos monitoring. In: 2011 IFIP/IEEE International Symposium on Integrated Network Management (IM). IEEE: Dublin, Ireland; 2011; 209-216.
- 12Bottou L. Large-scale machine learning with stochastic gradient descent. In: Proceedings of COMPSTAT'2010 Y Lechevallier, G Saporta, eds. Physica-Verlag HD: Paris, France; 2010; 177-186.
- 13Tibshirani R. Regression shrinkage and selection via the lasso. J R Stat Soc. Ser B (Methodological). 1996; 58: 267-288.
- 14Hastie T, Tibshirani R, Friedman J. The Elements of Statistical Learning, Springer Series in Statistics. New York, NY, USA: Springer New York Inc.; 2001.
10.1007/978-0-387-21606-5 Google Scholar
- 15Breiman L. Random forests. Mach Learn. 2001; 45(1): 5-32.
- 16Ikonomovska E, Gama J, Džeroski S. Learning model trees from evolving data streams. Data Min. Knowl Discov. 2011; 23(1): 128-168.
- 17Bowden T, Bauer B, Nerin J, Feng S. The /proc filesystem. Available from: https://www.kernel.org/doc/Documentation/filesystems/proc.txt. Accessed December, 2015.
- 18Godard S. SAR. Available from: http://linux.die.net/man/1/sar. Accessed December, 2015.
- 19 NTP. Available from: http://www.ntp.org/. Accessed December, 2015.
- 20 HAProxy. Available from: http://www.haproxy.org/. Accessed December, 2015.
- 21 Apache http server. Available from: http://httpd.apache.org/. Accessed December, 2015.
- 22 Ffmpeg. Available from: https://www.ffmpeg.org/. Accessed December, 2015.
- 23 Gluster FS. Available from: http://www.gluster.org/. Accessed December, 2015.
- 24 Voldemort Performance Tool. Available from: https://github.com/voldemort/voldemort/wiki/Performance-Tool. Accessed December, 2015.
- 25Hemminger S, et al. Network emulation with netem. In: Linux conf au. Citeseer; 2005: 18-23. Accessed December, 2015.
- 26 Fping. Available from: http://fping.org/fping.1.html. Accessed December, 2015.
- 27Ari I, Hong B, Miller E, Brandt S, Long DDE. Managing flash crowds on the internet. In: 11th IEEE/ACM International Symposium on Modeling, Analysis and Simulation of Computer Telecommunications Systems, 2003. MASCOTS 2003; Orlando, FL, USA; October 2003; 246-249.
- 28 Apache flink. Available from: https://flink.apache.org/. Accessed December, 2015.
- 29 Apache storm. Available from: http://storm.apache.org/. Accessed December, 2015.
- 30 Apache samza. Available from: http://samza.apache.org/. Accessed December, 2015.
- 31Available from: http://www.r-project.org/. Accessed December, 2015.
- 32 R functions to manipulate connections. Available from: https://stat.ethz.ch/R-manual/R-devel/library/base/html/connections.html. Accessed December, 2015.
- 33 Netcat: the swiss army knife of networking. Available from: http://nc110.sourceforge.net/. Accessed December, 2015.
- 34 Graphite - scalable realtime graphing. Available from: https://graphite.readthedocs.org/en/latest/. Accessed December, 2015.
- 35jQuery.com. jquery: write less, do more. Available from: https://jquery.com/. Accessed December, 2015.
- 36Yanggratoke R, Ahmed J, Ardelius J, Flinta C, Johnsson A, Stadler R. A platform for predicting real-time service-level metrics from device statistics. In: 2015 IFIP/IEEE International Symposium on Integrated Network Management (IM 2015); Ottawa, Canada; May 2015; 1141-1142. Demonstration session.
- 37Yanggratoke R, Stadler R. Linux kernel statistics from video-streaming and key-value cluster and service metrics from clients. Distributed by Machine learning data set repository [MLData.org]. http://mldata.org/repository/data/viewslug/realm-nem2017-traces; 2017.
- 38Friedman J, Hastie T, Tibshirani R. Regularization paths for generalized linear models via coordinate descent. J Stat Software. 2010; 33(1): 1-22.
- 39Therneau T, Atkinson B, Ripley B. rpart. Available from: http://cran.r-project.org/web/packages/rpart/rpart.pdf. Accessed December, 2015.
- 40Wijffels J. RMOA: Connect r with moa to perform streaming classifications. https://github.com/jwijffels/RMOA, r package version 1.0; 2014. Accessed December, 2015.
- 41Liaw A, Wiener M. Classification and regression by randomforest. R News. 2002; 2(3): 18-22.
- 42Gama J, Sebastiao R, Rodrigues P. On evaluating stream learning algorithms. Mach Learn. 2013; 90(3): 317-346.
- 43Balachandran A, Sekar V, Akella A, Seshan S, Stoica I, Zhang H. Developing a predictive model of quality of experience for internet video. In: Proceedings of the ACM SIGCOMM 2013 Conference on SIGCOMM. ACM: Hong Kong, China; 2013; 339-350.
- 44Bodık P, Griffith R, Sutton C, Fox A, Jordan M, Patterson D. Statistical machine learning makes automatic control practical for internet datacenters. In: Proceedings of the 2009 Conference on Hot Topics in Cloud Computing; San Diego, California; 2009; 12-12.
- 45Matsunaga A, Fortes JA. On the use of machine learning to predict the time and resources consumed by applications. In: Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing. IEEE Computer Society: Melbourne, Victoria, Australia; 2010; 495-504.
- 46Kundu S, Rangaswami R, Gulati A, Zhao M, Dutta K. Modeling virtualized applications using machine learning techniques. In: ACM SIGPLAN Notices, Vol. 47 ACM; Copenhagen, Denmark; 2012; 3-14.
- 47Song HH, Ge Z, Mahimkar A, et al. Q-score: proactive service quality assessment in a large iptv system. In: Proceedings of the 2011 ACM SIGCOMM Conference on Internet Measurement Conference. ACM; Toronto, Ontario, Canada; 2011; 195-208.
- 48Menkovski V, Oredope A, Liotta A, Sánchez AC. Predicting quality of experience in multimedia streaming. In: Proceedings of the 7th International Conference on Advances in Mobile Computing and Multimedia. ACM; Paris, France; 2009; 52-59.
- 49Menkovski V, Exarchakos G, Liotta A. Online qoe prediction. In: 2010 Second International Workshop on Quality of Multimedia Experience (QoMEX). IEEE; San Diego, USA; 2010; 118-123.
- 50Khan A, Sun L, Ifeachor E. Learning models for video quality prediction over wireless local area network and universal mobile telecommunication system networks. Commun IET. 2010; 4(12): 1389-1403.
- 51Hands D, Barriac OV, Telecom F. Standardization activities in the itu for a qoe assessment of iptv. IEEE Commun Mag. 2008; 46: 78-84.
- 52Leitner P, Ferner J, Hummer W, Dustdar S. Data-driven and automated prediction of service level agreement violations in service compositions. Distrib Parallel Databases. 2013; 31(3): 447-470.
- 53Cohen I, Goldszmidt M, Kelly T, Symons J, Chase JS. Correlating instrumentation data to system states: a building block for automated diagnosis and control. In: Proceedings of the 6th Conference on Symposium on Opearting Systems Design & Implementation, OSDI'04, Vol. 6. USENIX Association: Berkeley, CA, USA; 2004; 16-16.
- 54Liu Z, Zhang Q, Zhani MF, Boutaba R, Liu Y, Gong Z. Dreams: dynamic resource allocation for mapreduce with data skew. In: 2015 IFIP/IEEE International Symposium on Integrated Network Management (IM 2015); Ottawa, Canada; 2015; 18-26.
- 55de Frein R. Effect of system load on video service metrics. In: Signals and Systems Conference (ISSC), 2015 26th Irish; Carlow, Ireland; June 2015; 1-6.
- 56Yanggratoke R, Ahmed J, Ardelius J, et al. Predicting real-time service-level metrics from device statistics. In: 2015 IFIP/IEEE International Symposium on Integrated Network Management (IM 2015); Ottawa, Canada; April 2015; 414-422.
- 57Jiang Z. Predicting service metrics from device statistics in a container-based environment. Master's Thesis: KTH, Communication Networks; 2015.