We predict performance metrics of cloud services using statistical learning, whereby the behaviour of a system is learned from observations. Specifically, we collect device and network statistics from a cloud testbed and apply regression methods to predict, in real-time, client-side service metrics for video streaming and key-value store services. Results from intensive evaluation on our testbed indicate that our method accurately predicts service metrics in real time (mean absolute error below 16% for video frame rate and read latency, for instance). Further, our method is service agnostic in the sense that it takes as input operating systems and network statistics instead of service-specific metrics. We show that feature set reduction significantly improves the prediction accuracy in our case, while simultaneously reducing model computation time. We find that the prediction accuracy decreases when, instead of a single service, both services run on the same testbed simultaneously or when the network quality on the path between the server cluster and the client deteriorates. Finally, we discuss the design and implementation of a real-time analytics engine, which processes streams of device statistics and service metrics from testbed sensors and produces model predictions through online learning.

REFERENCES

1Bogojeska J, Lanyi D, Giurgiu I, Stark G, Wiesmann D. Classifying server behavior and predicting impact of modernization actions. In: 2013 9th International Conference on Network and Service Management (CNSM); Zürich, Switzerland; 2013; 59-66.
Google Scholar
2Mirza M, Sommers J, Barford P, Zhu X. A machine learning approach to tcp throughput prediction. IEEE/ACM Trans Networking. 2010; 18(4): 1026-1039.
10.1109/TNET.2009.2037812
Web of Science® Google Scholar
3Andrzejak A, Silva L. Using machine learning for non-intrusive modeling and prediction of software aging. In: Network Operations and Management Symposium, 2008. NOMS 2008. IEEE. IEEE: Salvador, Bahia, Brazil; 2008; 25-32.
Google Scholar
4Hlavacs H, Treutner T. Predicting web service levels during vm live migrations. In: 2011 5th International DMTF Academic Alliance Workshop on Systems and Virtualization Management (SVM). IEEE: Paris, France; 2011; 1-10.
Google Scholar
5 VLC. Available from: http://www.videolan.org/vlc. Accessed December, 2015.
Google Scholar
6Voldemort. Voldemort. Available from: http://www.project-voldemort.com. Accessed December, 2015.
Google Scholar
7Yanggratoke R, Ahmed J, Ardelius J, et al. Predicting service metrics for cluster-based services using real-time analytics. In: 2015 IFIP/IEEE International Symposium on Conference on Network and Service Management (CNSM 2015); November 2015; Barcelona, Spain: 135-143.
Google Scholar
8James G, Hastie T, Witten D, Tibshirani R. An Introduction to Statistical Learning with Applications in R: Springer: New York; 2014.
Google Scholar
9Gama J, Sebastiao R, Rodrigues PP. Issues in evaluation of stream learning algorithms. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM: Paris, France; 2009; 329-338.
Google Scholar
10Gama J, Zliobaite I, Bifet A, Pechenizkiy M, Bouchachia A. A survey on concept drift adaptation. ACM Comput Surv (CSUR). 2014; 46(4): 37.
10.1145/2523813
Web of Science® Google Scholar
11Handurukande S, Fedor S, Wallin S, Zach M. Magneto approach to qos monitoring. In: 2011 IFIP/IEEE International Symposium on Integrated Network Management (IM). IEEE: Dublin, Ireland; 2011; 209-216.
Google Scholar
12Bottou L. Large-scale machine learning with stochastic gradient descent. In: Proceedings of COMPSTAT'2010 Y Lechevallier, G Saporta, eds. Physica-Verlag HD: Paris, France; 2010; 177-186.
Google Scholar
13Tibshirani R. Regression shrinkage and selection via the lasso. J R Stat Soc. Ser B (Methodological). 1996; 58: 267-288.
10.1111/j.2517-6161.1996.tb02080.x
Web of Science® Google Scholar
14Hastie T, Tibshirani R, Friedman J. The Elements of Statistical Learning, Springer Series in Statistics. New York, NY, USA: Springer New York Inc.; 2001.
10.1007/978-0-387-21606-5
Google Scholar
15Breiman L. Random forests. Mach Learn. 2001; 45(1): 5-32.
10.1023/A:1010933404324
Web of Science® Google Scholar
16Ikonomovska E, Gama J, Džeroski S. Learning model trees from evolving data streams. Data Min. Knowl Discov. 2011; 23(1): 128-168.
10.1007/s10618-010-0201-y
Web of Science® Google Scholar
17Bowden T, Bauer B, Nerin J, Feng S. The /proc filesystem. Available from: https://www.kernel.org/doc/Documentation/filesystems/proc.txt. Accessed December, 2015.
Google Scholar
18Godard S. SAR. Available from: http://linux.die.net/man/1/sar. Accessed December, 2015.
Google Scholar
19 NTP. Available from: http://www.ntp.org/. Accessed December, 2015.
Google Scholar
20 HAProxy. Available from: http://www.haproxy.org/. Accessed December, 2015.
Google Scholar
21 Apache http server. Available from: http://httpd.apache.org/. Accessed December, 2015.
Google Scholar
22 Ffmpeg. Available from: https://www.ffmpeg.org/. Accessed December, 2015.
Google Scholar
23 Gluster FS. Available from: http://www.gluster.org/. Accessed December, 2015.
Google Scholar
24 Voldemort Performance Tool. Available from: https://github.com/voldemort/voldemort/wiki/Performance-Tool. Accessed December, 2015.
Google Scholar
25Hemminger S, et al. Network emulation with netem. In: Linux conf au. Citeseer; 2005: 18-23. Accessed December, 2015.
Google Scholar
26 Fping. Available from: http://fping.org/fping.1.html. Accessed December, 2015.
Google Scholar
27Ari I, Hong B, Miller E, Brandt S, Long DDE. Managing flash crowds on the internet. In: 11th IEEE/ACM International Symposium on Modeling, Analysis and Simulation of Computer Telecommunications Systems, 2003. MASCOTS 2003; Orlando, FL, USA; October 2003; 246-249.
Google Scholar
28 Apache flink. Available from: https://flink.apache.org/. Accessed December, 2015.
Google Scholar
29 Apache storm. Available from: http://storm.apache.org/. Accessed December, 2015.
Google Scholar
30 Apache samza. Available from: http://samza.apache.org/. Accessed December, 2015.
Google Scholar
31Available from: http://www.r-project.org/. Accessed December, 2015.
Google Scholar
32 R functions to manipulate connections. Available from: https://stat.ethz.ch/R-manual/R-devel/library/base/html/connections.html. Accessed December, 2015.
Google Scholar
33 Netcat: the swiss army knife of networking. Available from: http://nc110.sourceforge.net/. Accessed December, 2015.
Google Scholar
34 Graphite - scalable realtime graphing. Available from: https://graphite.readthedocs.org/en/latest/. Accessed December, 2015.
Google Scholar
35jQuery.com. jquery: write less, do more. Available from: https://jquery.com/. Accessed December, 2015.
Google Scholar
36Yanggratoke R, Ahmed J, Ardelius J, Flinta C, Johnsson A, Stadler R. A platform for predicting real-time service-level metrics from device statistics. In: 2015 IFIP/IEEE International Symposium on Integrated Network Management (IM 2015); Ottawa, Canada; May 2015; 1141-1142. Demonstration session.
Google Scholar
37Yanggratoke R, Stadler R. Linux kernel statistics from video-streaming and key-value cluster and service metrics from clients. Distributed by Machine learning data set repository [MLData.org]. http://mldata.org/repository/data/viewslug/realm-nem2017-traces; 2017.
Google Scholar
38Friedman J, Hastie T, Tibshirani R. Regularization paths for generalized linear models via coordinate descent. J Stat Software. 2010; 33(1): 1-22.
10.18637/jss.v033.i01
PubMed Web of Science® Google Scholar
39Therneau T, Atkinson B, Ripley B. rpart. Available from: http://cran.r-project.org/web/packages/rpart/rpart.pdf. Accessed December, 2015.
Google Scholar
40Wijffels J. RMOA: Connect r with moa to perform streaming classifications. https://github.com/jwijffels/RMOA, r package version 1.0; 2014. Accessed December, 2015.
Google Scholar
41Liaw A, Wiener M. Classification and regression by randomforest. R News. 2002; 2(3): 18-22.
Google Scholar
42Gama J, Sebastiao R, Rodrigues P. On evaluating stream learning algorithms. Mach Learn. 2013; 90(3): 317-346.
10.1007/s10994-012-5320-9
Web of Science® Google Scholar
43Balachandran A, Sekar V, Akella A, Seshan S, Stoica I, Zhang H. Developing a predictive model of quality of experience for internet video. In: Proceedings of the ACM SIGCOMM 2013 Conference on SIGCOMM. ACM: Hong Kong, China; 2013; 339-350.
Google Scholar
44Bodık P, Griffith R, Sutton C, Fox A, Jordan M, Patterson D. Statistical machine learning makes automatic control practical for internet datacenters. In: Proceedings of the 2009 Conference on Hot Topics in Cloud Computing; San Diego, California; 2009; 12-12.
Google Scholar
45Matsunaga A, Fortes JA. On the use of machine learning to predict the time and resources consumed by applications. In: Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing. IEEE Computer Society: Melbourne, Victoria, Australia; 2010; 495-504.
Google Scholar
46Kundu S, Rangaswami R, Gulati A, Zhao M, Dutta K. Modeling virtualized applications using machine learning techniques. In: ACM SIGPLAN Notices, Vol. 47 ACM; Copenhagen, Denmark; 2012; 3-14.
Google Scholar
47Song HH, Ge Z, Mahimkar A, et al. Q-score: proactive service quality assessment in a large iptv system. In: Proceedings of the 2011 ACM SIGCOMM Conference on Internet Measurement Conference. ACM; Toronto, Ontario, Canada; 2011; 195-208.
Google Scholar
48Menkovski V, Oredope A, Liotta A, Sánchez AC. Predicting quality of experience in multimedia streaming. In: Proceedings of the 7th International Conference on Advances in Mobile Computing and Multimedia. ACM; Paris, France; 2009; 52-59.
Google Scholar
49Menkovski V, Exarchakos G, Liotta A. Online qoe prediction. In: 2010 Second International Workshop on Quality of Multimedia Experience (QoMEX). IEEE; San Diego, USA; 2010; 118-123.
Google Scholar
50Khan A, Sun L, Ifeachor E. Learning models for video quality prediction over wireless local area network and universal mobile telecommunication system networks. Commun IET. 2010; 4(12): 1389-1403.
10.1049/iet-com.2009.0649
Web of Science® Google Scholar
51Hands D, Barriac OV, Telecom F. Standardization activities in the itu for a qoe assessment of iptv. IEEE Commun Mag. 2008; 46: 78-84.
10.1109/MCOM.2008.4473087
Web of Science® Google Scholar
52Leitner P, Ferner J, Hummer W, Dustdar S. Data-driven and automated prediction of service level agreement violations in service compositions. Distrib Parallel Databases. 2013; 31(3): 447-470.
10.1007/s10619-013-7125-7
Web of Science® Google Scholar
53Cohen I, Goldszmidt M, Kelly T, Symons J, Chase JS. Correlating instrumentation data to system states: a building block for automated diagnosis and control. In: Proceedings of the 6th Conference on Symposium on Opearting Systems Design & Implementation, OSDI'04, Vol. 6. USENIX Association: Berkeley, CA, USA; 2004; 16-16.
Google Scholar
54Liu Z, Zhang Q, Zhani MF, Boutaba R, Liu Y, Gong Z. Dreams: dynamic resource allocation for mapreduce with data skew. In: 2015 IFIP/IEEE International Symposium on Integrated Network Management (IM 2015); Ottawa, Canada; 2015; 18-26.
Google Scholar
55de Frein R. Effect of system load on video service metrics. In: Signals and Systems Conference (ISSC), 2015 26th Irish; Carlow, Ireland; June 2015; 1-6.
Google Scholar
56Yanggratoke R, Ahmed J, Ardelius J, et al. Predicting real-time service-level metrics from device statistics. In: 2015 IFIP/IEEE International Symposium on Integrated Network Management (IM 2015); Ottawa, Canada; April 2015; 414-422.
Google Scholar
57Jiang Z. Predicting service metrics from device statistics in a container-based environment. Master's Thesis: KTH, Communication Networks; 2015.
Google Scholar

Citing Literature

Volume28, Issue2

March/April 2018

e1991

A service-agnostic method for predicting service metrics in real time

Summary

REFERENCES

Citing Literature

References

Information

About Wiley Online Library

Help & Support

Opportunities

Connect with Wiley

A service-agnostic method for predicting service metrics in real time

Summary

REFERENCES

Citing Literature

References

Related

Information