Predicting process performance: A white-box approach based on process models
Corresponding Author
Ilya Verenich
School of Information Systems, Queensland University of Technology, Brisbane, Australia
Institute of Computer Science, University of Tartu, Tartu, Estonia
School of Computing and Information Systems, University of Melbourne, Victoria, Australia
Correspondence
Ilya Verenich, School of Computing and Information Systems, University of Melbourne, Victoria, Australia.
Email: [email protected]
Search for more papers by this authorMarlon Dumas
Institute of Computer Science, University of Tartu, Tartu, Estonia
Search for more papers by this authorMarcello La Rosa
School of Computing and Information Systems, University of Melbourne, Victoria, Australia
Search for more papers by this authorHoang Nguyen
School of Information Systems, Queensland University of Technology, Brisbane, Australia
Search for more papers by this authorCorresponding Author
Ilya Verenich
School of Information Systems, Queensland University of Technology, Brisbane, Australia
Institute of Computer Science, University of Tartu, Tartu, Estonia
School of Computing and Information Systems, University of Melbourne, Victoria, Australia
Correspondence
Ilya Verenich, School of Computing and Information Systems, University of Melbourne, Victoria, Australia.
Email: [email protected]
Search for more papers by this authorMarlon Dumas
Institute of Computer Science, University of Tartu, Tartu, Estonia
Search for more papers by this authorMarcello La Rosa
School of Computing and Information Systems, University of Melbourne, Victoria, Australia
Search for more papers by this authorHoang Nguyen
School of Information Systems, Queensland University of Technology, Brisbane, Australia
Search for more papers by this authorAbstract
Predictive business process monitoring methods exploit historical process execution logs to provide predictions about running instances of a process. These predictions enable process workers and managers to preempt performance issues or compliance violations. A number of approaches have been proposed to predict quantitative process performance indicators for running instances of a process, including remaining cycle time, cost, or probability of deadline violation. However, these approaches adopt a black-box approach, insofar as they predict a single scalar value without decomposing this prediction into more elementary components. In this paper, we propose a white-box approach to predict performance indicators of running process instances. The key idea is to first predict the performance indicator at the level of activities and then to aggregate these predictions at the level of a process instance by means of flow analysis techniques. The paper develops this idea in the context of predicting the remaining cycle time of ongoing process instances. The proposed approach has been evaluated on real-life event logs and compared against several baselines.
REFERENCES
- 1Maggi FM, Di Francescomarino C, Dumas M, Ghidini C. Predictive monitoring of business processes. In: CAiSE. Springer; 2014; Cham: 457-472.
- 2Leontjeva A, Conforti R, Di Francescomarino C, Dumas M, Maggi FM. Complex symbolic sequence encodings for predictive monitoring of business processes. In: BPM; 2015; Heidelberg: 297-313.
- 3van der Spoel S, van Keulen M, Amrit C. Process prediction in noisy data sets: a case study in a Dutch hospital. In: International Symposium on Data-Driven Process Discovery and Analysis. Springer; 2012; Berlin, Heidelberg: 60-83.
- 4Evermann J, Rehse J-R, Fettke P. A deep learning approach for predicting process behaviour at runtime. In: M. Dumas, M. Fantinato, eds. Business process management workshops - BPM 2016 international workshops, rio de janeiro, brazil, september 19, 2016, revised papers, Lecture Notes in Business Information Processing, vol. 281. Cham: Springer; 2016: 327-338. https://doi.org/10.1007/978-3-319-58457-7_24
- 5Aalst Wil MP, Schonenberg MH, Song M. Time prediction based on process mining. Inf Syst. 2011; 36(2): 450-475. https://doi.org/10.1016/j.is.2010.09.001
- 6Rogge-Solti A, Weske M. Prediction of remaining service execution time using stochastic Petri nets with arbitrary firing delays. In: ICSOC. Springer; 2013; Berlin, Heidelberg: 389-403.
- 7Rogge-Solti A, Weske M. Prediction of business process durations using non-markovian stochastic petri nets. Inf Syst. 2015; 54: 1-14.
- 8Dumas M, La Rosa M, Mendling J, Reijers HA. Fundamentals of Business Process Management. 2nd ed. Verlag Berlin Heidelberg: Springer; 2018. https://doi.org/10.1007/978-3-662-56509-4
10.1007/978-3-662-56509-4 Google Scholar
- 9Verenich I, Nguyen H, Rosa ML, Dumas M. White-box prediction of process performance indicators via flow analysis. In: R. Bendraou, D. Raffo, L. Huang, F.M. Maggi, eds. Proceedings of the 2017 international conference on software and system process, paris, france, ICSSP 2017, july 5-7, 2017. New York, NY, USA: ACM; 2017: 85-94. https://doi.org/10.1145/3084100.3084110
10.1145/3084100.3084110 Google Scholar
- 10Márquez-Chamorro AE, Resinas M, Ruiz-Corts A. Predictive monitoring of business processes: a survey. IEEE Trans Serv Comput. 2017; PP(99): 1-1.
- 11Pika A, van der Aalst WMP, Fidge CJ, ter Hofstede AHM, Wynn MT. Predicting deadline transgressions using event logs. In: BPM. Springer; 2012; Berlin, Heidelberg: 211-216.
- 12Conforti R, de Leoni M, La Rosa M, van der Aalst WMP, ter Hofstede AHM. A recommendation system for predicting risks across multiple business process instances. Decis Support Syst. 2015; 69: 1-19.
- 13Metzger A, Franklin R, Engel Y. Predictive monitoring of heterogeneous service-oriented business networks: the transport and logistics case. In: 2012 Annual SRII Global Conference. IEEE; 2012; DC, USA: 313-322.
- 14van Dongen BF, Crooy RA, van der Aalst WMP. Cycle time prediction: when will this case finally be finished? In: CoopIS. Springer; 2008; Berlin, Heidelberg: 319-336.
- 15Polato M, Sperduti A, Burattin A, de Leoni M. Data-aware remaining time prediction of business process instances. In: 2014 International Joint Conference on Neural Networks, IJCNN 2014; 2014; Beijing, China: 816-823.
- 16de Leoni M, van der Aalst WMP, Dees M. A general process mining framework for correlating, predicting and clustering dynamic behavior based on event logs. Inf Syst. 2016; 56: 235-257.
- 17Senderovich A, Weidlich M, Gal A, Mandelbaum A. Queue mining for delay prediction in multi-class service processes. Inf Syst. 2015; 53: 278-295. https://doi.org/10.1016/j.is.2015.03.010
- 18Senderovich A, Di Francescomarino C, Ghidini C, Jorbina K, Maggi FM. Intra and inter-case features in predictive process monitoring: a tale of two dimensions. Business process management - 15th international conference, BPM proceedings, Lecture Notes in Computer Science, vol. 10445. Cham: Springer; 2017: 306-323. https://doi.org/10.1007/978-3-319-65000-5_18
10.1007/978-3-319-65000-5_18 Google Scholar
- 19Levy A. Machine-Likeness and Explanation by Decomposition. Ann Arbor, MI: Michigan Publishing, University of Michigan Library; 2014.
- 20Lipton ZC. The mythos of model interpretability. Commun ACM. 2018; 61(10): 36-43. https://doi.org/10.1145/3233231
- 21Kikas R, Dumas M, Pfahl D. Using dynamic and contextual features to predict issue lifetime in github projects. In: M. Kim, R. Robbes, C. Bird, eds. Proceedings of the 13th international conference on mining software repositories, MSR 2016. Austin, TX, USA: ACM; 2016: 291-302. urlhttp://doi.acm.org/10.1145/2901739.2901751.
10.1145/2901739.2901751 Google Scholar
- 22Rees-Jones M, Martin M, Menzies T. Better predictors for issue lifetime. CoRR.;abs/1702.07735; 2017.
- 23 FogBugz. Evidence-Based Scheduling. http://help.fogcreek.com/7676/evidence-based-scheduling-ebs. Accessed: 2017-10-23.
- 24Lakshmanan GT, Shamsi D, Doganata YN, Unuvar M, Khalaf R. A markov prediction model for data-driven semi-structured business processes. Knowl Inf Syst. 2015; 42(1): 97-126.
- 25Breuker D, Matzner M, Delfmann P, Becker J. Comprehensible predictive models for business processes. Manag Inf Syst Q. 2016; 40(4): 1009-1034.
- 26Tax N, Verenich I, La Rosa M, Dumas M. Predictive business process monitoring with LSTM neural networks. In: E. Dubois, K. Pohl, eds. Advanced information systems engineering - 29th international conference, caise 2017, Lecture Notes in Computer Science, vol. 10253. Essen, Germany: Springer; 2017: 477-492. https://doi.org/10.1007/978-3-319-59536-8_30
10.1007/978-3-319-59536-8_30 Google Scholar
- 27Tan PN, Steinbach M, Karpatne A, Kumar V. Introduction to data mining, What's New in Computer Science Series. Boston, MA, USA: Pearson Education; 2013. https://books.google.com.au/books?id=_ZQ4MQEACAAJ
- 28Polyvyanyy A, Smirnov S, Weske M. The triconnected abstraction of process models. In: Business process management, 7th international conference, BPM 2009, Lecture Notes in Computer Science, vol. 5701. Springer; 2009; Berlin, Heidelberg: 229-244. https://doi.org/10.1007/978-3-642-03848-8_16
- 29Armas-Cervantes A, Baldan P, Dumas M, García-Bañuelos L. Diagnosing behavioral differences between business process models: An approach based on event structures. Inf Syst. 2016; 56: 304-325. https://doi.org/10.1016/j.is.2015.09.009
- 30Aalst WMP. Process discovery: Capturing the invisible. IEEE Comp Int Mag. 2010; 5(1): 28-41. https://doi.org/10.1109/MCI.2009.935307
- 31Augusto A, Conforti R, Dumas M, et al. Automated discovery of process models from event logs: Review and benchmark. CoRR.;abs/1705.02288; 2017.
- 32Cardoso JS, Sheth AP, Miller JA, Arnold J, Kochut K. Quality of service for workflows and web service processes. J Web Semant. 2004; 1(3): 281-308. https://doi.org/10.1016/j.websem.2004.03.001
10.1016/j.websem.2004.03.001 Google Scholar
- 33Blumenfeld D. Operations Research Calculations Handbook. Boca Raton, Florida (FL): CRC Press; 2012.
- 34Yang Y, Dumas M, García-Bañuelos L, Polyvyanyy A, Zhang L. Generalized aggregate quality of service computation for composite services. J Syst Softw. 2012; 85(8): 1818-1830.
- 35Teinemaa I, Dumas M, Maggi FM, Francescomarino CD. Predictive business process monitoring with structured and unstructured data. In: M.L. Rosa, P. Loos, O. Pastor, eds. Business process management - 14th international conference, BPM 2016, Lecture Notes in Computer Science, vol. 9850. Rio de Janeiro, brazil: Springer; 2016: 401-417. https://doi.org/10.1007/978-3-319-45348-4_23
10.1007/978-3-319-45348-4_23 Google Scholar
- 36Mitchell TM. Machine Learning. New York, NY, USA: McGraw-Hill; 1997.
- 37Fayyad UM, Piatetsky-Shapiro G, Smyth P, Uthurusamy R, eds. Advances in Knowledge Discovery and Data Mining. CA, USA: AAAI/MIT Press; 1996.
- 38Chen T, Guestrin C. XGBoost: A scalable tree boosting system. In: B. Krishnapuram, M. Shah, A.J. Smola, C.C. Aggarwal, D. Shen, R. Rastogi, eds. Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. San Francisco, CA, USA: ACM; 2016: 785-794. http://doi.acm.org/10.1145/2939672.2939785
10.1145/2939672.2939785 Google Scholar
- 39Fowler B, Rajendiran M, Schroeder T, Bergh N, Flower A, Kang H. Predicting patient revisits at the university of virginia health system emergency department. In: 2017 Systems and Information Engineering Design Symposium (SIEDS); 2017; Charlottesville, VA, USA: 253-258.
- 40Möller A, Ruhlmann-Kleider V, Leloup C, et al. Photometric classification of type ia supernovae in the supernova legacy survey with supervised learning. J Cosmol Astropart Phys. 2016; 2016(12): 008.
- 41Torlay L, Perrone-Bertolotti M, Thomas E, Baciu M. Machine learning-xgboost analysis of language networks to classify patients with epilepsy. Brain Inform. 2017; 4(3): 159-169. https://doi.org/10.1007/s40708-017-0065-7
- 42Olson RS, La Cava W, Mustahsan Z, Varik A, Moore JH. Data-driven advice for applying machine learning to bioinformatics problems. CoRR. 2017;abs/1708.05070; 2017.
- 43Hastie T, Tibshirani R, Friedman JH. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. 2nd ed. New York: Springer; 2009. http://www.worldcat.org/oclc/300478243
10.1007/978-0-387-84858-7 Google Scholar
- 44Urraca-Valle R, Antoñanzas J, Antoñanzas-Torres F, Martínez de Pisón FJ. Estimation of daily global horizontal irradiation using extreme gradient boosting machines. In: M. Graña, J.M. López-Guede, O. Etxaniz, Á. Herrero, H. Quintián, E. Corchado, eds. International joint conference soco'16-cisis'16-iceute'16, Advances in Intelligent Systems and Computing, vol. 527. San Sebastián, Spain: Springer; 2016: 105-113. https://doi.org/10.1007/978-3-319-47364-2_11
- 45Bergstra J, Bengio Y. Random search for hyper-parameter optimization. J Mach Learn Res. 2012; 13: 281-305. http://dl.acm.org/citation.cfm?id=2188395
- 46Augusto A, Conforti R, Dumas M, La Rosa M, Bruno G. Automated discovery of structured process models: Discover structured vs. discover and structure. In: Conceptual Modeling - 35th International Conference, ER 2016; 2016; Gifu, Japan: 313-329.
- 47Andrews R, Suriadi S, Wynn M, et al. Comparing static and dynamic aspects of patient flows via process model visualisations. Preprint available at https://eprints.qut.edu.au/102848/; 2016.
- 48Raudys S, Jain AK. Small sample size effects in statistical pattern recognition: Recommendations for practitioners. IEEE Trans Pattern Anal Mach Intell. 1991; 13(3): 252-264. https://doi.org/10.1109/34.75512
- 49Teinemaa I, Dumas M, La Rosa M, Maggi FM. Outcome-oriented predictive process monitoring: Review and benchmark. CoRR.;abs/1707.06766; 2017.
- 50Ranzato M, Huang FJ, Boureau Y-L, LeCun Y. Unsupervised learning of invariant feature hierarchies with applications to object recognition. In: IEEE Computer Society; 2007; Minneapolis, MN, USA: 1-8. https://doi.org/10.1109/CVPR.2007.383157
- 51Hyndman RJ, Koehler AB. Another look at measures of forecast accuracy. Int J Forecast. 2006; 22(4): 679-688.
- 52Pedregosa F, Varoquaux G, Gramfort A, et al. Scikit-learn: Machine learning in Python. J Mach Learn Res. 2011; 12: 2825-2830.
- 53Fukunaga K, Hayes RR. Effects of sample size in classifier design. IEEE Trans Pattern Anal Mach Intell. 1989; 11(8): 873-885. https://doi.org/10.1109/34.31448
- 54Tabachnick BG. Using Multivariate Statistics: Sas Workbook. Boston, MA: Addison-Wesley; 1996.
- 55Kull M, Flach PA. Reliability maps: A tool to enhance probability estimates and improve classification accuracy. In: Machine Learning and Knowledge Discovery in Databases - European Conference, ECML PKDD 2014; 2014; Berlin, Heidelberg: 18-33.