Response surface methodology to tune artificial neural network hyper-parameters
Corresponding Author
Sinem Bozkurt Keser
Department of Computer Engineering, Eskisehir Osmangazi University, Eskisehir, Turkey
Correspondence
Sinem Bozkurt Keser, Department of Computer Engineering, Eskisehir Osmangazi University, Eskisehir, Turkey.
Email: [email protected]
Search for more papers by this authorYeliz Buruk Sahin
Department of Industrial Engineering, Eskisehir Osmangazi University, Eskisehir, Turkey
Search for more papers by this authorCorresponding Author
Sinem Bozkurt Keser
Department of Computer Engineering, Eskisehir Osmangazi University, Eskisehir, Turkey
Correspondence
Sinem Bozkurt Keser, Department of Computer Engineering, Eskisehir Osmangazi University, Eskisehir, Turkey.
Email: [email protected]
Search for more papers by this authorYeliz Buruk Sahin
Department of Industrial Engineering, Eskisehir Osmangazi University, Eskisehir, Turkey
Search for more papers by this authorAbstract
Artificial neural network is a machine learning algorithm that has been widely used in many application areas. The performance of the algorithm depends on the type of the problem, the size of the problem, and the architecture of the algorithm. One of the most important things that affect ANN's performance is the selection of hyper-parameters, but there is not a specific rule to determine the hyper-parameters of the algorithm. Although there is no single well-known method in hyper-parameter tuning, this issue has been discussed in many studies. In this study, a central composite design which is a successful response surface methodology technique that considers factor interactions is used for hyper-parameter optimization. A categorical central composite design that has 39 experimental runs was used to predict accuracy and F-score. The effect of ANN hyper-parameters on selected performance indicators is investigated using two different size customer churn prediction data set, which is widely used in the literature. Using the desirability functions, multiple objectives are combined and the best hyper-parameter levels are selected. As a result of verification tests, the accuracy values are 85.79% and 83.29% for the first and second data set, respectively. And, the F-score values are 86.15% and 82.33% for the first and second data set, respectively. The results show the effectiveness of the adopted RSM technique.
CONFLICT OF INTEREST
The authors declare that they have no conflict of interest.
Open Research
DATA AVAILABILITY STATEMENT
The data that support the findings of this study are openly available in Kaggle at https://www.kaggle.com/blastchar/telco-customer-churn.
The data that support the findings of this study are openly available in CrowdAnalytix at https://www.crowdanalytix.com/contests/why-customer-churn.
Supporting Information
Filename | Description |
---|---|
exsy12792-sup-0001-Supinfo.zipapplication/x-zip-compressed, 307 KB | Appendix S1. Supporting Information. |
Please note: The publisher is not responsible for the content or functionality of any supporting information supplied by the authors. Any queries (other than missing content) should be directed to the corresponding author for the article.
REFERENCES
- Abiodun, O. I., Jantan, A., Omolara, A. E., Dada, K. V., Umar, A. M., Linus, O. U., & Kiru, M. U. (2019). Comprehensive review of artificial neural network applications to pattern recognition. IEEE Access, 7, 158820.
- Adwan, O., Faris, H., Jaradat, K., Harfoushi, O., Ghatasheh, N., & Abdullah, K. (2014). Predicting customer churn in telecom industry using multilayer preceptron neural networks: Modeling and analysis. Life Science Journal, 11(3), 75–81.
- Ahangar, R. G., Yahyazadehfar, M., & Pournaghshband, H. (2010). The comparison of methods artificial neural network with linear regression using specific variables for prediction stock price in Tehran stock exchange. International Journal of Computer Science and Information Security, 7(2), 038–046.
- Ahmad, F., Isa, N. A. M., Hussain, Z., Osman, M. K., & Sulaiman, S. N. (2015). A GA-based feature selection and parameter optimization of an ANN in diagnosing breast cancer. Pattern Analysis and Applications, 18(4), 861–870.
- Alawadi, S., Mera, D., Fernández-Delgado, M., Alkhabbas, F., Olsson, C. M., & Davidsson, P. (2020). A comparison of machine learning algorithms for forecasting indoor temperature in smart buildings. Energy Systems, 12, 1–17.
- Anderson, M. J., & Whitcomb, P. J. (2016). RSM simplified: Optimizing processes using response surface methods for design of experiments. Productivity press.
10.1201/9781315382326 Google Scholar
- Aydin, D. (2009). A comparative study of neural networks and non-parametric regression models for trend and seasonal time series. Trends in Applied Sciences Research, 4(3), 126–137.
10.3923/tasr.2009.126.137 Google Scholar
- Aytar, P., Gedikli, S., Buruk, Y., Cabuk, A., & Burnak, N. (2014). Lead and nickel biosorption with a fungal biomass isolated from metal mine drainage: Box–Behnken experimental design. International Journal of Environmental Science and Technology, 11(6), 1631–1640.
- Bengio, Y. (2012). Practical recommendations for gradient-based training of deep architectures. In Neural networks: Tricks of the trade (pp. 437–478). Springer.
10.1007/978-3-642-35289-8_26 Google Scholar
- Bergstra, J., & Bengio, Y. (2012). Random search for hyper-parameter optimization. Journal of Machine Learning Research, 13(1), 281–305.
- Bergstra, J. S., Bardenet, R., Bengio, Y., & Kégl, B. (2011). Algorithms for hyper-parameter optimization. In 25th Annual Conference on Neural Information Processing Systems (NIPS 2011), pp. 2546–2554.
- Bezerra, M. A., Santelli, R. E., Oliveira, E. P., Villar, L. S., & Escaleira, L. A. (2008). Response surface methodology (RSM) as a tool for optimization in analytical chemistry. Talanta, 76(5), 965–977.
- Box, G. E., & Draper, N. R. (1987). Empirical model-building and response surfaces. John Wiley & Sons.
- Box, G. E., & Wilson, K. B. (1992). On the experimental attainment of optimum conditions. In Breakthroughs in statistics. Springer.
10.1007/978-1-4612-4380-9_23 Google Scholar
- Derringer, G., & Suich, R. (1980). Simultaneous optimization of several response variables. Journal of Quality Technology, 12(4), 214–219.
- Dobslaw, F. A. (2010). Parameter tuning framework for metaheuristics based on design of experiments and artificial neural networks. In International Conference on Computer Mathematics and Natural Computing. WASET.
- Eiben, A. E., & Smit, S. K. (2011). Parameter tuning for configuring and analyzing evolutionary algorithms. Swarm and Evolutionary Computation, 1(1), 19–31.
- Ernst, O. K. (2014). Stochastic gradient descent learning and the backpropagation algorithm. (Techical Report). University of California, San Diego, La Jolla, CA.
- Faris, H. (2018). A hybrid swarm intelligent neural network model for customer churn prediction and identifying the influencing factors. Information, 9(11), 288.
- Feurer, M., & Hutter, F. (2019). Hyperparameter optimization. In Automated machine learning. The Springer series on challenges in machine learning (pp. 3–33). Springer.
10.1007/978-3-030-05318-5_1 Google Scholar
- García, D. L., Nebot, À., & Vellido, A. (2017). Intelligent data analysis approaches to churn as a business problem: A survey. Knowledge and Information Systems, 51(3), 719–774.
- Hinton, G. (2012). A practical guide to training restricted Boltzmann machines. In Neural networks: Tricks of the trade (pp. 599–619). Springer.
10.1007/978-3-642-35289-8_32 Google Scholar
- Kamalraj, N., & Malathi, A. (2013). A survey on churn prediction techniques in communication sector. International Journal of Computer Applications, 64(5), 39–42.
10.5120/10633-5373 Google Scholar
- Kaneko, H., & Funatsu, K. (2015). Fast optimization of hyperparameters for support vector regression models with highly predictive ability. Chemometrics and Intelligent Laboratory Systems, 142, 64–69.
- Kayaalp, F. (2017). Review of customer churn analysis studies in telecommunications industry. Karaelmas Science and Engineering Journal, 7(2), 696–705.
- Keerthi, S. S., & Lin, C. J. (2003). Asymptotic behaviors of support vector machines with Gaussian kernel. Neural Computation, 15(7), 1667–1689.
- Kim, Y. A., Song, H. S., & Kim, S. H. (2005). Strategies for preventing defection based on the mean time to defection and their implementations on a self-organizing map. Expert Systems, 22(5), 265–278.
- Larochelle, H., Erhan, D., Courville, A., Bergstra, J., & Bengio, Y. (2007). An empirical evaluation of deep architectures on problems with many factors of variation. In Proceedings of the 24th International Conference on Machine Learning (pp. 473–480).
- Larose, D. T., & Larose, C. D. (2014). Discovering knowledge in data: An introduction to data mining (Vol. 4). John Wiley & Sons.
10.1002/9781118874059 Google Scholar
- Liu, D. R., Liao, H. Y., Chen, K. Y., & Chiu, Y. L. (2019). Churn prediction and social neighbour influences for different types of user groups in virtual worlds. Expert Systems, 36(3), e12384.
- Ljubuncic, I. (2015). Problem-solving in high performance computing: A situational awareness approach with Linux. Morgan Kaufmann.
- Lucas, Y., Domingues, A., Driouchi, D., & Treuillet, S. (2006). Design of experiments for performance evaluation and parameter tuning of a road image processing chain. EURASIP Journal on Advances in Signal Processing, 2006, 1–10.
- Lujan-Moreno, G. A., Howard, P. R., Rojas, O. G., & Montgomery, D. C. (2018). Design of experiments and response surface methodology to tune machine learning hyperparameters, with a random forest case-study. Expert Systems with Applications, 109, 195–205.
- Luo, G. (2016). A review of automatic selection methods for machine learning algorithms and hyper-parameter values. Network Modeling Analysis in Health Informatics and Bioinformatics, 5(1), 1–16.
- Michell, T. (1997). Machine learning. McGraw Hill.
- Mobin, M., Mousavi, S. M., Komaki, M., & Tavana, M. (2018). A hybrid desirability function approach for tuning parameters in evolutionary optimization algorithms. Measurement, 114, 417–427.
- Močkus, J. (1975). On Bayesian methods for seeking the extremum. In Optimization techniques IFIP technical conference (pp. 400–404). Springer.
10.1007/978-3-662-38527-2_55 Google Scholar
- Montgomery, D. C. (2017). Design and analysis of experiments. John Wiley & Sons.
- Myers, R. H., Montgomery, D. C., & Anderson-Cook, C. M. (2016). Response surface methodology: Process and product optimization using designed experiments ( 3rd ed.). John Wiley & Sons.
- Ridge, E. (2007). Design of experiments for the tuning of optimisation algorithms. University of York, Department of Computer Science.
- Ridge, E., & Kudenko, D. (2006, September). Sequential experiment designs for screening and tuning parameters of stochastic heuristics. In Workshop on Empirical Methods for the Analysis of Algorithms at the Ninth International Conference on Parallel Problem Solving from Nature (PPSN) (pp. 27–34).
- Rodan, A., Faris, H., Alsakran, J., & Al-Kadi, O. (2014). A support vector machine approach for churn prediction in telecom industry. International Journal on Information, 17(8), 3961–3970.
- Sahin, Y. B., Celik, O. N., Burnak, N., & Demirtas, E. A. (2016). Modeling and analysis of the effects of nano-oil additives on wear properties of AISI 4140 steel material using mixture design. Proceedings of the Institution of Mechanical Engineers, Part J: Journal of Engineering Tribology, 230(4), 442–451.
- Salzberg, S. L. (1997). On comparing classifiers: Pitfalls to avoid and a recommended approach. Data Mining and Knowledge Discovery, 1(3), 317–328.
- Sharma, A., Panigrahi, D., & Kumar, P. (2011). A neural network based approach for predicting customer churn in cellular network services. International Journal of Computer Applications, 27(11), 26–31.
10.5120/3344-4605 Google Scholar
- Snoek, J., Larochelle, H., & Adam, R. P. (2012). Practical bayesian optimization of machine learning algorithms. In Advances in neural information processing systems (pp. 2951–2959). The MIT Press.
- Staelin, C. (2003). Parameter selection for support vector machines (Tech. Rep. HPL-2002-354(R.1)). The University of York, Department of Computer Science.
- Türkan, Y. S., Aydoğmuş, H. Y., & Erdal, H. (2016). The prediction of the wind speed at different heights by machine learning methods. An International Journal of Optimization and Control: Theories & Applications (IJOCTA), 6(2), 179–187.
10.11121/ijocta.01.2016.00315 Google Scholar
- Vafeiadis, T., Diamantaras, K. I., Sarigiannidis, G., & Chatzisavvas, K. C. (2015). A comparison of machine learning techniques for customer churn prediction. Simulation Modelling Practice and Theory, 55, 1–9.
- Van Rijn, J. N., & Hutter, F. (2018, July). Hyperparameter importance across datasets. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (pp. 2367–2376).
- Villaescusa, E. (2014). Geotechnical design for sublevel open stoping (p. 541). CRC Press.
- Wang, S., Tang, C., Sun, J., & Zhang, Y. (2019). Cerebral micro-bleeding detection based on densely connected neural network. Frontiers in Neuroscience, 13, 422.
- Wu, J., Chen, X. Y., Zhang, H., Xiong, L. D., Lei, H., & Deng, S. H. (2019). Hyperparameter optimization for machine learning models based on Bayesian optimization. Journal of Electronic Science and Technology, 17(1), 26–40.
- Xia, G. E., & Jin, W. D. (2008). Model of customer churn prediction on support vector machine. Systems Engineering - Theory & Practice, 28(1), 71–77.
10.1016/S1874-8651(09)60003-X Google Scholar
- Yang, L., & Shami, A. (2020). On hyperparameter optimization of machine learning algorithms: Theory and practice. Neurocomputing, 415, 295–316.
- Yann, A., LeCun, Y. A., Bottou, L., Orr, G. B., & Müller, K. (2012). Efficient backprop. In Neural networks: Tricks of the trade (pp. 9–48). Springer.
- Yu, R., An, X., Jin, B., Shi, J., Move, O. A., & Liu, Y. (2018). Particle classification optimization-based BP network for telecommunication customer churn prediction. Neural Computing and Applications, 29(3), 707–720.
- Zhang, G. P. (2000). Neural networks for classification: A survey. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 30(4), 451–462.
- Zhang, Y. D., Nayak, D. R., Zhang, X., & Wang, S. H. (2020). Diagnosis of secondary pulmonary tuberculosis by an eight-layer improved convolutional neural network with stochastic pooling and hyperparameter optimization. Journal of Ambient Intelligence and Humanized Computing, 12, 1–18.