Artificial neural network is a machine learning algorithm that has been widely used in many application areas. The performance of the algorithm depends on the type of the problem, the size of the problem, and the architecture of the algorithm. One of the most important things that affect ANN's performance is the selection of hyper-parameters, but there is not a specific rule to determine the hyper-parameters of the algorithm. Although there is no single well-known method in hyper-parameter tuning, this issue has been discussed in many studies. In this study, a central composite design which is a successful response surface methodology technique that considers factor interactions is used for hyper-parameter optimization. A categorical central composite design that has 39 experimental runs was used to predict accuracy and F-score. The effect of ANN hyper-parameters on selected performance indicators is investigated using two different size customer churn prediction data set, which is widely used in the literature. Using the desirability functions, multiple objectives are combined and the best hyper-parameter levels are selected. As a result of verification tests, the accuracy values are 85.79% and 83.29% for the first and second data set, respectively. And, the F-score values are 86.15% and 82.33% for the first and second data set, respectively. The results show the effectiveness of the adopted RSM technique.

CONFLICT OF INTEREST

The authors declare that they have no conflict of interest.

Open Research

DATA AVAILABILITY STATEMENT

The data that support the findings of this study are openly available in Kaggle at https://www.kaggle.com/blastchar/telco-customer-churn.

The data that support the findings of this study are openly available in CrowdAnalytix at https://www.crowdanalytix.com/contests/why-customer-churn.

Supporting Information

REFERENCES

Abiodun, O. I., Jantan, A., Omolara, A. E., Dada, K. V., Umar, A. M., Linus, O. U., & Kiru, M. U. (2019). Comprehensive review of artificial neural network applications to pattern recognition. IEEE Access, 7, 158820.
10.1109/ACCESS.2019.2945545
Web of Science® Google Scholar
Adwan, O., Faris, H., Jaradat, K., Harfoushi, O., Ghatasheh, N., & Abdullah, K. (2014). Predicting customer churn in telecom industry using multilayer preceptron neural networks: Modeling and analysis. Life Science Journal, 11(3), 75–81.
Google Scholar
Ahangar, R. G., Yahyazadehfar, M., & Pournaghshband, H. (2010). The comparison of methods artificial neural network with linear regression using specific variables for prediction stock price in Tehran stock exchange. International Journal of Computer Science and Information Security, 7(2), 038–046.
Google Scholar
Ahmad, F., Isa, N. A. M., Hussain, Z., Osman, M. K., & Sulaiman, S. N. (2015). A GA-based feature selection and parameter optimization of an ANN in diagnosing breast cancer. Pattern Analysis and Applications, 18(4), 861–870.
10.1007/s10044-014-0375-9
Web of Science® Google Scholar
Alawadi, S., Mera, D., Fernández-Delgado, M., Alkhabbas, F., Olsson, C. M., & Davidsson, P. (2020). A comparison of machine learning algorithms for forecasting indoor temperature in smart buildings. Energy Systems, 12, 1–17.
Google Scholar
Anderson, M. J., & Whitcomb, P. J. (2016). RSM simplified: Optimizing processes using response surface methods for design of experiments. Productivity press.
10.1201/9781315382326
Google Scholar
Aydin, D. (2009). A comparative study of neural networks and non-parametric regression models for trend and seasonal time series. Trends in Applied Sciences Research, 4(3), 126–137.
10.3923/tasr.2009.126.137
Google Scholar
Aytar, P., Gedikli, S., Buruk, Y., Cabuk, A., & Burnak, N. (2014). Lead and nickel biosorption with a fungal biomass isolated from metal mine drainage: Box–Behnken experimental design. International Journal of Environmental Science and Technology, 11(6), 1631–1640.
10.1007/s13762-013-0354-5
CAS Web of Science® Google Scholar
Bengio, Y. (2012). Practical recommendations for gradient-based training of deep architectures. In Neural networks: Tricks of the trade (pp. 437–478). Springer.
10.1007/978-3-642-35289-8_26
Google Scholar
Bergstra, J., & Bengio, Y. (2012). Random search for hyper-parameter optimization. Journal of Machine Learning Research, 13(1), 281–305.
Google Scholar
Bergstra, J. S., Bardenet, R., Bengio, Y., & Kégl, B. (2011). Algorithms for hyper-parameter optimization. In 25th Annual Conference on Neural Information Processing Systems (NIPS 2011), pp. 2546–2554.
Google Scholar
Bezerra, M. A., Santelli, R. E., Oliveira, E. P., Villar, L. S., & Escaleira, L. A. (2008). Response surface methodology (RSM) as a tool for optimization in analytical chemistry. Talanta, 76(5), 965–977.
10.1016/j.talanta.2008.05.019
CAS PubMed Web of Science® Google Scholar
Box, G. E., & Draper, N. R. (1987). Empirical model-building and response surfaces. John Wiley & Sons.
Google Scholar
Box, G. E., & Wilson, K. B. (1992). On the experimental attainment of optimum conditions. In Breakthroughs in statistics. Springer.
10.1007/978-1-4612-4380-9_23
Google Scholar
Derringer, G., & Suich, R. (1980). Simultaneous optimization of several response variables. Journal of Quality Technology, 12(4), 214–219.
10.1080/00224065.1980.11980968
Web of Science® Google Scholar
Dobslaw, F. A. (2010). Parameter tuning framework for metaheuristics based on design of experiments and artificial neural networks. In International Conference on Computer Mathematics and Natural Computing. WASET.
Google Scholar
Eiben, A. E., & Smit, S. K. (2011). Parameter tuning for configuring and analyzing evolutionary algorithms. Swarm and Evolutionary Computation, 1(1), 19–31.
10.1016/j.swevo.2011.02.001
Web of Science® Google Scholar
Ernst, O. K. (2014). Stochastic gradient descent learning and the backpropagation algorithm. (Techical Report). University of California, San Diego, La Jolla, CA.
Google Scholar
Faris, H. (2018). A hybrid swarm intelligent neural network model for customer churn prediction and identifying the influencing factors. Information, 9(11), 288.
10.3390/info9110288
Web of Science® Google Scholar
Feurer, M., & Hutter, F. (2019). Hyperparameter optimization. In Automated machine learning. The Springer series on challenges in machine learning (pp. 3–33). Springer.
10.1007/978-3-030-05318-5_1
Google Scholar
García, D. L., Nebot, À., & Vellido, A. (2017). Intelligent data analysis approaches to churn as a business problem: A survey. Knowledge and Information Systems, 51(3), 719–774.
10.1007/s10115-016-0995-z
Web of Science® Google Scholar
Hinton, G. (2012). A practical guide to training restricted Boltzmann machines. In Neural networks: Tricks of the trade (pp. 599–619). Springer.
10.1007/978-3-642-35289-8_32
Google Scholar
Kamalraj, N., & Malathi, A. (2013). A survey on churn prediction techniques in communication sector. International Journal of Computer Applications, 64(5), 39–42.
10.5120/10633-5373
Google Scholar
Kaneko, H., & Funatsu, K. (2015). Fast optimization of hyperparameters for support vector regression models with highly predictive ability. Chemometrics and Intelligent Laboratory Systems, 142, 64–69.
10.1016/j.chemolab.2015.01.001
CAS Web of Science® Google Scholar
Kayaalp, F. (2017). Review of customer churn analysis studies in telecommunications industry. Karaelmas Science and Engineering Journal, 7(2), 696–705.
Google Scholar
Keerthi, S. S., & Lin, C. J. (2003). Asymptotic behaviors of support vector machines with Gaussian kernel. Neural Computation, 15(7), 1667–1689.
10.1162/089976603321891855
PubMed Web of Science® Google Scholar
Kim, Y. A., Song, H. S., & Kim, S. H. (2005). Strategies for preventing defection based on the mean time to defection and their implementations on a self-organizing map. Expert Systems, 22(5), 265–278.
10.1111/j.1468-0394.2005.00317.x
Web of Science® Google Scholar
Larochelle, H., Erhan, D., Courville, A., Bergstra, J., & Bengio, Y. (2007). An empirical evaluation of deep architectures on problems with many factors of variation. In Proceedings of the 24th International Conference on Machine Learning (pp. 473–480).
Google Scholar
Larose, D. T., & Larose, C. D. (2014). Discovering knowledge in data: An introduction to data mining (Vol. 4). John Wiley & Sons.
10.1002/9781118874059
Google Scholar
Liu, D. R., Liao, H. Y., Chen, K. Y., & Chiu, Y. L. (2019). Churn prediction and social neighbour influences for different types of user groups in virtual worlds. Expert Systems, 36(3), e12384.
10.1111/exsy.12384
Web of Science® Google Scholar
Ljubuncic, I. (2015). Problem-solving in high performance computing: A situational awareness approach with Linux. Morgan Kaufmann.
Google Scholar
Lucas, Y., Domingues, A., Driouchi, D., & Treuillet, S. (2006). Design of experiments for performance evaluation and parameter tuning of a road image processing chain. EURASIP Journal on Advances in Signal Processing, 2006, 1–10.
10.1155/ASP/2006/48012
Web of Science® Google Scholar
Lujan-Moreno, G. A., Howard, P. R., Rojas, O. G., & Montgomery, D. C. (2018). Design of experiments and response surface methodology to tune machine learning hyperparameters, with a random forest case-study. Expert Systems with Applications, 109, 195–205.
10.1016/j.eswa.2018.05.024
Web of Science® Google Scholar
Luo, G. (2016). A review of automatic selection methods for machine learning algorithms and hyper-parameter values. Network Modeling Analysis in Health Informatics and Bioinformatics, 5(1), 1–16.
10.1007/s13721-016-0125-6
CAS Google Scholar
Michell, T. (1997). Machine learning. McGraw Hill.
Google Scholar
Mobin, M., Mousavi, S. M., Komaki, M., & Tavana, M. (2018). A hybrid desirability function approach for tuning parameters in evolutionary optimization algorithms. Measurement, 114, 417–427.
10.1016/j.measurement.2017.10.009
Web of Science® Google Scholar
Močkus, J. (1975). On Bayesian methods for seeking the extremum. In Optimization techniques IFIP technical conference (pp. 400–404). Springer.
10.1007/978-3-662-38527-2_55
Google Scholar
Montgomery, D. C. (2017). Design and analysis of experiments. John Wiley & Sons.
Google Scholar
Myers, R. H., Montgomery, D. C., & Anderson-Cook, C. M. (2016). Response surface methodology: Process and product optimization using designed experiments ( 3rd ed.). John Wiley & Sons.
Google Scholar
Ridge, E. (2007). Design of experiments for the tuning of optimisation algorithms. University of York, Department of Computer Science.
Google Scholar
Ridge, E., & Kudenko, D. (2006, September). Sequential experiment designs for screening and tuning parameters of stochastic heuristics. In Workshop on Empirical Methods for the Analysis of Algorithms at the Ninth International Conference on Parallel Problem Solving from Nature (PPSN) (pp. 27–34).
Google Scholar
Rodan, A., Faris, H., Alsakran, J., & Al-Kadi, O. (2014). A support vector machine approach for churn prediction in telecom industry. International Journal on Information, 17(8), 3961–3970.
Google Scholar
Sahin, Y. B., Celik, O. N., Burnak, N., & Demirtas, E. A. (2016). Modeling and analysis of the effects of nano-oil additives on wear properties of AISI 4140 steel material using mixture design. Proceedings of the Institution of Mechanical Engineers, Part J: Journal of Engineering Tribology, 230(4), 442–451.
10.1177/1350650115604861
CAS Web of Science® Google Scholar
Salzberg, S. L. (1997). On comparing classifiers: Pitfalls to avoid and a recommended approach. Data Mining and Knowledge Discovery, 1(3), 317–328.
10.1023/A:1009752403260
Web of Science® Google Scholar
Sharma, A., Panigrahi, D., & Kumar, P. (2011). A neural network based approach for predicting customer churn in cellular network services. International Journal of Computer Applications, 27(11), 26–31.
10.5120/3344-4605
Google Scholar
Snoek, J., Larochelle, H., & Adam, R. P. (2012). Practical bayesian optimization of machine learning algorithms. In Advances in neural information processing systems (pp. 2951–2959). The MIT Press.
Google Scholar
Staelin, C. (2003). Parameter selection for support vector machines (Tech. Rep. HPL-2002-354(R.1)). The University of York, Department of Computer Science.
Google Scholar
Türkan, Y. S., Aydoğmuş, H. Y., & Erdal, H. (2016). The prediction of the wind speed at different heights by machine learning methods. An International Journal of Optimization and Control: Theories & Applications (IJOCTA), 6(2), 179–187.
10.11121/ijocta.01.2016.00315
Google Scholar
Vafeiadis, T., Diamantaras, K. I., Sarigiannidis, G., & Chatzisavvas, K. C. (2015). A comparison of machine learning techniques for customer churn prediction. Simulation Modelling Practice and Theory, 55, 1–9.
10.1016/j.simpat.2015.03.003
Web of Science® Google Scholar
Van Rijn, J. N., & Hutter, F. (2018, July). Hyperparameter importance across datasets. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (pp. 2367–2376).
Google Scholar
Villaescusa, E. (2014). Geotechnical design for sublevel open stoping (p. 541). CRC Press.
Google Scholar
Wang, S., Tang, C., Sun, J., & Zhang, Y. (2019). Cerebral micro-bleeding detection based on densely connected neural network. Frontiers in Neuroscience, 13, 422.
10.3389/fnins.2019.00422
PubMed Web of Science® Google Scholar
Wu, J., Chen, X. Y., Zhang, H., Xiong, L. D., Lei, H., & Deng, S. H. (2019). Hyperparameter optimization for machine learning models based on Bayesian optimization. Journal of Electronic Science and Technology, 17(1), 26–40.
Google Scholar
Xia, G. E., & Jin, W. D. (2008). Model of customer churn prediction on support vector machine. Systems Engineering - Theory & Practice, 28(1), 71–77.
10.1016/S1874-8651(09)60003-X
Google Scholar
Yang, L., & Shami, A. (2020). On hyperparameter optimization of machine learning algorithms: Theory and practice. Neurocomputing, 415, 295–316.
10.1016/j.neucom.2020.07.061
Web of Science® Google Scholar
Yann, A., LeCun, Y. A., Bottou, L., Orr, G. B., & Müller, K. (2012). Efficient backprop. In Neural networks: Tricks of the trade (pp. 9–48). Springer.
Google Scholar
Yu, R., An, X., Jin, B., Shi, J., Move, O. A., & Liu, Y. (2018). Particle classification optimization-based BP network for telecommunication customer churn prediction. Neural Computing and Applications, 29(3), 707–720.
10.1007/s00521-016-2477-3
Web of Science® Google Scholar
Zhang, G. P. (2000). Neural networks for classification: A survey. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 30(4), 451–462.
10.1109/5326.897072
Web of Science® Google Scholar
Zhang, Y. D., Nayak, D. R., Zhang, X., & Wang, S. H. (2020). Diagnosis of secondary pulmonary tuberculosis by an eight-layer improved convolutional neural network with stochastic pooling and hyperparameter optimization. Journal of Ambient Intelligence and Humanized Computing, 12, 1–18.
Google Scholar

Citing Literature

Volume38, Issue8

December 2021

e12792

Response surface methodology to tune artificial neural network hyper-parameters

Abstract

CONFLICT OF INTEREST

Open Research

DATA AVAILABILITY STATEMENT

Supporting Information

REFERENCES

Citing Literature

References

Information

About Wiley Online Library

Help & Support

Opportunities

Connect with Wiley

Response surface methodology to tune artificial neural network hyper-parameters

Abstract

CONFLICT OF INTEREST

Open Research

DATA AVAILABILITY STATEMENT

Supporting Information

REFERENCES

Citing Literature

References

Related

Information