Volume 2022, Issue 1 3601914
Research Article
Open Access

Hybridizing Grid Search and Support Vector Regression to Predict the Compressive Strength of Fly Ash Concrete

Fei Tang

Fei Tang

School of Architectural Engineering, Xinyang Vocational and Technical College, Xinyang 464000, China

Search for more papers by this author
Yanqi Wu

Corresponding Author

Yanqi Wu

School of Civil Engineering, Southeast University, Nanjing 211189, China seu.edu.bd

Search for more papers by this author
Yisong Zhou

Yisong Zhou

School of Civil Engineering, Xinyang College, Xinyang 464000, China

Search for more papers by this author
First published: 23 February 2022
Citations: 25
Academic Editor: Ravindran Gobinath

Abstract

Support vector regression (SVR) has been applied to the prediction of mechanical properties of concrete, but the selection of its hyperparameters has been a key factor affecting the prediction accuracy. To this end, hybrid machine learning combines the SVR model and grid search (GS), namely, the GS-SVR model was proposed to predict the compressive strength of concrete and sensitivity analysis in this work. The hybrid model was trained and tested on a total of 98 datasets retrieved from literature, and the model performance was compared with the original SVR model under the same datasets. The obtained results in terms of R of 0.981, MSE of 3.44, RMSE of 1.85, MAE of 1.17, and MAPE of 0.05 demonstrate that the GS-SVR model proposed can be a candidate method for compressive strength prediction in subsequent related studies. Additionally, a graphical user interface (GUI) was developed to conveniently provide some initial estimates of the outcomes before performing extensive laboratory or fieldwork. Finally, the effect of each variable on the compressive strength in a random environment was analyzed.

1. Introduction

As the most consumed material in the construction industry, cement has brought great convenience to the construction industry. But the bad news is that it also puts enormous pressure on the environment. Since the production and utilization of cement are accompanied by a large amount of greenhouse gas emissions, many scholars have started to focus on research related to mineral admixtures that can be used to replace cement [15]. Among them, the use of fly ash as an auxiliary cementitious material for the production of a large amount of fly ash concrete is one of the important ways to reduce environmental pollution and realize the resourcefulness of fly ash. Moreover, it is also an effective means for concrete producers to enhance and improve the performance of concrete in all aspects, reduce the use of cement, and lower the cost of concrete [6, 7]. The incorporation of fly ash not only ensures the quality of concrete and reduces the cost of manufacturing concrete but also improves the compatibility, durability, and strength, thus becoming the most widely used alternative and receiving great attention [8].

The importance of concrete materials for the construction industry needs no more introduction [911]. Concrete is used as a construction material worldwide due to its various properties such as strength, durability, stiffness, and fire resistance. Among these properties, compressive strength is considered to be the most important one because it seriously affects the safety and durability of concrete members. Understanding the early behavior of concrete allows appropriate measures to be taken to avoid problems such as cracking and deformation of concrete members, formwork failure, and rework. In addition, early strength prediction and monitoring are important for assessing construction safety, determining structural maturity, and predicting later strength development. The main reason for different compressive strength values of concrete is that concrete is a nonhomogeneous material consisting of binders, aggregates, water, and admixtures. In such a complex mix, it is difficult to find or predict the compressive strength of concrete accurately. The compressive strength of concrete can be assessed in the laboratory by crushing a standard size cylinder or cube. However, such laboratory tests are supposed to be inefficient and uneconomical, as well as labor-intensive and time-consuming. Traditional methods of concrete strength prediction are mainly based on a statistical analysis of linear or nonlinear regression equations, but obtaining accurate equations is difficult and requires a great deal of skill and experience. Due to the large number of concrete ingredients, it seems difficult to establish an explicit equation between the compressive strength and each ingredient as a way to predict its compressive strength.

To address these issues, machine learning techniques are used to predict the compressive strength of concrete. In fact, with the development of artificial intelligence, various machine learning algorithms such as artificial neural network (ANN), support vector machines (SVM), random forest, and extreme learning machine (ELM) have been applied to predict the mechanical properties of concrete [1223]. Ly et al. [24] employed an optimal deep neural network model on a database of 223 experimental data to predict the 28 days compressive strength of rubber concrete and achieved a high prediction accuracy of R = 0.9874. Han et al. [25] utilized an optimized random forest approach on 1030 data samples collected from published literature to predict the compressive strength of high-performance concrete. Furqan et al. [26] used the ANN, SVM, and gene expression programming (GEP) on 300 datasets to predict the compressive strength of self-compacting concrete. Zhang et al. [27] employed nine machine learning models to predict the compressive strength of concrete at the age of 7 days and found that the nonlinear models had better predictive performance than the linear models. Khoa et al. [28] employed deep neural networks (DNNs) and ResNet models for compressive strength prediction of green fly ash-based geopolymer concrete. The results showed that ResNet is superior and indicated that these two machine learning methods can be useful for the mixed design of geopolymer concrete. From the published studies, it can be found that these machine learning algorithms outperform traditional empirical formulations by enabling the capture and mapping of multidimensional nonlinear relationships. It is possible to extract unknown relationships and data information between input and output variables [29].

However, these models also have limitations, and many models require parameter tuning to get better performance. For support vector regression (SVR), the selection of hyperparameters has a great impact on the accuracy of the prediction results. In other words, the prediction accuracy of a single SVR model is limited. To better understand and apply the SVR method, further exploration is still needed in this area using different datasets and optimization algorithms. For this reason, this study aims to propose a hybrid machine learning that combines the SVR model and grid search (GS), namely, the GS-SVR model, to achieve an accurate prediction of the compressive strength of fly ash concrete. Based on the model, the effect of random variation of individual variables on compressive strength is investigated as a reference and guide for concrete mix design and strength prediction.

2. Methodology

2.1. Support Vector Regression

The objective of the SVR is to find a linear regression equation that fits all sample points and minimizes the total variance of the sample points from this regression hyperplane. There is a sample training set E = {(xi, yi)|i = 1,2, …n}, xiRn, yiR. A function f (xi) is probed on Rn, such that yi = f (xi), and there is always a corresponding y-value for any input x. Such a function f (xi) is called a regression function, and f (xi) can be described as follows.
(1)
where ωRn is the weight vector, ϕ(xi) is a nonlinear mapping which serves to map the data from the space Rn to the higher feature space, and b is the bias. Equation (1) can be fitted to all sample points at precision ε.
(2)
Since there is a certain fitting error, the slack variables (ξi, ξi) and the penalty parameter C are introduced. The regression fitting problem is changed to an optimization problem.
(3)
Using Lagrange multipliers for equation (3), the Lagrange function is introduced to obtain its dual form.
(4)
The core of optimization at this point is to first determine the feature space and find the flattest function in that space that satisfies the conditions and then use that function to solve the nonlinear problem. For this reason, the kernel function K(xi, xj), K(xi, xj) = ϕ(xi) · ϕ(xj) is introduced. The regression fitting function at this point is
(5)

There are many choices of kernel functions, and the commonly used RBF function (Figure 1) is chosen in this study [30].

Details are in the caption following the image

2.2. Grid Search Method

It is well known that the identification results depend heavily on the selection of hyperparameters of the SVR model. Since the parameters are highly nonlinear, a large number of experiments are often required to determine the combination of parameters, such as the penalty parameter C and the kernel parameter g. Although Lin et al. [31, 32] have done much extensive research to simplify the application of SVM, such as proposing LIBSVM, the selection of parameters C and g still depends on experience. Therefore, there is an urgent need to implement automatic tuning of parameters to obtain the optimal value once the parameters are entered. The grid search (GS) method can solve these problems. The GS method is to computationally evaluate the impact of each parameter combination on the model performance by iterating through all the candidate parameter choices in a loop to obtain the optimal combination of hyperparameters [33]. The flowchart of parameter selection is shown in Figure 2.

Details are in the caption following the image

3. Dataset Description

3.1. Input and Output Variables

In this study, 98 sets of data were retrieved from the literature. Each dataset consisted of six constituents (cement, fly ash, water, super, plasticizer, coarse aggregate, and fine aggregate) and age as input variables and the compressive strength as output variable. The distribution of the input and output variables is shown in Figure 3, and the statistical characteristics of these variables are given in Table 1. It can be seen that the compressive strength varies greatly for different combinations of input variables. Additionally, Pearson correlation coefficients between input and output variables were plotted, as shown in Figure 4. Within the current dataset, linear correlation between any of the input and output variables is weak, which indicates a complex nonlinear relationship between compressive strength and these input parameters. Therefore, the relationship between compressive strength and these parameters is difficult to reflect by an explicit equation.

Details are in the caption following the image
Details are in the caption following the image
Details are in the caption following the image
Details are in the caption following the image
Details are in the caption following the image
Details are in the caption following the image
Details are in the caption following the image
Details are in the caption following the image
1. Statistical characteristics of variables.
Cement Fly ash Water Super plasticizer Coarse aggregate Fine aggregate Age Compressive strength
Training set Unit kg/m3 kg/m3 kg/m3 kg/m3 kg/m3 kg/m3 d MPa
Count 78 78 78 78 78 78 78 78
Maximum 376 163.3 216.7 18 1118 905.4 90 72.11
Minimum 136.1 92.1 141.1 0 801 700 3 9.49
Standard deviation 54.22 9.68 18.12 4.65 67.60 44.70 29.72 13.37
Mean 243.95 123.54 177.77 6.46 1010.62 800.34 34.06 31.85
Skewness 0.25 −0.05 −0.18 −0.07 −0.97 0.63 0.97 0.63
Kurtosis −0.75 9.44 −0.81 −0.88 0.79 0.23 −0.40 0.02
  
Test set Count 20 20 20 20 20 20 20 20
Maximum 349 168.3 220.5 16.1 1111.6 856.5 90 41.64
Minimum 144 95.7 158.2 0 801.1 687 3 9.55
Standard deviation 59.46 11.81 17.50 5.79 75.95 43.85 25.71 9.53
Mean 231.28 125.19 182.27 5.77 969.55 771.99 32.70 25.49
Skewness 0.46 1.78 0.51 0.34 −0.29 −0.17 1.52 0.21
Kurtosis −0.80 10.57 −0.11 −1.47 −0.05 −0.73 1.65 −1.20
Details are in the caption following the image

3.2. Performance Criteria

To describe and compare the performance of the models, five evaluation metrics, linear correlation coefficient (R), mean squared error (MSE), mean root error (RMSE), mean absolute percentage error (MAPE), and mean absolute error (MAPE), were introduced [34]. These metrics are defined as follows.
(6)
where n is the number of samples, ye is the experimental value of compressive strength, and yp is the predicted value. When R is closer to 1 or the other four error indicators are closer to 0, the prediction performance of the model is better.

4. Model Performance

Initially, GS is used for the selection of hyperparameters in the SVR. For the GS-SVR model, C and g are searched in an exponential grid of [2−8, 28] with step 20.1. The evolution of the parameters is shown in Figure 5. The model is trained using 10-fold cross-validation.

Details are in the caption following the image
Details are in the caption following the image

To highlight the superiority of the proposed method in this study, the original SVR model and GS-SVR model were performed on the same training and test set. The model results are shown in Figure 6. It can be clearly observed that compared with the SVR model, the GS-SVR model results have a closer distribution of data points along the diagonal, indicating that the predicted values match better with the experimental values, and the correlation coefficient R exceeds 0.98 for both the training and test sets.

Details are in the caption following the image
Details are in the caption following the image
Details are in the caption following the image
Details are in the caption following the image

For comparison and evaluation purposes, Figure 7 shows the predicted and experimental values for the training and test sets in more detail. At each sample point, the predicted values agree well with the experimental values, demonstrating the accuracy and effectiveness of the GS-SVR model in capturing the complex nonlinear relationships between the seven input variables and the compressive strength. The error metrics for model training and test are shown in Figure 8. It can be clearly observed from both sets that the four error indicators of the GS-SVR model are smaller, further validating the accuracy and generalization capability of the proposed GS-SVR model.

Details are in the caption following the image
Details are in the caption following the image

5. Sensitivity Analysis

The model results in Section 4 show that the compressive strength depends on the input vector consisting of six ingredients and age. However, these variables have almost no deterministic values in a stochastic environment. This uncertainty may cause the deviation of the predicted and actual values of the concrete compressive strength. Therefore, this section focuses on the effect of the random variation degree of each input variable on the compressive strength. According to the statistical characteristics of the dataset given in Table 1, a set of input vectors with deterministic configurations is given in Table 2. The input variables are assumed to be at three different values of degree of stochasticity S0 = (0.05, 0.1, 0.15). In each stochastic setting, 104 samples were generated using MATLAB. To quantify and compare the degree of influence on the output variable, the sensitivity of the random input to the compressive strength is defined as follows [35, 36].
(7)
where σ and μ are the standard deviation and mean values of the compressive strength, respectively. The distribution of compressive strength for the three randomness settings is shown in Figure 9.
2. Deterministic values of configuration parameters.
Variable Unit Value
Cement kg/m3 230
Fly ash kg/m3 125
Water kg/m3 180
Superplasticizer kg/m3 6
Coarse aggregate kg/m3 1000
Fine aggregate kg/m3 800
Age d 28
Details are in the caption following the image
Details are in the caption following the image
Details are in the caption following the image
Details are in the caption following the image

As the randomness S0 increases, the sensitivity of compressive strength for the three cases is higher. Additionally, among the seven variables, fly ash and coarse aggregate resulted in a more discrete distribution of predicted compressive strength values, where fly ash has the greatest influence on the predicted uncertainty of compressive strength values. This insight might also be observed in Figure 9(d). Designers should take more attention to the design of mix design and compressive strength prediction of concrete with fly ash admixture in stochastic environments.

6. GS-SVR Model-Based Interactive Graphical User Interface

Nowadays, structural designers and engineers prefer to develop more robust and user-friendly software to gain wider applicability. In fact, to ensure that the model developed in this study is useful and practical and for ease of use, a graphical user interface (GUI) was compiled using MATLAB as shown in Figure 10. The whole interface is divided into two main parts: the optimization of hyperparameters and the prediction of output results with known input parameters. The operation of the GUI can be obtained by clicking on the Help menu, and the whole process consists of four main steps.

Details are in the caption following the image

Step 1. Click the Initialize setting button to get the default values of the parameters; these values can also be modified manually.

Step 2. Click the Optimization button to obtain the optimal values of parameters C and g

Step 3. Manually enter each input parameter

Step 4. Click the Predict button to get the final output value

Finally, the compressive strength of fly ash concrete is displayed directly by clicking the Predict button. This GUI was developed mainly for the dataset used in this study, and future optimization of the user interface such as adding new datasets and other influencing parameters can be considered to make the model predictions more accurate.

7. Conclusions

In this work, a hybrid machine learning model GS-SVR was employed to predict the compressive strength of concrete with fly ash admixture and quantity the sensitivity of the compressive strength in the stochastic environment. The main findings are as follows.
  • (1)

    The proposed GS-SVR model can accurately capture the complex nonlinear relationship between the seven input parameters and the compressive strength of concrete with an accuracy R of over 98% in both the training and test phases

  • (2)

    From the performance criteria, the prediction performance of the proposed model is better than that of the original SVR model, which is a promising candidate for evaluating the compressive strength of fly ash concrete

  • (3)

    In the stochastic environment, for the dataset used in this study, the compressive strength of concrete with fly ash admixture is most sensitive to fly ash, followed by the coarse aggregate, and the sensitivity to the other five input variables is weak.

  • (4)

    As the randomness of variables increases, the distribution range of compressive strength becomes wider and the dispersion becomes larger, and designers and engineers should pay more attention to the effect of random variation of fly ash and coarse aggregate on strength uncertainty.

  • (5)

    This study provides a new GUI that can be easily used to predict the compressive strength of fly ash concrete. This tool has been proven to be very successful, exhibiting very reliable predictions. Otherwise, it is idealistic to have some initial estimates of the outcomes before performing any extensive laboratory work or fieldwork.

This work also exhibits several limitations that need to be investigated in the future. First, the dataset used in this study is not large enough, and the effects of aggregate size and water reducing agent type on model prediction accuracy are not studied due to the lack of the dataset. Second, other machine learning algorithms or hybrid models can also be developed appropriately if higher prediction accuracy can be obtained. Finally, the current GUI is relatively simple and rough. As new datasets are added, the model needs to be retrained, and the GUI needs to be further updated and improved.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

The authors are grateful for the financial support from the Key Scientific and Technological Research Projects of Henan Province (222102210306).

    Data Availability

    The dataset used to support the findings of this study and the GUI are available from the corresponding author upon request.

      The full text of this article hosted at iucr.org is unavailable due to technical difficulties.