Volume 2025, Issue 1 7386022
Research Article
Open Access

Automatic Water Seepage Depth Detection in Concrete Structures Using Percussion Method Combined With Deep Learning Network

Wenjie Huang

Wenjie Huang

School of Urban Construction , Yangtze University , Jingzhou , 434023 , China , yangtzeu.edu.cn

Search for more papers by this author
Kai Zhou

Corresponding Author

Kai Zhou

School of Urban Construction , Yangtze University , Jingzhou , 434023 , China , yangtzeu.edu.cn

Search for more papers by this author
Jicheng Zhang

Jicheng Zhang

School of Urban Construction , Yangtze University , Jingzhou , 434023 , China , yangtzeu.edu.cn

Search for more papers by this author
Longguang Peng

Longguang Peng

College of Civil Engineering , Fuzhou University , Fuzhou , 350108 , China , fzu.edu.cn

Search for more papers by this author
Guofeng Du

Guofeng Du

School of Urban Construction , Yangtze University , Jingzhou , 434023 , China , yangtzeu.edu.cn

Search for more papers by this author
Zezhong Zheng

Zezhong Zheng

School of Urban Construction , Yangtze University , Jingzhou , 434023 , China , yangtzeu.edu.cn

Search for more papers by this author
First published: 20 January 2025
Academic Editor: Jun Li

Abstract

Water seepage in concrete can significantly degrade the durability of hydraulic concrete structures. Therefore, this paper introduces a new method that combines the percussion method with deep learning techniques to detect the depth of water seepage in concrete structures. Initially, percussion sound signals were collected for different water seepage depths. Then, the proposed one-dimensional convolutional bidirectional gated recurrent unit (BiGRU) network with wide first-layer kernel (1D-WCBGRU) classifies the percussion sound signals for different water seepage depths. The 1D-WCBGRU uses a wide first convolutional kernel to extract features directly from the original percussion signals without the need to extract features manually. Subsequently, the BiGRU is utilized to capture long short-term information from the data, thereby enhancing feature separability and improving the classification accuracy and robustness of the model. Experiments confirm that the 1D-WCBGRU exhibits excellent performance in the seepage depth detection task compared to traditional learning algorithms.

1. Introduction

The use of concrete in buildings is increasingly widespread with the growth of the construction industry. Numerous concrete structures, such as dams, offshore buildings, and marine platforms, have either partial or complete surface exposure to environmental conditions involving water contact [1, 2]. The water permeability of the concrete structure is crucial for its durability, and it is essentially related to the concrete moisture content [3]. Therefore, the internal moisture condition of the concrete component plays a crucial role in determining its serviceability. In addition, moisture in concrete pores results in a variety of physicochemical erosion, such as chloride diffusion [4], steel reinforcement corrosion [5, 6], and alkali–aggregate reactions. Therefore, it is crucial to estimate the variation of moisture within concrete structures. For large-scale hydraulic concrete structures (e.g., dams, bridge abutments, and retaining walls), the water seepage depth is often used as a key indicator of the water content within the concrete [7].

Methods for detecting the depth of water seepage in concrete structures mainly include the capacitance method [8, 9], piezoelectric stress wave method [10, 11], hyperspectral imaging method [12], and microwave method [13], among others. While these methods demonstrate significant potential for detecting water seepage depth in concrete structures, each comes with its own set of limitations. Salt conductivity can adversely impact the capacitance method based on permittivity measurements, leading to inaccuracies in the results [14]. The piezoceramic stress wave method involves the prior installation of sensors in the structure, which may increase the cost and reduce the structural load-bearing capacity. Hyperspectral imaging methods may damage the specimen and require the involvement of specialized personnel. The accuracy of the microwave method is compromised at times due to internal scattering and diffraction arising from the heterogeneity of concrete. Hence, there is a need to identify a straightforward and cost-effective method for detecting the water seepage depth in concrete structures.

The percussion method can be utilized to gather information about potential damage by striking the surface of a test object. It is gaining prominence in nondestructive testing (NDT) due to its ease of operation, cost-effectiveness, and ability to function without sensors requiring direct contact with the object [1518]. Researchers focusing on the percussion method were primarily concerned with the structural vibration response induced by percussion rather than the sound generated by the impact [19]. However, with the swift advancement of machine learning (ML) techniques, there is a growing interest among researchers in sounds induced by percussion. Chen et al. [20] extracted power spectrum density (PSD) energy from percussion sound signals as features and employed a support vector machine (SVM) algorithm to classify subsurface voids in CFST structures. Zheng et al. [21] converted percussion sound signals with different concrete water content into Mel-frequency cepstral coefficients (MFCCs) and classified them using the SVM algorithm. Cheng et al. [16] demonstrated the effectiveness of the percussion method for detecting pipeline deposits by employing the SVM algorithm to identify the MFCC extracted from percussion sound signals. He et al. [22] used the K-nearest neighbors (KNNs) algorithm to recognize PSD features extracted from percussion sound signals. This was applied to detect the looseness condition in underwater bolted connections. Thus, to establish the connection between structural damage (e.g., loose bolts) and percussion-induced sound signals, the mentioned percussion methods heavily depend on ML techniques. However, ML techniques require features to be manually crafted and then input into the classifier. This approach might overlook critical features in percussion-induced sound signals, resulting in suboptimal classification accuracy [23]. Furthermore, the antinoise capacity and adaptability of existing ML techniques remain unverified, thereby constraining their practical effectiveness in real-world scenarios [24].

Recently, deep learning (DL) techniques have advanced rapidly, emerging as a promising solution to tackle the aforementioned challenges. In contrast to traditional ML techniques, DL techniques can autonomously extract features from data without the need for manual feature extraction. As a typical DL algorithm, convolutional neural networks (CNNs) have gotten a lot of attention because of their superior performance. Chuang, Tsai, and Wang [25] utilized CNNs to classify Mel-frequency spectrograms obtained by transforming one-dimensional signals for water pipe leak detection. Yuan et al. [26] employed CNNs to implement bolt-loosening detection by classifying Mel spectrograms derived from one-dimensional signals. However, when traditional CNNs process one-dimensional data such as audio, it is often necessary to convert the data into two-dimensional images, which can lead to redundant computations. To address this issue, researchers have introduced one-dimensional CNNs (1D-CNNs). Abdeljaber et al. [27] realized damage detection and real-time damage localization based on vibration by fusing feature extraction and classification modules into a compact learning body using 1D-CNN. Eren [28] employed 1D-CNN for rapid identification of bearing faults. In addition, 1D-CNN has been widely studied in pipeline leakage and water deposition detection [29, 30], bolt loosening detection [31, 32], intelligent diagnosis of rotating machinery [33], etc. However, to the best of the author’s knowledge, no existing research has applied DL techniques in combination with the percussion method to detect the depth of water seepage in concrete structures.

This paper presents a framework for detecting water seepage depth through the application of a percussion-based DL technique. The suggested DL framework is termed as the one-dimensional convolutional bidirectional gated recurrent unit network (1D-WCBGRU) with a wide first kernel. This hybrid framework combines the 1D-CNN and bidirectional gated recurrent unit (BiGRU). Unlike traditional ML/DL techniques, this framework harmonizes the potent feature extraction capability of 1D-CNN with BiGRU’s ability to capture both long-temporal and short-temporal relationships in features. This combination enhances the accuracy and robustness of predicting concrete structure water seepage depth.

This paper’s main contributions are as follows: (1) This paper establishes a finite element model to simulate the water seepage depth in concrete structures and verifies the feasibility of detecting water seepage depth using the percussion method. Additionally, a DL model is proposed, primarily for classifying percussion sound signals to achieve the detection of water seepage depth. (2) To enhance the feature extraction capability of the 1D-WCBGRU model, we designed a wide kernel convolutional block to extract more representative features from the original signal. The model also integrates a BiGRU block to bolster predictive performance, thereby improving the model’s accuracy in classifying percussion sound signals. (3) The experimental results demonstrate that the 1D-WCBGRU model exhibits strong noise immunity in noisy environments and shows powerful adaptability across various application conditions, proving the model’s potential and effectiveness in practical applications.

The rest of the paper is organized as follows. Section 2 provides an explanation of the model’s theoretical background. A detailed description of the 1D-WCBGRU model is provided in Section 3. In Section 4, the experimental setup is described in detail. In Section 5, we compare the performance of 1D-WCBGRU with other methods. Section 6 summarizes the study.

2. Theoretical Background

2.1. 1D-CNNs

In recent years, CNNs which are the most classical DL networks have attracted a lot of attention due to their excellent feature extraction and powerful fitting ability [34]. While CNNs have demonstrated remarkable proficiency in computer vision tasks like image classification [35], they face challenges in directly classifying 1D data. Various investigations [36, 37] attempted to convert 1D data into 2D images using different techniques, enabling CNNs to classify 1D data. However, this approach often results in redundant computations. To effectively address the above drawbacks, researchers [38] proposed that feature extraction and classification can be performed directly on 1D data using 1D-CNN. The traditional 1D-CNN has five main components: convolutional layer, activation layer, pooling layer, fully connected (FC) layer, and Softmax function output layer.

The convolutional kernel can be learned in the convolutional layer to perform convolutional operations on the input data to obtain feature mapping. Due to the characteristics of weight sharing and local connectivity in convolutional layers, the network can efficiently capture local dependencies within the data. The principle of the convolution operation is as follows:
()
where the output of the layer L is yL, represents the layer L − 1 output of the i-th channel, cL−1 is the c-th the layer L − 1 channel, denotes the weight matrix of the L-layer convolutional kernel, is the bias, and ⊗ is the convolutional computation. Later, the activation function is used to nonlinearly transform the output features of the convolutional layer. The Rectified Linear Unit (ReLU) activation function is a common choice and is defined as follows:
()
where xL(i, j) is the j-th feature value in the i-th feature mapping of the layer L and aL(i, j) represents the corresponding activation value, ReLU as an activation function. The pooling layer reduces the dimensionality of the feature map, thereby reducing the number of parameters and computational complexity of subsequent layers while preserving important features. Given that maximum pooling surpasses average pooling in effectively managing one-dimensional series tasks [39], this paper employs maximum pooling. The formula is as follows:
()
where xL(i, t) is the value of the t-th neuron in the i-th channel of layer L. The pooling kernel size is denoted by S. The output value of the j-th neuron of the i-th channel in layer L is denoted by yL(i, j). The FC layer is an important part of CNNs, which flattens the extracted deep features by the convolutional and pooling layers to one-dimensional vectors and classifies the extracted results by the Softmax layer. The formula for the FC layer is as follows:
()
where xL(i) is the layer L output value; the weights of the i-th neuron in layer and the j-th neuron in layer L + 1 are denoted as ; represents the bias of all neurons of layer L to the j-th neuron of layer L + 1; C indicates the number of categories; zi denotes the value in the i-th inactivated neuron in the output layer; and the output layer i-th neuron’s probability output is denoted by p(x)i.

2.2. Gated Recurrent Unit (GRU)

Feedforward neural networks offer only a static mapping of inputs and outputs, limiting their applicability to static classification tasks. To address the requirements of temporal prediction tasks, like speech recognition and natural language processing, several studies have concentrated on the development of recurrent neural networks (RNNs). However, challenges like gradient explosion undermine the performance of RNNs in processing time series tasks. To tackle these challenges, long short-term memory (LSTM) was introduced, replacing the RNN’s hidden layer with a memory block capable of retaining past data. After that, Chung et al. [40] proposed a GRU based on LSTM. The GRU replaces the input gate, output gate, and forget gate in LSTM with reset gate and update gate. Consequently, the GRU demonstrates performance comparable to that of LSTM, while requiring fewer training parameters and offering a faster training process [41]. Figure 1 illustrates the architecture of the GRU. The mathematical expression is as follows:
()
where Rt and Zt stand for reset gate and update gate, Wr and Wz are the weights connected to the input vector, Ur and Uz are joined to the weights of the previously hidden state, Ht−1 and Ht are the output at time t − 1 and t, and ⊙ is the product.
Details are in the caption following the image
Gated recurrent unit (GRU) architecture.

2.3. Numerical Simulation

Zhou et al. [17] investigated the correlation between percussion-induced vibration and percussion sound by wavelet packet decomposition and sound reconstruction techniques. Therefore, the sound produced by percussing concrete varies in response to the vibrations induced by the impact.

Furthermore, numerous factors (e.g., density and properties of interfacial transition zones.) influence the elastic modulus of concrete, given that it is an inhomogeneous multiphase material [42]. Liu et al. [43] investigated the relationship between the elastic modulus of concrete and water content. The results showed that the modulus of elasticity was positively correlated with water content. The formula for the interaction between the two was presented as follows:
()
where E denotes the modulus of elasticity (MPa) and M represents the moisture content (%). In addition, Kolluru, Popovics, and Shah [44] experimentally demonstrated a good correlation between resonance frequency and elastic modulus.

Luo and Yang [45] conducted experiments demonstrating that changes in the natural frequency of a structure influence its vibrational characteristics, resulting in distinct percussion sounds. Thus, numerical simulations were conducted for six concrete specimens with varying water seepage depths using the commercial finite element software Abaqus in this study. An example finite element model of a concrete specimen with dimensions 100 mm × 100 mm × 400 mm is illustrated in Figure 2. In this paper, the impact of water seepage depth on resonance frequency was simulated by adjusting the modulus of elasticity of concrete at different heights (0 mm, 80 mm, 160 mm, 240 mm, 320 mm, and 400 mm). All parameters were held constant except for the varying of the modulus of elasticity.

Details are in the caption following the image
Concrete element model.

2.3.1. Concrete Constitutive Model

The concrete damage plasticity (CDP) constitutive model [46] was used to simulate the concrete in this paper. To characterize the yield function and plastic flow properties of concrete, the following parameters were set: dilation angle of 30°, eccentricity of 0.1, fb0/fc0 of 1.16, K of 0.6667, and viscosity parameter of 0.0005. The elastic modulus was calculated based on equation (10). Following the Chinese standard GB50010-2010 [47], equations (7)–(11) were used in this paper to represent the uniaxial compression principal model of concrete.
()
()
()
()
()
where σ is the compressive stress; dc represents the damage parameter under uniaxial compressive stresses; E0 denotes the elastic modulus between the peak point and the origin; ε and ε0 indicate the compressive strain and the compressive strain at the peak point, respectively; fc represents the axial compressive strength; ρC represents the ratio of compressive strength to Ec multiplied by ε0; n and αc represent the parameters of the ascending and descending phases of concrete under uniaxial compressive stress, respectively; and x represents the ratio of ε to ε0.

2.3.2. Boundary Conditions and Mesh Sizes

The model simulates a portion of a mass hydraulic concrete structure with column top displacements (Ux, Uy, and UZ) and column bottom displacements (Ux, Uy, and UZ) in all three directions constrained to 0. At the same time, the rotational degrees of freedom (URx, URy, and URz) of the column was 0. The concrete mesh size was set to 5 mm.

The concrete model was subjected to modal analysis to validate the effect of seepage depth on the natural frequency of concrete. Table 1 illustrates the first three orders of natural frequency for concrete specimens at different water seepage depths. The results indicate that the increase in the depth of water seepage leads to a gradual increase in the first three orders of the natural frequency of the concrete specimens. Thus, with other conditions held constant, the sound generated from percussion is influenced by the water seepage depth, confirming the feasibility of using percussion sound to identify the water seepage depth in concrete structures.

Table 1. Natural frequency of concrete specimens.
Water seepage depth (mm) First mode frequency (Hz) Second mode frequency (Hz) Third mode frequency (Hz)
0 1627.4 2492.1 3597.1
80 1646.1 2523.8 3633.8
160 1652.6 2538.2 3656.2
240 1662.6 2539.6 3671.8
320 1669.6 2552.5 3694.0
400 1688.8 2586.1 3732.8

3. Proposal Method

This paper introduces a novel deep neural network designed to address the classification of percussion sounds corresponding to different water seepage depths. Figure 3 illustrates the proposed 1D-WCBGRU architecture, which mainly consists of an input block, a wide kernel convolutional block, a BiGRU block, and an output block. The detailed parameters of the 1D-WCBGRU are shown in Table 2. The fundamental mechanism involves feeding percussion sound signals from various water seepage depths into the 1D-WCBGRU. First, a wide convolutional kernel (kernel size = 256) extracts local features from the audio signals. To reduce the computational complexity and dimensionality of the feature maps, the output of the convolutional layer is subjected to max pooling operation. Subsequently, a smaller convolutional kernel (kernel size = 2) is employed for a secondary convolution operation to further extract features from the audio signals. Concurrently, to enhance the model’s training stability and generalization capability, layer normalization (LN) is introduced, thereby improving the model’s adaptability to different input distributions. Following this, BiGRU is incorporated to more effectively capture contextual information within audio sequences. After passing through this layer, the model uses a global average pooling (GAP) layer to map audio sequences of varying lengths to a fixed-length vector. This operation enables the model to better handle audio inputs of different lengths, reduces the number of model parameters, and mitigates the risk of overfitting. Finally, the model employs an FC layer to map audio features to the output layer. The Softmax function is employed to provide for probabilistic output across six categories, achieving accurate classification of different types of damage. The model effectively captures multilevel information from audio signals, thereby enhancing the model’s classification performance and robustness. The theoretical background of each module is described in the following section.

Details are in the caption following the image
Flowchart of the 1D-WCBGRU.
Table 2. The detailed parameters for 1D-WCBGRU.
Layer Parameter settings
Conv1d_1 Filers = 32, kernel_size = 256, and stride = 1, activation = ReLU
Maxpool1d_1 Kernel_size = 2
Conv1d_2 Filers = 64, kernel_size = 2, and stride = 1, activation = ReLU
Maxpool1d_2 Kernel_size = 2
LayerNorm Normalized_shape = 64
GRU Input_size = 64 and hidden_size = 32, bidirectional = ture
Global average pooling 1D Out_size = 1
FC In_features = 64 and out_features = 6

3.1. Wide Kernel Convolutional Block

Researchers [48] discovered that the size of the convolutional kernel significantly impacts the extracted features when using the same quantity of input samples. The challenge arises because using a model with small kernels for one-dimensional audio signals results in a very deep network, which is difficult to train and thus not practical. In addition, the small kernel in the first layer is prone to interference from high-frequency noise. Therefore, the first convolutional layer utilizes a wide kernel to effectively capture the complexities of low-frequency signals for feature extraction. After that, employing small kernels aids in obtaining a good representation of the input signal, thereby enhancing network performance. Note that the depth of the network is further increased by applying the ReLU activation function after each convolutional operation.

3.2. Bidirectional GRU

While GRU performs well in handling sequence problems with long-term dependencies, many sequence modeling tasks benefit from accessing both future and past contexts. Researchers [49] introduced a reverse structure layer to the GRU, resulting in the design of BiGRU, as illustrated in Figure 4. In BiGRU, the outputs of the two networks are merged at each step, enabling the structure to offer complete contextual information. The computational procedure of BiGRU is as follows:
()
where GRUfw and GRUbw, respectively, represent the mapping relationship of forward GRUs and backward GRUs. In this study, the BiGRU block is connected after the wide kernel convolutional block to capture information that may be overlooked by GRU blocks.
Details are in the caption following the image
The architecture of BiGRU.

3.3. Output Block

The output block comprises a GAP layer as well as an FC layer with Softmax activation. The FC layer learns both linear and nonlinear relationships among the features. In the FC layer, each node is linked to all nodes in the preceding layer, enabling each node to receive outputs from those nodes, thereby maximizing information retention. However, the large number of parameters in the FC layer can lead to excessive consumption of computational resources and increase the risk of overfitting. To address this issue, the dropout regularization technique is typically introduced into the FC layer. Dropout regularization randomly deactivates nodes based on a specified probability, reducing the interneuron dependency and enhancing the network’s generalization ability. Nevertheless, it is challenging to determine the optimal parameters for dropout. For this reason, Zhang et al. [50] introduced GAP before the FC layer to reduce the feature dimensions. In the 1D-WCBGRU, the GAP layer is linked to the BiGRU block for computing the average of all features in each feature map, thereby reducing the parameter space and mitigating the risk of overfitting. Incorporating the GAP layer before the FC layer further contributes to maintaining the original architecture of the training model. It is defined as follows:
()
where yc is the output of channel c, H and W represent the height and width of the feature map, respectively, and xc,i,j denotes the element of the feature map at channel c, height i, and width j.

4. Experimental Setup

Three concrete prismatic specimens, each with dimensions of 100 mm × 100 mm × 400 mm were fabricated and tested in a laboratory setting to validate the proposed method. The concrete was designed to have a compressive strength of 30 MPa. The mix ratio for achieving this strength is presented in Table 3.

Table 3. Concrete mix proportions.
Component Water Cement Coarse aggregate Fine aggregate Superplasticizer
Quantity (kg/m3) 183 450 1192 600 5

Following a curing period of 28 days under standardized conditions, the concrete specimens were subsequently dried in an oven at 105°C for 48 h, ensuring that their mass reached a constant value. Once the concrete had cooled completely, the specimens were immersed in pure water at the specified depth for about 4 h. After that, the concrete specimens were taken out of the water and dried with a cloth until no water droplets remained on the surface, after which the percussion test was conducted. Given the short duration of the percussion experiments, it was assumed that water seepage depth inside the concrete remained constant. Figure 5(a) illustrates the experimental samples and equipment.

Details are in the caption following the image
Experimental apparatus.

A total of six different damage conditions were set up for the experiment, and each damage condition was assigned a label value from 0 to 5, as shown in Table 4. The apparatus used for the hammering experiment comprised an impact hammer and a smartphone. Throughout the experiment, the specimen was secured at both ends using the fixture to prevent disturbances caused by the concrete shaking during percussion. As depicted in Figure 5(b), the fixture was in contact with the concrete specimen and applied no force. Concrete was percussed at specified locations 100 times with uniform force to produce an audio signal. Simultaneously, the audio signal was captured at a sampling rate of 48 kHz by a smartphone situated 100 mm from the percussion point, as depicted in Figure 5(b). The experiment was performed in a quiet laboratory to minimize noise disturbance. Afterward, the audio signal was preprocessed by converting the dual-channel raw audio to a single channel with a sampling time set to 0.1 s (i.e., a sample length of 4800). Peak points are sample points in the waveform where the amplitude exceeds a predefined threshold of 15,000. These peak points serve as the starting point for each percussion signal. To ensure an accurate separation of each individual knock sample, a minimum gap was introduced between two consecutive peak points, allowing only one peak point to be selected within this gap. Finally, the individual percussion raw signals were normalized and used as inputs to the model. Figure 6 illustrates a portion of the percussion sound signals waveform captured at various water seepage depths.

Table 4. Details of different damage conditions.
Label Water depth (mm) Sampling number
0 0 100
1 80 100
2 160 100
3 240 100
4 320 100
5 400 100
Details are in the caption following the image
Waveforms under different conditions.

5. Results and Discussion

5.1. The Model Training

The 1800 audio data generated by percussion under six work conditions were categorized into a 7:3 ratio for training and test sets, respectively. Subsequently, the 1D-WCBGRU was trained and tested. Before training, the audio signal was normalized to the range [0, 1]. The cross-entropy loss was chosen as the loss function of the model, which is defined as follows:
()
where N is the sample size and C represents the number of categories. yi,j and denote the true and predicted label of the j-th category of the i-th sample, respectively. The model is trained for a total of 500 epochs using the Adam algorithm with a 0.0005 learning rate and a batch size of 32. The model was trained using PyTorch (Version 2.0.1) in Python (Version 3.11.4) on a computer equipped with an Intel core i5-13500HX and an Nvidia GeForce RTX4050.

The accuracy and loss curves stabilize after small fluctuations during 500 epochs of training, as shown in Figure 7. Throughout the training process, the close alignment of the training and test curves indicates that the model has strong generalization ability. The model achieves 100% training accuracy and 100% test accuracy after 500 epochs of training. This confirms the sensitivity of the 1D-WCBGRU to water seepage depth in concrete structures, showcasing its effective feature extraction and learning capabilities during training. Simultaneously, the 1D-WCBGRU has superior performance on unseen datasets.

Details are in the caption following the image
Accuracy and loss curve.

5.2. Methodology Comparison

To demonstrate the superior performance of the 1D-WCBGRU model, it was compared against two common ML-based methods, (1) PSD + DT [51] and (2) MFCC + SVM [16], along with three DL-technology–based methods: (3) MFCC + CNN [52], (4) 1D-WDCNN [48], and (5) 1D-ResNet [29]. In the ML-based methods, (1) PSD values were extracted from 10 frequency ranges (i.e., 0 Hz to 10 kHz) of the original data to train the decision tree (DT). (2) Spectrograms containing 12 MFCC coefficients were extracted and used as input to the SVM. For the DL-based model, (3) spectrograms of the 16 MFCC were extracted from the original audio. Subsequently, these spectrograms were processed using CNNs with convolutional kernel sizes of 9 × 9, 7 × 7, and 5 × 5. In method (4), a 1D-CNN with five convolutional layers was constructed. The first convolutional layer has a kernel size of 64 × 1, while the subsequent layers have a kernel size of 3 × 1. Method (5) used three residual layers (including two convolutional layers with a convolutional kernel size of 3 and a skip-joining group) to learn the residual information after extracting the features using convolutional layers. The training and testing experiments for all six models were conducted in the same computing environment on the same computer. To comprehensively compare the performance of each method, accuracy, precision, recall, and F1-score were calculated. These metrics are defined as follows:
()
where TP, FP, FN, and TN represent the number of true-positive, false-positive, false-negative, and true-negative outcomes, respectively. The classification performance of each model for the concrete structures water seepage depth task is presented in Table 5. Figure 8 illustrates the classification results of the different methods. The analysis of the results reveals that the 1D-WCBGRU achieves a classification accuracy, precision, recall, and F1-score of 100%. This indicates that the 1D-WCBGRU exhibits outstanding performance in the task of detecting water seepage depth in concrete structures when compared to other methods. Furthermore, it is evident from the results that the DL-based methods (MFCC + CNN, 1D-WDCNN, 1D-ResNet, and 1D-WCBGRU) outperform the ML-based methods. The results also demonstrate that manually extracting features leads to redundant computation, thus reducing accuracy.
Table 5. Performance comparison of different methods.
Method PSD + DT MFCC + SVM MFCC + CNN 1D-WDCNN 1D-ResNet 1D-WCBGRU
Accuracy 0.7926 0.9722 0.9815 0.9907 0.9926 1.0000
Precision 0.7967 0.9730 0.9815 0.9909 0.9928 1.0000
Recall 0.7926 0.9722 0.9815 0.9907 0.9926 1.0000
F1-score 0.7932 0.9730 0.9815 0.9906 0.9926 1.0000
Details are in the caption following the image
The confusion matrix of six methods.

The t-distributed stochastic neighbor embedding (t-SNE) technique is employed to generate a 2D visual representation of the feature mapping. This approach enhances the understanding of the features learned by the model and facilitates the analysis of the model’s representation in the data space. The 2D visualization results of the feature mapping for the input layer, convolutional layer, BiGRU layer, and FC layer of the 1D-WCBGRU are obtained using t-SNE, as illustrated in Figure 9. In this figure, each point represents a data sample, and the six different colored points correspond to the six categories of water seepage depths. The 2D visualization results from the input layer indicated a significant superposition of data points from different categories, making it challenging to differentiate the feature information of the original data. In the convolutional layer, while no clear clustering of points is observed for each category, their distribution becomes progressively more structured and organized. In contrast, features are more distinguishable after the BiGRU layer, with only a few instances of misclassification. This is evidence that the BiGRU layer enhances feature distinguishability. Ultimately, the features are perfectly classified after the FC layer. Overall, the coordinated interaction of the individual blocks significantly enhances the feature extraction capability of the 1D-WCBGRU model.

Details are in the caption following the image
The visualization results of t-SNE.

5.3. Adaptability Test

Despite the outstanding performance exhibited by the 1D-WCBGRU in the aforementioned experiments, certain challenges persist in practical applications. For this reason, two experiments were designed in this section to test the adaptability of the 1D-WCBGRU.

5.3.1. Effect of Percussion Position

Liu et al. [53] investigated the correlation between the location of percussion and detection effectiveness. The detection accuracy decreases as the percussion position moves farther away from the bolt. This could be attributed to the further distance between the percussion positions and the defect, which introduces greater physical separation and structural differences. As a result, more irrelevant information is captured, leading to a decrease in detection accuracy. In practice, it is difficult to determine the approximate range of water seepage depth to select the appropriate percussion position. Therefore, the experiment was designed with different percussion positions (120 mm and 280 mm) to assess the adaptability of the 1D-WCBGRU. Table 6 depicts the accuracy of each model in classifying the water seepage depth at different percussion positions. The results demonstrate the superiority of the proposed 1D-WCBGRU model compared to the other methods.

Table 6. The accuracy of different methods on different percussion positions.
Percussion position (mm) Methods
PSD + DT MFCC + SVM MFCC + CNN 1D-WDCNN 1D-ResNet 1D-WCBGRU
120 0.6944 0.9593 0.9889 0.9851 0.9833 0.9981
280 0.7360 0.9644 0.9850 0.9925 0.9831 1.0000

5.3.2. Cross-Dataset Performance Evaluation

As concrete is an inhomogeneous multiphase composite material, the audio features captured through the percussion method may vary even for concrete with identical mix ratios, due to differences in the manufacturing process. Consequently, additional investigations into the adaptability of the proposed model are warranted. For this purpose, the two datasets are merged into a training set, and the other dataset is used as a test set. This indicates that the training and test sets were derived from separate concrete samples, ensuring no overlap between the datasets. Table 7 demonstrates the accuracy of each method. Figure 10 depicts the classification results for the 1D-WCBGRU across different percussion positions in the cross-dataset. It is worth noting that when considering cross-dataset, the accuracy of all methods tends to decrease. However, the 1D-WCBGRU consistently outperforms the other methods.

Table 7. The accuracy of different methods on cross-datasets with different percussion positions.
Percussion position (mm) Methods
PSD + DT MFCC + SVM MFCC + CNN 1D-WDCNN 1D-ResNet 1D-WCBGRU
120 0.2883 0.3967 0.3967 0.6717 0.4800 0.9450
200 0.2450 0.4117 0.3817 0.6417 0.5450 0.9767
280 0.3650 0.4617 0.2217 0.5600 0.6783 0.9617
Details are in the caption following the image
The confusion matrix under different datasets: (a) 120 mm, (b) 200 mm, and (c) 280 mm.

In real-world scenarios, the depth of water seepage in concrete structures is constantly changing and may differ from the preset depths used in this experiment, potentially making detection more challenging. To address this concern, the confusion matrix depicted in Figure 10 was simplified in Figure 11 by focusing solely on the relative position of the water seepage depth to the percussion positions. In Figure 11, the label “0” represents instances where the water level height is below the percussion positions, while the label “1” indicates instances where the water level height is above the percussion positions. The accuracy of the 1D-WCBGRU at percussion positions of 120 mm, 200 mm, and 280 mm is 99.50%, 99.17%, and 99.67%, respectively. This demonstrates that the 1D-WCBGRU achieves high accuracy when the relative position of the percussion to the water seepage depth is the primary consideration.

Details are in the caption following the image
The confusion matrix for the relative position of the water level and the percussion point: (a) 120 mm, (b) 200 mm, and (c) 280 mm.
Details are in the caption following the image
The confusion matrix for the relative position of the water level and the percussion point: (a) 120 mm, (b) 200 mm, and (c) 280 mm.
Details are in the caption following the image
The confusion matrix for the relative position of the water level and the percussion point: (a) 120 mm, (b) 200 mm, and (c) 280 mm.

To further evaluate the performance of the 1D-WCBGRU on unseen datasets, additional experiments were conducted based on the water seepage depth and the relative positions of percussion. The data labels “0,” “2,” “3,” and “5” in the training set were used to generate a new training set, and the data labels “1” and “4” in the test set were used to generate a new test set. In the new training set, data labeled “0” and “2” were combined into label “0,” while data labeled “3” and “5” were consolidated into label “1.” In the new test set, “0” and “1” correspond to labels “1” and “4” in the original test set, respectively. It should be noted that only the data from the 200-mm percussion position have been selected for the purpose of balancing the data. The classification accuracy is shown in Table 8. Notably, the 1D-WCBGRU still has high classification accuracy cross-dataset for uncollected water seepage depth signals.

Table 8. The accuracy comparison of different methods.
Methods PSD + DT MFCC + SVM MFCC + CNN 1D-WDCNN 1D-ResNet 1D-WCBGRU
Accuracy 0.8000 0.5700 0.5300 0.6900 0.6350 0.9550

5.4. Model Performance in Noisy Environment

While all the raw audio data were collected in a quiet experimental environment in this study, the presence of noise in actual environments complicates obtaining a pure audio signal through the percussion method. Hence, it is essential to ensure the model’s noise immunity. To assess this, the robustness of the 1D-WCBGRU was tested in noisy environments by introducing Gaussian white noise with varying signal-to-noise ratios (SNR) into the cross-dataset data. The mathematical expression for the SNR is as follows:
()
where Psignal and Pnoise represent the power of the signal and noise, respectively. This paper considers six SNR classes ranging from 0 dB to 10 dB in 2 dB increments. Figure 12 illustrates the audio waveforms at various SNR levels. The introduction of Gaussian white noise almost masks the waveform characteristics of the original audio, which makes it challenging to visually distinguish the original audio waveform from the noisy audio waveform.
Details are in the caption following the image
The audio waveforms at different noise levels.

Figure 13 provides a visualization of the classification accuracy in a noisy environment for different percussion positions cross-dataset. The combined analysis shows that the accuracy of all methods decreases in noisy environments, underscoring the general adverse effect of noise on model performance. Three feature extraction methods (PSD + DT, MFCC + SVM, and MFCC + CNN) perform poorly in noisy environments. However, the three methods (1D-WDCNN, 1D-ResNet, and 1D-WCBGRU) that directly take 1D audio signals as inputs show better performance. This indicates that the direct use of 1D signals as input facilitates the capture and retention of key features in the audio data. Notably, the 1D-WCBGRU has better noise immunity.

Details are in the caption following the image
Test results of different knocking positions at different SNR: (a) 120 mm, (b) 200 mm, and (c) 280 mm.
Details are in the caption following the image
Test results of different knocking positions at different SNR: (a) 120 mm, (b) 200 mm, and (c) 280 mm.
Details are in the caption following the image
Test results of different knocking positions at different SNR: (a) 120 mm, (b) 200 mm, and (c) 280 mm.

In addition, it can be observed from the figure that the accuracy of some models instead increases after adding Gaussian white noise. For instance, 1D-ResNet’s classification accuracy increased by 3.83% after introducing Gaussian white noise with an SNR of 8 to the data collected at a 120-mm percussion position, compared to using noiseless data. This phenomenon may result from the model overfitting specific features in the training set that were absent in the test set. When noise was introduced into the data, these overfitted features were masked by the noise, making the model rely on more general and robust features. However, the 1D-WCBGRU model does not suffer from this problem, indicating that the model has been effective in avoiding the overfitting problem in the absence of noise.

5.5. Ablation Experiment

The 1D-WCBGRU model utilizes 1D-CNN as the model skeleton, the first layer of the convolutional kernel was designed as a wide kernel convolutional block, and a BiGRU block is applied. To evaluate the effectiveness of these enhancements for audio recognition, a series of ablation experiments were conducted. In these experiments, the improved components were sequentially removed, starting with the BiGRU block, until the model reverted to the base 1D-CNN architecture. Then recovery of the BiGRU block was started and finally restored to the 1D-WCBGRU model. Each modification was individually removed, followed by retraining and retesting the model using both the standard dataset and the cross-dataset data collected at a percussion depth of 200 mm to assess the effectiveness of each module, with results presented in Table 9.

Table 9. Results of ablation experiments.
Wide kernel BiGRU Standard dataset Cross dataset
0.9963 0.8150
0.9981 0.8633
1.0000 0.8833
1.0000 0.9767

The results show that both the wide convolutional kernel and the BiGRU block contribute remarkably to improving the accuracy of the model. While the results derived solely from the standard dataset may indicate that the incorporation of wide kernel convolutional block and BiGRU block into the model does not significantly improve accuracy, the inclusion of these two modules considerably augments the model’s feature extraction capabilities. This enhancement is particularly vital when addressing more complex datasets, as it empowers the model to generalize and adapt to diverse data distributions with greater efficacy. The significance of these two modules becomes particularly evident in the cross-dataset scenario. Since the training data and test data are derived from different specimens, the model must possess stronger generalization capabilities to ensure reliable performance across varying datasets. In this case, the wide convolutional kernel and the BiGRU blocks improve the accuracy by 6.83% and 4.83%, respectively, and the synergistic effect of the two improves the accuracy by 16.17%. The inclusion of the wide convolutional kernel and the BiGRU block significantly enhances the model’s generalization ability.

6. Conclusion

To address the limitations of existing techniques in detecting water seepage depth in concrete structures, this paper introduces a novel DL model called 1D-WCBGRU. This model integrates the percussion method with advanced DL techniques. The method utilizes a first convolutional layer with a wide kernel to extract the features from the 1D audio signal, thereby reducing the training cost. In addition, the model incorporates BiGRU blocks to better capture intrinsic feature connections, consequently improving feature separability. To evaluate the performance of the model, three concrete specimens with identical mix ratios were manufactured. Percussion sound signals were recorded under various water seepage depths to create the dataset. After training the 1D-WCBGRU, the results were compared with commonly used classification methods. The performance of the model was evaluated under various conditions. The conclusions reached are as follows:
  • 1.

    The experiments confirm the feasibility of using the 1D-WCBGRU for water seepage depth detection. The 1D-WCBGRU achieves a classification accuracy of 100%.

  • 2.

    Experiments cross-datasets and different percussion locations demonstrate the superiority of the 1D-WCBGRU in water seepage depth detection. In addition, the 1D-WCBGRU accurately predicts the relative position of the water level to the percussion point even when the seepage depth is not considered.

  • 3.

    The 1D-WCBGRU demonstrates robust noise immunity when tested with audio data containing added noise. In comparison with other methods, the model maintains a stable performance in noisy environments.

  • 4.

    The t-SNE technique and ablation experiments confirm that the introduction of the wide convolutional kernel and the BiGRU block provides the 1D-WCBGRU with favorable feature separability and generalization ability.

Overall, the 1D-WCBGRU shows promise as a reliable method for water seepage depth detection, given its superior performance demonstrated in the task of water seepage depth detection.

Conflicts of Interest

The authors declare no conflicts of interest.

Author Contributions

Wenjie Huang: conceptualization, methodology, writing–original draft preparation. Kai Zhou: formal analysis, writing–original draft preparation, funding acquisition. Jicheng Zhang: investigation, resources, writing–original draft preparation. Longguang Peng: investigation, resources, software, data curation. Guofeng Du: methodology, writing–review and editing, funding acquisition. Zezhong Zheng: conceptualization, methodology, writing–review and editing.

Funding

This research was financially supported by the National Natural Science Foundation of China (Grant No. 52078052 and 12302164).

Acknowledgments

The authors thank the National Natural Science Foundation of China (No. 52078052 and No. 12302164) for financial support.

    Data Availability Statement

    The data used to support the findings of this study are available from the corresponding author on request.

      The full text of this article hosted at iucr.org is unavailable due to technical difficulties.