Unbiased Normalized Ensemble Methodology for Zero-Shot Structural Damage Detection Using Manifold Learning and Reconstruction Error From Variational Autoencoder
Abstract
Zero-shot learning approaches have emerged as promising techniques for structural health monitoring (SHM) due to their ability to learn representations without labeled data. With the practical design of such models, the shift from traditional structure-dependent techniques to potentially large-scale implementations becomes feasible, effectively addressing the challenge of gathering labeled data. Autoencoders (AEs), a class of deep neural networks, align well with zero-shot SHM settings due to their architecture, loss function, and optimization process. In AEs, the reconstruction error is expected to increase for novel data patterns (i.e., potential damage data), while the encoded manifold in their bottleneck layers enables the discrimination of complex patterns. However, for practical SHM applications, rigorous evaluation of (variational) AEs and the robustness of reconstruction loss- or manifold-based designs in handling real-world scenarios remains necessary. Accordingly, this article employs two SHM benchmarks to evaluate the effectiveness of manifold learning compared to the reconstruction errors of (variational) AEs in a zero-shot setting. The comparison encompasses metrics such as reconstruction fidelity, preservation of structural characteristics, and the ability to generalize to unseen structural conditions. Furthermore, an unbiased normalization-based ensemble methodology is proposed, combining both approaches with the goal of enhancing damage detection performance and delivering more reliable results in zero-shot learning contexts. The proposed ensemble strategy, integrating both reconstruction error and manifold representations, adds robustness to the damage detection process, a crucial feature in the uncertain domain of zero-shot structural damage detection. The findings suggest that neither reconstruction loss nor manifold data consistently outperform the other; structural differences may render one approach more effective than the other in specific contexts, and based on these observations, a zero-shot damage severity index is suggested and tested on the benchmark data. Nevertheless, the proposed ensemble method demonstrates superior performance over individual models in estimating damage severity in an unsupervised setting. These results highlight the efficacy of variational AEs for zero-shot SHM, offering insights into their strengths and limitations and aiding users in selecting appropriate zero-shot damage detection strategies in the absence of labeled data.
1. Introduction
Structural health monitoring (SHM) is an umbrella term that encompasses various techniques and procedures, including structural damage detection (SDD), which provides a framework for ensuring the safety and longevity of critical infrastructure. Timely damage detection and prognosis are essential for tracking undesired changes in structural characteristics over time, assessing the overall condition of a structure, and determining whether it can continue to perform its intended functions. In this context, SHM is employed to prevent further deterioration by conducting routine structural assessments across civil, mechanical, and aerospace engineering [1–3].
SHM frameworks encompass both model-based and data-driven approaches. The model-based strategy relies on a “model” constructed from known structural design details, such as specifications for beams and columns (e.g., [4]), which can be coupled with techniques such as inverse problem methods to further calibrate (“tune”) these numerical models (e.g., finite element models). Through the parallel numerical model, various SHM objectives, including SDD, can be performed [5–7]. The data-driven approach involves monitoring the system’s behavior over time by analyzing the dynamic or vibration responses of the structure, particularly the acceleration data collected from a group of channels (sensors) [1].
The data-driven method is characterized by the identification and interpretation of underlying data patterns that can be linked to various structural states or conditions of interest. Still, the two methods may overlap, particularly with the advent of surrogate modeling and data-driven model updating, both of which can be achieved using deep learning (DL) techniques (e.g., [8]), or statistical methods (e.g., [9]). Statistical pattern recognition and machine learning (ML) remain classical and widely adopted means for implementing data-driven SHM frameworks [3, 10, 11].
Using the aforementioned model, the damage detection process generally involves two main components: feature extraction and feature classification [12, 13]. The inherent nonlinearity and noise present in SHM data, coupled with the limitations of traditional models in discerning patterns within such data, make the feature selection step particularly challenging. As a result, this step is often guided by expert judgment and trial-and-error procedures. This challenge is well-documented in the literature (e.g., [11, 14–17]), where feature extraction is frequently highlighted as a critical phase. Feature selection through wavelet (packet) transforms and the Hilbert–Huang transform [18–22], known for their ability to capture damage-sensitive characteristics, have become established methods for feature extraction. Similarly, time-series modeling [12] has been used to extract and examine changes in structure responses with damage-sensitive feature (DSF) extraction, even on environmental and operational variations (EOVs) [14]. These features are derived mainly directly or indirectly from the coefficient and residuals of the time-series model [14, 23–26]. In the realm of feature classification, artificial neural networks (ANNs) [27] and support vector machines (SVMs) [28] are commonly acknowledged as capable feature classifiers.
Still, feature selection can become a bottleneck in developing generalizable SHM solutions. SHM performance heavily relies on the quality of the extracted features, which domain experts need to carry out for feature selection, a case-dependent process. Moreover, as feature extraction and damage classification are distinct procedures, the features obtained may not be ideal for classification purposes. However, DL has emerged as a comprehensive modeling approach capable of addressing these limitations [29]. DL models operate through representation learning, in which both feature extraction and the target task (e.g., classification) are jointly optimized.
In SHM, supervised learning refers to cases where data on both healthy and damaged conditions are available [29, 30], while unsupervised learning is applied when labels are not used during training [31, 32]. Although supervised learning has demonstrated remarkable performance, it presents significant challenges in real-world applications, particularly due to the difficulty in collecting labeled data from actual structures [33]. In contrast to supervised learning, unsupervised learning algorithms do not depend on data labels but instead learn implicit patterns directly from the data [34]. These methods, such as representation learning, density estimation, and cluster analysis, enable the discovery of the underlying patterns and relationships in unlabeled data to support informed decisions without needing labeled data [35, 36]. However, the ability of unsupervised approaches to distinguish between different damage classes remains underdeveloped [36]. Additional concerns include the curse of dimensionality, which refers to the exponential increase in data complexity with rising feature dimensions [37] and the presence of noise and outliers in the data, which can adversely affect the accuracy of unsupervised learning, remain as significant concerns. Ultimately, even though unsupervised methods are needless of labels, both damage and no-damage data are needed unless the zero-shot approach is pursued.
These challenges underscore the need for robust and efficient zero-shot unsupervised learning algorithms capable of handling high-dimensional data, accounting for noise and outliers, and capturing complex patterns and relationships, even in the absence of historical structural data. In the context of damage detection, unsupervised learning methods have been applied to tasks such as damage identification and classification [38]. Deep unsupervised learning has further enhanced the performance of traditional unsupervised ML models [39–47], while also enabling the reconstruction of missing data in SHM systems [48]. Nevertheless, employing DL models to zero-shot damage diagnosis remains relatively limited and is still an emerging area of research [44].
Zero-shot SDD presents a significant challenge in SHM, as it involves identifying structural damage without prior knowledge of how such damage affects structural behavior [49–51]. In the absence of labeled data, zero-shot learning relies on innovative loss functions and heuristics to produce reliable outcomes for previously unseen data classes, in this case, damaged structural states. When considering a simple binary classification between “damaged” and “undamaged” conditions, where the model is never exposed to damaged data during training, autoencoders (AEs) emerge as a promising solution. Owing to their architecture, AEs are well-suited for anomaly detection through two principal strategies: utilizing reconstruction error (RE) and analyzing manifold representations (i.e., encoder outputs). Most existing studies emphasize the first strategy, RE, to detect damage [42, 52–55], or employ reconstruction-based data normalization [56]. However, the manifold data can also be leveraged in various semisupervised and unsupervised learning approaches, commonly referred to as manifold learning [57–59]. Manifold learning is an algorithm for reducing the dimensionality of data from high-dimensional to low-dimensional spaces. As with the anticipated higher reconstruction loss error for future damage scenarios, the different projections of the input data into the low-dimensional space can capture the differences in incoming unseen damage data from the structure’s baseline condition.
To summarize, various ML and DL models have been applied to SDD, covering a range of supervision levels in the literature. These studies highlight the strengths and limitations of different strategies and architectures. However, limited research has specifically focused on zero-shot SDD, which represents the least case-dependent approach and holds promise for enabling large-scale SHM applications. Furthermore, even in unsupervised methods, where damage data may be available but unlabeled, advancing from basic damage detection to quantifying damage severity remains a significant challenge.
- •
Customize and apply DL-based strategies, specifically variational AE’s (VAE) reconstruction loss and manifold learning, in the zero-shot SDD context.
- •
Propose an unbiased normalization-based ensemble approach to address the limitations of relying solely on RE or manifold data, thereby improving the reliability and accuracy of zero-shot learning in SHM for real-world scenarios with unlabeled data.
- •
Evaluate the effectiveness and sensitivity of both methods in distinguishing between damage classes, and propose a zero-shot damage severity score, which is successfully applied to benchmark data with incremental damage cases.
The research is structured as follows. Section 2 describes the proposed ensemble methodology and its approach. Section 3 outlines the dataset and preprocessing steps. Section 4 presents experimental results, highlighting performance metrics and comparisons. Finally, Section 5 offers concluding remarks, summarizing key findings and future directions.
2. Methodology
2.1. VAE
A VAE builds upon the traditional AE by introducing a probabilistic framework to the latent space. Instead of mapping inputs to fixed points, the encoder generates a distribution (typically Gaussian). This allows the VAE to capture uncertainty and improve its ability to create diverse and more realistic outputs compared to the deterministic nature of a standard AE [52]. Figure 1 shows the general overview of the VAE network.

The objective is to learn a latent representation z that captures essential features of the input data, effectively compressing it and enabling the reconstruction of the original input with minimal loss.
The VAE’s objective is to reconstruct the input data and ensure that the learned latent space follows a predefined distribution, enabling structured and continuous representations in the latent space.
2.2. Network Architecture
Figure 2 demonstrates a VAE architecture used for the comparison in this article. The encoder encodes the input data into a low-dimensional feature space, from which the input data are reconstructed. Accordingly, the neural network layer that better understands the input vibrational data and generalizes it to unseen data variations is needed. Previous research studies [44] have shown the potential of recurrent neural networks (RNNs), and more specifically, the long short–term memory (LSTM) units, to extract generalizable features from such input data. The VAE encoder part is thus designated with the channel-based LSTM architecture, and the decoder is followed by fully connected layers up to the input size. Figure 2, shows the VAE architecture used in this study.

2.3. Reconstruction and Manifold Learning Approaches in VAE
Based on the VAE outputs, two approaches exist for damage detection. In the first approach, the reconstruction data of each network are used, and in the second approach, the manifold data (i.e., the output of the encoder) can be used through manifold learning.
2.3.1. RE
2.3.2. Manifold Learning
SHM often faces significant challenges due to the complex, high-dimensional nature of the data, which complicate data modeling and the monitoring of internal structural changes. To address these complexities, dimensionality reduction offers an effective method for preprocessing high-dimensional datasets. Manifold learning re-represents high-dimensional data within a low-dimensional space, similar to the traditional dimension reduction techniques. In ML, this technique operates on the assumption that high-dimensional data typically lie near specific low-dimensional manifolds [57].
One novel aspect of this research is the application of VAE in manifold learning [54]. Manifold learning seeks to capture data’s intrinsic patterns and relationships by representing them in a lower-dimensional space, providing a more compact yet insightful understanding of complex structural behavior. This study explores how VAEs capture the underlying manifold of structural data within their latent spaces.
The VAE attempts to compress and map the key features necessary for data reconstruction into a low-dimensional space with its deterministic framework. This capability becomes particularly crucial in SHM, where accurately detecting structural damage depends on understanding subtle variations within the underlying manifold of the data. The probabilistic nature of the VAE allows it to better account for uncertainties, which is particularly beneficial for anomaly detection in real-world applications where damage scenarios may be complex and variable.
This lower-dimensional representation z, produced by the encoder, serves as the key feature set for downstream anomaly detection tasks. The VAE encoder’s output is utilized as a dimensionality reduction module, effectively compressing the input into a latent space that captures the essential structural characteristics.
To perform anomaly detection in a zero-shot learning context, where only the no-damage data are available for training, the OCSVM algorithm is employed. SVMs are known for two core capabilities that work in tandem: first, data are mapped into a high-dimensional space using a kernel function; second, an optimal hyperplane is determined to maximize the margin between data classes. The support vectors, data points closest to the decision boundary, define this margin, ensuring robust separation [55]. One-Class SVM (OCSVM), a variant designed for anomaly detection, learns a decision boundary around the majority class (typically labeled as “normal,” e.g., Label 0) while implicitly treating a small fraction of data as potential outliers (Label 1). The inherent ability of SVMs to create clear class boundaries and maintain margin separation makes them well-suited for SHM and SDD (SHM-SDD), where distinguishing anomalies is essential. The successful application of OCSVM in such contexts has been repeatedly demonstrated in the literature [60]. The OCSVM is trained on the manifold features extracted from the no-damage data (the current condition class) in the bottleneck layer of the VAE. The trained OCSVM then serves as an anomaly detector, flagging any deviations from the learned normal data as potential instances of structural damage [61]. The negative of the OCSVM score (S), which is known as the raw anomaly scores (log-likelihoods), where higher values correspond to greater anomaly for each sample under the fitted model, is performed for the manifold learning SDD base approach. This method effectively utilizes the OCSVM to detect unseen damage scenarios, relying entirely on the manifold representations learned from the undamaged data.
By combining manifold learning with OCSVM for anomaly detection, the proposed approach capitalizes on the strengths of the VAE’s ability to capture and represent the underlying manifold of structural data. This methodology enables robust damage detection in zero-shot learning, providing a promising framework for real-world SHM applications, where acquiring labeled damage data is often impractical or impossible.
2.4. Proposed Unbiased Normalized Ensemble Methodology
This research addresses the challenge of determining whether RE or manifold data from VAE networks are more effective for SDD in zero-shot learning scenarios. Both RE and manifold learning represent critical yet distinct features, but it is not always clear which one yields superior results under varying damage conditions. An unbiased normalized ensemble methodology is proposed to address this uncertainty. The proposed method is designed to facilitate both the RE and the manifold learning to contribute to the final decision. This is achieved by applying z-score normalization to both, standardizing them based on the mean and standard deviation of the no-damage training data. By normalizing the outputs, the method ensures that the contributions of each feature are unbiased, preventing either component from disproportionately influencing the ensemble simply due to differences in their original magnitude or scale.
The final output is a binary damage indicator based on the combined evidence from the RE and manifold learning. This approach is designed to incorporate the contributions of both methods into the decision-making process, aiming to improve the overall robustness and accuracy of the system in detecting structural anomalies.
Figure 3 depicts the flowchart of the proposed method. The presented flowchart contains feature extraction, training, damage detection methodologies, and a proposed unbiased normalized ensemble methodology. Each framework component is designed to systematically approach the problem of damage detection in structures, integration of DL techniques, particularly focusing on zero-shot learning and ensemble methods.

In the first stage, feature extraction from raw data is processed to extract meaningful features from vibration signals. These features are the normalized fast-Fourier transform (FFT), which is directly obtained from the time-series acceleration response of the structure. The training phase includes the VAE network that is responsible for learning manifold representations of the data. This phase uses an input dataset XZS from a source structure in a zero-shot learning context. The input data are passed through an encoder that generates latent variables μ and σ, encapsulating the underlying distribution of the source structure’s features. The decoder then reconstructs the input from these latent variables, allowing the VAE to learn the manifold data. In addition, an OCSVM model is trained on the manifold features of the structure to enable anomaly detection in the next phases.
The third component focuses on the damage detection procedure, which can be divided into two methodologies. The first, Methodology I, utilizes the VAE RE as the main feature. The input test data Xtest from the upcoming data stream with an unknown condition are processed through the trained VAE. The RE, the difference between the original input and the reconstructed error, is calculated. This RE is then used as a metric for detecting damage with significant deviations from normal RE patterns, indicating potential damage. The second approach, Methodology II, involves utilizing the latent manifold data generated from the encoder without directly focusing on the RE. Instead, the manifold data are fed into the pretrained OCSVM model in the training phase, which computes an anomaly score. This score is used to identify damage in the target structure, with higher scores suggesting more significant deviations from the normal state.
In the final section of the flowchart, a proposed unbiased normalized ensemble methodology is outlined, which integrates the outputs of both methodologies in a statistically unbiased manner. This approach aims to provide a more reliable damage detection framework by combining the strengths of RE and manifold learning. The ensemble methodology uses normalized values of the RE and manifold data , subtracting the mean and dividing by the standard deviation for both values to contribute to the damage detection process. The normalized values are then summed to produce a final score highlighting the most likely damaged regions, providing a more robust and balanced detection method. By combining these techniques, the framework seeks to enhance the accuracy and reliability of damage detection in zero-shot SHM.
3. Dataset
In this study, two benchmark datasets with various damaged and undamaged cases were used. Datasets include (i) Qatar University Grandstand Simulator (QUGS) [62], a scaled stadium seating platform structure excited in a laboratory setting, and (ii) Yellow Frame [63], a four-story steel frame under ambient excitation. Independent research groups made the datasets publicly available. In all datasets, damaged cases are labeled as DCXX, where XX represents different damaged conditions, each identified by a unique label number.
3.1. The Yellow Frame
A four-story steel frame structure, built at one-third scale and incorporating modular components such as masses, braces, and columns, serves as a versatile benchmark for SHM and control studies (see Figure 4). This study focuses on the recent “brace removal” scenario involving 21 distinct data cases. Mendler et al. [63] elaborate on the sensor setup and labeling details, while a comprehensive description of the 21 data cases is provided by Bernagozzi et al. [65] and summarized in Table 1. Data acquisition was carried out at a sampling rate of 1000 Hz.

Damage case | Removed brace ID |
---|---|
No-damage | None |
DC1 | 2, 4 (II) |
DC2 | DC1 + (18, 20) (II) |
DC3 | DC2 + (1, 3, 17, 19) (II) |
DC4 | DC1 + (17, 19) (II) |
DC5 | DC1 + (18, 20) (I) |
DC6 | 2 (II) |
DC7 | (2, 4) (I) |
DC8 | (25, 27) (I) |
DC9 | (29, 31, 8, 6) (I) |
DC10 | (21, 23, 29, 31) (I) |
DC11 | DC10 + (17, 19, 25, 27) (I) |
DC12 | DC7 + (1, 3, 17, 18) (I) |
DC13 | (10, 12) (II) |
DC14 | DC13 + 21 (II), 23 (I) |
DC15 | (21, 23) (II) |
DC16 | (7-8, 21, 22) (I) |
DC17 | (5, 6, 7, 8, 21, 24) (I) |
DC18 | DC17 + (7, 8, 21, 22) (I) |
DC19 | DC18 + (5, 6, 23, 24) (I) |
DC20 | (6, 8) (II), (21, 22, 23, 24) (I) |
3.2. The QUGS
The QUGS structure is an inclined steel frame (Figure 5(a)) designed to represent a stadium spectator seating format in a structure. Avci et al. [66] elaborates on structural specifications and instrumentation of this experiment. With 30 girder connections [67, 68], QUGS damage cases are defined by bolt loosening (30 damage data cases from DC1 to DC31). Thirty accelerometers collected data at 30 joint locations with a sampling rate of 1000 Hz (Figure 5(b)).


3.3. Experimental Setup
This study analyzes the data from multiple benchmarks, where each dataset consists of time-series signals collected from various sensors. The first step involves windowing the continuous time-series data by dividing it into smaller segments of length W. Each window serves as an independent data instance for further analysis. To extract relevant features, the FFT is applied to each windowed segment channelwise, converting the data from the time domain to the frequency domain. This transformation offers a refined feature space suited to damage detection [46] without losing data dimensions.
Following the application of the FFT on data windows and the computation of the magnitude for each window, the resulting values are normalized by dividing each by its mean. This normalization is performed for all data channels across the different benchmarks. As a result, each channel’s FFT feature has dimensions of W/2, and the FFT features of all channels are concatenated synchronously into a matrix, with dimensions N × W/2, where N represents the number of channels. For example, choosing W = 1000, the QUGS dataset results in 262 data instances. There are 668 instances for the no-damage class for the Yellow Frame dataset, while the number of data instances in the damage classes ranges between 559 and 1173. Therefore, the input data size (half-spectrum FFT features) is represented as (i.e., N = 15) for the Yellow Frame dataset, while for QUGS, it is a double-struck cap R to the 15,000 (i.e., N = 30).
In this experiment, the training procedure for the neural network models is as follows. In accordance with the zero-shot setting, and to simulate online data acquisition from benchmark data, the first 50% of the no-damage class from each benchmark is treated as incoming online data. This subset is further divided into training and validation sets in a 4:1 ratio, using the feature extraction scheme described earlier. The neural networks are trained on the training set using the ADAM optimizer [69], with a recommended learning rate of 1e − 4 and decay rates of 0.9 and 0.999 for the first and second moments, respectively. The model with the lowest validation error is selected as the final model. RE is used as one metric, while the manifold representations obtained from the trained model are used to train an OCSVM on the same training and validation data. The negative of the OCSVM anomaly score (S) for each sample serves as the output for SDD. Subsequently, REs and manifold representations are computed for each sample in the remaining 50% of the no-damage data, as well as for all damage data.
- 1.
True positive (TP), damage data, where the negative of the OCSVM score S exceeds the threshold τ.
- 2.
False positive (FP), no-damage data, where S exceeds τ.
- 3.
True negative (TN), no-damage data, where S is below τ.
- 4.
False negative (FN), damage data, where S is below τ.
These metrics are used to evaluate the performance of each network architecture and approach. Given the zero-shot learning nature of the proposed methodology, each dataset is divided into binary damage/no-damage classes, and accuracy measures are reported for each damage class individually. For example, in the QUGS dataset, we evaluate the 50% unobserved no-damage class data combined with each of the 30 damage cases, simulating a zero-shot damage detection scenario. For the manifold learning approach, the same 50% of no-damage data is used to obtain encoder outputs, which are then used to train an OCSVM with the extracted features.
In addition, the receiver operating characteristic (ROC) curve is utilized to assess the overall performance of the models. The ROC curve plots the TP rate (TPR) against the FP rate (FPR) across different threshold values τ. The area under the ROC curve (AUC) is calculated to quantify the model’s ability to distinguish between damaged and undamaged data. A higher AUC indicates better discrimination capability, whereas a model with an AUC close to 1 suggests high accuracy in distinguishing between damage and no-damage classes.
4. Experimental Results
This section applies the proposed methodology, which uses RE and manifold data for SDD, to the two benchmark datasets, and then the performance of each approach is compared individually. The performance metrics of both approaches are presented and discussed for the potential of using RE, as well as the manifold data of the VAE networks for SDD problems. Finally, a detailed computational complexity analysis will test whether the method complies with real-world applications. Subsequently, the performance metrics of the proposed unbiased ensemble approach are presented and discussed, highlighting how the integration of both reconstruction error and manifold data from the VAE networks enhances the accuracy of SDD.
4.1. QUGS
According to Section 3.2., the QUGS dataset contains one no-damage condition and 30 different damage conditions. Half of the available no-damage data (131) is used for the training phase of the VAE network. The network is trained in a zero-shot learning manner without previous information about the structure’s condition. Figures 6(a) and 6(b) show the result of the VAE REs and the negative OCSVM anomaly score of the manifold data for damage detection, in contrast to the proposed ensemble method.



According to Figure 6 the VAE results for both the RE and manifold learning–based approaches are shown relative to the damage detection threshold. The damage detection scores indicate that both methods leverage the VAE network’s discriminative capability to identify damages, with Conditions 2, 29, and 30 proving to be more challenging for detection.
The manifold learning through OCSVM confirms that zero-shot manifold learning with VAEs can be adopted for SDD. The result of manifold learning contains more false alarms than the RE; however, it still confirms the capability of the intermediate network data to detect damage. This observation suggests that relying on the outputs of intermediate network layers may not always be the most effective approach. Therefore, proposing a reliable method to ensemble these two responses is essential for improving the performance of the VAE network in zero-shot damage detection. As shown in the results of Figure 6(c), it is evident that the proposed ensemble algorithm has significantly fewer false alarms compared to both classic decision-making damage detection techniques. It demonstrates superior power and performance relative to the proposed ensemble method for both algorithms, which leads to a lower damage index value output and makes better discrimination to identify the damage in Conditions 2, 29, and 30.
Figure 7 represents the F1-score, precision, and recall metrics obtained for better comparison according to (14)–(16). These metrics can be obtained through binary classification to assess their performance. All the above statements of the VAE network’s performance through successes and limitations of the proposed ensemble method can be figured out.

The results in Figure 7 reaffirm the observations made in Figure 6, while the RE of the VAE network demonstrates more robust performance compared to the manifold data, and the proposed ensemble method achieves the best performance. The F1-score indicates that the ensemble approach effectively outperforms the individual methods, owing to its unique and unbiased strategy. By leveraging both the RE and manifold data, the ensemble method enhances robustness through the integration of multiple perspectives, resulting in a more reliable assessment of damage detection.
In Figure 8, the class-by-class ROC curves of the QUGS structure for each approach are presented, providing a visual representation of the detection capability across different damage classes. The mean AUC of the ensemble method is higher than both the RE and the manifold data. This demonstrates that the ensemble method offers a more comprehensive and accurate detection strategy, as AUC is a key metric for assessing the performance of classification models. The higher AUC values confirm the greater sensitivity and specificity of the proposed ensemble method.



Together, these performance metrics, F1-score and AUC, underscore the effectiveness of the proposed ensemble strategy. By combining the strengths of RE and manifold data, the ensemble approach offers a balanced, reliable, and improved damage detection framework, further validating its applicability in this context.
4.2. Yellow Frame Results
This experimental evaluation of the proposed approach and VAE-based reconstruction error and manifold learning approaches on a real-scale structure under environmental and operational effects is conducted. To achieve this, the network is trained using 40% of no-damage data, or 267 data instances. This experiment is perfect for evaluating proposed approaches under imbalanced data conditions due to a variety of 559–1173 data samples for each damage case.
Figure 9, depicts the results of the VAE network for damage detection. In contrast to the previous result, the manifold learning approach demonstrates superior performance over the reconstruction-based method, with both showing a clear separation between most damage classes and the no-damage data. This contrasts with the results from the QUGS model, where the reconstruction approach outperformed the manifold learning. Such a discrepancy reinforces the research objective of highlighting performance variations and the necessity of developing more robust designs for real-world applications of VAE models in zero-shot SHM scenarios. This difference becomes particularly evident in the case of DC8, where the reconstruction loss fails to distinguish its instances, whereas the manifold learning approach partially captures the data, potentially indicating actual damage in practical applications.



Furthermore, in this experiment, certain damage conditions involve incremental damage classes, where damages are added sequentially on top of each other. In such cases, the model can also demonstrate its capability to measure damage severity. For example, the DC2–DC5 damage conditions are derived from DC1, where the DC3 has the highest damage severity in this group regarding the number of removed braces. In addition, the damage severity of DC12 is greater than that of DC7, and DC11 is greater than that of DC10. Furthermore, the damage severity in DC17, DC18, and DC19 increased.
Figure 10 depicts the damage severity diagnosis for the Yellow Frame structure. The Euclidean distance of each data instance from the whole no-damage data was calculated to achieve this. Then, for each damage condition, the average of this distance was obtained as a final damage severity index. According to the results presented in Figure 10, all the statements about the severity of different damage conditions for this structure are fully confirmed and captured following progressive damage cases in Table 1. It should be mentioned that the damage severity results are normalized between 0 and 1.

The F1-score, precision, and recall metrics evaluation results for different approaches are presented in Figure 11, and generally, the VAE has better results for both the SDD approaches. However, the manifold data of the VAE network offer better performance at some points than the other approaches.

In addition to detection performance, the computational efficiency of the proposed framework plays a critical role in real-world SHM applications. To evaluate its practicality, we measured the execution time of key components on a mid-range NVIDIA RTX 3050 Ti GPU. As summarized in Table 2, training the VAE using 267 (40% of training data) samples with 7500 features and a batch size of 200 required approximately 2.06 min. FFT preprocessing of time-series data into the frequency domain took 0.11 min for 100 samples. The OCSVM, trained on the compact latent manifold features extracted from the VAE, completed in under one second. Notably, the entire inference pipeline, which includes FFT transformation, loading pretrained VAE weights, encoding test data, computing OCSVM scores, and ensemble calculation, was executed in under 0.225 min for a batch of 256 test samples. These results confirm the method’s suitability for periodic or near-real-time SHM, with minimal computational overhead and no reliance on labeled damage data, thus supporting efficient and scalable deployment in real-world environments.
Component | Time (RTX 3050 Ti) | Notes |
---|---|---|
FFT preprocessing | 0.11 min | Per batch of 100 samples (time-series to frequency domain) |
VAE training | 2.06 min | 200 batch size; 7500 features; LSTM encoder |
OCSVM training | < 1 s | On low-dimensional latent (manifold) features |
Inference | < 0.225 min | FFT + load pretrained VAE + OCSVM + ensemble (256 s time data samples) |
4.3. Comparative Study
In the competitive study, the AE network architecture was used to evaluate the performance of various approaches, with the F1-score as the key metric across the QUGS and the Yellow Frame benchmark experiments.
As shown in Table 3, the proposed ensemble method delivered the highest performance across both datasets, achieving an F1-score of 0.983 for the QUGS and 0.955 for the Yellow Frame. While effective, the VAE-based approaches did not outperform the ensemble method. Specifically, the VAE RE achieved F1-scores of 0.972 for the QUGS and 0.880 for the Yellow Frame, while VAE manifold learning scored 0.935 for the QUGS and 0.934 for the Yellow Frame. The AE RE approach had lower performance, scoring 0.803 for the QUGS and 0.844 for the Yellow Frame. A close assessment of the outcomes can shed light on the reason behind this observation. First, by examining both QUGS (see Figure 6) and Yellow Frame (see Figure 9) results, it becomes evident that the reconstruction loss offers better discerning capacity between different damage classes, that is, their damage detection scores become more separable, while the manifold learning–based strategy is less effective in this regard. However, the improved ability to distinguish between different types of damage comes with a trade-off: an increased risk of FNs and sensitivity to threshold tuning. The more sensitive the model becomes to changes, the more likely it is for no-damage data to show sudden increases in detection scores, thereby raising the damage detection threshold and potentially leaving many real damages undetected. The proposed ensemble strategy, however, seeks to leverage the high sensitivity of the reconstruction-based approach while benefiting from the more robust damage estimation offered by the manifold-based strategy. The proposed ensemble demonstrates the significant impact of the proposed ensemble method across both datasets, confirming earlier observations that while the VAE network provides robust performance in both reconstruction and manifold data, it remains inconsistent compared to the AE network. In particular, VAE RE performed better for QUGS, while VAE manifold learning successfully detected damage conditions (DC8) in the Yellow Frame structure. Although the VAE network performs well for these benchmarks, the proposed ensemble method, by combining both strategies, enhances the overall robustness and reliability of the system. Ensemble techniques do not guarantee a superior performance compared to each of their input models, yet herein, the unbiased normalized ensemble methodology yielded superior performance in both datasets and on all, except one, damage cases, demonstrating its prowess and effectiveness in the studied zero-shot SDD benchmark problems.
Approach | Dataset | |
---|---|---|
QUGS | Yellow frame | |
VAE reconstruction error | 0.972 | 0.880 |
VAE manifold learning | 0.935 | 0.934 |
AE reconstruction error | 0.803 | 0.844 |
Proposed ensemble method | 0.983 | 0.955 |
- Note: The bold formatting in the results table is intended to highlight the performance of the proposed method.
5. Conclusions
This study proposes the unbiased normalized ensemble methodology, a robust zero-shot learning framework for structural damage detection that integrates reconstruction error and latent manifold features extracted from VAEs. In addition, this research conducted a comprehensive comparison between RE and manifold learning by employing AE and VAE models in the context of zero-shot SDD. The results revealed that the features extracted from the VAE, namely, RE and manifold representation, exhibited distinct capabilities in capturing and representing structural anomalies, even without labeled training data, outperforming traditional AE models. While the AE’s RE directly reflects the model’s ability to reproduce the structural characteristics of both undamaged and damaged data, the VAE’s probabilistic framework is shown to generate more diverse and structured reconstructions, highlighting its potential for enhanced anomaly detection in previously unseen scenarios.
The exploration into manifold learning highlighted its role in revealing the intrinsic representations of structural data within the latent space. By focusing on capturing the underlying data distribution, the VAE demonstrated a superior ability to preserve the manifold, which allowed for more meaningful interpolations between latent variables than the AE. This suggests that manifold learning offers a detailed understanding of structural variations, which is crucial for accurate damage detection in real-world applications. Moreover, by investigating both the manifold features and reconstruction loss for distinguishing damage classes in a zero-shot setting, a zero-shot damage severity measure is also proposed and demonstrated to be effective in the studied benchmarks.
Finally, the study introduced an unbiased normalized ensemble methodology that combined the outputs of both RE and manifold learning. This approach enabled more robust decision-making in zero-shot learning scenarios, where uncertainty is heightened due to the absence of labeled damaged data. By leveraging the strengths of both features, the ensemble method enhanced the reliability and accuracy of SDD.
In conclusion, this research offers valuable insights into selecting unsupervised learning models for zero-shot SDD. The observed trade-offs between RE and manifold representation underscore the importance of aligning model capabilities with the specific requirements of SHM applications. Future research could further optimize these models, incorporate domain-specific knowledge to improve detection performance, and extend the evaluation to a broader range of structural datasets, ensuring the models’ generalization and robustness across diverse SHM scenarios.
Conflicts of Interest
The authors declare no conflicts of interest.
Funding
No funding was received for this study.
Acknowledgments
The authors appreciate the Yellow Frame data provided by Dr. Carlos Ventura and Dr. Alexander Mendler.
Open Research
Data Availability Statement
The data that support the findings of this study are available from the corresponding author upon reasonable request.