Forests uniquely deliver different vital resources, particularly oxygen and carbon dioxide purification. Wildfire is the leading cause of deforestation, where massive forest areas are annually lost due to the failure to identify and predict forest fires. Accordingly, early detection of wildfires is crucial to inform operational and firefighting teams to prevent fires from advancing. This study analyzes images taken by unmanned aerial vehicles for wildfire detection. For this purpose, the two-dimensional discrete wavelet transform was first performed on the images. Next, due to its superior ability, a convolutional neural network was utilized to extract deep features from wavelet transform sub-bands. Then, the features obtained from each sub-band were merged to create the final feature vector. Afterward, multidimensional scaling was employed to reduce the extracted non-useful features. Ultimately, the presence or absence of wildfire locations in the images was detected using proper classifiers. The proposed method reaches an accuracy and F₁ score of 0.9684 and 0.9672, respectively, from the images of the FLAME dataset, indicating its efficiency in detecting the presence of wildfire locations. Thus, this method can significantly contribute to the on-time and prompt firefighting operations and prevent extensive damage to forests.

1. Introduction

1.1. Motivations

Forests play unique roles in human life and deliver various vital resources. They purify the air by drawing in CO₂ and breathing out O₂, so forests are called the “lungs of the planet”. Likewise, they are a habitat for most animals and clean drinking water by filtering out many pollutants from the water [1, 2]. Regrettably, massive forest areas have been burned and destroyed in the past years due to human activities regarded as the foremost cause of wildfires. Accordingly, elevated awareness can prevent most damages and forest losses from happening. Furthermore, early wildfire detection is crucial in diminishing associated risks and casualties and assists firefighters in fire smothering immediately in the early phases. The time spanning from wildfire detection to alerting the relevant authorities is critical and can minimize the risks and consequences. Thus, to rapidly control the fire, early detection of wildfires is vital [3].

Several techniques have been recently proposed to detect wildfires early and allocate appropriate resources to extinguish the fire. These techniques are often based on aerial and ground-based technologies, such as watchtowers with multiple sensors and satellites. However, these technologies have some constraints that lessen overall wildfire detection performance. For example, watchtowers have a limited field of view and impose high construction costs. Likewise, although satellites deliver a pervasive field of view, they come with huge costs and suffer from flexibility issues and low spatial/temporal image resolution, all of which prevent detecting the fire spot at the right time [4, 5].

1.2. Related Works

Computer vision allows machines (e.g., unmanned aerial vehicles [UAVs]) to visually perceive their surrounding environment and respond based on their predefined mission. Computer vision technologies intended for wildfire detection are classified into two classes. The first class covers machine learning (ML)-based conventional methods that work using handcrafted feature extraction techniques and identifying changes in the color [6, 7]. Choosing these features is time-consuming and requires qualified experts to pick features suitable for developing efficient algorithms. These techniques are inefficient when solving complicated problems, such as wildfire detection from images with a cluttered background in dense forests. The techniques in the second class employ deep learning (DL) algorithms to extract relevant and robust features automatically. DL algorithms allow machines to perform difficult and multifaceted tasks such as time-series analysis [8, 9], vehicle [10] and face recognition [11], self-driving cars [12], and the diagnosis of herbal diseases [13].

Recently, UAVs have been broadly utilized in several applications related to forests, like forest discovery, search and rescue operations, forest resource surveys, and wildfire smothering. Furthermore, recent technological breakthroughs have enabled drones to process visual data automatically. Fire and smoke detection in remote and hard-access areas and forests using DL-based techniques has been a subject of interest in recent years. Flame and smoke can be visual features for early and precise wildfire detection. Some studies have explored fire detection using flames [14, 15]. Likewise, some studies have inspected early fire detection using smoke as a proper signal for wildfire [16, 17]. Flames can be concealed in the early phases of wildfire, particularly in the forest [3]. For these restraints to be overcome, recent research has targeted concomitant flame and smoke detection.

Image classification-based methods classify input images (i.e., whether or not images contain fire patterns). Handcrafted pixel-based features were extracted in [18] and classified by support vector machine (SVM). Recently, several studies have worked with a convolutional neural network (CNN) to classify drone-captured wildfire images. For example, authors in [19] propose using AlexNet, a basic CNN architecture, to detect wildfire. The five CNN architectures, including AlexNet, GoogleNet, VGG, and modified versions of the last CNNs, were considered in [19] to classify UAV images into fire and non-fire events. Authors in [20] proposed a CNN-based method for early wildfire detection using a hexacopter with an optical camera. Preprocessing techniques such as histogram equalization and nonlinear filters are applied to enhance data quality and diminish noises. Similarly [16], proposes a method to classify and determine the precise location of wildfires. Other methods based on CNN were proposed in [21, 22]. Transfer learning with InceptionV3, DenseNet121, ResNet50V2, VGG19, and NASNetMobile CNNs was considered in [23] for forest wildfire detection. Reduce-VGGnet was employed in [24] for image classification with the aim of wildfire detection.

On the other side, object detection algorithms draw a bounding box around the object’s extent [25]. Here, objects can be flames or smoke. Several object detection algorithms have been reported to yield acceptable performances in recent years. The region candidates (which may contain fire events) are first generated using a selective search method in region-based algorithms. Next, these region candidates are classified depending on the occurrence or non-occurrence of the desired object. The region-based CNN (R-CNN) family is a well-established and efficient algorithm among region-based algorithms. Contrarily, single-step detectors (STDs) overlook the region candidate generation step by processing the input image in a single step, thus offering faster detection while retaining high accuracy. Yolo and RetinaNet are among the most efficient STDs [26]. A method based on YOLOv5 was proposed in [27] for domain-free fire detection. In [28], YOLOv5 and YOLOv8 were used to identify forest fires. The combination of physical and DL schemes was utilized in [29].

DL-based computer vision algorithms are not merely used for image classification and object recognition. They further can be employed for semantic segmentation. These algorithms are among the most effective DL techniques for wildfire detection. They mainly classify each pixel in the image according to the object class it belongs to (i.e., flame, smoke, forest, etc.). Nonetheless, semantic segmentation algorithms are highly multifaceted, entail higher computing performance, and take a more prolonged course to annotate training images. In recent years, several semantic segmentation techniques have been proposed to detect wildfires from digital images and videos captured by high-accurate UAV platforms. Examples of these techniques are DeepLab [30], U-Net [31], SegNet [32], and CTNet [33].

1.3. Contributions

Taking note of the above topics and regarding the necessity for early detection of wildfires to launch firefighting operations and avoid massive casualties, this study proposes an effective method to detect wildfires from images taken by UAVs from forests. The proposed method is based on time–frequency analysis and extracting deep features from its content. First, images are subjected to a one-level two-dimensional discrete wavelet transform (2D DWT). Wildfires exhibit distinct patterns at different scales. Previous works did not consider these patterns, and 2D DWT can extract them. Also, 2D DWT highlights specific regions in the image where changes occur. Next, the obtained sub-bands pass through CNNs to extract deep features. Since CNNs use local receptive fields to capture spatial patterns, they can be used for local analysis of 2D DWT by identifying relevant features. Then, the deep features of different sub-bands are merged and subjected to feature reduction. Ultimately, the right decision is made using suitable classifiers.

The contributions of this paper are as follows:

•
Proposing a new method based on deep time–frequency features from images captured by UAV for wildfire detection in forests.
•
Extracting time–frequency features by 2D DWT and deep features by CNN.
•
Reducing the number of deep features by multidimensional scaling (MDS).
•
Performing extensive simulations to indicate the performance of the proposed method.

The rest of this paper is organized as follows. Section 2 describes the dataset used in this paper. The proposed method is explained in detail in Section 3. Section 4 contains the simulation results, and section 5 concludes the paper.

2. Dataset and Preliminaries

2.1. Dataset

As mentioned, this paper aims to present the wildfire detection algorithm for forests due to their unique roles in human life. The use of UAVs for wildfire detection in forests has been attracted due to the challenges and issues of traditional methods. The FLAME (Fire Luminosity Airborne-based Machine learning Evaluation) dataset [34] is an open-access dataset covering wildfire images. The videos were recorded by DJI Phantom 3 Professional and DJI Matrice 200 drones with Zenmuse X4S, FLIR Vue Pro thermal, and DJI Phantom 3 cameras. The first and second videos are similarly raw (16 min long) and recorded by a Zenmuse X4S camera with frames per second (FPS) of 29. The second video presents the behavior of one pile from the start of burning. The resolution of the first and second videos is 1280 × 720. The third, fourth, and fifth videos are WhiteHot, GreentHot, and Heatmap. They are 89-s, 5-min, and 25-min long, respectively, and a FLIR camera has recorded all with an FPS of 30 and a resolution of 640 × 512. The sixth video is a 17-min HD RGB video recorded by a DJI Phantom 3 camera with an FPS of 30—the resolution of this video is 3840 × 2160. The seventh and eighth repositories contain 39,375 and 8617 frames with a resolution of 254 × 254. For image classification, the ninth and tenth repositories have more than 2000 fire and ground truth mask frames with high resolution for the fire segmentation problem. The primary purpose of this dataset is to classify images into “wildfire” or “non-wildfire” and perform fire image segmentation. Furthermore, they can detect wildfires from RGB images and thermal drones. Table 1 summarizes the details of the parts of the database used in this study. Likewise, Figures 1 and 2 provide examples of images of “wildfire” and “non-wildfire” events.

Details are in the caption following the image — **Figure 1**
Open in figure viewer PowerPoint

Examples of wildfire images from FLAME dataset [34].

Table 1. Detail of the dataset used in this paper.

Class	Training set	Validation set	Testing set	Total
Wildfire images	540	180	180	900
Non-wildfire images	600	200	200	1000
Total	1140	380	380	1900

2.2. Preliminaries

2.2.1. 2D DWT

DWT performs the Time-frequency analysis of the signal. The DWT is a multi-resolution technique that can analyze individual frequencies at different resolutions. 2D DWT provides multi-scale analysis by decomposing an image into different frequency components at various scales (low-frequency and high-frequency details). Wildfires exhibit distinct patterns at different scales (e.g., large flames vs. smoke plumes). By analyzing these scales, DWT enhances the ability to detect fire-related features. Also, 2D DWT provides spatial localization of features. When a wildfire occurs, 2D DWT highlights specific regions in the image where changes occur (localized energy). This localization aids in pinpointing fire-affected areas accurately. 2D DWT coefficients represent different frequency bands and can be input for ML models such as CNN.

By passing through a series of lowpass and highpass filters and downsampling the output signal by a factor of two, it is possible to measure the representation of the wavelet of a discrete signal x[n] consisting of N samples. Now, each frequency band contains N/2 samples. This operation is reversible with the proper selection of filters. This process decomposes the original signal into two sub-bands [35]. Such transformation can be extended to multiple dimensions using separable filters. This is, depending on the signal input dimensions, DWT can be implemented in one-dimensional (1D), two-dimensional (2D), and three-dimensional (3D) forms.

The 2D DWT calculation applies highpass and lowpass filters on image pixels. The highpass filter can create detailed information about the image pixel, while the lowpass filter generates the approximation to the input image on each level. The outputs of filters are subjected to downsampling with a factor of two. 2D DWT is separable and is obtained from two 1D-DWT. Hence, 2D DWT can be performed by first performing a 1D-DWT on each row (i.e., the image’s horizontal filter) and then performing a 1D-DWT on each column (i.e., the vertical filter).

2.2.2. CNN

2D DWT decomposes an image into different frequency subbands (approximation and detail coefficients) at various scales. CNNs are designed to learn hierarchical features; therefore, CNNs can effectively learn and extract relevant features from these subbands. CNNs use local receptive fields (small windows) to capture spatial patterns, and 2D DWT subbands represent different frequency components. This localized analysis helps identify relevant textures, edges, and structure features. CNNs exhibit translation invariance, meaning they can recognize features regardless of their position in the image. By combining CNNs with DWT, deep hierarchies are created for feature extraction. Hence, the combination of DWT subbands and CNNs provides a robust framework for extracting relevant image features. It leverages spatial and frequency information, improving wildfire detection performance.

The CNN structure consists of multiple layers: convolution, pooling, and being fully connected. A typical architecture includes repetitions of the convolution stack and pooling layers followed by one or more fully connected layers. The convolution layer performs feature extraction as a central component of the CNN architecture. To obtain the output value in the corresponding position of the output tensor, an elementwise product between each element of the kernel and the input tensor is calculated at each location of the tensor and summed, which is called the feature map. The pooling layer is responsible for typical downsampling to introduce a translation invariance to small shifts and distortions and reduce the number of subsequent learnable parameters. Notably, there is no learnable parameter in every pooling layer, and filter sizes, strides, or padding are hyperparameters of a pooled operation similar to convolution operations. Deep features extracted by convolution and pooling layers are mapped with a subset of fully connected layers towards the final outputs, such as probability per class in the classification task. The final fully-connected layer typically contains the same output nodes as the number of classes [36, 37].

2.2.3. Multidimensional Scaling (MDS)

Multidimensional scaling (MDS) is the FDR technique employed to visualize differences and is broadly used in multidimensional data analysis [38]. MDS aims to find the projection of multidimensional data in low-dimensional space by preserving the similarity or inconsistency of the data. The object’s proximity index shall be optimally mapped to the distance between its multi-dimensionally located points. To achieve an intuitive spatial graph, MDS compresses a large amount of data that contains multiple variables in a smaller dimension space. MDS translates distances between each pair of objects into a configuration of points in an abstract Cartesian space, simplifying the representation while retaining essential information. Since MDS is the nonlinear dimensionality reduction, it can capture complex relationships and nonlinear structures in the data, which is impossible for linear methods, such as principal component analysis (PCA).

The steps of the classical MDS analysis algorithm are given below:

Step 1: Constructing matrix based on the distance matrix D = [d_ij] ∈ R^n×n.

Step 2: Calculating the inner product matrix .

Step 3: Calculating the eigenvalues and eigenvectors of the matrix B. Here, the distance matrix is the Euclidean matrix. In this case, the eigenvalues are positive. The matrix shall not be Euclidean in case of a negative eigenvalue. Then:

()

where S_k denotes the cumulative contribution rate in PCA. Optimally, the value of k is better not to be too large, while the cumulative contribution rate is better to be significant.

Step 4: As the k value is determined, we have . E_k is a matrix made up of the first k eigenvectors retained by matrix B, and L_k is a diagonal matrix composed of k eigenvalues. As can be seen, the classical MDS and PCA are essentially similar. The difference is that the former is based on the sample, while the latter is based on variables.

3. Proposed Wildfire Detection

Concerning the need for early wildfire detection and timely warning of different departments, this section describes the proposed method for wildfire detection in forests. As shown in Figure 3, this method includes preprocessing, feature extraction, and classification. These parts are explained in the following.

3.1. Preprocessing

To increase the generalization of the proposed method, we only convert the RGB image to the gray-scale one, and there are no other preprocessing algorithms. The images in the database are in the RGB format. To avoid being affected by the irregular light intensity and other environmental factors, RGB images are converted to gray-scale images as follows:

()

where R, G, and B denote the pixel value in the red, green, and blue channels, respectively. Also, y is the pixel value in the gray-scale format.

3.2. Feature Extraction

Figure 4 demonstrates the proposed feature extraction method for wildfire detection. Images containing fire events contain considerable information on the edges that need to be extracted, although such information has not been well used in previous studies. For this, the first step is to extract the edges of the images and put the information in the high frequencies of the image. In this study, the 2D DWT is employed for this purpose.

In this study, 2D DWT was applied in one step using Daubechies 4 filters. A diagram of the single-level of 2D DWT is shown in Figure 5. As a result, four sub-bands are obtained, including the approximation band (LL), the vertical band (LH), the horizontal band (HL), and the diagonal detail band (HH), which are represented by x_a, x_h, x_v, and x_d, respectively.

For extracting deep features, each of the sub-bands obtained in the previous step is separately passed through the considered CNN, and the features obtained from the flattened layer are considered deep features for each sub-band. Notably, the dimensions of each matrix obtained for different sub-bands are altered based on the dimensions of the CNN’s input layer before being subjected to the CNN. This results in feature vectors y_a, y_h, y_v, and y_d, which correspond to the sub-bands x_a, x_h, x_v, and x_d, respectively. The size of all the feature vectors obtained is n_d × 1. These feature vectors are merged into y_f = [y_a, y_h, y_v, y_d] to obtain the final feature vector with dimensions of (4n_d × 1).

The final feature vector consists of many characteristics. There is often redundant information, such as related or duplicate factors, in this feature vector. Thus, feature dimensionality reduction (FDR) overcomes the outcomes of redundant or irrelevant information, mapping the practical information in the main features to fewer features.

3.3. Classification

In the last step, the proposed method classifies the reduced features obtained from MDS to determine if the image contains fire events. Accordingly, this study uses several classification techniques and compares their performances.

Support vector machine (SVM): SVM has been proposed as a robust classifier. SVM is broadly used in various studies with binary scenarios because of its lower computational complexity and easy processing. The optimal hyperplane in SVM maximizes the marginal distance between classes. The linear SVM is utilized in this paper.

k-Nearest neighbor (kNN): kNN is a well-established method in ML-based classification algorithms. kNN is a simple classifier used for wildfire detection. The distance between the test data and the nearest samples in training data determines the test data class. Thus, the value of k plays a unique role in the performance of kNN.

Decision tree: This classifier is a supervised ML technique where a dataset is constantly divided into subsets based on a specific parameter. This classifier uses a tree-like structure consisting of a root, internal decision, and end nodes. The root node is regarded as the whole dataset classified into branches. The internal subsets are called decision nodes; the final node represents the predicted class.

4. Results

This section evaluates and measures the efficacy of the proposed method in detecting wildfires. First, we explain the simulation setup and give the results.

4.1. Simulation Setup

The simulations were carried out using MATLAB R2023a on the computer with the following hardware specifications: CPU: Core i7 13700 H, RAM: 32 GB, GeForce RTX3050, HDD: 1TB. We consider the 10-fold cross-validation to partition the dataset into the train and test data. This scheme randomly divides the dataset into 10 equally sized parts, and the training and test procedure is repeated 10 times. In each cross-validation, one part is considered test data, and the remaining train the proposed method. This procedure was repeated for all parts as test data, and the results were averaged.

The effectiveness of the proposed method is measured in terms of accuracy (Acc.), sensitivity (Sens.), precision (Prec.), and F₁ score (F₁) as follows [39]:

()

where TN denotes true negative (true detection of non-wildfire event), FP denotes false positive (wrong detection of wildfire event), TP denotes true positive (true diagnosis of wildfire event), and FN denotes false negative (wrong detection of non-wildfire event). This study utilized three CNNs, including AlexNet, InceptionV3, and MobileNet, to extract deep features. Additionally, it utilizes transfer learning to train these CNNs to extract deep features. Table 2 presents the parameters used in the learning process.

Table 2. The parameters used for CNN training.

Parameter	Value
Optimizer	Stochastic gradient descent with momentum (SGDM)
Loss function	Cross-entropy
Number of epochs	50
Batch size	16
Learning rate	0.0001
Momentum	0.85

CNNs have a large number of parameters, and there is a need for a vast number of images to train a typical CNN. The low number of images per category cannot generalize the CNN well, resulting in overfitting issues. Data augmentation can be considered an efficient tool to prevent the mentioned challenges. In this paper, we consider the following operations:

1.
Adding zero-mean Gaussian noise with variances of 0.005 and 0.01 to images.
2.
Applying gamma correction with gamma value ranging from 0.7 to 1.3.
3.
Rotating images with a step size of 5° from the angle of −45° to 45°.
4.
Mosaic by combining four randomly cropped images to create a new image.

4.2. Classification Accuracy

Table 3 shows the classification accuracy for various mother wavelet, CNN, and classifier combinations. Mother wavelets are Daubechies 4 (db4), Symlet 8 (sym8), biorthogonal 1.5 (bior 1.5), and reverse biorthogonal 3.5 (rbior 3.5). CNNs, including AlexNet, MobileNet, and InceptionV3, were used to extract deep features. Likewise, SVM, kNN, and decision tree classifiers were utilized to classify reduced deep features. From the results, we observe: (1) the db4 wavelet consistently yields better classification outcomes than sym8, bior1.5, and rbior3.5, likely because it balances edge characterization and noise suppression effectively; (2) InceptionV3 outperforms AlexNet and MobileNet in nearly every wavelet–classifier pairing, indicating that its deeper multi-scale filters are beneficial for wildfire vs. nonwildfire discrimination; and (3) SVM tends to produce higher accuracy compared to kNN and decision tree under the same feature sets, likely thanks to its robust margin-maximizing properties. Overall, the db4 + InceptionV3 + SVM combination achieves the highest accuracy, about 96.8%.

Table 3. Classification accuracy for different combinations of mother wavelet, CNN and classifier.

Mother wavelet	CNN	Classifier
Mother wavelet	CNN	SVM	kNN	Decision tree
db4	AlexNet	0.925	0.887	0.872
	MobileNet	0.940	0.900	0.880
	InceptionV3	0.968	0.932	0.916

sym8	AlexNet	0.914	0.871	0.860
	MobileNet	0.928	0.885	0.871
	InceptionV3	0.955	0.919	0.904

bior1.5	AlexNet	0.890	0.852	0.836
	MobileNet	0.905	0.868	0.852
	InceptionV3	0.915	0.890	0.877

rbior3.5	AlexNet	0.905	0.864	0.850
	MobileNet	0.924	0.882	0.870
	InceptionV3	0.939	0.905	0.891

Note: The bold value is the maximum classification accuracy.

4.3. Confusion Matrix

Table 4 gives the confusion matrix for the proposed method. Confusion matrix is a square matrix with dimensions n_c × n_c, where n_c denotes the number of classes that are equal to two in this study. The confusion matrix represents the prediction summary in the matrix form and indicates how many predictions are correct and incorrect per class. Such a representation allows us to understand the classes that are being confused by the model as other classes. As shown, from 180 images with wildfire events, only three are wrongly detected as non-wildfire events. Likewise, from 200 images without wildfire events, nine are wrongly recognized as images with wildfire events. The confusion matrix further represents the sensitivity of each class, which is equal to the correct classification accuracy for each class. The sensitivity and precision are 0.9833 and 0.9516, respectively. Regarding the importance of early wildfire detection and the fast action by firefighters, these results confirm the efficiency of the proposed wildfire detection method. Also, the value of 0.9672 for the F₁ score indicates the balanced ability of the proposed model to handle positive cases while maintaining accuracy.

Table 4. Confusion matrix of the proposed method.

	Predicted class		Sens.	Prec.	F₁ score
	Positive (wildfire)	Negative (non-wildfire)	Sens.	Prec.	F₁ score
Actual class
Positive (wildfire)	177	3	0.9833	0.9516	0.9672
Negative (non-wildfire)	9	191	0.9833	0.9516	0.9672

4.4. Ablation Study

Table 5 presents the results of investigating the feature reduction algorithm’s effect on the proposed method’s accuracy. As discussed earlier, CNNs extract many deep features from 2D DWT sub-bands, which are highly correlated and thus prolong classification, increase complexity, and reduce efficiency. Thus, using feature reduction methods is crucial. Table 5 compares the accuracy of MDS, PCA, and linear discriminant analysis (LDA). Furthermore, the accuracy of the proposed method is reported while using no feature reduction method. As shown, classification accuracy is improved after using any feature reduction method. Furthermore, supervised LDA outperforms unsupervised PCA, while MDS achieves the highest classification accuracy.

Table 5. The effect of the feature reduction algorithm on wildfire detection accuracy.

Method	MDS	PCA	LDA	Without feature reduction
Accuracy	0.9684	0.9211	0.9368	0.8895

Table 6 compares the efficacy of various optimization algorithms in training CNNs. According to the results, the optimizer stochastic gradient descent with momentum (SGDM) achieves the highest accuracy than the adaptive moment estimation (ADAM) and root mean squared propagation (RMSProp) ().

Table 6. The effect of training optimizer on the accuracy of the proposed method.

Optimizer	ADAM	RMSPROP	SGDM
Accuracy	0.8868	0.8737	0.9684

4.5. Convergence Speed

Figure 6 presents the convergence speed of the proposed method for wildfire detection. As observed, the proposed method has an appreciated convergence speed. Also, the difference between the training and test data accuracy indicates that the model is converged and there is no overfitting or underfitting.

4.6. Computational Time Analysis

In that section, we present a temporal-cost table that quantifies the average processing time for each major step of our approach—namely, (1) wavelet decomposition, (2) CNN-based deep feature extraction, (3) dimensionality reduction (MDS), and (4) final classification. Table 7 illustrates the proposed method’s overall computational. These results confirm that the CNN’s forward pass is the primary driver of runtime. However, given modern GPUs or on-board accelerators (like the NVIDIA Jetson family), our entire pipeline remains practical for near real-time applications, mainly if further optimizations (such as model pruning or quantization) are applied.

Table 7. Computational time analysis of the proposed method.

Stage	Operation	Average Time
1. Wavelet transform	One-level 2D DWT (Daubechies-4)	3–5 ms
2. CNN feature extraction	Forward pass (InceptionV3, batch size = 16)	110–130 ms
3. Dimensionality reduction	MDS (classical) on concatenated features	8–12 ms
4. Classification	SVM inference	1–2 ms
Total per Image	All steps combined	122–149 ms

5. Conclusion

Concerning the unique role of forests and their extensive destruction by frequent wildfires, early detection of wildfires is crucial to protect these natural resources. Hence, this study proposed a new method for detecting forest wildfires using UAV images. The proposed method is based on extracting deep features from four sub-bands of one-step 2D DWT. The deep features of the sub-bands are merged to create the final feature vector. Then, by using the MDS method, the redundant features are removed to reduce the computational complexity. Ultimately, a classifier detects if the images contain wildfire events. This study employed the FLAME dataset to train and evaluate the proposed method. In addition, the efficiency of CNNs (i.e., AlexNet, MobileNet, and InceptionV3) and classifiers (i.e., SVM, kNN, and the decision tree) were calculated. It was found that the Inception-SVM combination achieves the highest accuracy of 0.9686. The accuracy of the correct detection of wildfire and non-wildfire events was 0.9833 and 0.955, respectively, implying the high efficiency of the proposed method. Also, it was shown that MDS outperforms conventional methods such as PCA and LDA regarding wildfire detection.

This research classified images into wildfire and non-wildfire categories, and the fire-affected regions were not determined. In the future, we can use semantic segmentation models to segment wildfire-affected regions precisely, aiding firefighting efforts. Also, we can consider hybrid DL models that combine CNNs with attention mechanisms to capture local and global features.

Conflicts of Interest

The authors certify that they have no affiliations with or involvement in any organization or entity with any financial interest, or non-financial interest in the subject matter or materials discussed in this manuscript.

Funding

The authors received no specific funding for this work.

Open Research

Data Availability Statement

Data is available online at: https://ieee-dataport.org/open-access/flame-dataset-aerial-imagery-pile-burn-detection-using-drones-uavs.

References

1 Aydin B., Selvi E., Tao J., and Starek M. J., Use of Fire-Extinguishing Balls for a Conceptual System of Drone-Assisted Wildfire Fighting, Drones. (2019) 3, no. 1.
10.3390/drones3010017
Web of Science® Google Scholar
2 Zanchi G., Yu L., and Akselsson C., et al.Simulation of Water and Chemical Transport of Chloride From the Forest Ecosystem to the Stream, Environmental Modelling & Software. (2021) 138, 104984.
10.1016/j.envsoft.2021.104984
Web of Science® Google Scholar
3 Hossain F. A., Zhang Y. M., and Tonima M. A., Forest Fire Flame and Smoke Detection From UAV-Captured Images Using Fire-Specific Color Features and Multi-Color Space Local Binary Pattern, Journal of Unmanned Vehicle Systems. (2020) 8, no. 4, 285–309.
10.1139/juvs-2020-0009
Web of Science® Google Scholar
4 Alkhatib A. A., A Review on Forest Fire Detection Techniques, International Journal of Distributed Sensor Networks. (2014) 10, no. 3, 597368.
10.1155/2014/597368
Web of Science® Google Scholar
5 Zhao Y., Ma J., Li X., and Zhang J., Saliency Detection and Deep Learning-Based Wildfire Identification in UAV Imagery, Sensors. (2018) 18, no. 3, https://doi.org/10.3390/s18030712, 2-s2.0-85042705374.
10.3390/s18030712
Web of Science® Google Scholar
6 Jadon A., Omama M., Varshney A., Ansari M. S., and Sharma R., FireNet: A Specialized Lightweight Fire & Smoke Detection Model for Real-Time IoT Applications, arXiv preprint arXiv: 1905.11922, 2019.
Google Scholar
7 Barmpoutis P., Papaioannou P., Dimitropoulos K., and Grammalidis N., A Review on Early Forest Fire Detection Systems Using Optical Remote Sensing, Sensors. (2020) 20, no. 22, https://doi.org/10.3390/s20226442.
10.3390/s20226442
Web of Science® Google Scholar
8 Ren W. and Jin Z., Phase Space Visibility Graph, Chaos, Solitons and Fractals. (2023) 176, 114170.
10.1016/j.chaos.2023.114170
Web of Science® Google Scholar
9 Ren W., Jin N., and OuYang L., Phase Space Graph Convolutional Network for Chaotic Time Series Learning, IEEE Transactions on Industrial Informatics. (2024) 20, no. 5, 7576–7584.
10.1109/TII.2024.3363089
Web of Science® Google Scholar
10 Benjdira B., Khursheed T., Koubaa A., Ammar A., and Ouni K., Car Detection Using Unmanned Aerial Vehicles: Comparison Between Faster R-CNN and YOLOv3, 2019 1st International Conference on Unmanned Vehicle Systems-Oman (UVS), 2019, February, Muscat, Oman, IEEE, 1–6, 05-07.
Google Scholar
11 Herrmann C., Willersinn D., and Beyerer J., Low-Resolution Convolutional Neural Networks for Video Face Recognition, 2016 13th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), 2016, August, Colorado Springs, CO, USA, IEEE, 221–227, 23-26.
Google Scholar
12 Totakura V., Vuribindi B. R., and Reddy E. M., Improved Safety of Self-Driving Car Using Voice Recognition Through CNN, IOP Conference Series: Materials Science and Engineering. (2021) 1022, no. 1.
10.1088/1757-899X/1022/1/012079
Google Scholar
13 Saleem M. H., Potgieter J., and Arif K. M., Plant Disease Detection and Classification by Deep Learning, Plants. (2019) 8, no. 11, https://doi.org/10.3390/plants8110468.
10.3390/plants8110468
Google Scholar
14 Goyal S., Shagill M., Kaur A., Vohra H., and Singh A., A Yolo Based Technique for Early Forest Fire Detection, The International Journal of Innovative Technology and Exploring Engineering. (2020) 9, 1357–1362.
10.35940/ijitee.F4106.049620
Google Scholar
15 Novac I., Geipel K. R., de Domingo Gil J. E., de Paula L. G., Hyttel K., and Chrysostomou D., A Framework for Wildfire Inspection Using Deep Convolutional Neural Networks, 2020 IEEE/SICE International Symposium on System Integration (SII), 2020, January, Honolulu, HI, USA, IEEE, 867–872, 12-15.
Google Scholar
16 Zhang Q., Xu J., Xu L., and Guo H., Deep Convolutional Neural Networks for Forest Fire Detection, International Forum on Management, Education and Information Technology Application, 2016, Atlantis Press, Netherlands, 568–575.
10.2991/ifmeita-16.2016.105
Google Scholar
17 Alexandrov D., Pertseva E., Berman I., Pantiukhin I., and Kapitonov A., Analysis of Machine Learning Methods for Wildfire Security Monitoring With an Unmanned Aerial Vehicles, 2019 24th conference of Open Innovations Association (FRUCT), 2019, April, Moscow, Russia, IEEE, 3–9, 08-12.
Google Scholar
18 Harkat H., Nascimento J. M., Bernardino A., and Ahmed H. F. T., Fire Images Classification Based on a Handcraft Approach, Expert Systems With Applications. (2023) 212, https://doi.org/10.1016/j.eswa.2022.118594, 118594.
10.1016/j.eswa.2022.118594
Web of Science® Google Scholar
19 Lee W., Kim S., Lee Y.-T., Lee H.-W., and Choi M., Deep Neural Networks for Wild Fire Detection With Unmanned Aerial Vehicle, 2017 IEEE International Conference on Consumer Electronics (ICCE), 2017, January, Las Vegas, NV, IEEE, 252–253, 08-10.
Google Scholar
20 Chen Y., Zhang Y., Xin J., Yi Y., Liu D., and Liu H., A UAV-Based Forest Fire Detection Algorithm Using Convolutional Neural Network, 37th Chinese Control Conference (CCC), 2018, July, Wuhan, China, 10305–10310, 25-27.
Google Scholar
21 Singh J., Aarthi M. S., and Idikkula A. S., Convolutional Neural Networks for Early Detection of Forest Fires, 2023 2nd International Conference on Automation, Computing and Renewable Systems (ICACRS), 2023, December, Pudukkottai, India, IEEE, 777–780, 11-13.
Google Scholar
22 Ghali R. and Akhloufi M., Deep Learning Approaches for Wildland Fires Remote Sensing: Classification, Detection, and Segmentation, Remote Sensing. (2023) 15, no. 7, https://doi.org/10.3390/rs15071821.
10.3390/rs15071821
Web of Science® Google Scholar
23 Reis H. C. and Turk V., Detection of Forest Fire Using Deep Convolutional Neural Networks With Transfer Learning Approach, Applied Soft Computing. (2023) 143, https://doi.org/10.1016/j.asoc.2023.110362, 110362.
10.1016/j.asoc.2023.110362
Web of Science® Google Scholar
24 Wang L., Zhang H., Zhang Y., Hu K., and An K., A Deep Learning-Based Experiment on Forest Wildfire Detection in Machine Vision Course, IEEE Access. (2023) 11, 32671–32681, https://doi.org/10.1109/ACCESS.2023.3262701.
10.1109/ACCESS.2023.3262701
Web of Science® Google Scholar
25 Solovyev R., Wang W., and Gabruseva T., Weighted Boxes Fusion: Ensembling Boxes From Different Object Detection Models, Image and Vision Computing. (2021) 107, https://doi.org/10.1016/j.imavis.2021.104117, 104117.
10.1016/j.imavis.2021.104117
Web of Science® Google Scholar
26 Lin T.-Y., Goyal P., Girshick R., He K., and Dollár P., Focal Loss for Dense Object Detection, Proceedings of the IEEE International Conference on Computer Vision, 2017, October, Venice, Italy, IEEE, 2980–2988, 22-29.
Google Scholar
27 Kim S., Jang I.-s., and Ko B. C., Domain-Free Fire Detection Using the Spatial-Temporal Attention Transform of the YOLO Backbone, Pattern Analysis and Applications. (2024) 27, no. 2, https://doi.org/10.1007/s10044-024-01267-y.
10.1007/s10044-024-01267-y
Web of Science® Google Scholar
28 Shamta I. and Demir B. E., Development of a Deep Learning-Based Surveillance System for Forest Fire Detection and Monitoring Using UAV, PLOS ONE. (2024) 19, no. 3, https://doi.org/10.1371/journal.pone.0299058.
10.1371/journal.pone.0299058
PubMed Web of Science® Google Scholar
29 Jin S., Wang T., Huang H., Zheng X., Li T., and Guo Z., A Self-Adaptive Wildfire Detection Algorithm by Fusing Physical and Deep Learning Schemes, International Journal of Applied Earth Observation and Geoinformation. (2024) 127, https://doi.org/10.1016/j.jag.2024.103671.
10.1016/j.jag.2024.103671
Web of Science® Google Scholar
30 Chen L.-C., Papandreou G., Kokkinos I., Murphy K., and Yuille A. L., Deeplab: Semantic Image Segmentation With Deep Convolutional Nets, Atrous Convolution, and Fully Connected Crfs, IEEE Transactions on Pattern Analysis and Machine Intelligence. (2018) 40, no. 4, 834–848, https://doi.org/10.1109/TPAMI.2017.2699184, 2-s2.0-85042712042.
10.1109/TPAMI.2017.2699184
PubMed Web of Science® Google Scholar
31 Ronneberger O., Fischer P., and Brox T., U-net: Convolutional Networks for Biomedical Image Segmentation, Medical Image Computing and Computer-Assisted Intervention-MICCAI 2015: 18th International Conference, 2015, October, Munich, Germany, Springer, 234–241, 5-9, 2015, Proceedings, Part III 18.
Google Scholar
32 Badrinarayanan V., Kendall A., and Cipolla R., Segnet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation, IEEE Transactions on Pattern Analysis and Machine Intelligence. (2017) 39, no. 12, 2481–2495, https://doi.org/10.1109/TPAMI.2016.2644615, 2-s2.0-85033697420.
10.1109/TPAMI.2016.2644615
PubMed Web of Science® Google Scholar
33 Li Z., Sun Y., Zhang L., and Tang J., CTNet: Context-Based Tandem Network for Semantic Segmentation, IEEE Transactions on Pattern Analysis and Machine Intelligence. (2022) 44, no. 12, 9904–9917, https://doi.org/10.1109/TPAMI.2021.3132068.
10.1109/TPAMI.2021.3132068
PubMed Web of Science® Google Scholar
34 Shamsoshoara A., Afghah F., Razi A., Zheng L., Fulé P. Z., and Blasch E., Aerial Imagery Pile Burn Detection Using Deep Learning: The FLAME Dataset, Computer Networks. (2021) 193, https://doi.org/10.1016/j.comnet.2021.108001, 108001.
10.1016/j.comnet.2021.108001
Web of Science® Google Scholar
35 Stollnitz E. J., DeRose T. D., and Salesin D. H., Wavelets for Computer Graphics: Theory and Applications, 1996, Morgan Kaufmann, San Francisco, USA.
Web of Science® Google Scholar
36 Yamashita R., Nishio M., Do R. K. G., and Togashi K., Convolutional Neural Networks: An Overview and Application in Radiology, Insights into Imaging. (2018) 9, no. 4, 611–629, https://doi.org/10.1007/s13244-018-0639-9, 2-s2.0-85052299105.
10.1007/s13244-018-0639-9
PubMed Web of Science® Google Scholar
37 Zali-Vargahan B., Charmin A., Kalbkhani H., and Barghandan S., Semisupervised Deep Features of Time-Frequency Maps for Multimodal Emotion Recognition, International Journal of Intelligent Systems. (2023) 2023, no. 1, 11, https://doi.org/10.1155/2023/3608115, 3608115.
10.1155/2023/3608115
Web of Science® Google Scholar
38 Carroll J. D. and Arabie P., Multidimensional Scaling, Measurement, Judgment and Decision Making. (1998) 179–250, https://doi.org/10.1016/B978-012099975-0.50005-1.
10.1016/B978-012099975-0.50005-1
Google Scholar
39 Salimpour S., Kalbkhani H., Seyyedi S., and Solouk V., Stockwell Transform and Semi-Supervised Feature Selection From Deep Features for Classification of BCI Signals, Scientific Reports. (2022) 12, no. 1, https://doi.org/10.1038/s41598-022-15813-3, 11773.
10.1038/s41598-022-15813-3
CAS PubMed Web of Science® Google Scholar

All articles

Fusion of Deep Features of Wavelet Transform for Wildfire Detection

Abstract

1. Introduction

1.1. Motivations

1.2. Related Works

1.3. Contributions

2. Dataset and Preliminaries

2.1. Dataset

2.2. Preliminaries

2.2.1. 2D DWT

2.2.2. CNN

2.2.3. Multidimensional Scaling (MDS)

3. Proposed Wildfire Detection

3.1. Preprocessing

3.2. Feature Extraction

3.3. Classification

4. Results

4.1. Simulation Setup

4.2. Classification Accuracy

4.3. Confusion Matrix

4.4. Ablation Study

4.5. Convergence Speed

4.6. Computational Time Analysis

5. Conclusion

Conflicts of Interest

Funding

Open Research

Data Availability Statement

References

Figures

References

Information

About Wiley Online Library

Help & Support

Opportunities

Connect with Wiley