Infrared and Visible Image Fusion Based on Iterative Control of Anisotropic Diffusion and Regional Gradient Structure
Abstract
To improve the fusion performance of infrared and visible images and effectively retain the edge structure information of the image, a fusion algorithm based on iterative control of anisotropic diffusion and regional gradient structure is proposed. First, the iterative control operator is introduced into the anisotropic diffusion model to effectively control the number of iterations. Then, the image is decomposed into a structure layer containing detail information and a base layer containing residual energy information. According to the characteristics of different layers, different fusion schemes are utilized. The structure layer is fused by combining the regional structure operator and the structure tensor matrix, and the base layer is fused through the Visual Saliency Map. Finally, the fusion image is obtained by reconstructing the structure layer and the energy layer. Experimental results show that the proposed algorithm can not only effectively deal with the fusion of infrared and visible images but also has high efficiency in calculation.
1. Introduction
In recent years, UAVs have played an increasingly important role in many fields due to their high flexibility, low cost, and easy operation, which are often used for battlefield reconnaissance, battle situation assessment, target recognition, and tracking in the military. Now, image sensors in UAVS can acquire multiple types of images such as multispectral images, visible images, and infrared images [1]. However, due to the limitation of environmental conditions such as light, imaging with only one sensor will be affected by certain factors and cannot meet the requirements of practical applications. The combination of multiple imaging sensors can overcome the shortcomings of a single sensor and obtain more reliable and comprehensive information. The imaging sensors commonly used in UAVs are infrared sensors and visible sensors. The infrared sensors use the principle of thermal radiation to obtain images with larger infrared targets, but the targets are not clear and the edges are blurred [2]. The visible sensors use the principle of light reflection to obtain clear images with clear details, but under low-visibility conditions, the images have limitations. Research has found that the effective combination of infrared images and visible images can result in a more comprehensive and accurate scene or target, which provides strong support for subsequent task processing [3].
The more widely used methods in the field of infrared and visible image fusion can be roughly classified into MST-based methods [4], sparse representation-based methods [5], spatial domain-based methods [6], and deep learning-based methods [7]. At present, the most researched and applied methods are MST-based methods, including wavelet transform [8], Laplacian pyramid transform [9], nonsubsampled shear wave transform [10], and nonsubsampled contourlet transform [11]. These methods decompose the source images in multiple scales, then fuse them separately according to certain fusion rules, and finally get the fusion result through inverse transformation, which can extract the salient information in the images and get better performance. For example, nonsubsampled contourlet transform is utilized by Huang et al. [11] to decompose the source images to obtain precise decomposition. However, due to the lack of spatial consistency in the traditional MST methods, structural or brightness distortion may appear in the result.
In addition, image fusion methods with edge preserving filtering [12] are also receiving attention. Edge-preserving filtering can effectively reduce the halo artifacts around the edges in the fusion results while retaining the edge information of the image contour and has a good visual performance. Popular methods are mean filtering [13], bilateral filtering [14], joint bilateral filtering [15], and guided filtering [16]. These methods complete decomposition according to the spatial structure of the images to achieve spatial consistency, so as to achieve the purpose of smoothing the texture and preserving edge detail information. For example, Zhu et al. [16] proposed a novel fast single-image dehazing algorithm by using guided filtering to decomposition the images, and it obtained good performance. The edge-preserving fusion algorithms maintain spatial consistency and effectively improve the phenomenon of fusion image distortion or artifacts, but there are certain limitations: (1) it will introduce detail “halos” at the edges; (2) when the input images and the guide images are inconsistent, the filtering will be insensitive or even fail; and (3) it is difficult to meet the requirements of fusion performance, time efficiency, and noise robustness simultaneously.
Inspired by the previous research, this article focuses on reducing “halos” at the edges to retain the edge structure information and obtaining better decomposition performance in both noise-free and noise-perturbed images. In this paper, a new infrared and visible image fusion method based on iterative control of anisotropic diffusion and regional gradient structure operator is proposed. Anisotropic diffusion is utilized to deconstruct the source image into a structure layer and a base layer. Then, the structure layer is processed by using the gradient-based structure tensor matrix and the regional structure operator. Due to the weak detail and high energy of the base layer, the Visual Saliency Map (VSM) is utilized to fuse the base layer. By reconstructing the two prefusion components, the final fusion image can be obtained.
- (1)
A novel method of infrared and visible image fusion is proposed. The anisotropic diffusion model with a control iteration operator is proposed to adaptively control the number of iterations, so the image is decomposed adaptively into a structure layer with rich edges and detail information and a base layer with pure energy information. Especially, the computational efficiency is greatly improved
- (2)
The regional structure operator is proposed into the structure tensor matrix, which can effectively extract information such as image details, contrast, and structure. It can also greatly improve the detection ability of weak structures and obtain structure images with good prefusion performance
- (3)
Since anisotropic diffusion can effectively deal with noise, the proposed method also has a good performance on noisy image fusion. In addition, the algorithm is widely used and it is also suitable for other types of image fusion
The paper is organized as follows. Section 2 briefly reviews the anisotropic diffusion and structure tensor theory and introduces new operators. Section 3 describes the proposed infrared and visible image fusion algorithm in detail. Section 4 introduces related experiments and compares with several current advanced algorithms. Finally, the conclusion is discussed in Section 5.
2. Related Theories
2.1. Anisotropic Diffusion Based on Iterative Control
The scale space weighed by these two functions is different. The first function is for the abrupt areas with large gradients, namely, the edge and detail areas. The second function is for flat areas with small gradients. Both functions consist of a free parameter k.
The anisotropic diffusion of the image I is simply represented by aniso(I). After the image is diffused through anisotropy, since the iterative control operator can precisely control the number of iterations, almost all the vibration and repetitive context can be effectively preserved in the structure layer, while the energy information and weak edges are preserved in the base layer. Figure 1 shows the base layer and structure layer images obtained after anisotropic diffusion decomposition. It can be clearly seen that the images are basically consistent with the theoretical analysis.

2.2. Gradient-Based Structure Tensor Matrix
Gradient is the rate of change, which is reflected by the difference between a central pixel and surrounding pixels. It can be used to accurately reflect the texture details, contour features, and structural components in the image. The structure tensor is an effective method to analyse the gradient problem, and it has been applied to a variety of image processing tasks.
3. Fusion Framework
Based on the above theories, a new image fusion framework is constructed, as shown in Figure 2. Different from the traditional decomposition scheme, in order to make better use of the useful information in the original image, first, the iterative control anisotropic diffusion is utilized to decompose the source image into base and structure components. At this time, most of the gradients and edges can be effectively preserved in the structure layer, and the base layer contains the remaining energy information. Then, according to the characteristics of each layer, different fusion rules are introduced to acquire the prefusion of each layer. Among them, for the fusion of the structure layer, the prefusion is effectively realized through the regional gradient structure; for the base layer, the prefusion is performed through the VSM. Finally, the fusion result is obtained by reconstructing the two prefusion layers.

3.1. Anisotropic Decomposition
After anisotropic decomposition, a structure layer with rich outline and texture details and a base layer with intensity information can be obtained.
3.2. Fusion of Structure Layers
3.3. Fusion of Base Layers
Since the base layers contain less details, the weighted average technology based on VSM [20] is used to fuse the base layer .
4. Experimental Analysis and Results
In order to verify the effectiveness and reliability of the algorithm in this paper, multiple pairs of images are utilized for experimental verification, and the results are analysed through subjective vision and objective quantitative evaluation. After setting the algorithm parameters, the experimental results are displayed and discussed.
4.1. Experimental Setting
As shown in Figure 3, six pairs of source images are employed in the experiment, which can be obtained from the public websitehttp://imagefusion.org/. All the experiments are implemented using MATLAB 2018a on a notebook PC. And five recent methods are compared in the same experimental environment for verification, such as image fusion with ResNet and zero-phase component analysis (ResNet) proposed by Li et al. [21], image fusion with the convolutional neural network (CNN) proposed by Liu et al. [22], gradient transfer and total variation minimization-based image fusion method (GTF) proposed by Ma et al. [23], image fusion through infrared feature extraction and visual information preservation (IFEVIP) proposed by Zhang et al. [24], and multisensor image fusion based on fourth-order partial differential equations (FPDE) proposed by Bavirisetti et al. [25]. In addition, the fusion performance is quantitatively evaluated by six indicators, including entropy (EN) [26], edge information retention (QAB/F) [27], Chen-Blum’s index (QCB) [28], mutual information (MI) [29], structural similarity (SSIM) [30], and peak signal-to-noise ratio (PSNR) [31].

4.2. Image Fusion and Evaluation
Figures 4 and 5 are six pairs of infrared and visible image fusion examples. Figures 4(a1), 4(b1), and 4(c1) and Figures 5(a1), 5(b1), and 5(c1) are infrared images, and Figures 5(a2), 5(b2), and 5(c2) and Figures 4(a1), 4(b1), and 4(c1) are infrared images. Figures 5(a2), 5(b2), and 5(c2) are visible images; Figures 4(a3)–4(a8), 4(b3)–4(b8), and 4(c3)–4(c8) and Figures 5(a3)–5(a8), 5(b3)–5(b8), and 5(c3)–5(c8) are the fusion results obtained by different methods. The content in the red box in the figure is the part to be emphasized.


4.2.1. Subjective Evaluation
It can be seen from Figures 4 and 5 that the fusion images obtained by the ResNet and GTF methods have lower contrast than the results obtained by the proposed method. Although the structure is better preserved, the details are relatively weakened and lost. The IFEVIP method maintains a good contrast, but the visual effect is too enhanced, especially in the partially enlarged areas, resulting in obvious error in the result. The FPDE method has the phenomenon of blurred internal features. The CNN method has obtained a relatively good fusion result, but its image is somewhat unnatural, and the colour of the result in Figure 5(c4) contains errors. Therefore, the proposed method can effectively separate the component information of different images, preserve the useful information of the source images into the fusion images, and obtain the best visual performance in the aspect of edge and detail preservation.
4.2.2. Objective Evaluation
Except for subjective evaluation, the fusion results are quantitatively evaluated, and the results are shown in Table 1, in which the best results are labelled in bold. According to the data in the table, it can be seen that the objective evaluation of the proposed method is significantly higher than other methods. In all quantitative evaluations, only a few places are not optimal, but they do not affect the advantages of the method in this paper. In addition, Figure 6 shows the bar chart comparison of EN, QAB/F, QCB, MI, SSIM, and PSNR values of various fusion methods for the car example.
Source images | Index | ResNet | CNN | GTF | TIF | FPDE | Proposed |
---|---|---|---|---|---|---|---|
Car | EN | 6.798 | 6.519 | 7.119 | 7.090 | 6.569 | 7.564 |
QABF | 0.313 | 0.349 | 0.246 | 0.398 | 0.299 | 0.401 | |
QCB | 0.401 | 0.395 | 0.337 | 0.367 | 0.395 | 0.439 | |
MI | 1.735 | 1.082 | 1.243 | 1.638 | 1.432 | 2.866 | |
SSIM | 1.543 | 1.372 | 1.310 | 1.330 | 1.435 | 1.988 | |
PSNR | 58.253 | 58.430 | 58.127 | 57.280 | 58.349 | 59.124 | |
House | EN | 6.592 | 6.618 | 6.952 | 6.960 | 6.870 | 6.998 |
QABF | 0.375 | 0.345 | 0.253 | 0.402 | 0.387 | 0.418 | |
QCB | 0.463 | 0.471 | 0.338 | 0.468 | 0.477 | 0.490 | |
MI | 1.301 | 1.251 | 1.080 | 1.417 | 1.222 | 1.456 | |
SSIM | 1.534 | 1.436 | 1.374 | 1.418 | 1.305 | 1.991 | |
PSNR | 58.143 | 59.462 | 59.066 | 58.225 | 58.356 | 59.184 | |
Shop | EN | 6.275 | 6.062 | 6.627 | 6.266 | 6.403 | 6.703 |
QABF | 0.461 | 0.205 | 0.481 | 0.589 | 0.328 | 0.598 | |
QCB | 0.478 | 0.451 | 0.434 | 0.481 | 0.445 | 0.497 | |
MI | 1.723 | 0.812 | 1.746 | 1.682 | 1.308 | 1.896 | |
SSIM | 1.334 | 1.031 | 1.221 | 1.325 | 1.299 | 1.993 | |
PSNR | 59.012 | 59.885 | 59.466 | 59.308 | 59.345 | 59.936 | |
Snow | EN | 7.012 | 6.818 | 5.919 | 6.911 | 6.918 | 7.723 |
QABF | 0.543 | 0.551 | 0.498 | 0.577 | 0.565 | 0.579 | |
QCB | 0.554 | 0.483 | 0.478 | 0.517 | 0.612 | 0.661 | |
MI | 2.045 | 1.914 | 1.598 | 2.368 | 1.999 | 2.657 | |
SSIM | 1.367 | 1.238 | 1.164 | 1.222 | 1.029 | 1.989 | |
PSNR | 55.034 | 56.630 | 56.122 | 55.237 | 56.856 | 57.083 | |
Tree | EN | 6.744 | 6.696 | 6.696 | 6.229 | 6.801 | 6.962 |
QABF | 0.301 | 0.311 | 0.348 | 0.341 | 0.333 | 0.379 | |
QCB | 0.423 | 0.424 | 0.417 | 0.430 | 0.415 | 0.469 | |
MI | 1.354 | 1.109 | 1.452 | 1.090 | 1.027 | 1.635 | |
SSIM | 1.578 | 1.512 | 1.469 | 1.483 | 1.496 | 1.995 | |
PSNR | 58.013 | 57.641 | 57.242 | 57.325 | 57.934 | 58.823 | |
Walking Night | EN | 6.348 | 5.734 | 6.059 | 6.281 | 6.786 | 7.011 |
QABF | 0.405 | 0.346 | 0.309 | 0.471 | 0.418 | 0.474 | |
QCB | 0.313 | 0.353 | 0.291 | 0.344 | 0.350 | 0.386 | |
MI | 2.019 | 2.130 | 2.189 | 2.139 | 2.005 | 2.335 | |
SSIM | 1.346 | 1.153 | 1.232 | 1.208 | 1.206 | 1.989 | |
PSNR | 55.999 | 56.047 | 55.843 | 54.825 | 55.470 | 56.943 |

In summary, for infrared and visible fusion, the method in this paper has a good performance both subjectively and objectively.
4.3. Extended Experiment
After experimental verification, the proposed fusion algorithm is equally effective for remote sensing images. To illustrate the effectiveness of this algorithm, two different sets of panchromatic and multispectral satellite remote sensing images are shown in Figure 7.

Figures 7(a1) and 7(b1) are multispectral images with high spectral resolution and low spatial resolution. Figures 7(a2) and 7(b2) are panchromatic images with high spatial resolution and low spectral resolution. The corresponding fusion results are shown in Figures 7(a3) and 7(b3). As can be seen from the content in the red box in Figure 7, the fusion results have both high spatial resolution and high spectral resolution, and the fused images have a strong ability to express structure and details. The objective evaluation results are shown in Figure 8. It can be seen from the visual and objective results that this algorithm can effectively retain high-spatial and hyperspectral information and can improve the accuracy of subsequent processing of remote sensing images.

4.4. Computational Efficiency
The methods tested in this paper are all carried out in the same experimental environment. The average implementation time of six pairs of images is compared as shown in Table 2. It can be seen that the calculation efficiency of the proposed algorithm has a considerable advantage over the comparison algorithms.
Method | ResNet | CNN | GTF | IFEVIP | FPDE | Proposed |
---|---|---|---|---|---|---|
Time (s) | 23.07 | 23.03 | 2.91 | 1.34 | 5.78 | 0.78 |
5. Conclusions
In this paper, an infrared and visible image fusion algorithm based on iterative control of anisotropic diffusion and regional gradient structure is proposed. The algorithm makes full use of the advantages of anisotropic diffusion and improves the decomposition efficiency and effect through iterative control operators. The regional gradient structure operator is introduced to fully extract the detailed information in the structure layer to obtain a better fusion performance. Many experimental results show that this algorithm is significantly better than existing methods in terms of subjective and objective evaluation. In addition, higher calculation efficiency and stronger antinoise performance can be obtained, and the algorithm can be effectively applied to other types of image fusion situations.
Conflicts of Interest
The authors declare that there is no conflict of interest regarding the publication of this paper.
Acknowledgments
This research was funded by the National Natural Science Foundation of China, grant number 61801507, and the National Natural Science Foundation of Hebei Province, grant number F2021506004.
Open Research
Data Availability
The data used to support the findings of this paper are available from http://imagefusion.org/.