Volume 2022, Issue 1 5816565

Research Article

Open Access

Single-Object-Based Region Growth: Key Area Localization Model for Remote Sensing Image Scene Classification

Feiyang Li,

Feiyang Li

orcid.org/0000-0003-3378-9928

School of Physics and Electronic Information, Huaibei Normal University, Huaibei, China hbcnc.edu.cn

Search for more papers by this author

Jiangtao Wang,

Corresponding Author

Jiangtao Wang

[email protected]

orcid.org/0000-0002-2603-8512

School of Physics and Electronic Information, Huaibei Normal University, Huaibei, China hbcnc.edu.cn

School of Information, Huaibei Normal University, Huaibei, China hbcnc.edu.cn

Search for more papers by this author

Mingyang Wang,

Mingyang Wang

School of Physics and Electronic Information, Huaibei Normal University, Huaibei, China hbcnc.edu.cn

Search for more papers by this author

Ziyang Wang,

Ziyang Wang

School of Physics and Electronic Information, Huaibei Normal University, Huaibei, China hbcnc.edu.cn

Search for more papers by this author

Feiyang Li,

Feiyang Li

orcid.org/0000-0003-3378-9928

School of Physics and Electronic Information, Huaibei Normal University, Huaibei, China hbcnc.edu.cn

Search for more papers by this author

Jiangtao Wang,

Corresponding Author

Jiangtao Wang

[email protected]

orcid.org/0000-0002-2603-8512

School of Physics and Electronic Information, Huaibei Normal University, Huaibei, China hbcnc.edu.cn

School of Information, Huaibei Normal University, Huaibei, China hbcnc.edu.cn

Search for more papers by this author

Mingyang Wang,

Mingyang Wang

School of Physics and Electronic Information, Huaibei Normal University, Huaibei, China hbcnc.edu.cn

Search for more papers by this author

Ziyang Wang,

Ziyang Wang

School of Physics and Electronic Information, Huaibei Normal University, Huaibei, China hbcnc.edu.cn

Search for more papers by this author

First published: 15 September 2022

https://doi.org/10.1155/2022/5816565

Academic Editor: Mohammad R. Khosravi

Share a link

Email
Wechat
Bluesky

Abstract

Remote sensing image scene classification is a challenging task due to the large differences within the same classes and a large number of similar scenes among different classes. To tackle this problem, this paper proposes a single-object-based region growth algorithm to effectively localize the most key area in the whole image, so as to generate more discriminative local fine-grained features for the image scene. Concurrently, a local-global two-branch network is designed to utilize the features of the images from multiple perspectives, respectively. Specially, the global branch extracts global features (such as contour, texture) from the whole image, and local branch extracts more local features from the local key area. Finally, the global and local classification scores are integrated to make the final decision. Experiments are performed on three publicly available data sets, and the results show that this method can achieve higher accuracy compared to most existing state-of-the-art methods.

1. Introduction

With the continuous progress of remote sensing technology and the upgrading of imaging equipment, acquisition of high-resolution remote sensing images is easier than before. High-resolution remote sensing images contain rich scene semantic information, which is beneficial to the interpretation of remote sensing images. As an important means of remote sensing image interpretation, remote sensing image scene classification has received increasingly attention in recent years. However, the complex background and a large number of irrelevant scene information in remote sensing images pose great challenges to the classification.

Feature extraction has always been a research hotspot as the core problem of image classification. However, the large intraclass differences and subtle interclass differences in remote sensing images make it difficult to extract discriminating features. A key point to solving the above problem is to find the local subtle differences, and most existing methods first locate local regions and then extract local features for classification. In order to accurately locate local key areas, image patches containing objects need to be generated first. Selective search [1] combines the advantages of exhaustive search and segmentation and can search and capture all possible object regions in a variety of ways. Zhang et al. [2] used selective search to generate part proposals, and the average recall of parts is 95% on the bird data set. However, this unsupervised method requires additional annotation, which is time-consuming and labor-consuming. To address this problem, researchers proposed weak supervised learning without labeled information. Zhang et al. [3] used CNN to generate multiscale part proposals (all part proposals are clustered) and then calculate an importance score of each part cluster, and those parts with high scores are selected as the useful areas. Despite the computational cost savings, a number of proposals lead to overlap in the selected parts.

To tackle the above issues, we present a single-object-based region growth algorithm to locate the most key areas. Meanwhile, a global-local two-branch model as shown in Figure 1 is designed to extract discrimination features from the whole image and local key areas, respectively. Finally, the classification scores of the two branches are fused to complete the final decision. Experimental results on the RSSCN7, AID, and NWPU-RESISC45 data sets show that the proposed method has excellent performance in terms of accuracy.

Details are in the caption following the image — Open in figure viewer PowerPoint

The main contributions of this paper are as follows.

(1)
A single-object-based region growth algorithm is proposed, which can effectively discover and localize the most important areas. More importantly, the method does not require additional annotation information during training and testing
(2)
Unlike the traditional region growth algorithm, this method treats the entire image as a region and requires only one seed point. Furthermore, the saliency value size of the pixels around the seed is taken as a determination condition to incorporate the new regions
(3)
Most existing methods ignore the connection between global and local. This paper designs a two-branch model that combines the global and local scores to promote each other

The remainder of this paper is covered as below. Section 2 briefly describes the related work of remote sensing image scene classification and salient object detection. In Section 3, the proposed method is described in detail. In Section 4, the data sets, experimental results and analysis are presented. Section 5 summarizes this paper.

2. Related Work

2.1. Remote Sensing Image Scene Classification

Traditional classification methods rely on some manually designed low-level feature descriptors, such as texture descriptors [4], histograms [5], and scale-invariant feature transform (SIFT) [6]. However, there is an insurmountable semantic divide between low-level and high-level features, which makes classification results unsatisfactory. To solve this problem, the bag-of-visual-words (BoVW) model [7] is proposed to extract more discriminating mid-level features. BoVW technology can integrate local features of an image into a global representation by clustering, encoding, etc. On this basis, Chen and Tian [8] proposed a pyramid of spatial relations (PSR) model for the land cover classification. The PSR model adopts a new concept of spatial relation to merge both absolute and relative spatial information into the BoVW, which can effectively deal with the problems of translation and rotation in remote sensing image. Although these methods have achieved good results, handcrafted features cannot effectively deal with various challenges in remote sensing image classification.

In recent years, the convolutional neural network (CNN) has been widely used in computer vision tasks, such as image classification [9], target detection [10], and object tracking [11]. Different from the features designed manually, the CNN model can learn more discriminatory deep features from images. Consequently, CNN-based methods have gradually become the mainstream of remote sensing image scene classification. Zhao et al. [12] proposed an object-based deep learning method, deep features are computed from the fixed receptive window using a five-layer CNN, and features are extracted using three different segmentation scales. In order to extract more hidden information from the features of different layers, Li et al. [13] proposed a multiscale feature fusion strategy for remote sensing image scene classification. Xue et al. [14] used three popular CNNs to extract features and performed classification after fusion of these features.

2.2. Salient Object Detection

As one of the important preprocessing methods in computer vision tasks, saliency object detection is widely used in video object segmentation [15], scene classification [16], and object detection [17], etc.

Early methods mainly detected salient objects by manually extracting features. For example, Itti et al. [18] extracted color, orientation, and brightness features of the image under different scales to calculate the saliency map. Yan et al. [19] treated the product of global color contrast with the central prior as saliency under a single scale. With the development of deep learning, the combination of salient object detection and the convolutional neural network has also achieved great success. Li and Yu [20] used the multiscale features extracted by the convolutional neural network to calculate the saliency map. Zhang et al. [21] proposed a multilayer feature aggregation network, which can integrate multilevel features into multiple resolutions. Then, combine these feature maps at each resolution and predict the saliency map with the combined features. Moreover, different from the multiscale feature fusion approach, Wei et al. [22] proposed selective convolutional descriptor aggregation (SCDA) for salient object detection. First, the output feature map of the last convolutional layer is aggregated in the depth direction. Then, multiple object regions are found based on a threshold segmentation method and finally retained the largest connected region to locate the local image.

For remote sensing images, it is crucial to find the unique region from complex scenes. Motivated by the idea of convolution-descriptor aggregation in SCDA, we propose a single-object-based region growth to find the boundary of key area, which can be used to sample local images.

3. Proposed Method

In this section, we first introduce the important components of the baseline network. Then, the extraction process of the local key area is described in detail. Finally, the global-local two-branch network shown in Figure 1 is designed to extract the global and local features separately.

Deep convolutional neural networks have powerful learning capabilities, but their performance degrade substantially with increasing depth. The proposal of the residual network [23] solves this problem to some extent and achieves marvelous performance in image recognition. This experiment mainly uses the 18-layers residual network (ResNet18) as the baseline network.

3.1. Baseline

Residual network is mainly formed by the residual block stacking shown in Figure 2(a). When the input is x, the parameter W_i is learned through the residual function F(x, {W_i}), and then, x is obtained directly through a shortcut connection. Finally, the output is defined as [23]

(1)

As the network deepens, when the residual F approximates 0 infinitely, the residual block is equivalent to complete a simple identity mapping, which will not degrade network performance.

3.2. Extract Local Key Areas

3.2.1. Aggregate Mapping

Given the input image I, the image is input into a pretrained convolutional neural network as shown in Figure 2(b). The activation features generated by the last convolutional layer (layer4) are represented as

(2)

where I_i ∈ R^H×W is the feature map of ith channel in M_c, C is the number of channels, and H and W are the height and width of feature maps, respectively. Therefore, there are C feature maps need special attention. However, for different feature maps, the semantics of their activation region may be completely different or even appear with background noise. To avoid the effects of background noise, a simple and effective method is to add up the activation features of each channel, which is defined as [22]

(3)

where A ∈ R^H×W is called the “aggregation map.”

In order to locate key areas more accurately, the aggregate map is scaled first. Moreover, to eliminate the impact of negative values, the elements in A need to be normalized, which is written as [14]

(4)

where

is the normalized data and A_max and A_min are the values of the maximum and minimum in A_n.

3.2.2. Single-Object-Based Region Growth Algorithm

After the above process, the saliency map as shown in Figure 3(a) is obtained. It can be seen from the figure that the higher the saliency value of a position (x, y), the more the possibility to become a key area. For the saliency map of

, total saliency value is expressed as

(5)

and the total saliency value within the region [x₁ : x₂, y₁ : y₂] in

is defined as

(6)

If E[x₁ : x₂, y₁ : y₂] > T × E[H, W], the region [x₁ : x₂, y₁ : y₂] is considered to be the most key area for image recognition. T is a hyperparameter in the range of (0, 1]. In order to find the most critical region quickly and accurately, the single-object-based region growth algorithm is proposed, which mainly consists of the following steps:

Step 1. initialization. Firstly, find the coordinate of the maximum value in and take it as the starting position[x_s, y_s]. Then, the initial boundary of the salient region can be marked as [x_s, x_d, y_s, y_d], where x_d = x_s + 1, y_d = y_s + 1

Step 2. single-object-based region growth. The initial boundary is continuously expanded until reaches the termination condition. Some implementation details are shown in Algorithm 1.

Step 3. scale the boundary. Scaling the values of the boundary to range [0, 1]

Algorithm 1: Single-object-based region growth

Input:
1: For example: x_s = 0 and y_s = 0
2: x_d = x_s + 1, y_d = y_s + 1
3: T_s = E[x_s, x_d, y_s, y_d]/E[H, W]
4: while T_s < T then
5: if E[x_s, x_d + 1, y_s, y_d] > E[x_s, x_d, y_s, y_d + 1] do
6: x_d = x_d + 1
7: else
8: y_d = y_d + 1
9: T_s = E[x_s, x_d, y_s, y_d]/E[H, W]
10: return [x_s, x_d, y_s, y_d]

3.2.3. Local Area Sampling

Finally, the scaled boundary is used to guide the sampling for the local image I_l, which is denoted as

(7)

3.3. Visualization of Single-Object-Based Region Growth

The whole process of single-object-based region growth is shown in Figure 3, which can help understand the Algorithm 1. For convenience, the image size is set to 10 × 10 and the hyperparameter T is set to 0.5.

As shown in Figure 3, the class of the input image is the industrial region. The initialized bounding box is shown in Figure 3(b), and the result is [6, 4, 7, 5]. Then, in order to rapidly increase the total saliency value within the region, the bounding box expands one step in a specific direction each time after discrimination. After the region stops growing, the bounding box as shown in Figure 3(j) is [3, 2, 9, 7]. The final result can be seen that a large amount of background noise in the global image is eliminated, and the local image almost contains the key object.

Further, to more intuitively evaluate the effect of local regional localization, the method in this paper is compared with SCDA, and the result is shown in Figure 4. It is obvious from the results that the single-object-based region growth can locate key areas more precisely, and the obtained local regions contain less background noise.

3.4. Classification

As shown in Figure 1, global image passes through the global branch above to obtain the feature map and global classification score S_g. Then, find the boundary of local key area on the aggregation map. Later, the enlarged image is sampled to get the local image, and the local image is input into the following local branch to get the local classification score S_l. Finally, the two classification scores are fused as the final decision. The formula is

(8)

4. Experiments

4.1. Data Sets and Evaluation Metric

In order to verify the effectiveness of the proposed method, experiments are carried out on three public remote sensing image data sets. The basic information of each data set is listed in Table 1.

1. Comparison of different data sets.

Data sets	Image size	Spatial resolution (m)	Total	Classes
RSSCN7	400 × 400	—	2800	7
AID	600 × 600	0.5-8	10000	30
NWPU-RESISC45	256 × 256	0.2-30	31500	45

RSSCN7 data set has 7 categories, including grass land, forest, farm land, parking lot, residential region, industrial region, and river and lake. These images come from different seasons and weather changes and are sampled with different scales.

Aerial image data (AID) set split into 30 categories, that is, airport, bare land, baseball field, beach, bridge, center, church, commercial, dense residential, desert, farmland, forest, industrial, meadow, medium residential, mountain, park, parking, playground, pond, port, railway station, resort, river, school, sparse residential, square, stadium, storage tanks, and viaduct. There are a lot of similar features between these images.

The NWPU-RESISC45 data set is grouped into 45 classes, including airplane, airport, baseball diamond, basketball court, beach, bridge, chaparral, church, circular farmland, cloud, commercial area, dense residential, desert, forest, freeway, golf course, ground track field, harbor, industrial area, intersection, island, lake, meadow, medium residential, mobile home park, mountain, overpass, palace, parking lot, railway, railway station, rectangular farmland, river, roundabout, runway, sea ice, ship, snow berg, sparse residential, stadium, storage tank, tennis court, terrace, thermal power station, and wetland. The NWPU-RESISC45 data set is currently the largest data set with small class differences and large intraclass differences.

The overall accuracy and confusion matrix are used to evaluate the classification performance for this method. Overall accuracy refers to the score of correctly classified samples relative to all test samples, which is defined as [14]:

(9)

where m is the number of classes, R_i is the number of samples with correct classification of class i, and N is the total number of samples in the data set.

4.2. Experiment Setup

To facilitate comparison with other methods, two different training ratios are used for each data set. For the RSSCN7 data set and AID, the training ratios are fixed at 20% and 50%. For the NWPU-RESISC45 data set, the ratios are fixed at 10% and 20%. During training, two different methods are used to process the input images. For the global branch, the input images are resized to 224 × 224 and flipped horizontally at random. For the local branch, the input images are resized to 448 × 448. Besides, a total of 50 epochs are trained in this experiment. In the training process, Adam algorithm is selected as the optimizer, the initial learning rate is set to 1e − 4, and the attenuation is 0.1 every 20 epochs.

For reliable experimental results, we performed five experiments based on the RSSCN7, AID, and NWPU-RESISC45 data sets and calculate the mean value and standard deviation of the experimental results. All experiments are conducted on the open-source machine learning library PyTorch [24], and a GTX 1060Ti GPU is used for acceleration.

4.3. Experimental Results and Analysis

4.3.1. RSSCN7 Data Set

In order to verify the performance of the proposed method and find a satisfactory hyperparameter T, a great quantity of experiments based on this method are performed on the RSSCN7 data set. The results are shown in Table 2.

2. Comparison of overall accuracy (%) with different T on the RSSCN7 data set.

Methods	20% training			50% training
Methods	0.4	0.5	0.6	0.4	0.5	0.6
ResNet18	92.30	92.30	92.30	94.90	94.90	94.90
Ours	93.79	94.13	93.90	96.43	96.63	96.50

According to the results, compared with ResNet18 model, the accuracy of the proposed method is significantly improved. Meanwhile, a large number of experiments show that the size of the local region will directly affect the classification results. As can be seen from Table 2, when the threshold T is set to 0.5, the results are the best regardless of the training ratio. Therefore, in all experiments below, T is set to 0.5 by default.

Further, to analyze the spatial and temporal complexity of the methods presented in this paper, the size and test time of the model are calculated. It can be seen from the results in Table 3 that although the model size is twice as big as ResNet18, the test time is less than twice cost of ResNet18. Thus, the computational complexity of the single-object-based region growth algorithm is relatively low.

3. Size and test time of different models.

Methods	Model size	Test time (s)
ResNet18	89 MB	0.0097
Ours	89 × 2 MB	0.0151

In addition, the overall accuracy is compared with other ten methods. From Table 4, the classification overall accuracy of the proposed method has been significantly improved regardless of the training ratio. Compared with EfficientNetB3-attn [25], the classification accuracy of this method is improved by about 0.83% under the training ratio of 20% and 0.33% under the training ratio of 50%.

4. Overall accuracy (%) comparison with other methods on the RSSCN7 data set.

Methods	Training ratio	Training ratio
Methods	20%	50%
CaffeNet [26]	85.57 ± 0.95	88.25 ± 0.62
VGG-VD-16 [26]	83.98 ± 0.87	87.18 ± 0.94
ResNet50-TEX-Net-LF [27]	92.45 ± 0.45	94.00 ± 0.57
VGG-M-TEX-Net-EF-6 [27]	86.77 ± 0.76	89.61 ± 0.54
VGG-M-TEX-Net-EF-6 [27]	85.65 ± 0.79	88.70 ± 0.78
Fine-tune MobileNet V2 [28]	89.04 ± 0.17	92.46 ± 0.66
SE-MDPMNet [28]	92.65 ± 0.13	94.71 ± 0.15
Contourlet CNN [29]	—	95.54 ± 0.71
Dual Attention-Aware Net [30]	91.07 ± 0.65	93.25 ± 0.28
EfficientNetB3-attn [25]	93.30 ± 0.19	96.17 ± 0.23
Ours	94.13 ± 0.06	96.50 ± 0.11

Figure 5 shows the confusion matrix for the global-local two-branch model on the RSSCN7 data set with a training ratio of 50%. Among them, the blank space means “0.” From the figure, resident, industry, and parking are more likely to be misclassified. In addition, using the RSSCN7 data set as an example, we display the loss of the training procedure and the accuracy during the test procedure in Figure 6.

4.3.2. AID Data Set

Our method is compared with the others on the AID data set. Table 5 reports the classification accuracy of different methods. The experimental results show that the proposed method obtains the highest classification accuracy of 94.67% and 97.10% for 20% and 50% training ratios, respectively. Compared with other methods, the classification accuracy of this method is 0.31 higher than of dual attention-aware Net [30] for 20% training ratio and is 0.49 higher than of ResNet101+SENet [31] for 50% training ratio.

5. Overall accuracy (%) comparison with other method on the AID data set.

Methods	Training ratio	Training ratio
Methods	20%	50%
VGG-VD-16 [26]	86.59 ± 0.29	89.64 ± 0.36
VGG-TEX-Net-LF [27]	90.87 ± 0.11	92.96 ± 0.18
ResNet50-TEX-Net-LF [27]	93.81 ± 0.12	95.73 ± 0.16
Fine-tune MobileNet V2 [28]	94.13 ± 0.28	95.96 ± 0.27
Dual Attention-Aware Net [30]	94.36 ± 0.54	95.53 ± 0.30
EfficientNetB3 [25]	93.43 ± 0.33	95.37 ± 0.41
ResNet101+SENet [31]	93.69 ± 0.35	96.61 ± 0.21
CNN-CapsNet [32]	93.79 ± 0.13	96.32 ± 0.12
RADC-Net [33]	88.12 ± 0.43	92.35 ± 0.19
MG-CAP(Sqrt-E) [34]	93.34 ± 0.18	96.12 ± 0.12
Ours	94.67 ± 0.07	97.10 ± 0.09

When the training ratio is 50%, the confusion matrix of experimental results is displayed in Figure 7. As can be seen from the figure, resort and school are less than 90% accurate because they are easily misclassified as other scenarios. In addition, center, school, park, and square are all prone to misclassification. Finding the discriminative features between the classes is a key way to further improve the classification performance.

4.3.3. NWPU-RESISC45 Data Set

From Table 6, it can be seen the classification accuracy of our method is the highest compared with other advanced methods, which proves its validity. When the training ratio is 10%, the accuracy of this method is 90.71%, which is 0.54% higher than the second highest MF²Net [35]. When the training ratio is 20%, the accuracy of this method is 93.25%, which is 0.52% higher than MF²Net.

6. Overall accuracy (%) comparison with other method on NWPU-RESISC45 data set.

Methods	Training ratio	Training ratio
Methods	10%	20%
ResNet101 [31]	89.41 ± 0.16	92.52 ± 0.17
VGG-16-CapsNet [32]	85.08 ± 0.13	89.18 ± 0.14
Inception-v3-CapsNet [32]	89.03 ± 0.21	92.60 ± 0.11
RADC-Net [33]	85.72 ± 0.25	87.63 ± 0.28
MG-CAP(bilinear) [34]	89.42 ± 0.19	91.72 ± 0.16
Fine-tune VGG16 [36]	87.15 ± 0.45	90.36 ± 0.18
Fine-tune GoogLeNet [36]	82.57 ± 0.12	86.02 ± 0.18
GANet [37]	87.96 ± 0.23	91.36 ± 0.18
MF²Net [35]	90.17 ± 0.25	92.73 ± 0.21
ResNet34 + SFFM [38]	86.28 ± 0.34	91.11 ± 0.13
DS-CapsNet [39]	89.27 ± 0.22	91.62 ± 0.18
Ours	90.71 ± 0.13	93.25 ± 0.09

When the training rate is 20%, the confusion matrix of NWPU-RESISC45 data set is shown in Figure 8. The NWPU-RESISC45 data set contains a large number of remote sensing images with complex background, thus hardly substantially improving their classification accuracy. The results in the figure show that the scenarios with low classification accuracy have church, commercial area, dense residential, freeway, industrial area, medium residential, palace, river, wetland, and railway station. To further improve the classification accuracy, new solutions still need to be found.

5. Conclusion

In this paper, a single-object-based region growth is proposed to locate the most important region in remote sensing images. Further, the global-local two-branch network is designed for remote sensing scene image classification. Global branches extract texture and contour information from the whole image, and local branches can extract more discriminative fine-grained features. Two branches promote each other and can improve the problem of large-scale variation. The experimental results show the effectiveness of the proposed approach compared with other state-of-the-art methods on three widely used remote sensing data sets. In future work, the model should be further optimized. How to lighten the model while maintaining high accuracy is a problem that needs further research.

Conflicts of Interest

The authors declare that there is no conflict of interest regarding the publication of this paper.

Acknowledgments

This work was supported by the Major Project of University Natural Science Research of Anhui Province under Grant KJ2018ZD038 and the Domestic Visit and Training Program for the Youth Elite in Universities of Anhui Province under Grant gxgnfx2021175.

Open Research

Data Availability

The data used to support the findings of this research are available from the corresponding author upon request.

References

1 Uijlings J. R., Van De Sande K. E., Gevers T., and Smeulders A. W., Selective search for object recognition, International Journal of Computer Vision. (2013) 104, no. 2, 154–171.
10.1007/s11263-013-0620-5
Web of Science® Google Scholar
2 Zhang N., Donahue J., Girshick R., and Darrell T., Part-based R-CNNs for fine-grained category detection, European Conference on Computer Vision, 2014, Cham, 834–849.
Google Scholar
3 Zhang Y., Wei X. S., Wu J., Cai J., Lu J., Nguyen V. A., and Do M. N., Weakly supervised fine-grained categorization with part-based image representation, IEEE Transactions on Image Processing. (2016) 25, no. 4, 1713–1725, https://doi.org/10.1109/TIP.2016.2531289, 2-s2.0-84964246694, 26890872.
10.1109/TIP.2016.2531289
PubMed Web of Science® Google Scholar
4 Bhagavathy S. and Manjunath B. S., Modeling and detection of geospatial objects using texture motifs, IEEE Transactions on Geoscience and Remote Sensing. (2006) 44, no. 12, 3706–3715, https://doi.org/10.1109/TGRS.2006.881741, 2-s2.0-33845636912.
10.1109/TGRS.2006.881741
Web of Science® Google Scholar
5 Cheng G., Zhou P., Han J., Guo L., and Han J., Auto-encoder-based shared mid-level visual dictionary learning for scene classification using very high resolution remote sensing images, IET Computer Vision. (2015) 9, no. 5, 639–647, https://doi.org/10.1049/iet-cvi.2014.0270, 2-s2.0-84942017110.
10.1049/iet-cvi.2014.0270
Web of Science® Google Scholar
6 Risojević V. and Babić Z., Fusion of global and local descriptors for remote sensing image classification, IEEE Geoscience and Remote Sensing Letters. (2013) 10, no. 4, 836–840, https://doi.org/10.1109/LGRS.2012.2225596, 2-s2.0-84874662600.
10.1109/LGRS.2012.2225596
Web of Science® Google Scholar
7 Yang Y. and Newsam S., Bag-of-visual-words and spatial extensions for land-use classification, Proceedings of the 18th SIGSPATIAL international conference on advances in geographic information systems, 2010, San Jose California, 270–279.
Google Scholar
8 Chen S. and Tian Y., Pyramid of spatial relatons for scene-level land use classification, IEEE Transactions on Geoscience and Remote Sensing. (2014) 53, no. 4, 1947–1957.
10.1109/TGRS.2014.2351395
Web of Science® Google Scholar
9 Maggiori E., Tarabalka Y., Charpiat G., and Alliez P., Convolutional neural networks for large-scale remote-sensing image classification, IEEE Transactions on Geoscience and Remote Sensing. (2016) 55, no. 2, 645–657.
10.1109/TGRS.2016.2612821
Web of Science® Google Scholar
10 Zou Z. and Shi Z., Ship detection in spaceborne optical image with SVD networks, IEEE Transactions on Geoscience and Remote Sensing.(2016) 54, no. 10, 5832–5845, https://doi.org/10.1109/TGRS.2016.2572736, 2-s2.0-84975263405.
10.1109/TGRS.2016.2572736
Web of Science® Google Scholar
11 Muller M., Bibi A., Giancola S., Alsubaihi S., and Ghanem B., TrackingNet: a large-scale dataset and benchmark for object tracking in the wild, Proceedings of the European Conference on Computer Vision (ECCV), 2018, Munich, Germany, 300–317.
Google Scholar
12 Zhao W., Du S., and Emery W. J., Object-based convolutional neural network for high-resolution imagery classification, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing. (2017) 10, no. 7, 3386–3396.
10.1109/JSTARS.2017.2680324
Web of Science® Google Scholar
13 Li E., Xia J., du P., Lin C., and Samat A., Integrating multilayer features of convolutional neural networks for remote sensing scene classification, IEEE Transactions on Geoscience and Remote Sensing. (2017) 55, no. 10, 5653–5665, https://doi.org/10.1109/TGRS.2017.2711275, 2-s2.0-85023777322.
10.1109/TGRS.2017.2711275
Web of Science® Google Scholar
14 Xue W., Dai X., and Liu L., Remote sensing scene classification based on multi-structure deep features fusion, IEEE Access. (2020) 8, 28746–28755, https://doi.org/10.1109/ACCESS.2020.2968771.
10.1109/ACCESS.2020.2968771
Web of Science® Google Scholar
15 Wang W., Shen J., Yang R., and Porikli F., Saliency-aware video object segmentation, IEEE Transactions on Pattern Analysis and Machine Intelligence. (2017) 40, no. 1, 20–33.
10.1109/TPAMI.2017.2662005
PubMed Web of Science® Google Scholar
16 Wu H., Zhang L., and Ma J., Remote sensing image super-resolution via saliency-guided feedback GANs, IEEE Transactions on Geoscience and Remote Sensing. (2020) 60, 1–16.
10.1109/TGRS.2020.3042515
Web of Science® Google Scholar
17 Lai B. and Gong X., Saliency guided end-to-end learning for weakly supervised object detection, arXiv preprint arXiv. (2017) 1706, article 06768.
Google Scholar
18 Itti L., Koch C., and Niebur E., A model of saliency-based visual attention for rapid scene analysis, IEEE Transactions on Pattern Analysis and Machine Intelligence. (1998) 20, no. 11, 1254–1259.
10.1109/34.730558
Web of Science® Google Scholar
19 Yan Q., Xu L., Shi J., and Jia J., Hierarchical saliency detection, 2013, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2013, Portland, OR, USA, 1155–1162.
Google Scholar
20 Li G. and Yu Y., Visual saliency based on multiscale deep features, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, Boston, MA, USA, 5455–5463.
Google Scholar
21 Zhang P., Wang D., Lu H., Wang H., and Ruan X., Amulet: aggregating multi-level convolutional features for salient object detection, Proceedings of the IEEE international conference on computer vision, 2017, Venice, Italy, 202–211.
Google Scholar
22 Wei X. S., Luo J. H., Wu J., and Zhou Z. H., Selective convolutional descriptor aggregation for fine-grained image retrieval, IEEE Transactions on Image Processing. (2017) 26, no. 6, 2868–2881, https://doi.org/10.1109/TIP.2017.2688133, 2-s2.0-85018774944.
10.1109/TIP.2017.2688133
PubMed Web of Science® Google Scholar
23 He K., Zhang X., Ren S., and Sun J., Deep residual learning for image recognition, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, Las Vegas, NV, USA, 770–778.
Google Scholar
24 Paszke A., Gross S., Massa F., Lerer A., Bradbury J., Chanan G., Killeen T., Lin Z., Gimelshein N., Antiga L., Desmaison A., Kopf A., Yang E., DeVito Z., Raison M., Tejani A., Chilamkurthy S., Steiner B., Fang L., Bai J., and Chintala S., PyTorch: an imperative style, high-performance deep learning library, Advances in Neural Information Processing Systems. (2019) 32, 8026–8037.
Google Scholar
25 Alhichri H., Alswayed A. S., Bazi Y., Ammour N., and Alajlan N. A., Classification of remote sensing images using EfficientNet-B3 CNN model with attention, IEEE Access. (2021) 9, 14078–14094, https://doi.org/10.1109/ACCESS.2021.3051085.
10.1109/ACCESS.2021.3051085
Web of Science® Google Scholar
26 Xia G. S., Hu J., Hu F., Shi B., Bai X., Zhong Y., Zhang L., and Lu X., AID: a benchmark data set for performance evaluation of aerial scene classification, IEEE Transactions on Geoscience and Remote Sensing. (2017) 55, no. 7, 3965–3981, https://doi.org/10.1109/TGRS.2017.2685945, 2-s2.0-85018642692.
10.1109/TGRS.2017.2685945
Web of Science® Google Scholar
27 Anwer R. M., Khan F. S., van de Weijer J., Molinier M., and Laaksonen J., Binary patterns encoded convolutional neural networks for texture recognition and remote sensing scene classification, ISPRS Journal of Photogrammetry and Remote Sensing. (2018) 138, 74–85, https://doi.org/10.1016/j.isprsjprs.2018.01.023, 2-s2.0-85042190618.
10.1016/j.isprsjprs.2018.01.023
Web of Science® Google Scholar
28 Zhang B., Zhang Y., and Wang S., A lightweight and discriminative model for remote sensing scene classification with multidilation pooling module, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing. (2019) 12, no. 8, 2636–2653, https://doi.org/10.1109/JSTARS.2019.2919317, 2-s2.0-85072633447.
10.1109/JSTARS.2019.2919317
Web of Science® Google Scholar
29 Liu M., Jiao L., Liu X., Li L., Liu F., and Yang S., C-CNN: contourlet convolutional neural networks, IEEE Transactions on Neural Networks and Learning Systems. (2021) 32, no. 6, 2636–2649.
10.1109/TNNLS.2020.3007412
PubMed Google Scholar
30 Gao Y., Shi J., Li J., and Wang R., Remote sensing scene classification with dual attention-aware network, 2020 IEEE 5th International Conference on Image, Vision and Computing (ICIVC), 2020, Beijing, China, 171–175.
Google Scholar
31 Hu J., Shen L., and Sun G., Squeeze-and-excitation networks, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, Salt Lake City, UT, USA, 7132–7141.
Google Scholar
32 Zhang W., Tang P., and Zhao L., Remote sensing image scene classification using CNN-CapsNet, Remote Sensing. (2019) 11, no. 5, https://doi.org/10.3390/rs11050494, 2-s2.0-85062947435.
10.3390/rs11050494
PubMed Google Scholar
33 Bi Q., Qin K., Zhang H., Li Z., and Xu K., RADC-Net: a residual attention based convolution network for aerial scene classification, Neurocomputing. (2020) 377, 345–359, https://doi.org/10.1016/j.neucom.2019.11.068.
10.1016/j.neucom.2019.11.068
Web of Science® Google Scholar
34 Wang S., Guan Y., and Shao L., Multi-granularity canonical appearance pooling for remote sensing scene classification, IEEE Transactions on Image Processing. (2020) 29, 5396–5407, https://doi.org/10.1109/TIP.2020.2983560, 32248105.
10.1109/TIP.2020.2983560
PubMed Web of Science® Google Scholar
35 Xu K., Huang H., Li Y., and Shi G., Multilayer feature fusion network for scene classification in remote sensing, IEEE Geoscience and Remote Sensing Letters. (2020) 17, no. 11, 1894–1898, https://doi.org/10.1109/LGRS.2019.2960026.
10.1109/LGRS.2019.2960026
Web of Science® Google Scholar
36 Cheng G., Han J., and Lu X., Remote sensing image scene classification: benchmark and state of the art, Proceedings of the IEEE. (2017) 105, no. 10, 1865–1883, https://doi.org/10.1109/JPROC.2017.2675998, 2-s2.0-85017152027.
10.1109/JPROC.2017.2675998
Web of Science® Google Scholar
37 Guo Y., Ji J., Lu X., Huo H., Fang T., and Li D., Global-local attention network for aerial scene classification, IEEE Access. (2019) 7, 67200–67212.
10.1109/ACCESS.2019.2918732
Google Scholar
38 Li M., Lei L., Li X., Sun Y., and Kuang G., An adaptive multilayer feature fusion strategy for remote sensing scene classification, Remote Sensing Letters. (2021) 12, no. 6, 563–572, https://doi.org/10.1080/2150704X.2021.1899328.
10.1080/2150704X.2021.1899328
Web of Science® Google Scholar
39 Wang C., Wu Y., Wang Y., and Chen Y., Scene recognition using deep softpool capsule network based on residual diverse branch block, Sensors. (2021) 21, no. 16, https://doi.org/10.3390/s21165575, 34451017.
10.3390/s21165575
PubMed Web of Science® Google Scholar

All articles

Single-Object-Based Region Growth: Key Area Localization Model for Remote Sensing Image Scene Classification

Abstract

1. Introduction

2. Related Work

2.1. Remote Sensing Image Scene Classification

2.2. Salient Object Detection

3. Proposed Method

3.1. Baseline

3.2. Extract Local Key Areas

3.2.1. Aggregate Mapping

3.2.2. Single-Object-Based Region Growth Algorithm

3.2.3. Local Area Sampling

3.3. Visualization of Single-Object-Based Region Growth

3.4. Classification

4. Experiments

4.1. Data Sets and Evaluation Metric

4.2. Experiment Setup

4.3. Experimental Results and Analysis

4.3.1. RSSCN7 Data Set

4.3.2. AID Data Set

4.3.3. NWPU-RESISC45 Data Set

5. Conclusion

Conflicts of Interest

Acknowledgments

Open Research

Data Availability

References

Figures

References

Information

About Wiley Online Library

Help & Support

Opportunities

Connect with Wiley