Volume 2022, Issue 1 3029866
Research Article
Open Access

A Semantic Image Retrieval Method Based on Interest Selection

Wenting Hu

Wenting Hu

Business College, Jiangsu Open University, Nanjing 210036, China jsou.cn

Business College, Nanjing University, Nanjing 210093, China nju.edu.cn

Search for more papers by this author
Yin Sheng

Yin Sheng

College of Internet of Things Engineering, Hohai University, Changzhou 213022, China hhu.edu.cn

Search for more papers by this author
Xianjun Zhu

Corresponding Author

Xianjun Zhu

School of Software Engineering, Jinling Institute of Technology, Nanjing 211169, China jit.edu.cn

Search for more papers by this author
First published: 27 February 2022
Citations: 1
Academic Editor: Narasimhan Venkateswaran

Abstract

There is a semantic gap between people’s understanding of images and the underlying visual features of images, which makes it difficult for image retrieval results to meet the needs of individual interests. To overcome the semantic gap in image retrieval, this paper proposes a semantic image retrieval method based on interest selection. This method analyses the interest points of individual selections and gives the weight of the interest selection in different regions of an image. By extracting the underlying visual features of different regions, this paper constructs two feature vector methods after users’ interest point weighting. The two methods are called interest weighted summation and interest weighting. Finally, this paper compares the accuracy of different image classification methods using a support vector machine classification algorithm. The experimental results show that the target classification accuracy of the classification algorithm based on interest weighted summation is higher than that of the traditional and interest weighted methods. The classification algorithm based on interest weighted summation has the highest overall effect on target object classification in the four experimental scenarios. Therefore, the interest point selection method can effectively improve the overall satisfaction of image recommendation and can be used as a novel solution to overcome the semantic gap.

1. Introduction

With the continuous evolution of artificial intelligence technologies, such as computer vision, speech semantics, and machine learning, society is entering a new era of intelligence. This will cause the transformation of image retrieval modes and reshape the process experience of information retrieval to promote the intelligent upgrading and functional reconstruction of traditional information retrieval.

Images are the most important and effective method for human beings to obtain information because images are intuitive, comprehensible, and informative. In fact, an individual’s understanding of an image not only is based on the visual similarity but also requires the semantic similarity of the image. Image processing algorithms are often used to extract underlying visual features, which cannot be used to fully describe the semantic information of an image [1]. People not only understand images through their accumulated experience, knowledge, and personal preferences from daily life but also understand images through a cognitive mode of thinking from a semantic perspective. This easily leads to a semantic gap between the image semantics and the underlying visual features. Relevant studies try to mimic the human visual attention mechanism to fundamentally solve the semantic gap problem [24]. The purpose of filtering redundant information from a large amount of visual information is to find useful information and obtain the high-level semantics of the image. Most of the methods were designed to model visual attention and have been evaluated by their congruence with fixation data obtained from experiments with eye gaze trackers. On the one hand, progress has been made in the construction of visual computing models to simulate the human visual attention mechanism, but these models are still in the simulation stage of human viewing scenes [57]. On the other hand, researchers use eye-tracking technology to obtain eye movement behaviour to depict the human visual attention mechanism [811]. It is difficult to collect individual eye movement information in large quantities because of the high cost and weak popularity of eye trackers.

In addition to the traditional visual attention calculation model and novel eye-tracking technology, scholars have tried to use other means to reflect and measure human visual attention phenomena, such as motion trajectory models and the click behaviour of a mouse cursor. Related research shows that users’ attention behaviour is related to the mouse cursor trajectory. Users’ interest selections are highly correlated with visual attention, and interest selection can be used to predict the location of the fixation point more accurately than the visual calculation model [12, 13]. Therefore, this paper takes users’ interest points as feedback information to study the problem of interest point weighting in image semantics. Finally, the algorithm is used to study the accuracy of the image retrieval results fused with interest selection.

2. Classification Method Based on Interest Selection

2.1. Feature Vector Based on Interest Point Weighting

2.1.1. Weight Matrix

This paper collects interest data with a click experiment to obtain interest weight given by the grid sampling object region. We suppose that xiRn(i = 1,2, …, n) is a set of interest data collected from the click experiment. Λ = {C1, C2, …, CT} represents the object area after the object of the experimental scene was sampled by the uniform grid, and t = 1,2, …, T. T = |Λ| represents the number of grid sampling objects in the experimental scene. The expression of the users’ interests in the grid objects of the experimental scene Ct is the sum of the weights of the points of interest:
(1)
where m represents the number of interest points that are divided into object Ct in the experimental scene. represents the value of interest point xi in object Ct in the experimental scene, where the value of is 1. ki represents the weight value given to according to different interest point weights, and 0 ≤ ki ≤ 1.
The one-dimensional matrix of the user’s interest in the experimental scene grid object is obtained and expressed as P = [P1, P2, …, PT]. In this paper, the weight value given by the experimental scene grid object is described by the interest degree of the experimental scene grid object, and the weight value ωt corresponding to object Ct in the experimental scene can be expressed as
(2)

Therefore, the one-dimensional weight matrix [ω1, ω2, …, ωT] is obtained to provide weight to the eigenvector of the grid object Ct.

2.1.2. Weighted Eigenvector

In this paper, the set of image eigenvectors extracted from grid objects in the experimental scene is expressed as Y = {yt|t = 1,2, …, T}, where yt is the underlying visual feature of the grid object Ct. On this basis, HSV (hue, saturation, and value) and LBP (local binary pattern) are combined to describe the underlying visual features of the grid objects, and a 1 × 131 dimensional visual eigenvector is obtained. This paper uses two methods to express the weighted eigenvector as follows.

The formula of the IWS (interest weighted summary) method is expressed as
(3)
The formula for the IW (interest weighting) method is expressed as
(4)

The formulas of u(Y) and express the two weighted underlying visual eigenvectors and allow them to be studied with the classification algorithm to explore the impact of interest decisions on the accuracy of image classification.

2.2. Classification Algorithm

The SVM (support vector machine) is a supervised learning method that can be widely used in statistical classification and regression analysis [1416]. The initial appearance of SVM comes from a linear classifier. Suppose there is a two-class classification problem, and the data points are set as n dimensional vectors x of categories y, where the value of y is +1 or −1. If f(x) = ωTx + b exists, y equals +1 and −1, which are separated on both sides of f(x). It can be assumed that the y value corresponding to x in f(x) < 0 equals −1, while the y value corresponding to x in f(x) > 0 is +1. In most cases, the data are not always linearly separable, so we need to consider how to solve this problem. The SVM method maps the vector to a higher dimensional space, in which a maximum interval hyperplane is established. On both sides of the hyperplane, which separates data, there are two parallel hyperplanes. The optimization goal of separating hyperplanes is to maximize the distance between the two parallel hyperplanes.

To maximize the distance, the hyperplane can be vertically projected onto the corresponding point x0 on the hyperplane for a point x. Then, the distance between x and x0 can be defined as
(5)
Since x0 is a point on the hyperplane that satisfies f(x0) = 0, it can be obtained by substituting x0 into the hyperplane equation f(x) = ωTx + b:
(6)
Since γ is signed at this time, the absolute value is required. It needs to be multiplied by the category of y:
(7)
Therefore, the objective function required by SVM is , according to the constraint of . can be fixed to find the maximum value. The objective function becomes
(8)
We make a slight adjustment to the above formula, and then the function becomes
(9)
For the above problem, the Lagrange multiplier is used to calculate the extremum. The Lagrange equation is
(10)
Then, after the partial derivations of ω, b, λ, we can obtain
(11)
Next, L is substituted; then, the problem becomes
(12)
Finally, is substituted into f(x); then, we can obtain
(13)

The above formula is the application of SVM in image classification by integrating the choice of interest.

3. Point of Interest Experiment

3.1. Purpose

According to the experimental requirements, the subjects needed to click on five points in the experimental picture that they were interested in, and the clicked positions represented the points of interest selected by the subjects. Based on this, the weighted eigenvector is brought into the classification algorithm to study whether the eigenvector based on points of interest has an impact on the accuracy of image classification.

3.2. Subjects

A total of 42 subjects (30 males and 12 females), aged 21 to 25, with an average age of 23, were invited. They had never participated in similar experiments before. They are right-handed and had a strong understanding of the experiment and completed the experiment well according to the requirements of the experimenter.

3.3. Experimental Equipment

The experiment used a desktop computer with a Dell OptiPlex 790 and a CPU frequency of 31 GHz. The capacity of the hard disk is 500 GB, and it runs on a Windows XP operating system.

3.4. Experimental Materials

In the experiment, four kinds of pictures, including kitten, puppy, motorcycle, and car pictures, in the PASCAL VOC2007 database were selected. Fifteen pictures of each kind were randomly selected to form a material library of 60 pictures, and the pictures were numbered from 01 to 60 consecutively. According to the specific requirements and specifications of the experiment, the experimental pictures were uniformly adjusted to 500 × 375 pixels. Figure 1 shows examples of the pictures used in the experiments.

Details are in the caption following the image
Examples of experimental pictures. (a) Picture of a motorcycle. (b) Picture of a car. (c) Picture of a puppy. (d) Picture of a kitten.
Details are in the caption following the image
Examples of experimental pictures. (a) Picture of a motorcycle. (b) Picture of a car. (c) Picture of a puppy. (d) Picture of a kitten.
Details are in the caption following the image
Examples of experimental pictures. (a) Picture of a motorcycle. (b) Picture of a car. (c) Picture of a puppy. (d) Picture of a kitten.
Details are in the caption following the image
Examples of experimental pictures. (a) Picture of a motorcycle. (b) Picture of a car. (c) Picture of a puppy. (d) Picture of a kitten.

3.5. Experimental Results and Analysis

3.5.1. Data Screening and Description

Each subject completed an experimental task with 30 pictures by selecting 5 points of interest from each picture; eventually, an average number of 105 interest points for each picture were obtained. According to the inquiry and investigation after the experiment, interest points that were incorrectly chosen by subjects were eliminated from the experiment to ensure the objectivity and accuracy of the experiment. After removing the abnormal data, the interest point coordinates of each experimental image were derived from the screen coordinates and transformed into 500 × 375 pixel coordinates by a mathematical transformation. Therefore, we ensured that the image coordinates were within the pixel range, in which the value of the X axis ranged from 0 to 500 and that of the Y axis ranged from 0 to 375.

Figure 2 describes the effective distribution of the interest points of all subjects on the example pictures of motorcycles, cars, puppies, and kittens, in which the first points of interest selected by the subjects are marked with red asterisks, and the second to fifth points of interest are marked with blue dots. Figure 2 shows that the interest points are mainly distributed on the target object or foreground object rather than the picture background. The distributions of the interest points on the motorcycle and car target objects are relatively uniform, and the interest points on the dog and cat target objects are relatively concentrated. In particular, the first interest points on the puppy and cat pictures are mainly distributed on the faces.

Details are in the caption following the image
Distributions of points of interest on experimental pictures. (a) Interest points on a motorcycle. (b) Interest points on a car. (c) Interest points on a puppy. (d) Interest points on a kitten.
Details are in the caption following the image
Distributions of points of interest on experimental pictures. (a) Interest points on a motorcycle. (b) Interest points on a car. (c) Interest points on a puppy. (d) Interest points on a kitten.
Details are in the caption following the image
Distributions of points of interest on experimental pictures. (a) Interest points on a motorcycle. (b) Interest points on a car. (c) Interest points on a puppy. (d) Interest points on a kitten.
Details are in the caption following the image
Distributions of points of interest on experimental pictures. (a) Interest points on a motorcycle. (b) Interest points on a car. (c) Interest points on a puppy. (d) Interest points on a kitten.

3.5.2. Data Results and Analysis

First, object classification research is carried out according to the given standards for the four kinds of pictures of kittens, puppies, motorcycles, and cars. When studying the classification of a specific target, this paper takes one kind of picture as the target object and the other three kinds of pictures as interference objects. On this basis, the two classifications of sample objects are realized by combining the non-points of interest method, the IW method, and the IWS method with the SVM algorithm, and these methods are named SVM, IW-SVM, and IWS-SVM, respectively.

Second, in the case of different numbers of uniform grid object segmentations, the IW-SVM and IWS-SVM methods are used to study the average accuracy of target object classification. With the increasing number of regions obtained by uniform grid segmentation from 2 × 2 to 8 × 8, the average classification accuracies of the target objects of the two methods show a decreasing trend, indicating that the improvement in classification accuracy of the target objects is not consistent with the increase in the number of segmented regions. The study found that the average accuracy, which reached 0.8, was the highest in the case of a 3 × 3 segmented mesh.

Finally, according to the classification experimental results of the four kinds of target objects, this paper selects the 3 × 3 segmentation grid to explore the accuracy of target object classification in the four types of experimental scenes. We then randomly generate weights from a group of subject interest points. For example, the weights of the first to fifth interest points are [0.2, 0.2, 0.2, 0.2, 0.2]. This study finds that the weights of the first to the fifth interest points are [0.4, 0.3, 0.1, 0.1, 0.1], and the average accuracy of the target object classification result is the highest. It shows that the first and second interest points of the subjects better reflect the individuals’ intentions and needs, indicating that the first and second interest points selected by the subjects have high research significance and value for target object classification. Table 1 describes accuracies of classification algorithms obtained by the SVM, IW-SVM, and IWS-SVM methods, which are 0.78, 0.75, and 0.85, respectively. Among them, the IWS-SVM method has the highest average accuracy. This shows that the addition of the IWS method can provide high-level semantics to the target object and improve the accuracy of target classification. In this study, we introduce the interest point selection method which can improve the overall satisfaction which can be found in Table 1.

Table 1. Accuracy of target object classification.
Target objects SVM IW-SVM IWS-SVM
Kitten 0.80 0.74 0.85
Puppy 0.85 0.78 0.90
Motorcycle 0.70 0.72 0.84
Car 0.75 0.76 0.81
Average 0.78 0.75 0.85

4. Conclusion

To overcome the research disadvantages of visual attention computational models and eye-tracking technology in optimizing image recommendations, this paper proposes an image classification method based on interest selection. We focus on explaining the eigenvector of weighted interest points and complete relevant click experiments to realize the classification of experimental scene objects. The experimental results show that (1) the subjects’ first and second choices of interest have a great impact on the target classification in the experimental scenes; (2) the IWS-SVM method has the best overall effect on the target object classification in the four kinds of experimental scenes; (3) the accuracy of target classification combined with the IWS classification algorithm is higher than that of the traditional methods and the IW method; and (4) the interest point method can effectively improve image information retrieval. Our results have shown that the interest point selection can be used as a novel solution to overcome the semantic gap. Therefore, future work could use other information (e.g., eye movements and electroencephalogram) to improve the overall satisfaction of image recommendation.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This study was supported by the National Natural Science Foundation of China (no. 61903346), Postdoctoral Research Project in Jiangsu Province (nos. 2020Z034 and 2019K086), Natural Science Research Project of Colleges and Universities in Jiangsu Province (no. 18KJB520007), China Postdoctoral Science Foundation (no. 2020T130129ZX), and Research Project of Philosophy and Social Sciences at Jiangsu Universities (no. 2020SJA0767).

    Data Availability

    The images used to support the findings of this study are available from the corresponding author upon request.

      The full text of this article hosted at iucr.org is unavailable due to technical difficulties.