International Journal of Intelligent Systems

Volume 2025, Issue 1 8819489

Research Article

Open Access

A Protecting-Privacy Path Query Supporting Semantic-Based Multikeyword Search Over Ciphertext Graph Data in Cloud Computing

Bin Wu,

Bin Wu

orcid.org/0000-0003-4848-7316

School of Computer and Big Data Science , Jiujiang University , Jiujiang , Jiangxi, 332005 , China , jju.edu.cn

Jiujiang Key Laboratory of Network and Information Security , Jiujiang , Jiangxi, 332005 , China

Search for more papers by this author

Zhuolin Mei,

Zhuolin Mei

orcid.org/0000-0002-8941-0844

School of Computer and Big Data Science , Jiujiang University , Jiujiang , Jiangxi, 332005 , China , jju.edu.cn

Jiujiang Key Laboratory of Network and Information Security , Jiujiang , Jiangxi, 332005 , China

Search for more papers by this author

Jiaoli Shi,

Jiaoli Shi

orcid.org/0000-0002-8908-1921

School of Computer and Big Data Science , Jiujiang University , Jiujiang , Jiangxi, 332005 , China , jju.edu.cn

Search for more papers by this author

Zongmin Cui,

Zongmin Cui

orcid.org/0000-0001-5592-425X

School of Computer and Big Data Science , Jiujiang University , Jiujiang , Jiangxi, 332005 , China , jju.edu.cn

Search for more papers by this author

Zhiqiang Zhao,

Zhiqiang Zhao

orcid.org/0000-0001-6627-9807

School of Mathematics and Computer Science , Ningxia Normal University , Guyuan , Ningxia, 756099 , China , nxtu.cn

Search for more papers by this author

Jinzhou Huang,

Corresponding Author

Jinzhou Huang

[email protected]

orcid.org/0009-0001-4217-6894

School of Computer Engineering , Hubei University of Arts and Science , Xiangyang , Hubei, 441053 , China , hbuas.edu.cn

Search for more papers by this author

Bin Wu,

Bin Wu

orcid.org/0000-0003-4848-7316

School of Computer and Big Data Science , Jiujiang University , Jiujiang , Jiangxi, 332005 , China , jju.edu.cn

Jiujiang Key Laboratory of Network and Information Security , Jiujiang , Jiangxi, 332005 , China

Search for more papers by this author

Zhuolin Mei,

Zhuolin Mei

orcid.org/0000-0002-8941-0844

School of Computer and Big Data Science , Jiujiang University , Jiujiang , Jiangxi, 332005 , China , jju.edu.cn

Jiujiang Key Laboratory of Network and Information Security , Jiujiang , Jiangxi, 332005 , China

Search for more papers by this author

Jiaoli Shi,

Jiaoli Shi

orcid.org/0000-0002-8908-1921

School of Computer and Big Data Science , Jiujiang University , Jiujiang , Jiangxi, 332005 , China , jju.edu.cn

Search for more papers by this author

Zongmin Cui,

Zongmin Cui

orcid.org/0000-0001-5592-425X

School of Computer and Big Data Science , Jiujiang University , Jiujiang , Jiangxi, 332005 , China , jju.edu.cn

Search for more papers by this author

Zhiqiang Zhao,

Zhiqiang Zhao

orcid.org/0000-0001-6627-9807

School of Mathematics and Computer Science , Ningxia Normal University , Guyuan , Ningxia, 756099 , China , nxtu.cn

Search for more papers by this author

Jinzhou Huang,

Corresponding Author

Jinzhou Huang

[email protected]

orcid.org/0009-0001-4217-6894

School of Computer Engineering , Hubei University of Arts and Science , Xiangyang , Hubei, 441053 , China , hbuas.edu.cn

Search for more papers by this author

First published: 15 May 2025

https://doi.org/10.1155/int/8819489

Academic Editor: Songpon Teerakanok

Share a link

Email
Wechat
Bluesky

Abstract

With the rapid development of information technology and intelligence, there are more and more usage scenarios of graph data. Path queries have always been a hot topic of research for scholars. There are already many mature methods and ideas in the study of the path query on the plaintext graph. For the path query on graph data in the case of cloud outsourcing, it is necessary to consider both the construction of query algorithms and the protection of privacy information. The processing of graph privacy information through encryption, and then outsourcing to the cloud platform, is a common measure. The semantic-based path query supporting multikeyword search is an extended path query method, which can improve the query function. For users, it is very troublesome to access and process the encrypted graph data. In this article, we propose a protecting-privacy semantic-based multikeyword path query scheme on the ciphertext graph (PSMP). Firstly, based on the principle of searchable encryption and vector operation, a secure index is constructed, and then the cloud server uses the secure index to implement path queries. This article demonstrates its security through formal analysis and verifies its effectiveness through experimental comparison and analysis. The work of this article has a certain promoting effect on the query processing and analysis of ciphertext graph data.

1. Introduction

The emergence of cloud computing and massive graph data has brought a lot of convenience and more choices to people’s lives and work. Cloud computing platforms can bring outsourcing services and powerful computing and processing capabilities [1–3]. The owner of graph data can use cloud computing platforms to store and process this large amount of data, which is not only a cost saving but also a better choice for the convenience of using this data [4, 5]. Graph data are very commonly used in practical applications. Graph vertices can represent entity information, and the edges between vertices contain the interrelationships between entities. The graph data contain a lot of sensitive information. For example, there are user relationships and personal information in social networks, gene and patient information in the biomedical graph, and transaction data information in the financial transaction graph. If these data are released to the cloud platform without security processing, sensitive information will be leaked to the cloud server, which will bring the risk of privacy disclosure [6, 7]. Therefore, for security reasons, these graph data need to undergo certain security considerations and operations before being published on cloud platforms. Encrypting sensitive data is one of the important and commonly used methods [8, 9]. Another issue that comes with it is that accessing and using these encrypted graph data on cloud platforms is very inconvenient. Therefore, it is very urgent and important to directly query and process ciphertext graph data on cloud platforms.

In the use of graph data, the path query is a very common and important operation, which is an important basic primitive of graph data processing [10, 11]. The path query provides support for research on the shortest path or path optimization problem in graph data usage [11, 12]. The path query supporting semantic-based multikeyword search can bring users a more flexible query range and more choices. For example, in urban network graph data, a path query research is used to determine locations with connectivity relationships. The vertex represents two locations in the graph, and the edge between the vertices represents the direct connection path between the two locations. The semantic-based path query that supports multikeyword search operations is to complete the query request with semantic multikeyword joint expressions and return the query results that meet the conditions to the user [13, 14]. To achieve the path query, we use the stem extraction algorithm, the vector processing principle, and the encryption mechanism to construct query indexes and query symbols [15, 16]. At the same time, considering the cost and overhead factors, it is very necessary and valuable to study the semantic-based path query supporting multikeyword search on the ciphertext graph in cloud computing.

For query processing operations in cloud outsourcing scenarios, searchable encryption is a practical and effective cryptographic primitive [17–21]. In cloud outsourcing data query and processing, the outsourced data are securely processed, and a secure index is created. The cloud server can perform queries and other operations without knowing the relevant privacy information of the outsourced data. The searchable encryption has been developing and improving continuously since its initial appearance, which provides an important solution for remote outsourcing data query and processing. In the subsequent research, many researchers have extended and expanded the searchable encryption scheme [22–25]. But these existing searchable encryption schemes cannot operate directly on path queries on encrypted graph data. Meanwhile, many researchers have proposed some query schemes related to ciphertext graph data [26–30]. Cao et al. first proposed and solved the subgraph query problem on the outsourced graph data and ensured its security [26]. The literature [27] first proposed structured encryption, which can perform query operations on any structured data such as graph data. The problem of the shortest path query on the encryption graph was studied in the literature [28], and the security information was protected at the same time. The literature [29] and the literature [30] study the privacy-preserving graph similarity query problem and aggregate the query over public property graphs, respectively. However, the existing solutions cannot be used to handle the path query problem supporting semantic-based multikeyword search on ciphertext graph in cloud computing.

To address the problem in the outsourcing graph, we propose a solution to implement a protecting-privacy semantic-based multikeyword path query on ciphertext graph in cloud computing (PSMP). In the PSMP scheme, the server on the remote cloud platform operates on the semantic-based multikeyword path query on the ciphertext graph through a secure index and a query token and does not have access to the relevant privacy information. We first perform stem extraction and orthonormal vector transformation on graph vertices. Next, the secure query index and the encrypted query expression are constructed based on the newly generated vertex vector. The cloud server finally realizes the privacy-protecting path query with the help of the index and secure expression. After the cloud server completes the search, the obtained search results are sent to the query user, and the cloud server cannot obtain the privacy contents related to the query results. This article demonstrates the security and effectiveness of the proposed PSMP scheme through formal analysis and experimental comparison. The contributions of this article are summarized as follows:

1.
In the cloud outsourcing environment, a novel solution-dubbed PSMP is presented to protect the privacy information.
2.
The effective use of stem extraction and vector operations in graph vertex processing improves the efficiency of path queries in ciphertext.
3.
The construction of the path query method in ciphertext and the effective combination of searchable encryption mechanisms have improved the efficiency of the query and ensured their security.

The rest of the article is as follows: Section 2 introduces the work related to ciphertext graph queries. Section 3 proposes and designs the PSMP solution. Section 4 provides a security analysis of the PSMP solution. Afterward, we evaluated the effectiveness of our solution through experimental comparison and analysis in Section 5. Finally, Section 6 summarizes the article.

2. Related Work

The wide existence and deep use of cloud services in various industries have brought great convenience to people’s lives and production. It provides the possibility and opportunity for the outsourcing of large amounts of data [31–33]. Meanwhile, the security of these outsourced data is an important issue to consider; otherwise, sensitive data information will be leaked [34, 35]. The hidden danger of outsourcing private data brought by cloud servers and external attackers makes people attach great importance to the security of outsourcing data. This has also attracted many researchers to conduct in-depth research on this security issue, and a lot of research results have appeared [36–38]. Searchable encryption among them is one of the outstanding research achievements. There are two modes of symmetric searchable encryption (SSE) and asymmetric searchable encryption (ASE) based on the different ways of encryption construction [17, 18, 20, 21]. The asymmetric encryption scheme has slow encryption speeds and high computational costs, while the symmetric encryption scheme is efficient and fast, which is more suitable for encrypting large amounts of data. Considering the efficiency of operation, this article adopts the SSE mechanism.

The searchable encryption is an effective and important cryptographic primitive in remote outsourced data retrieval services. A lot of related research results continue to appear, which provides valuable ideas and guidelines for the ciphertext query under cloud outsourcing [17–21]. Song et al. first proposed the concept of searchable encryption to solve the query problem on outsourced data [17]. The idea of index construction was proposed in the query of remote data by using Bloom filters in the literature [18]. Subsequently, based on the idea of constructing indexes, a more efficient SSE scheme was proposed in the literature [20], and the privacy guarantee of nonadaptive SSE and adaptive SSE was analyzed. Curtmola et al. firstly proposed a searchable encryption scheme based on public key cryptography, which promoted the development of searchable encryption primitives [21]. With the increase of demand and the development of technology, many searchable encryption schemes with more functions have emerged [22–25]. Baek et al. improved and expanded the design method and index construction based on the literature [20] and proposed a dynamic SSE scheme [22]. Xu et al. presented a dynamic multikeyword ranked search scheme over encrypted cloud data by designing the tree-based index structure and the greedy breadth first algorithm [23]. Zhong et al. proposed a multikeyword fuzzy query scheme supporting dynamic update by using the Bloom filter and unary grammar [24]. Tong et al. proposed a privacy-preserving boolean range query with temporal access control in mobile computing and presented an enhanced scheme with faster-than-linear search complexity by constructing a Quadtree index structure [25]. These solutions are designed to implement queries for a specific need in outsourced file data, and these solutions cannot be used to solve the semantic-based multikeyword path query on the ciphertext graph.

With the rapid development of graph data use in various scenarios, the outsourcing and sharing of graph data are more and more frequent, and the security of graph outsourcing has attracted people’s attention. The related research is also constantly emerging and accumulating [26–30]. The secure query problem of the subgraph in remote outsourcing graph data was firstly proposed in the literature [26], and the security query solution was given. The concept and ideas of structural encryption were proposed in the literature [27] and were used to solve the query problem of outsourced graph data. The shortest distance query problem with approximation constraints was solved in the literature [28], while ensuring privacy and security issues. Zheng et al. proposed the graph similarity query scheme in the filter and verification framework and designed a suite of private algorithms based on a symmetric homomorphic encryption scheme [29]. Guan et al. introduced a property graph aggregate query scheme to design the vertex matching and matching update techniques, which achieved aggregate queries over public property graphs under the premise of protecting privacy [30]. The query schemes on the abovementioned graph data solve the problems of the subgraph query, the adjacency query, the shortest distance query, and so on. They have their own characteristics and solve their own problems, and these graph query implementation techniques were not able to handle the path query of the outsourcing graph supporting the semantic-based multikeyword search in the cloud.

To solve the problem, we propose a new solution to achieve the protecting-privacy path query by means of stem extraction, the orthonormal vector conversion algorithm, and inner product operation in this article and solve the problem of semantic-based multikeyword search on the ciphertext graph in cloud outsourcing.

3. Proposed Solution

3.1. Preliminary Knowledge

In the security protection of massive information, cryptography is a very important means of realization and plays an important role in the realization of privacy protection. The idea of semantic security was proposed in the literature [39]. If an attacker can perform computations on the plaintext given the ciphertext, then he can also perform computations without the ciphertext. In this case, the system is semantically secure [39]. Considering the computing overhead and efficiency under remote outsourcing, we use the semantically secure symmetric encryption mechanism in our construction scheme. It consists of three algorithms Gen, Enc, and Dec, where Gen is the secret key generation algorithm, Enc is the encryption algorithm, and Dec is the decryption algorithm.

To implement the semantic-based multikeyword search, the Porter stem extraction algorithm is used to convert graph vertices to semantically similar words. The set of graph vertices is V = {v₁, …, v_n}, and the set transformed by stem extraction is

, where m ≤ n. It should be noted that this article is not only limited to a particular stem extraction algorithm. Then, the set of converted words is converted into orthonormal vectors by using the Gram–Schmidt algorithm. In the query processing, the query results are obtained by performing the inner product calculation of the vector. Let {μ₁, μ₂, …, μ_n} be a linearly independent vector set of a Euclidean space, π(μ_i, μ_j) represents the scalar product of μ_i and μ_j, and the orthogonalization process of the Gram–Schmidt algorithm is shown in the following formulas:

()

After the conversion, the standard orthogonalization vector set of the set {μ₁, μ₂, …, μ_n} is {ρ₁, ρ₂, …, ρ_n}. The main notations in this article and their corresponding descriptions are shown in Table 1.

Table 1. Notations and the descriptions.

Notations	Description
G	The outsourcing graph data
I	The secure index
n	The number of graph vertices
V	The vertex set of the graph G, V = {v₁, …, v_n}
l	The security parameter
p	The maximum number of paths for the graph vertex
Q_i	The encrypted query token, where 1 ≤ i ≤ n
R_i	The query result set of the query token Q_i
Enc_key(·)	The symmetric encryption mechanism
Dec_key(·)	The symmetric decryption mechanism

3.2. Overview of the Solution

In the outsourcing service system of cloud computing, the system model of the path query is shown in Figure 1. There are three roles in this system model, which are the cloud server, the data owner, and the user. The data owner outsources the graph data in ciphertext to the cloud server, which is responsible for processing and manipulating the encrypted graph data. This effectively ensures the security of privacy information for outsourced graph data. In this work, we adopt the ideas in the previous ciphertext query literature for the query authorization and key information of the query user [20, 22].

Details are in the caption following the image — **Figure 1**
Open in figure viewer PowerPoint

The system model of the path query over the encrypted graph.

This work resolve the issue of the path query supporting semantic-based multikeyword search on the encryption graph and can realize the following goals:

1.
Query users can realize path query supporting semantic-based multi-keyword search by the cloud server in cloud computing.
2.
The security analysis can be provided, and the cloud server does not know the privacy information of the graph data during the query process.
3.
The proposed solution can meet the effectiveness of the path query by experimental analysis.

In the article, our proposed query solution is established with the help of the stem extraction algorithm, vector operation, and searchable encryption. The vertex set of the graph is transformed into a new set of words through stem extraction and orthogonal vector transformation. Based on the newly generated set of vectors, we next build the secure index. The cloud server finally performs path queries on the remote platform through the secure index. The proposed solution in this article is consistent with the existing schemes [20, 22], and we assume that servers in the cloud environment adopt an adaptive attack model. Query users have query permissions only when they are authorized by the data owner [20].

3.3. Implementation of Our Solution

The main purpose of this solution is to realize the path query on the remote cloud server, and the main work to achieve this goal is the construction of the secure index. The execution process of the solution in this article is shown in Figure 2.

The several algorithms used in the scheme construction of this article are described as follows:

•
keyGen (l): This algorithm takes the security parameter l as input and outputs a symmetric secret key.
•
ConsChaintable( G, k): The vertex set V of the graph is transformed into a word set V^′ through the stem extraction algorithm. It then takes the set V^′ as input and outputs a normalized orthogonal set W. Next, a chain table of path information between vertices in the graph is constructed. It takes the graph G and a symmetric secret key k as input and outputs a chain table C.
•
ConsIndex( C, k): Constructing an index for outsourcing path queries. It takes the chain table C and a symmetric secret key set k as input and outputs a query index I.
•
ConsToken( V^′, k): Constructing a query token consisting of semantically similar words for several vertices. It takes the set V^′ and a symmetric secret key set k as input and outputs a query token Q.
•
ExecutingQuery( I, Q): It takes the query index I and the query token Q as inputs and returns the query path contents.

In this article, we use l as a security parameter and use Enc and Dec to represent symmetric encryption and decryption functions, respectively. The implementation details of our proposed solution are as follows:

3.3.1. Constructing the Path Chain Table

We construct the path chain table in this section, and the construction process is shown in Figure 3. To realize the requirement of the semantic query, this work transforms the vertex set into a new set by means of the Porter stem extraction algorithm. Then, we proceed to normalized orthogonal transformation and path chain table construction. The implementation details of this part are given in Algorithm 1. The vertex set of the outsourced graph is represented as V = {v₁, …, v_n}, and the converted set is represented as , where m ≤ n. To protect the security of element information in the set, it is necessary to encrypt the elements of the set. Each element is encrypted by an encryption function Enc_k(·), where 1 ≤ i ≤ m and k is a key. The element is encrypted and represented as , that is t_i ∈ {0, 1}^∗. All elements are encrypted and represented as the set T = {t₁, …, t_m}. We next convert the set T to a normalized orthogonal set by formulas (1)–(3). All values are converted and represented as a keyword set W = {w₁, …, w_m}. For each element w_i in the set W, we construct its path information chain table C_i. The number of paths for this element is expressed as |Ci|. The maximum number of paths in all m elements is represented by p. Each node in the chain table stores the information about two vertices connected by paths. After the path chain tables corresponding to all elements are constructed, it is represented as the set C = {C₁, …, C_m}.

Algorithm 1:ConsChaintable.

Input: The vertex set V of outsourced graph G
Output: The path chain table set C
1.
The vertex set of outsourced graph is V = {v₁, …, v_n}. The vertex set is transformed using Porter stem extraction algorithm, and the resulting set is represented as ./^∗m ≤ n^∗/
2.
for alli ∈ [1, m]do
3.
The element of the set V^′ is encrypted by the encryption function Enc_k(·), and it is represented as ./^∗k is a key, and t_i ∈ {0, 1}^∗^∗/
4.
end for All elements in set V^′ are encrypted and represented as set T = {t₁, …, t_m}.
5.
Construct the normalized orthogonal set of the set T using formulas (1)–(3), as follows. a₁ = t₁;/^∗a₁, …, a_m are temporary variables.^∗/
6.
for allk ∈ [2, m]do
7.
a_k = .
8.
end for
9.
for allk ∈ [1, m]do
10.
.
11.
end for
12.
The normalized orthogonal set obtained after calculation is W = {w₁, …, w_m}.
13.
for alli ∈ [1, m]do
14.
for allj ∈ [1, m]do
15.
x_i,j = Enc_k(value_path);/^∗value_path represents the content of path information, and if the path exists, use the keywords connection to represent it, otherwise use 0 to represent it.^∗/
16.
C_i[j] = x_i,j.
17.
end for
18.
end for
19.
C = {C₁, C₂, … C_m};
20.
returnC.

Through analysis, we can see that the time complexity of stem conversion and normalized orthogonal vector construction for graph vertices is O(n). The time complexity of constructing path chain tables for C_i is O(p) (p is the maximum number of paths for vertices in the outsourcing graph). Thus, the time complexity of constructing all path chain tables is O(m·p), and it is approximately expressed as O(m³).

3.3.2. Constructing the Query Index

To ensure the implementation of the secure path query on the cloud server, we will construct a secure index to meet the query requirements of remote outsourcing services, and the construction process is shown in Figure 4. The cloud server implements the user’s path query request through the secure index. The specific description of this index construction is given in Algorithm 2. For each element w_i in the set W, its corresponding set of path information is C_i. We next build its path information label set and represent it as , where |C_i| represents the number of paths. The path information corresponding to each element in the label set will be stored in the index table I. And the storage location of the index table is also encrypted. Each element in the label set also matches only one entry in the index table I. The number of entries in index I is the sum of the number of paths for all vertices, which we express in terms of max. Therefore, when the path information query of a query request is performed, it is also translated into the index table I query for the element corresponding to the label. After processing all the path information, the constructed index table I is submitted to the cloud server.

Algorithm 2:ConsIndex.

Input:W, C
Output:I
1.
The normalized orthogonal keyword set is W = {w₁, w₂, …, w_m}, and the path chain tables set is C = {C₁, C₂, …, C_m};
2.
for alli ∈ [1, m]do
3.
for allj ∈ [1, |C_i|]do
4.
;/^∗building the path information label set^∗/
5.
end for
6.
end for
7.
for alli ∈ [1, m]do
8.
do
9.
e = Enc_k(w_i‖j);/^∗e is a storage location in the index table^∗/
10.
I[e] = C_i[j];
11.
end for
12.
end for
13.
returnI./^∗The secure index^∗/

In the implementation of index construction, the time complexity of building all path information labels is O(m·|C_i|) = O(m·p) (p is the maximum number of paths), and the time complexity of building the index table is O(m·p). Therefore, the time complexity of building the secure index tree is O(m·p).

3.3.3. Performing Query

The constructed secure index and encryption graph are outsourced to the cloud server. We then build a token for the cloud query based on the query request. Our next step is to implement secure path queries in the cloud. The concrete implementation is given in Algorithm 3. When s query words are requested for the cloud query, they are first converted into new words with similar semantics through the stem extraction algorithm and then converted into normalized orthogonal keywords. They have already undergone security processing during construction, and the entire construction process is similar to the construction method in Algorithm 1. The inner product calculation of any two vector keywords a and b is represented as π(a, b). We perform disjunctive computation on s query normalized orthogonal keywords and get a query expression Q = q₁∨q₂ … ∨q_s. When the query is executed, the query expression Q is sent to the cloud server, and the cloud server calculates the inner product of the query expression Q and each item of the index I. If the result of the inner product calculation has a value of 1, it indicates that a path exists, and the information content of the existing path is returned to the query user, and if the result of the calculation is 0, the query expression does not have a path.

Algorithm 3:ExecutingQuery.

Input:I, Q
Output:R
1.
For s query terms t₁, t₂, …, t_s, through stem extraction transformation and normalized orthogonal keywords construction, we have obtained s encrypted keywords: q₁, q₂, …, q_s. Next a query token Q is built, which is a disjunctive expression, namely Q = q₁∨q₂ … ∨q_s.
2.
for alli ∈ [1, max]do
3.
x = π(Q, I[i]);/^∗inner product calculation^∗//^∗ max represents the number of entries in index I^∗/
4.
ifx = 1then
5.
the contents of I[i] is stored in the set R;
6.
end if
7.
end for
8.
returnR.

In the query operation, the time complexity of the query token construction is O(s), and the time complexity of the query execution process is O(max). Therefore, the time complexity of the query algorithm ExecutingQuery is O(max).

4. Security Analysis

In this section, we conduct a security analysis on the proposed scheme we have constructed. Several of the concepts [20] we use in our security analysis are described below.

History: This is an interaction between a user and the cloud server when a user requests a query operation. It contains an outsourced graph dataset and several query requests, which can be represented as H_q = (G, req₁, …, req_q). The partial history is expressed as , where t ≤ q.
View: In the query interaction, what the cloud server sees is the encrypted information, and the view is defined as V_k(H_q) = (Enc_k(G), I, Q₁, …, Q_q). The partial view is , where t ≤ q.
Access pattern: Given the history H_q, it is a tuple and is represented as R(H_q) = (R₁, …, R_q), where R_i(1 ≤ i ≤ q) is the query results of the query token Q_i.
Search pattern: Given the history H_q, it is denoted as a binary symmetric matrix (Tex translation failed), such that ∏_q[i, j] = 1 if Q_i = Q_j, and ∏_q[i, j] = 0 otherwise, for 1 ≤ i, j ≤ q.
Trace: Given the history H_q, it is defined as T_r(H_q) = (|Enc_k(G)|, R(H_q), (Tex translation failed)), where |Enc_k(G)| represents the size of the encrypted graph information; R(H_q) and (Tex translation failed), respectively, represent the access pattern and the search pattern about the history H_q. The trace of partial history is represented as , where t ≤ q.

The simulation-based approach is used in the security proof of our scheme to prove that our query scheme satisfies adaptive semantic security [19, 20]. The cloud server can select query requests based on previous search information and search results. For the security evaluation, we follow the proof principle in the previous searchable symmetric encryption scheme [20, 22]. When two histories with the same trace are given, the cloud server cannot distinguish the views of these two histories. During the execution of the query, the cloud server obtains no additional contents other than the trace information [20]. The security theorem and the proof of the query scheme in this article are given as follows:

Theorem 1. The PSPM solution meets the adaptive semantic security.

Proof 1. To prove the security of our path query scheme, we first define a polynomial scale simulator S. For all q ∈ N, given only trace of a partial history and randomly selected key k, S can simulate a adversary. Given the trace for all 0 ≤ t ≤ q, the simulator S can generate a view , such that is indistinguishable from the adversary^′ s view .

For t = 0, for a given q, the simulator S can observe encrypted outsourced graph data processed by semantically secure symmetric encryption algorithms. The simulator can simulate outsourced graph data by generating arbitrary strings of the same size. The simulated graph data are indistinguishable from the outsourced graph data due to the indiscriminability of the semantically secure symmetric encryption. The simulator also builds a simulated index I^∗ based on a partial historical trace by way of arbitrary string generation, and the simulated index I^∗ is the same size as the real index I. Clearly, I^∗ and I are indistinguishable. Otherwise, one can distinguish between the outputs of semantically secure symmetric encryption and any strings of the same size. Therefore, and are indistinguishable.

For 1 ≤ t ≤ q, the search pattern (Tex translation failed) for t query tokens is included in . The simulator S will build the query tokens included in the view . During building the tokens process, the simulator can reuse the tokens in or rebuild the query tokens based on . To build , the simulator confirms whether contains by checking if ∏_t[t, j] = 1(1 ≤ j ≤ t − 1). If does not contain , the simulator builds based on the contents about query results in by generating a cipher random string. The ciphertext string generated by the simulator is used as one of the terms in I^∗, and it is determined that any two generated strings are not the same. is the same size as Q_t. If contains , the simulator identifies the query contents related to and assigns them to . It ensures that if contains duplicate query terms, the corresponding query tokens contained in are identical.

It is clear that the simulated query tokens in is indistinguishable from the query tokens (Q₁, …, Q_t) in . Otherwise, the outputs of semantically secure symmetric encryption and the random strings of the same size can be distinguishable. Therefore, for 0 ≤ t ≤ q, no probabilistic polynomial scale adversary can distinguish from . Thus, we have proven the security of our proposed theorem.

5. Experimental Evaluations

We now evaluate the performance of the path query scheme on the outsourcing encryption graph through experimental analysis. The data set used in the experiment is the dataset of the Enron email network graph commonly used in the graph query [40, 41], and we randomly selected five datasets from them for experimental analysis and comparison. In the experimental deployment, the experimental environment we use is a local working machine and a cloud server, and the implementation of the program uses C language. The local working machine is equipped with the Windows 10 system, which uses a 4-core processor and 16 GB of memory. The server is equipped with an 8-core processor and 32 GB of memory. The construction and processing of secure indexes and query tokens are implemented locally, and the performance evaluation experiment of the path query is completed on the cloud server.

The PSMP solution proposed in this article implements the protecting-privacy path query problem supporting semantic-based multikeyword search. To verify the feasibility of this scheme, the proposed scheme is compared with the plaintext path query method. The path query method in plaintext is denoted as MSMP, and its building idea is similar to the PSMP scheme, but the data and contents are not encrypted. The comparison of the two schemes aims to evaluate the time and storage overhead of the proposed solution on the encrypted graph query and verify its effectiveness. For outsourced graph data, the difference in the number of edges with a certain number of vertices can affect the number of paths and query performance. For the five randomly selected graph data sets, we consider two experimental scenarios for analysis and comparison. One is the experimental graph data with fewer edges, which are represented by PSMP-1 and MSMP-1, respectively, in experimental analysis. The other is the graph dataset with a large number of edges, represented as PSMP-2 and MSMP-2, respectively, which is approximately twice the former, and the number of edges in the graph dataset is, respectively, 9352, 19,603, 39,528, 80,536, and 139,624. The experimental comparison and analysis under the two scenarios aim to evaluate the effectiveness of the secure semantic-based multikeyword path query proposed in this article.

5.1. Building the Secure Index

To realize the secure path query on the cloud server and ensure the security of the outsourced graph data, it is accomplished by building a secure index. We next conduct an experimental analysis of index construction. We first convert the vertex set of the outsourced graph into a normalized orthogonal set and then build the path information chain tables. We next set the storage location in the index and encrypt the stored contents. The experimental results of index-building after analysis are plotted in Figures 5(a) and 5(b). The vertical coordinate-axis represents the execution time of index-building, the horizontal coordinate-axis in Figure 5(a) represents the number of vertices on the experimental graph data, and the horizontal coordinate-axis in Figure 5(b) represents the number of edges in the case of more edges on the experimental graph data. Here, we consider the comparison and analysis in the case of more edges. The result analysis is similar for the case with the fewer edges. The two experimental figures represent the influence and relationship between index-building time and the number of vertices and edges on the outsourced graph data.

From the experimental Figure 5, it can be seen that the index-building time is almost linearly related to the number of vertices and edges on the outsourced graph data in the four scenarios. The increase in the number of edges usually affects the increase in the number of paths, so in Figure 5(a), the index-building time of PSMP-2 is more than that of PSMP-1, and the related consumption time of MSMP-2 is more than that of MSMP-1. In the scheme design of this article, it is necessary to perform ciphertext processing on the outsourced graph data, and data encryption processing is required in the construction of the secure index. Therefore, the index-building time of the scheme PSMP in this article is slightly longer than that in the case MSMP. Considering the security query purpose and the acceptable additional time consumed, the index-building time of the proposed solution in this article is relatively effective.

In terms of the experimental evaluation of the index-building scale, the analysis results are given in Figure 6. The vertical coordinate-axis in the experimental figures represents the index-building scale, and the horizontal coordinate-axis represents the number of vertices or edges in the query graph data. It can be seen from experimental Figures 6(a) and 6(b) that the index-building scale increases approximately linearly with the increase of the number of vertices or edges in the graph data. Similar to the abovementioned analysis, based on the experimental evaluation of the index-building scale, the proposed scheme PSMP in this article implements the path query under the ciphertext graph data, and the efficiency cost is within the controllable range.

5.2. Query Evaluation

In the path query of outsourced graph data in the cloud environment, the cloud server completes the query operation through the secure index and encrypted query tokens. We evaluate the time cost and efficiency of path queries on cloud outsourced graph data through the analysis in this section. The experimental analysis results are shown in Figure 7, where the vertical coordinate-axis represents the path query time and the horizontal coordinate-axis represents the number of vertices or edges of the outsourced graph data.

As can be seen from Figure 7, the time consumed by the query operation presents an approximate linear increase with the increase of the number of vertices or edges on the outsourced graph data. The outsourced graph data with more edges can have relatively more paths, so the query time consumption in cases PSMP-2 and MSMP-2 is, respectively, higher than that in cases PSMP-1 and MSMP-1 in Figure 7(a). That is, as shown in Figure 7(a), with the same vertices, the more the edges in the graph data, the more time it takes to execute the query. The query scheme PSMP in the case of encrypted graph data consumes more query time than the scheme MSMP from the comparison of the results in Figure 7, and from the perspective of satisfying security, sacrificing certain query efficiency is in line with the overall interests.

In summary, the index built in the proposed scheme plays a primary role in achieving secure path queries, which can be created on the local machine. The cloud server performs query operations and returns query results based on the secure index and query tokens. The scheme in this article realizes the semantic-based multikeyword path query on the outsourced encrypted graph data, and its effectiveness is verified by the experiments.

6. Conclusion

In this article, we present a solution to the problem of the path query supporting semantic-based multikeyword search on encrypted graph data in cloud computing. In the process of the building query scheme, we use stem extraction algorithms to transform and manipulate graph vertices to meet the requirements of semantic queries. We build standard orthogonalization vectors based on the Gram–Schmidt algorithm and then use vector operations and processing to build a secure index and query tokens, thereby achieving the semantic-based multikeyword path query on cloud outsourced graph data. We give the security analysis and proof of the proposed solution and evaluate the effectiveness of the solution through experimental comparison and analysis.

Conflicts of Interest

The authors declare no conflicts of interest.

Funding

This research was supported in part by the National Natural Science Foundation of China (nos. 62262033, 62362042, and 62062045), the Science and Technology Research Project of Jiangxi Education Department (no. GJJ2401819), the Jiangxi Provincial Natural Science Foundation of China (no. 20224BAB202012), the Nature Science Foundation of Ningxia, P.R.China (no. 2024AAC03310), the Higher Education Institution Scientific Research Project of Ningxia, P.R.China (no. NYG2024164), the Hubei Natural Science Foundation Innovation and Development Joint Fund Project (nos. 2022CFD101 and 2022CFD103), and the Xiangyang High-tech Key Science and Technology Plan Project (no. 2022ABH006848).

Acknowledgments

The authors gratefully acknowledge the editor and the reviewers’ comments and helpful suggestions.

Open Research

Data Availability Statement

The datasets used in this study belong to all authors and can be reasonably requested through the corresponding author.

References

1 Li C., Feng X., Shen Q., and Wu Z., On the Security of Secure Keyword Search and Data Sharing Mechanism for Cloud Computing, IEEE Transactions on Dependable and Secure Computing. (2024) 21, no. 4, 4306–4308, https://doi.org/10.1109/tdsc.2023.3346795.
10.1109/TDSC.2023.3346795
Google Scholar
2 Sampé J., Sánchez-Artigas M., Vernik G., Yehekzel I., and García-López P., Outsourcing Data Processing Jobs With Lithops, IEEE Transactions on Cloud Computing. (2023) 11, no. 1, 1026–1037, https://doi.org/10.1109/tcc.2021.3129000.
10.1109/TCC.2021.3129000
Google Scholar
3 Jiang X., Xiong N. N., Wang X., Ying C., Wu F., and Luo Y., DIVINE: A Pricing Mechanism for Outsourcing Data Classification Service in Data Market, Information Sciences. (2023) 636, https://doi.org/10.1016/j.ins.2023.04.001.
10.1016/j.ins.2023.04.001
Google Scholar
4 Ren Y., Song Z., Sun S., Liu J. K., and Feng G., Outsourcing Lda-Based Face Recognition to an Untrusted Cloud, IEEE Transactions on Dependable and Secure Computing. (2023) 23, 2058–2070.
Google Scholar
5 Koti N., Kukkala V. B., Patra A., and Raj Gopal B., Graphiti: Secure Graph Computation Made More Scalable, Proceedings of the 2024 on ACM SIGSAC Conference on Computer and Communications Security, CCS 2024, 2024, 4017–4031, https://doi.org/10.1145/3658644.3670393.
10.1145/3658644.3670393
Google Scholar
6 Jiang J., Wang D., and Zhang G., Qpause: Quantum-Resistant Password-Protected Data Outsourcing for Cloud Storage, IEEE Transactions on Services Computing. (2024) 17, no. 3, 1140–1153, https://doi.org/10.1109/tsc.2023.3331000.
10.1109/TSC.2023.3331000
Google Scholar
7 Cui Z., Lu Z., Yang L. T. et al., Privacy and Accuracy for Cloud-Fog-Edge Collaborative Driver-Vehicle-Road Relation Graphs, IEEE Transactions on Intelligent Transportation Systems. (2023) 24, no. 8, 8749–8761, https://doi.org/10.1109/tits.2023.3254370.
10.1109/TITS.2023.3254370
Google Scholar
8 Chenam V. B. and Ali S. T., A Pairing-free Dynamic Multi-Receiver Certificateless Authenticated Searchable Encryption for Cloud Storage, Journal of Information Security and Applications. (2024) 83, https://doi.org/10.1016/j.jisa.2024.103801.
10.1016/j.jisa.2024.103801
Google Scholar
9 Miao Y., Li F., Jia X. et al., REKS: Role-Based Encrypted Keyword Search with Enhanced Access Control for Outsourced Cloud Data, IEEE Transactions on Dependable and Secure Computing. (2024) 21, no. 4, 3247–3261, https://doi.org/10.1109/tdsc.2023.3324640.
10.1109/TDSC.2023.3324640
Google Scholar
10 Yuan L., Hao K., Lin X., and Zhang W., Batch Hop-Constrained S-T Simple Path Query Processing in Large Graphs, 40th IEEE International Conference on Data Engineering, ICDE 2024, Utrecht, The Netherlands, May 13-16, 2024, 2024, IEEE, 2557–2569.
Google Scholar
11 Ouyang D., Wang Z., Zhang F., Yang S., Yang J., and Lin X., Multi-Source Shortest Path Query With Assembly Points on Large Graphs, IEEE Transactions on Knowledge and Data Engineering. (2024) 36, no. 12, 8859–8875, https://doi.org/10.1109/tkde.2024.3424947.
10.1109/TKDE.2024.3424947
Google Scholar
12 Gautam K. and Ahn C. W., Quantum Path Integral Approach for Vehicle Routing Optimization With Limited Qubit, IEEE Transactions on Intelligent Transportation Systems. (2024) 25, no. 5, 3244–3258, https://doi.org/10.1109/tits.2023.3327157.
10.1109/TITS.2023.3327157
Google Scholar
13 Pépin I., Alaca F., and Zulkernine F. H., Privacy-Preserving Multi-Party Keyword-Based Classification of Unstructured Text Data, 20th International Conference on Distributed Computing in Smart Systems and the Internet of Things, DCOSS-IoT 2024, 2024, IEEE, 203–210.
Google Scholar
14 Chen L., Xue Y., Mu Y., Zeng L., Rezaeibagha F., and Deng R. H., CASE-SSE: Context-Aware Semantically Extensible Searchable Symmetric Encryption for Encrypted Cloud Data, IEEE Transactions on Services Computing. (2023) 16, no. 2, 1011–1022, https://doi.org/10.1109/tsc.2022.3162266.
10.1109/TSC.2022.3162266
Google Scholar
15 Huang C., Chang C., Cheng H., and Wu A., BORE: Energy-Efficient Banded Vector Similarity Search with Optimized Range Encoding for Memory-Augmented Neural Network, Design, Automation & Test in Europe Conference & Exhibition, DATE 2024, Valencia, Spain, March 25-27, 2024, 2024, IEEE, 1–6.
Google Scholar
16 Bag A., Patranabis S., and Mukhopadhyay D., Tokenised Multi-Client Provisioning for Dynamic Searchable Encryption with Forward and Backward Privacy, Proceedings of the 19th ACM Asia Conference on Computer and Communications Security, ASIA CCS 2024, Singapore, July 1-5, 2024, 2024, ACM.
Google Scholar
17 Song D. X., Wagner D., and Perrig A., Practical Techniques for Searches on Encrypted Data, 2000 IEEE Symposium on Security and Privacy, 44–55.
Google Scholar
18 Goh E., Secure Indexes, IACR Cryptology ePrint Archive 2003. (2003) .
Google Scholar
19 Chang Y. and Mitzenmacher M., Privacy Preserving Keyword Searches on Remote Encrypted Data, Third International Conference on Applied Cryptography and Network Security (ACNS 2015), 2005, 442–455, https://doi.org/10.1007/11496137_30.
10.1007/11496137_30
Google Scholar
20 Curtmola R., Garay J. A., Kamara S., and Ostrovsky R., Searchable Symmetric Encryption: Improved Definitions and Efficient Constructions, Proceedings of the 13th ACM Conference on Computer and Communications Security, 2006, 79–88.
Google Scholar
21 Baek J., Safavi-Naini R., and Susilo W., Public Key Encryption With Keyword Search Revisited, Lecture Notes in Computer Science, 1249–1259, https://doi.org/10.1007/978-3-540-69839-5_96, 2-s2.0-54249101916.
10.1007/978-3-540-69839-5_96
Google Scholar
22 Kamara S., Papamanthou C., and Roeder T., Dynamic Searchable Symmetric Encryption, The ACM Conference on Computer and Communications Security, CCS’12, 2012, Raleigh, NC, 965–976.
Google Scholar
23 Xu D., Peng C., Wang W., Liu H., Shaikh S. A., and Tian Y., Privacy-preserving Dynamic Multi-Keyword Ranked Search Scheme in Multi-User Settings, IEEE Transactions on Consumer Electronics. (2023) 69, no. 4, 890–901, https://doi.org/10.1109/tce.2023.3269045.
10.1109/TCE.2023.3269045
Google Scholar
24 Zhong H., Li Z., Cui J., Sun Y., and Liu L., Efficient Dynamic Multi-Keyword Fuzzy Search Over Encrypted Cloud Data, Journal of Network and Computer Applications. (2020) 149, https://doi.org/10.1016/j.jnca.2019.102469.
10.1016/j.jnca.2019.102469
Google Scholar
25 Tong Q., Li X., Miao Y., Liu X., Weng J., and Deng R. H., Privacy-Preserving Boolean Range Query With Temporal Access Control in Mobile Computing, IEEE Transactions on Knowledge and Data Engineering. (2022) 35, 1–5172, https://doi.org/10.1109/tkde.2022.3152168.
10.1109/tkde.2022.3152168
Google Scholar
26 Cao N., Yang Z., Wang C., Ren K., and Lou W., Privacy-Preserving Query over Encrypted Graph-Structured Data in Cloud Computing, 2011 International Conference on Distributed Computing Systems (ICDCS), 2011, 393–402.
Google Scholar
27 Chase M. and Kamara S., Structured Encryption and Controlled Disclosure, Advances in Cryptology-ASIACRYPT 2010-16th International Conference on the Theory and Application of Cryptology and Information Security, 2010, Singapore, 577–594, https://doi.org/10.1007/978-3-642-17373-8_33, 2-s2.0-78650834524.
10.1007/978-3-642-17373-8_33
Google Scholar
28 Shen M., Ma B., Zhu L., Mijumbi R., Du X., and Hu J., Cloud-Based Approximate Constrained Shortest Distance Queries Over Encrypted Graphs with Privacy Protection, IEEE Transactions on Information Forensics and Security. (2018) 13, no. 4, 940–953, https://doi.org/10.1109/tifs.2017.2774451, 2-s2.0-85035104153.
10.1109/TIFS.2017.2774451
Web of Science® Google Scholar
29 Zheng Y., Zhu H., Lu R. et al., Pgsim: Efficient and Privacy-Preserving Graph Similarity Query Over Encrypted Data in Cloud, IEEE Transactions on Information Forensics and Security. (2023) 18, 2030–2045, https://doi.org/10.1109/tifs.2023.3262147.
10.1109/TIFS.2023.3262147
Google Scholar
30 Guan Y., Lu R., Zhang S., Zheng Y., Shao J., and Wei G., Efficient and Privacy-Preserving Aggregate Query Over Public Property Graphs, IEEE Transactions on Big Data. (2024) 10, no. 2, 146–157, https://doi.org/10.1109/tbdata.2023.3342623.
10.1109/TBDATA.2023.3342623
Google Scholar
31 Xia Z., Yuan C., Lv R., Sun X., Xiong N. N., and Shi Y., A Novel Weber Local Binary Descriptor for Fingerprint Liveness Detection, IEEE Transactions on Systems, Man, and Cybernetics: Systems. (2020) 50, no. 4, 1526–1536, https://doi.org/10.1109/tsmc.2018.2874281, 2-s2.0-85055720344.
10.1109/TSMC.2018.2874281
Google Scholar
32 Li J., Ye H., Li T. et al., Efficient and Secure Outsourcing of Differentially Private Data Publishing With Multiple Evaluators, IEEE Transactions on Dependable and Secure Computing. (2022) 19, no. 1, 67–76, https://doi.org/10.1109/tdsc.2020.3015886.
10.1109/TDSC.2020.3015886
CAS Web of Science® Google Scholar
33 Wu Z., Xie J., Shen S., Lin C., Xu G., and Chen E., A Confusion Method for the Protection of User Topic Privacy in Chinese Keyword-Based Book Retrieval, ACM Transactions on Asian and Low-Resource Language Information Processing. (2023) 22, no. 5, 146:1–19, https://doi.org/10.1145/3571731.
10.1145/3571731
Google Scholar
34 Jebali A., Sassi S., Jemai A., and Chbeir R., Secure Data Outsourcing in Presence of the Inference Problem: A Graph-Based Approach, Journal of Parallel and Distributed Computing. (2022) 160, 1–15, https://doi.org/10.1016/j.jpdc.2021.09.006.
10.1016/j.jpdc.2021.09.006
Google Scholar
35 Zhang X., Zhao J., Xu C., Wang H., and Zhang Y., DOPIV: Post-Quantum Secure Identity-Based Data Outsourcing with Public Integrity Verification in Cloud Storage, IEEE Transactions on Services Computing. (2022) 15, no. 1, 334–345, https://doi.org/10.1109/tsc.2019.2942297, 2-s2.0-85072723449.
10.1109/TSC.2019.2942297
Google Scholar
36 Fan K., Chen Q., Su R. et al., MSIAP: A Dynamic Searchable Encryption for Privacy-Protection on Smart Grid with Cloud-Edge-End, IEEE Transactions on Cloud Computing. (2023) 11, no. 2, 1170–1181, https://doi.org/10.1109/tcc.2021.3134015.
10.1109/TCC.2021.3134015
Google Scholar
37 Liu J., Li Y., Sun R. et al., EMK-ABSE: Efficient Multikeyword Attribute-Based Searchable Encryption Scheme Through Cloud-Edge Coordination, IEEE Internet of Things Journal. (2022) 9, no. 19, 18650–18662, https://doi.org/10.1109/jiot.2022.3163340.
10.1109/JIOT.2022.3163340
Google Scholar
38 Wang Q., Lai C., Lu R., and Zheng D., Searchable Encryption With Autonomous Path Delegation Function and Its Application in Healthcare Cloud, IEEE Transactions on Cloud Computing. (2023) 11, no. 1, 879–896, https://doi.org/10.1109/tcc.2021.3120110.
10.1109/TCC.2021.3120110
CAS Google Scholar
39 Goldwasser S. and Micali S., Probabilistic Encryption, Journal of Computer and System Sciences. (1984) 28, no. 2, 270–299, https://doi.org/10.1016/0022-0000(84)90070-9, 2-s2.0-0021409284.
10.1016/0022-0000(84)90070-9
Web of Science® Google Scholar
40 Klimt B. and Yang Y., Introducing the Enron Corpus, CEAS 2004-First Conference on Email and Anti-Spam.
Google Scholar
41 Leskovec J., Lang K. J., Dasgupta A., and Mahoney M. W., Community Structure in Large Networks: Natural Cluster Sizes and the Absence of Large Well-Defined Clusters, Internet Mathematics. (2009) 6, no. 1, 29–123, https://doi.org/10.1080/15427951.2009.10129177, 2-s2.0-84924178998.
10.1080/15427951.2009.10129177
Google Scholar

All articles

A Protecting-Privacy Path Query Supporting Semantic-Based Multikeyword Search Over Ciphertext Graph Data in Cloud Computing

Abstract

1. Introduction

2. Related Work

3. Proposed Solution

3.1. Preliminary Knowledge

3.2. Overview of the Solution

3.3. Implementation of Our Solution

3.3.1. Constructing the Path Chain Table

3.3.2. Constructing the Query Index

3.3.3. Performing Query

4. Security Analysis

5. Experimental Evaluations

5.1. Building the Secure Index

5.2. Query Evaluation

6. Conclusion

Conflicts of Interest

Funding

Acknowledgments

Open Research

Data Availability Statement

References

Figures

References

Information

About Wiley Online Library

Help & Support

Opportunities

Connect with Wiley