A Protecting-Privacy Path Query Supporting Semantic-Based Multikeyword Search Over Ciphertext Graph Data in Cloud Computing
Abstract
With the rapid development of information technology and intelligence, there are more and more usage scenarios of graph data. Path queries have always been a hot topic of research for scholars. There are already many mature methods and ideas in the study of the path query on the plaintext graph. For the path query on graph data in the case of cloud outsourcing, it is necessary to consider both the construction of query algorithms and the protection of privacy information. The processing of graph privacy information through encryption, and then outsourcing to the cloud platform, is a common measure. The semantic-based path query supporting multikeyword search is an extended path query method, which can improve the query function. For users, it is very troublesome to access and process the encrypted graph data. In this article, we propose a protecting-privacy semantic-based multikeyword path query scheme on the ciphertext graph (PSMP). Firstly, based on the principle of searchable encryption and vector operation, a secure index is constructed, and then the cloud server uses the secure index to implement path queries. This article demonstrates its security through formal analysis and verifies its effectiveness through experimental comparison and analysis. The work of this article has a certain promoting effect on the query processing and analysis of ciphertext graph data.
1. Introduction
The emergence of cloud computing and massive graph data has brought a lot of convenience and more choices to people’s lives and work. Cloud computing platforms can bring outsourcing services and powerful computing and processing capabilities [1–3]. The owner of graph data can use cloud computing platforms to store and process this large amount of data, which is not only a cost saving but also a better choice for the convenience of using this data [4, 5]. Graph data are very commonly used in practical applications. Graph vertices can represent entity information, and the edges between vertices contain the interrelationships between entities. The graph data contain a lot of sensitive information. For example, there are user relationships and personal information in social networks, gene and patient information in the biomedical graph, and transaction data information in the financial transaction graph. If these data are released to the cloud platform without security processing, sensitive information will be leaked to the cloud server, which will bring the risk of privacy disclosure [6, 7]. Therefore, for security reasons, these graph data need to undergo certain security considerations and operations before being published on cloud platforms. Encrypting sensitive data is one of the important and commonly used methods [8, 9]. Another issue that comes with it is that accessing and using these encrypted graph data on cloud platforms is very inconvenient. Therefore, it is very urgent and important to directly query and process ciphertext graph data on cloud platforms.
In the use of graph data, the path query is a very common and important operation, which is an important basic primitive of graph data processing [10, 11]. The path query provides support for research on the shortest path or path optimization problem in graph data usage [11, 12]. The path query supporting semantic-based multikeyword search can bring users a more flexible query range and more choices. For example, in urban network graph data, a path query research is used to determine locations with connectivity relationships. The vertex represents two locations in the graph, and the edge between the vertices represents the direct connection path between the two locations. The semantic-based path query that supports multikeyword search operations is to complete the query request with semantic multikeyword joint expressions and return the query results that meet the conditions to the user [13, 14]. To achieve the path query, we use the stem extraction algorithm, the vector processing principle, and the encryption mechanism to construct query indexes and query symbols [15, 16]. At the same time, considering the cost and overhead factors, it is very necessary and valuable to study the semantic-based path query supporting multikeyword search on the ciphertext graph in cloud computing.
For query processing operations in cloud outsourcing scenarios, searchable encryption is a practical and effective cryptographic primitive [17–21]. In cloud outsourcing data query and processing, the outsourced data are securely processed, and a secure index is created. The cloud server can perform queries and other operations without knowing the relevant privacy information of the outsourced data. The searchable encryption has been developing and improving continuously since its initial appearance, which provides an important solution for remote outsourcing data query and processing. In the subsequent research, many researchers have extended and expanded the searchable encryption scheme [22–25]. But these existing searchable encryption schemes cannot operate directly on path queries on encrypted graph data. Meanwhile, many researchers have proposed some query schemes related to ciphertext graph data [26–30]. Cao et al. first proposed and solved the subgraph query problem on the outsourced graph data and ensured its security [26]. The literature [27] first proposed structured encryption, which can perform query operations on any structured data such as graph data. The problem of the shortest path query on the encryption graph was studied in the literature [28], and the security information was protected at the same time. The literature [29] and the literature [30] study the privacy-preserving graph similarity query problem and aggregate the query over public property graphs, respectively. However, the existing solutions cannot be used to handle the path query problem supporting semantic-based multikeyword search on ciphertext graph in cloud computing.
- 1.
In the cloud outsourcing environment, a novel solution-dubbed PSMP is presented to protect the privacy information.
- 2.
The effective use of stem extraction and vector operations in graph vertex processing improves the efficiency of path queries in ciphertext.
- 3.
The construction of the path query method in ciphertext and the effective combination of searchable encryption mechanisms have improved the efficiency of the query and ensured their security.
The rest of the article is as follows: Section 2 introduces the work related to ciphertext graph queries. Section 3 proposes and designs the PSMP solution. Section 4 provides a security analysis of the PSMP solution. Afterward, we evaluated the effectiveness of our solution through experimental comparison and analysis in Section 5. Finally, Section 6 summarizes the article.
2. Related Work
The wide existence and deep use of cloud services in various industries have brought great convenience to people’s lives and production. It provides the possibility and opportunity for the outsourcing of large amounts of data [31–33]. Meanwhile, the security of these outsourced data is an important issue to consider; otherwise, sensitive data information will be leaked [34, 35]. The hidden danger of outsourcing private data brought by cloud servers and external attackers makes people attach great importance to the security of outsourcing data. This has also attracted many researchers to conduct in-depth research on this security issue, and a lot of research results have appeared [36–38]. Searchable encryption among them is one of the outstanding research achievements. There are two modes of symmetric searchable encryption (SSE) and asymmetric searchable encryption (ASE) based on the different ways of encryption construction [17, 18, 20, 21]. The asymmetric encryption scheme has slow encryption speeds and high computational costs, while the symmetric encryption scheme is efficient and fast, which is more suitable for encrypting large amounts of data. Considering the efficiency of operation, this article adopts the SSE mechanism.
The searchable encryption is an effective and important cryptographic primitive in remote outsourced data retrieval services. A lot of related research results continue to appear, which provides valuable ideas and guidelines for the ciphertext query under cloud outsourcing [17–21]. Song et al. first proposed the concept of searchable encryption to solve the query problem on outsourced data [17]. The idea of index construction was proposed in the query of remote data by using Bloom filters in the literature [18]. Subsequently, based on the idea of constructing indexes, a more efficient SSE scheme was proposed in the literature [20], and the privacy guarantee of nonadaptive SSE and adaptive SSE was analyzed. Curtmola et al. firstly proposed a searchable encryption scheme based on public key cryptography, which promoted the development of searchable encryption primitives [21]. With the increase of demand and the development of technology, many searchable encryption schemes with more functions have emerged [22–25]. Baek et al. improved and expanded the design method and index construction based on the literature [20] and proposed a dynamic SSE scheme [22]. Xu et al. presented a dynamic multikeyword ranked search scheme over encrypted cloud data by designing the tree-based index structure and the greedy breadth first algorithm [23]. Zhong et al. proposed a multikeyword fuzzy query scheme supporting dynamic update by using the Bloom filter and unary grammar [24]. Tong et al. proposed a privacy-preserving boolean range query with temporal access control in mobile computing and presented an enhanced scheme with faster-than-linear search complexity by constructing a Quadtree index structure [25]. These solutions are designed to implement queries for a specific need in outsourced file data, and these solutions cannot be used to solve the semantic-based multikeyword path query on the ciphertext graph.
With the rapid development of graph data use in various scenarios, the outsourcing and sharing of graph data are more and more frequent, and the security of graph outsourcing has attracted people’s attention. The related research is also constantly emerging and accumulating [26–30]. The secure query problem of the subgraph in remote outsourcing graph data was firstly proposed in the literature [26], and the security query solution was given. The concept and ideas of structural encryption were proposed in the literature [27] and were used to solve the query problem of outsourced graph data. The shortest distance query problem with approximation constraints was solved in the literature [28], while ensuring privacy and security issues. Zheng et al. proposed the graph similarity query scheme in the filter and verification framework and designed a suite of private algorithms based on a symmetric homomorphic encryption scheme [29]. Guan et al. introduced a property graph aggregate query scheme to design the vertex matching and matching update techniques, which achieved aggregate queries over public property graphs under the premise of protecting privacy [30]. The query schemes on the abovementioned graph data solve the problems of the subgraph query, the adjacency query, the shortest distance query, and so on. They have their own characteristics and solve their own problems, and these graph query implementation techniques were not able to handle the path query of the outsourcing graph supporting the semantic-based multikeyword search in the cloud.
To solve the problem, we propose a new solution to achieve the protecting-privacy path query by means of stem extraction, the orthonormal vector conversion algorithm, and inner product operation in this article and solve the problem of semantic-based multikeyword search on the ciphertext graph in cloud outsourcing.
3. Proposed Solution
3.1. Preliminary Knowledge
In the security protection of massive information, cryptography is a very important means of realization and plays an important role in the realization of privacy protection. The idea of semantic security was proposed in the literature [39]. If an attacker can perform computations on the plaintext given the ciphertext, then he can also perform computations without the ciphertext. In this case, the system is semantically secure [39]. Considering the computing overhead and efficiency under remote outsourcing, we use the semantically secure symmetric encryption mechanism in our construction scheme. It consists of three algorithms Gen, Enc, and Dec, where Gen is the secret key generation algorithm, Enc is the encryption algorithm, and Dec is the decryption algorithm.
After the conversion, the standard orthogonalization vector set of the set {μ1, μ2, …, μn} is {ρ1, ρ2, …, ρn}. The main notations in this article and their corresponding descriptions are shown in Table 1.
Notations | Description |
---|---|
G | The outsourcing graph data |
I | The secure index |
n | The number of graph vertices |
V | The vertex set of the graph G, V = {v1, …, vn} |
l | The security parameter |
p | The maximum number of paths for the graph vertex |
Qi | The encrypted query token, where 1 ≤ i ≤ n |
Ri | The query result set of the query token Qi |
Enckey(·) | The symmetric encryption mechanism |
Deckey(·) | The symmetric decryption mechanism |
3.2. Overview of the Solution
In the outsourcing service system of cloud computing, the system model of the path query is shown in Figure 1. There are three roles in this system model, which are the cloud server, the data owner, and the user. The data owner outsources the graph data in ciphertext to the cloud server, which is responsible for processing and manipulating the encrypted graph data. This effectively ensures the security of privacy information for outsourced graph data. In this work, we adopt the ideas in the previous ciphertext query literature for the query authorization and key information of the query user [20, 22].

- 1.
Query users can realize path query supporting semantic-based multi-keyword search by the cloud server in cloud computing.
- 2.
The security analysis can be provided, and the cloud server does not know the privacy information of the graph data during the query process.
- 3.
The proposed solution can meet the effectiveness of the path query by experimental analysis.
In the article, our proposed query solution is established with the help of the stem extraction algorithm, vector operation, and searchable encryption. The vertex set of the graph is transformed into a new set of words through stem extraction and orthogonal vector transformation. Based on the newly generated set of vectors, we next build the secure index. The cloud server finally performs path queries on the remote platform through the secure index. The proposed solution in this article is consistent with the existing schemes [20, 22], and we assume that servers in the cloud environment adopt an adaptive attack model. Query users have query permissions only when they are authorized by the data owner [20].
3.3. Implementation of Our Solution
The main purpose of this solution is to realize the path query on the remote cloud server, and the main work to achieve this goal is the construction of the secure index. The execution process of the solution in this article is shown in Figure 2.

- •
keyGen (l): This algorithm takes the security parameter l as input and outputs a symmetric secret key.
- •
ConsChaintable( G, k): The vertex set V of the graph is transformed into a word set V′ through the stem extraction algorithm. It then takes the set V′ as input and outputs a normalized orthogonal set W. Next, a chain table of path information between vertices in the graph is constructed. It takes the graph G and a symmetric secret key k as input and outputs a chain table C.
- •
ConsIndex( C, k): Constructing an index for outsourcing path queries. It takes the chain table C and a symmetric secret key set k as input and outputs a query index I.
- •
ConsToken( V′, k): Constructing a query token consisting of semantically similar words for several vertices. It takes the set V′ and a symmetric secret key set k as input and outputs a query token Q.
- •
ExecutingQuery( I, Q): It takes the query index I and the query token Q as inputs and returns the query path contents.
In this article, we use l as a security parameter and use Enc and Dec to represent symmetric encryption and decryption functions, respectively. The implementation details of our proposed solution are as follows:
3.3.1. Constructing the Path Chain Table
We construct the path chain table in this section, and the construction process is shown in Figure 3. To realize the requirement of the semantic query, this work transforms the vertex set into a new set by means of the Porter stem extraction algorithm. Then, we proceed to normalized orthogonal transformation and path chain table construction. The implementation details of this part are given in Algorithm 1. The vertex set of the outsourced graph is represented as V = {v1, …, vn}, and the converted set is represented as , where m ≤ n. To protect the security of element information in the set, it is necessary to encrypt the elements of the set. Each element is encrypted by an encryption function Enck(·), where 1 ≤ i ≤ m and k is a key. The element is encrypted and represented as , that is ti ∈ {0, 1}∗. All elements are encrypted and represented as the set T = {t1, …, tm}. We next convert the set T to a normalized orthogonal set by formulas (1)–(3). All values are converted and represented as a keyword set W = {w1, …, wm}. For each element wi in the set W, we construct its path information chain table Ci. The number of paths for this element is expressed as |Ci|. The maximum number of paths in all m elements is represented by p. Each node in the chain table stores the information about two vertices connected by paths. After the path chain tables corresponding to all elements are constructed, it is represented as the set C = {C1, …, Cm}.

-
Algorithm 1:ConsChaintable.
-
Input: The vertex set V of outsourced graph G
-
Output: The path chain table set C
- 1.
The vertex set of outsourced graph is V = {v1, …, vn}. The vertex set is transformed using Porter stem extraction algorithm, and the resulting set is represented as ./∗m ≤ n∗/
- 2.
for alli ∈ [1, m]do
- 3.
The element of the set V′ is encrypted by the encryption function Enck(·), and it is represented as ./∗k is a key, and ti ∈ {0, 1}∗∗/
- 4.
end for All elements in set V′ are encrypted and represented as set T = {t1, …, tm}.
- 5.
Construct the normalized orthogonal set of the set T using formulas (1)–(3), as follows. a1 = t1;/∗a1, …, am are temporary variables.∗/
- 6.
for allk ∈ [2, m]do
- 7.
ak = .
- 8.
end for
- 9.
for allk ∈ [1, m]do
- 10.
.
- 11.
end for
- 12.
The normalized orthogonal set obtained after calculation is W = {w1, …, wm}.
- 13.
for alli ∈ [1, m]do
- 14.
for allj ∈ [1, m]do
- 15.
xi,j = Enck(value_path);/∗value_path represents the content of path information, and if the path exists, use the keywords connection to represent it, otherwise use 0 to represent it.∗/
- 16.
Ci[j] = xi,j.
- 17.
end for
- 18.
end for
- 19.
C = {C1, C2, … Cm};
- 20.
returnC.
Through analysis, we can see that the time complexity of stem conversion and normalized orthogonal vector construction for graph vertices is O(n). The time complexity of constructing path chain tables for Ci is O(p) (p is the maximum number of paths for vertices in the outsourcing graph). Thus, the time complexity of constructing all path chain tables is O(m·p), and it is approximately expressed as O(m3).
3.3.2. Constructing the Query Index
To ensure the implementation of the secure path query on the cloud server, we will construct a secure index to meet the query requirements of remote outsourcing services, and the construction process is shown in Figure 4. The cloud server implements the user’s path query request through the secure index. The specific description of this index construction is given in Algorithm 2. For each element wi in the set W, its corresponding set of path information is Ci. We next build its path information label set and represent it as , where |Ci| represents the number of paths. The path information corresponding to each element in the label set will be stored in the index table I. And the storage location of the index table is also encrypted. Each element in the label set also matches only one entry in the index table I. The number of entries in index I is the sum of the number of paths for all vertices, which we express in terms of max. Therefore, when the path information query of a query request is performed, it is also translated into the index table I query for the element corresponding to the label. After processing all the path information, the constructed index table I is submitted to the cloud server.

-
Algorithm 2:ConsIndex.
-
Input:W, C
-
Output:I
- 1.
The normalized orthogonal keyword set is W = {w1, w2, …, wm}, and the path chain tables set is C = {C1, C2, …, Cm};
- 2.
for alli ∈ [1, m]do
- 3.
for allj ∈ [1, |Ci|]do
- 4.
;/∗building the path information label set∗/
- 5.
end for
- 6.
end for
- 7.
for alli ∈ [1, m]do
- 8.
do
- 9.
e = Enck(wi‖j);/∗e is a storage location in the index table∗/
- 10.
I[e] = Ci[j];
- 11.
end for
- 12.
end for
- 13.
returnI./∗The secure index∗/
In the implementation of index construction, the time complexity of building all path information labels is O(m·|Ci|) = O(m·p) (p is the maximum number of paths), and the time complexity of building the index table is O(m·p). Therefore, the time complexity of building the secure index tree is O(m·p).
3.3.3. Performing Query
The constructed secure index and encryption graph are outsourced to the cloud server. We then build a token for the cloud query based on the query request. Our next step is to implement secure path queries in the cloud. The concrete implementation is given in Algorithm 3. When s query words are requested for the cloud query, they are first converted into new words with similar semantics through the stem extraction algorithm and then converted into normalized orthogonal keywords. They have already undergone security processing during construction, and the entire construction process is similar to the construction method in Algorithm 1. The inner product calculation of any two vector keywords a and b is represented as π(a, b). We perform disjunctive computation on s query normalized orthogonal keywords and get a query expression Q = q1∨q2 … ∨qs. When the query is executed, the query expression Q is sent to the cloud server, and the cloud server calculates the inner product of the query expression Q and each item of the index I. If the result of the inner product calculation has a value of 1, it indicates that a path exists, and the information content of the existing path is returned to the query user, and if the result of the calculation is 0, the query expression does not have a path.
-
Algorithm 3:ExecutingQuery.
-
Input:I, Q
-
Output:R
- 1.
For s query terms t1, t2, …, ts, through stem extraction transformation and normalized orthogonal keywords construction, we have obtained s encrypted keywords: q1, q2, …, qs. Next a query token Q is built, which is a disjunctive expression, namely Q = q1∨q2 … ∨qs.
- 2.
for alli ∈ [1, max]do
- 3.
x = π(Q, I[i]);/∗inner product calculation∗//∗ max represents the number of entries in index I∗/
- 4.
ifx = 1then
- 5.
the contents of I[i] is stored in the set R;
- 6.
end if
- 7.
end for
- 8.
returnR.
In the query operation, the time complexity of the query token construction is O(s), and the time complexity of the query execution process is O(max). Therefore, the time complexity of the query algorithm ExecutingQuery is O(max).
4. Security Analysis
-
History: This is an interaction between a user and the cloud server when a user requests a query operation. It contains an outsourced graph dataset and several query requests, which can be represented as Hq = (G, req1, …, reqq). The partial history is expressed as , where t ≤ q.
-
View: In the query interaction, what the cloud server sees is the encrypted information, and the view is defined as Vk(Hq) = (Enck(G), I, Q1, …, Qq). The partial view is , where t ≤ q.
-
Access pattern: Given the history Hq, it is a tuple and is represented as R(Hq) = (R1, …, Rq), where Ri(1 ≤ i ≤ q) is the query results of the query token Qi.
-
Search pattern: Given the history Hq, it is denoted as a binary symmetric matrix (Tex translation failed), such that ∏q[i, j] = 1 if Qi = Qj, and ∏q[i, j] = 0 otherwise, for 1 ≤ i, j ≤ q.
-
Trace: Given the history Hq, it is defined as Tr(Hq) = (|Enck(G)|, R(Hq), (Tex translation failed)), where |Enck(G)| represents the size of the encrypted graph information; R(Hq) and (Tex translation failed), respectively, represent the access pattern and the search pattern about the history Hq. The trace of partial history is represented as , where t ≤ q.
The simulation-based approach is used in the security proof of our scheme to prove that our query scheme satisfies adaptive semantic security [19, 20]. The cloud server can select query requests based on previous search information and search results. For the security evaluation, we follow the proof principle in the previous searchable symmetric encryption scheme [20, 22]. When two histories with the same trace are given, the cloud server cannot distinguish the views of these two histories. During the execution of the query, the cloud server obtains no additional contents other than the trace information [20]. The security theorem and the proof of the query scheme in this article are given as follows:
Theorem 1. The PSPM solution meets the adaptive semantic security.
Proof 1. To prove the security of our path query scheme, we first define a polynomial scale simulator S. For all q ∈ N, given only trace of a partial history and randomly selected key k, S can simulate a adversary. Given the trace for all 0 ≤ t ≤ q, the simulator S can generate a view , such that is indistinguishable from the adversary′ s view .
For t = 0, for a given q, the simulator S can observe encrypted outsourced graph data processed by semantically secure symmetric encryption algorithms. The simulator can simulate outsourced graph data by generating arbitrary strings of the same size. The simulated graph data are indistinguishable from the outsourced graph data due to the indiscriminability of the semantically secure symmetric encryption. The simulator also builds a simulated index I∗ based on a partial historical trace by way of arbitrary string generation, and the simulated index I∗ is the same size as the real index I. Clearly, I∗ and I are indistinguishable. Otherwise, one can distinguish between the outputs of semantically secure symmetric encryption and any strings of the same size. Therefore, and are indistinguishable.
For 1 ≤ t ≤ q, the search pattern (Tex translation failed) for t query tokens is included in . The simulator S will build the query tokens included in the view . During building the tokens process, the simulator can reuse the tokens in or rebuild the query tokens based on . To build , the simulator confirms whether contains by checking if ∏t[t, j] = 1(1 ≤ j ≤ t − 1). If does not contain , the simulator builds based on the contents about query results in by generating a cipher random string. The ciphertext string generated by the simulator is used as one of the terms in I∗, and it is determined that any two generated strings are not the same. is the same size as Qt. If contains , the simulator identifies the query contents related to and assigns them to . It ensures that if contains duplicate query terms, the corresponding query tokens contained in are identical.
It is clear that the simulated query tokens in is indistinguishable from the query tokens (Q1, …, Qt) in . Otherwise, the outputs of semantically secure symmetric encryption and the random strings of the same size can be distinguishable. Therefore, for 0 ≤ t ≤ q, no probabilistic polynomial scale adversary can distinguish from . Thus, we have proven the security of our proposed theorem.
5. Experimental Evaluations
We now evaluate the performance of the path query scheme on the outsourcing encryption graph through experimental analysis. The data set used in the experiment is the dataset of the Enron email network graph commonly used in the graph query [40, 41], and we randomly selected five datasets from them for experimental analysis and comparison. In the experimental deployment, the experimental environment we use is a local working machine and a cloud server, and the implementation of the program uses C language. The local working machine is equipped with the Windows 10 system, which uses a 4-core processor and 16 GB of memory. The server is equipped with an 8-core processor and 32 GB of memory. The construction and processing of secure indexes and query tokens are implemented locally, and the performance evaluation experiment of the path query is completed on the cloud server.
The PSMP solution proposed in this article implements the protecting-privacy path query problem supporting semantic-based multikeyword search. To verify the feasibility of this scheme, the proposed scheme is compared with the plaintext path query method. The path query method in plaintext is denoted as MSMP, and its building idea is similar to the PSMP scheme, but the data and contents are not encrypted. The comparison of the two schemes aims to evaluate the time and storage overhead of the proposed solution on the encrypted graph query and verify its effectiveness. For outsourced graph data, the difference in the number of edges with a certain number of vertices can affect the number of paths and query performance. For the five randomly selected graph data sets, we consider two experimental scenarios for analysis and comparison. One is the experimental graph data with fewer edges, which are represented by PSMP-1 and MSMP-1, respectively, in experimental analysis. The other is the graph dataset with a large number of edges, represented as PSMP-2 and MSMP-2, respectively, which is approximately twice the former, and the number of edges in the graph dataset is, respectively, 9352, 19,603, 39,528, 80,536, and 139,624. The experimental comparison and analysis under the two scenarios aim to evaluate the effectiveness of the secure semantic-based multikeyword path query proposed in this article.
5.1. Building the Secure Index
To realize the secure path query on the cloud server and ensure the security of the outsourced graph data, it is accomplished by building a secure index. We next conduct an experimental analysis of index construction. We first convert the vertex set of the outsourced graph into a normalized orthogonal set and then build the path information chain tables. We next set the storage location in the index and encrypt the stored contents. The experimental results of index-building after analysis are plotted in Figures 5(a) and 5(b). The vertical coordinate-axis represents the execution time of index-building, the horizontal coordinate-axis in Figure 5(a) represents the number of vertices on the experimental graph data, and the horizontal coordinate-axis in Figure 5(b) represents the number of edges in the case of more edges on the experimental graph data. Here, we consider the comparison and analysis in the case of more edges. The result analysis is similar for the case with the fewer edges. The two experimental figures represent the influence and relationship between index-building time and the number of vertices and edges on the outsourced graph data.


From the experimental Figure 5, it can be seen that the index-building time is almost linearly related to the number of vertices and edges on the outsourced graph data in the four scenarios. The increase in the number of edges usually affects the increase in the number of paths, so in Figure 5(a), the index-building time of PSMP-2 is more than that of PSMP-1, and the related consumption time of MSMP-2 is more than that of MSMP-1. In the scheme design of this article, it is necessary to perform ciphertext processing on the outsourced graph data, and data encryption processing is required in the construction of the secure index. Therefore, the index-building time of the scheme PSMP in this article is slightly longer than that in the case MSMP. Considering the security query purpose and the acceptable additional time consumed, the index-building time of the proposed solution in this article is relatively effective.
In terms of the experimental evaluation of the index-building scale, the analysis results are given in Figure 6. The vertical coordinate-axis in the experimental figures represents the index-building scale, and the horizontal coordinate-axis represents the number of vertices or edges in the query graph data. It can be seen from experimental Figures 6(a) and 6(b) that the index-building scale increases approximately linearly with the increase of the number of vertices or edges in the graph data. Similar to the abovementioned analysis, based on the experimental evaluation of the index-building scale, the proposed scheme PSMP in this article implements the path query under the ciphertext graph data, and the efficiency cost is within the controllable range.


5.2. Query Evaluation
In the path query of outsourced graph data in the cloud environment, the cloud server completes the query operation through the secure index and encrypted query tokens. We evaluate the time cost and efficiency of path queries on cloud outsourced graph data through the analysis in this section. The experimental analysis results are shown in Figure 7, where the vertical coordinate-axis represents the path query time and the horizontal coordinate-axis represents the number of vertices or edges of the outsourced graph data.


As can be seen from Figure 7, the time consumed by the query operation presents an approximate linear increase with the increase of the number of vertices or edges on the outsourced graph data. The outsourced graph data with more edges can have relatively more paths, so the query time consumption in cases PSMP-2 and MSMP-2 is, respectively, higher than that in cases PSMP-1 and MSMP-1 in Figure 7(a). That is, as shown in Figure 7(a), with the same vertices, the more the edges in the graph data, the more time it takes to execute the query. The query scheme PSMP in the case of encrypted graph data consumes more query time than the scheme MSMP from the comparison of the results in Figure 7, and from the perspective of satisfying security, sacrificing certain query efficiency is in line with the overall interests.
In summary, the index built in the proposed scheme plays a primary role in achieving secure path queries, which can be created on the local machine. The cloud server performs query operations and returns query results based on the secure index and query tokens. The scheme in this article realizes the semantic-based multikeyword path query on the outsourced encrypted graph data, and its effectiveness is verified by the experiments.
6. Conclusion
In this article, we present a solution to the problem of the path query supporting semantic-based multikeyword search on encrypted graph data in cloud computing. In the process of the building query scheme, we use stem extraction algorithms to transform and manipulate graph vertices to meet the requirements of semantic queries. We build standard orthogonalization vectors based on the Gram–Schmidt algorithm and then use vector operations and processing to build a secure index and query tokens, thereby achieving the semantic-based multikeyword path query on cloud outsourced graph data. We give the security analysis and proof of the proposed solution and evaluate the effectiveness of the solution through experimental comparison and analysis.
Conflicts of Interest
The authors declare no conflicts of interest.
Funding
This research was supported in part by the National Natural Science Foundation of China (nos. 62262033, 62362042, and 62062045), the Science and Technology Research Project of Jiangxi Education Department (no. GJJ2401819), the Jiangxi Provincial Natural Science Foundation of China (no. 20224BAB202012), the Nature Science Foundation of Ningxia, P.R.China (no. 2024AAC03310), the Higher Education Institution Scientific Research Project of Ningxia, P.R.China (no. NYG2024164), the Hubei Natural Science Foundation Innovation and Development Joint Fund Project (nos. 2022CFD101 and 2022CFD103), and the Xiangyang High-tech Key Science and Technology Plan Project (no. 2022ABH006848).
Acknowledgments
The authors gratefully acknowledge the editor and the reviewers’ comments and helpful suggestions.
Open Research
Data Availability Statement
The datasets used in this study belong to all authors and can be reasonably requested through the corresponding author.