Efficient Boolean Keywords Search over Encrypted Cloud Data in Public Key Setting
Abstract
Searchable public key encryption- (SPE-) supporting keyword search plays an important role in cloud computing for data confidentiality. The current SPE scheme mainly supports conjunctive or disjunctive keywords search which belongs to very basic query operations. In this paper, we propose an efficient and secure SPE scheme that supports Boolean keywords search, which is more advanced than the conjunctive and disjunctive keywords search. We first develop a keyword conversion method, which can change the index and Boolean keywords query into a group of vectors. Then, through applying a technique so-called dual pairing vector space to encrypt the obtained vectors, we propose a concrete scheme proven to be secure under chosen keyword attack. Finally, we put forward a detailed theoretical and experimental analysis to demonstrate the efficiency of our scheme.
1. Introduction
Currently, thousands of information retrieval systems, such as e-mail systems, database management systems, and document management systems, are operating successfully in both the government and private sectors. As the data stored in these systems increase rapidly, more and more people want to migrate these data to cloud. To keep data privacy, users often encrypt these data before uploading them to the cloud. Since the encrypted data are difficult to retrieve, how to execute keyword search over encrypted data has attracted tremendous research attention over the past few years. Among these research studies, the searchable encryption (SE) is one of the most important techniques to address the issue of searching over encrypted data [1, 2].
The SE enables data users to retrieve the encrypted data of interest from a cloud server without decrypting the data. Commonly, SE is divided into two categories: one is searchable symmetric key encryption (SSE); the other is searchable public key encryption (SPE). During recent years, many SSE schemes have been proposed to support keyword search over encrypted data [3–6]. The key of SSE for encrypting data is the same as the key for generating search trapdoor. By contrast, the key of SPE for encrypting data is open to public, while the key for generating search trapdoor is only given to the authorized data receivers. Compared with SSE, SPE is more suitable for the situation in which there are many data senders and only a few data receivers, e.g., e-mail system [7], personal health record [8], and wireless sensor network [9]. As illustrated in Figure 1, in the scenario of e-mail system, the security requirements can be summarized as follows: (1) any data senders can generate encrypted e-mail data; (2) only data receiver can query and decrypt the encrypted e-mail data; (3) except the data receiver, none of the other entities, including the cloud server, can know the content of the encrypted e-mail data. Since security characteristics of SPE satisfy all these requirements in the above scenario, it is argued that SPE is very suitable for this application. Therefore, how to construct an efficient and secure SPE scheme supporting keyword search is always a hotspot in the field of SE.

1.1. Motivation
The very first SPE scheme supporting keyword search was introduced by Boneh et al., and it is so-called public key with keyword search (PEKS) [7]. However, their work only supports a single keyword search. In order to support more expressive query, many SPE schemes [10–12, 16] were proposed to realize advanced search, for example, conjunctive and disjunctive keywords search. In practice, most of the applications need more advanced keywords search function than the conjunctive and disjunctive keywords search. More precisely, many applications require Boolean keywords search. For example, in an e-mail system, users want to make a query like (A∧B)∨(C∧D), where A, B, C, and D are keywords. A naive thought is that a Boolean query can be obtained by remoulding a PECK or PEDK scheme, i.e., by combining the query results of conjunctive or disjunctive keywords search. However, we argue this simple method has many drawbacks. To better illustrate our motivation, based on a PEDK or PECK scheme, we construct a naive scheme supporting the Boolean keywords search like q1∨q2∧q3, where q1, q2, and q3 are three keywords. We then briefly review the simple solution and explain why it is unsatisfactory.
The approach is that we first execute the query q1 and the query q2∧q3 by making use of the PECK scheme, respectively, and obtain the union of the results of q1 query and q2∧q3 query. However, this method will leak the trapdoors of q1 and q2∧q3. By utilizing the trapdoors, the search results of q1 and q2∧q3 are also leaked. Over time, the adversary may combine this information to derive the contents of user’s documents. In addition, we also can execute the query of q1∨q2 and q3 by making use of the PEDK scheme, respectively, and then obtain the intersection of the results of the query q1∨q2 and the query q3. However, this method carries the same drawback.
1.2. Contribution
- (1)
Inspired by the keyword conversion method introduced in [17], we create a novel keyword conversion method which can transform the index keyword set and Boolean query into an attribute and a predicate vector, respectively. These vectors can efficiently realize Boolean keywords search by an inner product operation. Moreover, the vector dimension is much less than that generated by adopting the previous method.
- (2)
Through elaborately applying the existing technique called dual pairing vector space (DPVS) to encrypt the attribute and predicate vectors, we propose a secure and efficient SPE scheme supporting Boolean keywords search (SPE-BKS), which can accomplish Boolean keywords search over encrypted data with a better search efficiency than the previous schemes.
Moreover, for security concern, we introduce a formal security definition for SPE-BKS and give a detailed proof to demonstrate that our scheme is secure against chosen keyword attack. To verify the efficiency of the proposed scheme, we conduct an experiment for comparing our scheme with some recent schemes over a real-world dataset (Enron Email Dataset).
1.3. Related Work
The first SPE scheme supporting keyword search was introduced by Boneh et al. [7]. They called it as public key encryption with keyword search (PEKS), which only supports a single keyword search. To support multikeyword search, Park et al. proposed an SPE scheme supporting conjunctive keyword search, which is called public key encryption with conjunctive keywords search (PECK) [10]. In their scheme, each keyword is associated with a keyword field. The mechanism of the keyword field is based on two assumptions: one is that the keywords in a keyword field must be arranged in a preset order; the other is that the same keyword never appears in two different keyword fields of the same document. However, in many applications, the keyword field will make the multikeyword search unpractical. For instance, in an e-mail system, the keyword fields usually contain “From,” “To,” and “Title.” Many e-mails may have the same keyword in different keyword fields, e.g., “From: LeBron James” and “To: James Harden.” Moreover, the keywords in the keyword field “Title” may be organized in an alphabet order. To address this issue, the subsequence work is to create a PECK scheme without keyword field. In [11], Boneh and Waters proposed a public key encryption scheme called hidden vector encryption, which can efficiently support conjunctive keywords search without keyword field. After this, some efficient PECK schemes with better performance were proposed in [12–15]. To support disjunctive keyword search over encrypted data without keyword field, Katz et al. introduced a novel encryption scheme called predicate encryption supporting inner product, which is also named as inner product encryption (IPE) [16]. Through changing the index and query into an attribute and a predicate vector, respectively, a public key encryption with disjunctive keywords search (PEDK) scheme can be built based on the IPE scheme. Considering that the previous SPE schemes cannot use one trapdoor to realize conjunctive and disjunctive keywords search simultaneously, Zhang et al. proposed two public key encryption with conjunctive and disjunctive keyword search (PECDK) schemes [17, 18], which can efficiently support conjunctive and disjunctive keyword search at the same time. In order to support expressive query over encrypted data, based on the Paillier cryptosystem with threshold decryption (PCTD) [19], Yang et al. proposed an SPE scheme supporting versatile search query patterns, such as the range, conjunctive, disjunctive, and Boolean keywords search [20]. Miao et al. presented a hybrid keyword-field search scheme that supports both keyword search and range search simultaneously [21]. In addition, their scheme also provides an efficient key management mechanism to reduce the storage cost of keys. For the issue of fuzzy keyword search, Yang et al. designed a method to segment keyword according to the position of wildcards and proposed an SPE scheme supporting wildcard keyword search by combining the segmentation method and PCTD [22]. To support keyword search over arbitrary languages, Yang et al. realized a general method which can convert a variety of languages into a uniform big integer. By utilizing this conversion method and PCTD, they can carry out an SPE scheme supporting multikeyword rank search in arbitrary language [23]. To add the access control mechanism to SE, Li et al. created an attribute-based encryption (ABE) scheme which supports not only keyword search but also update operations for users ciphertext and secret key [24]. Then, they presented an outsourced ABE scheme supporting keyword search, which can transfer operations of decryption and key issuing to the cloud server partially [25]. He et al. proposed an SPE scheme which can control user’s search permission according to an access control policy [26]. Miao et al. proposed an attribute-based keyword search scheme under a shared multiowner setting [27]. Zhang et al. proposed an SPE scheme achieving both Boolean keywords search and fine-grained search permission [28]. For the problem of tensor decomposition over encrypted data, by elaborately combining homomorphic encryption and block chain techniques, Feng et al. designed several schemes to implement different types of tensor decomposition, such as high-order Bi-Lanczos and Tucker decomposition [29–31]. To improve the efficiency of SPE, Hwang et al. created a more efficient SPE scheme, by replacing the operation of bilinear pairing with ElGamal encryption system [32]. Lu et al. proposed a certificate-less encryption supporting keyword search under a multirecipient setting [33]. In order to obtain a better efficiency, their scheme avoids using a costly operation called bilinear pairing. Considering the scenario in which devices have limited resources, two secure and efficient energy-saving platforms were proposed to protect user’s sensitive data [34, 35]. To resist the DoS attack, Li et al. gave an efficient remote user authentication and privacy-preserving scheme by adopting the technique called extended chaotic maps [36]. In order to improve search accuracy, Zhang et al. proposed an SPE scheme supporting semantic keywords search by adopting a method called “Word2vec” [37].
1.4. Organization
This paper is organized as follows. In Section 2, we give the framework of SPE-BKS and its security definition. Some basic tools are also provided in the section. In Section 3, the construction of SPE-BKS is given, and its security proof is also presented. The experimental and theoretical analysis is provided in Section 4. We conclude this paper in Section 5.
2. Preliminaries
In this section, we will give a formal definition of the framework and security model of SPE-BKS. In addition, we also briefly introduce some basic ingredients used in our scheme, including dual pairing vector space (DPVS), two important lemmas, and complexity assumption.
2.1. Framework of SPE-BKS
- (1)
Data receiver generates the public key (pk) and secret key (sk) and sends the pk to the public. Data receiver also generates the trapdoor for any query of his/her interest and sends the trapdoor to the cloud server.
- (2)
For a message M with a keyword set W, data sender encrypts W to create the encrypted index IW by using pk. Moreover, data sender will produce the encrypted message C for M. After this, data sender sends IW and C to the cloud server.
- (3)
When the cloud server receives the trapdoor generated by the data receiver, the server tests the trapdoor against each encrypted index and returns the matched messages to the receiver.
According to the responsibilities of these three roles, we give a formal definition of the framework of SPE-BKS.
Definition 1. SPE-BKS consists of four polynomial-time algorithms (KeyGen, IndexBuild, Trapdoor, and Test) as follows:
- (1)
KeyGen (λ): this algorithm is run by the data receiver. It takes a security parameter λ as input and outputs pk and sk.
- (2)
IndexBuild (pk, W): this algorithm is executed by the data sender to encrypt the keyword set W = {w1, w2, …, wn}. It produces a searchable encrypted index IW by using pk and W.
- (3)
Trapdoor (pk, sk, and Q): the algorithm is executed by the receiver to construct a trapdoor of Q. It takes pk, sk, and Q as input and outputs a trapdoor TQ.
- (4)
Test (pk, IW, and TQ): for the query Q = (CNF1)∨(CNF2)∨⋯∨(CNFm) and the index keyword set W, we define the function f(W, Q) as follows: if there exists some i ∈ [1, m] such that the keyword set in CNFi is a subset of W, then f(W, Q) = 1. Otherwise, f(W, Q) = 0. This algorithm is run by the cloud server. It takes a trapdoor TQ, a secure index IW, and pk as input and outputs 1 if f(W, Q) = 1, or 0 otherwise.
2.1.1. Correctness
- (1)
If f(W, Q) = 1, Test (pk, IW, TQ) outputs 1
- (2)
If f(W, Q) = 0, Test (pk, IW, TQ) outputs 1 with negligible probability
In practice, data senders will send a message M with a keyword set W. The above algorithms aim to construct a secure and searchable index for W. For the message M, we can apply the symmetric encryption scheme, e.g., AES and triple DES, to protect the security of M. Like the previous SPE schemes, we only concentrate on searchable encryption part.
2.2. Security Definition of the SPE-BKS
In this section, we present a formal definition for SPE-BKS, which defines a group of adversaries who can adaptively query the trapdoors of chosen keyword sets, and issue two challenge ciphertexts. The essential of the security of SPE-BKS is that the adversaries fail to distinguish these two ciphertexts based on the given trapdoors. Depending on the above description, inspired by the security definition of the previous SPE schemes, the security definition of SPE-BKS is given as follows.
Definition 2. An SPE-BKS scheme is adaptively index-hiding against chosen keyword attack if for all probabilistic polynomial-time (PPT) adversaries , the advantage of in the following game is negligible for the security parameter λ:
- (1)
Setup: the challenger runs the KeyGen (λ) algorithm to generate pk and sk and gives pk to the attacker .
- (2)
Phase 1: the attacker can adaptively ask the challenger for the trapdoor TQ for any query Q of his choice.
- (3)
Challenge: first selects two keyword sets W(0) and W(1) and sends them to . Suppose that Q(1), Q(2), …, Q(t), Q(2), …, Q(t) are the keyword queries which are queried to construct trapdoors in Phase 1; the only restriction is that these queries cannot distinguish these two challenge keyword sets. Then, randomly chooses a bit β ∈ {0, 1} and generates Iβ = IndexBuild(pk, W(β)). Finally, {Iβ, W(0), W(1)} are sent to .
- (4)
Phase 2: continues to ask for trapdoor TQ for any query Q of his/her choice under the restriction mentioned in the Challenge phase.
- (5)
Response: the attacker outputs β′ ∈ {0, 1} and wins the game if β′ = β.
Based on the above game, the advantage of is defined as follows:
2.3. Prime Order Bilinear Group
- (1)
Bilinear: , where a, b ∈ G and
- (2)
Nondegenerate: if g ∈ G, then
- (3)
Computable: for any a, b ∈ G, can be efficiently computable
An efficient bilinear map can be obtained by applying the Weil pairing or the Tate pairing [38].
2.4. Dual Pairing Vector Space
Suppose that and g ∈ G; we have . We can perform the scalar multiplication and vector addition in the exponent. For any a ∈ Zp and , we have and . We can also have and . Here, the dot product is taken as modulo p.
We will employ the concept of DPVS which is introduced in [39]. The notation used to describe DPVS is introduced in [40]. Suppose that and are two random bases of , where l is a fixed dimension; if whenever i ≠ j and for all i ∈ n, where λ is a random elements in Zp, then we call B and B∗ dual orthonormal bases. Obviously, for a generator g ∈ G, whenever i ≠ j, where 1 can be seen as the identity element of GT.
2.5. Two Important Lemmas
We will introduce two important lemmas used in the security proof of our scheme. The first lemma is presented in [40]. To describe the lemma formally, first of all, we give some notations and definitions which are also introduced in [40]. Let t, l be two fixed positive integers where t < l, be an invertible matrix and St⊆{1,2, …, l} be a subset of size t. Suppose that B and B∗ are random dual orthonormal bases; a new pair of dual orthonormal bases BA and was defined as follows.
Let Bt be a l × t matrix over Zp whose columns are the vectors such that i ∈ St. We can easily find that BtA is also a l × t matrix. By keeping all of the vectors for i ∉ St and exchanging for i ∈ St with the columns of BtA, BA is then constructed. Because is also a l × t matrix, also can be constructed by using the same method.
For a fixed dimension l and prime p, we denote randomly choosing a pair of dual orthonormal bases B and B∗ by . can be viewed as a dual orthonormal bases set.
The first lemma is described as follows.
Lemma 1. For any fixed positive integers t < l, any fixed invertible and set St⊆{1,2, …, l} of size t, if , is also distributed as a random sample from . In particular, the distribution of is independent of A.
The second lemma introduced in [39] (Lemma 23) is described as follows.
Lemma 2. Let , where V is l-dimensional vector space, and and V∗ are its dual. For all , ,
2.6. Complexity Assumption
In order to prove our scheme’s security, subspace complexity assumption introduced in [40] is needed. This validity of this assumption is also given in [40].
For a fixed dimension n′ ≥ 3 and a prime p, the dual orthonormal bases B and B∗ which are randomly chosen are denoted by . can be seen as a dual orthonormal bases set. For a positive integer k ≤ (n′/3), the definition of this assumption is described as follows.
Definition 3 (subspace complexity). Given a group generator , we define the following distribution:
We assume that, for any PPT algorithm A with output in {0, 1}, the advantage of defined by is negligible in the security parameter λ.
3. The Proposed SPE-BKS Scheme
In this section, we first introduce a keyword conversion method which converts the index and query keywords into a group of vectors. Then, through taking advantage of DPVS to encrypt these vectors, the construction of SPE-BKS is given. Finally, the security proof of our scheme is presented.
3.1. Keyword Conversion Method
Before describing the method, some notations will be introduced. Suppose that any keyword w can be expressed as a string in {0, 1}∗, we define a function . Since p is a large prime and is larger than the number of all words, H1 can be collision-resistant. This means that if i ≠ j, then H1(wi) ≠ H1(wj), where wi and wj are two distinct keywords.
According to the coefficient of the f(x), the vector for W is obtained.
Note that if it exists some i such that Qi⊆W, where i ∈ [1, m], according to (4) and (5), it is not difficult to verify that .
As a result, we can test each Qi in Q against W to make a Boolean keywords search. If f(W, Q) = 1, there is at least an i ∈ [1, m] such that . Based on this property, a concrete SPE-BKS scheme will be proposed in the next section.
3.2. Construction
- (i)
KeyGen: choosing a bilinear group G of a prime order p and setting n′ = 3n + 3, the algorithm randomly selects a pair of dual orthonormal bases (B, B∗) from the dual orthonormal bases set , where , and (mod p), where i ∈ [1, 3n + 3]. The algorithm outputs pk and sk as follows:
- (i)
IndexBuild: given a keyword set W = {w1, w2, …, wn}, the algorithm constructs an n-degree polynomialf(x) = (x − H1(w1))(x − H1(w2)), …, (x − H1(wn)) = anxn + an−1xn−1 + ⋯+a0x0 , where H1(w1), H1(w2), …, H1(wn) are n roots of the equation f (x) = 0. Choosing two random elements s1, s2 ∈ Zp, for the vector , this algorithm creates the index IW as follows:
() - (i)
Trapdoor: given a query Q, this algorithm first generates a group of vectors , by using the keyword conversion method introduced in Section 3.1. Then, it randomly chooses r1, r2, …, rm, t1, t2, …, tm ∈ Zp and an invertible matrix α. Suppose that and in which and , where j ∈ [1, m], for each j, the trapdoor generation algorithm computes
() - (i)
The trapdoor of Q is TQ = {K1, K2, …, Km, α−1}.
- (ii)
Test: the test algorithm first computes for each j ∈ [1, m]. Suppose that ; it outputs
() -
where and j ∈ [1, m]. Based on , the test algorithm works as follows:
- (1)
Choose a counter v, and set v = 1.
- (2)
If v > m, then go to step (3); otherwise, the algorithm computes. If Dj = 1, the algorithm outputs 1 and ends. Otherwise, it sets v = v + 1 and goes to step (2).
- (3)
The algorithm outputs 0 and ends.
3.2.1. Correctness
If there exists some j ∈ [1, m] such that , it has xj=0, which makes , and, thus, the test algorithm outputs 1.
3.2.2. Application
- (1)
Data Receiver. Data receiver runs the “KeyGen” function to generate pk and sk, and pk is open to the public. When data receiver wants to perform Boolean keywords search, the “Trapdoor” function is called to generate a trapdoor by using sk and a Boolean query condition. After this, the trapdoor is sent to the cloud server.
- (2)
Data Sender. For a document set, the data sender builds the secure index by calling the “IndexBuild” function and sends the index to the cloud server.
- (3)
Cloud Server. Upon receiving a trapdoor generated by the data receiver, the cloud server launches the “Test” function and returns documents associated with the query to the data receiver.
In the real world, any practical application that needs ciphertext retrieval can integrate our scheme to realize the function of searching on encrypted data.
3.3. Security
- (i)
Semifunctional Index. Let , where i ∈[0, n] and is introduced in “KeyGen” algorithm. A normal index is constructed by the “IndexBuild” algorithm. Choosing random values y0, y1, …, yn ∈ Zp, the semifunctional index is created as follows:
() - (i)
Semifunctional Trapdoor. Let , where i ∈[0, n]. A normal trapdoor is constructed by the “Trapdoor” algorithm. Choosing random values zj0, zj1, …, zjn ∈ Zp where j ∈ [1, m], the semifunctional trapdoor is created as follows:
()
When using the semifunctional trapdoor to test the semifunctional index, the additional factors will be generated, where j ∈ [1, m].
- (1)
GameReal: this game is the real security game.
- (2)
Gamek: for each k ∈ [0, q], Gamek is similar to GameReal except that the index given to is semifunctional and the first k trapdoors are semifunctional. The remaining trapdoors are normal. In Game0, all the trapdoors given to are normal and the index is semifunctional. In Gameq, the index and all trapdoors are semifunctional.
- (3)
: suppose that a keyword set W = {w1, w2, …, wn} is the challenge keyword set; we construct an n-degree polynomial f(x) = (x − H1(w1))(x − H1(w2)), …, (x − H1(wn)) = anxn + an−1xn−1 + ⋯+a0x0 by using the function H1, where H1(w1), H1(w2), …, H1(wn) are n roots of the equation f (x) = 0. Then, we define this game. For each k ∈ [−1, n], is similar to Gameq except that index is a semifunctional encryption of a vector in which the first k + 1 elements are random and the remaining elements are {ak+1, ak+2, …, an}.
-
is a game such that the index is a semifunctional encryption of a real challenge keyword set, which is identical to Gameq. is a game such that the index is a semifunctional encryption of a random keyword set. We will show that these games are indistinguishable in the following lemmas.
Lemma 3. Suppose that there exists a PPT algorithm such that is nonnegligible. Then, we can build a PPT algorithm with nonnegligible advantage in breaking subspace complexity assumption, with n′ = 3n + 3, k = n + 1.
Lemma 4. Suppose that there exists a PPT algorithm such that is nonnegligible. Then, we can build a PPT algorithm with nonnegligible advantage in breaking subspace complexity assumption, with n′ = 3n + 3, k = n + 1.
Lemma 5. Suppose that there exists a PPT algorithm such that is nonnegligible. Then, we can build a PPT algorithm with nonnegligible advantage in breaking subspace complexity assumption, with n′ = 6, k = 2.
Considering the length of the article and the coherence of the article structure, the proofs of Lemmas A–C are given in Appendix.
Theorem 1. If subspace complexity assumption holds, then our SPE-BKS scheme is secure.
Proof. If subspace complexity assumption holds, the real security game is indistinguishable from based on the previous lemmas. In , the value of β is information-theoretically hidden from the attackers. Hence, we can state that the attackers can attain no advantage in breaking our SPE-BKS scheme.
4. Performance Evaluation
In this section, we present a detailed experiment to demonstrate that our scheme can efficiently perform Boolean keywords search over the encrypted data. We implement our scheme in JAVA with Java Pairing-Based Cryptography (JPBC) Library [43]. In our implementation, the bilinear map is instantiated as Type A pairing (base field size is 128 bits), which offers a level of security equivalent to 1024-bit DLOG [43]. Our experiment is run on Intel® Core™ i7 CPU at 2.90 GHz processor and 16 GB memory size and is over a real-world e-mail dataset called Enron Email Dataset [44]. In our experiment, we randomly choose 1000 e-mails from the Enron Email Dataset and denote the number of documents by d (d = 1000). To show the efficiency of our scheme, we compare our scheme to three previous SPE schemes in terms of key generation, index building, trapdoor generation, and search. For simplicity, we denote these three schemes introduced in [17, 18, 20] by PECDK-1, PECDK-2, and YY18. These three SPE schemes can perform conjunctive, disjunctive, and Boolean keywords search over encrypted data.
4.1. Key Generation
From Figure 2(a), the time costs of key generation in PECDK-1 and our scheme are both linear with, while that in PECDK-2 is linear with O (n). The reason for this phenomenon is the case that both our scheme and PECDK-1 adopt DPVS to generate group elements in G. Because the dimension of DPVS in our scheme is 3n while that in PECDK-1 is 4n, the time cost of key generation in our scheme is less than that in PECDK-1. In addition, since the key generation algorithm in YY18 is independent of n, the time cost of key generation is not related to n. Although the time cost of key generation in our scheme is higher than that in PECDK-2 and YY18, it has little impact on our practical application since this algorithm only runs when system initialization and key pair replacement are carried out.




As shown in Figures 3(a) and 3(b), because both pk and sk contain group elements in G, the space cost for key pair in our scheme and PECDK-1 are both linear with the square of n. By contrast, the space cost for key pair in PECDK-2 is linear with O (n). Besides, for YY18, since both pk and sk contain constant big integers, the space cost for key pair is not related to n. Though the storage cost of keys in our scheme is more than that in the other three schemes, our scheme still does not need much space to store the keys as these keys are stored only a few copies.




4.2. Index Building
From Figure 2(b), the time costs of index building in PECDK-1, PECDK-2, and our scheme are all linear with, while that in YY18 is linear with O (n). For PECDK-2, the index building algorithm needs to convert the keywords into a matrix and then needs exponentiation computation of G to encrypt the keywords. For the proposed scheme and PECDK-1, they also require exponentiation computation of G owing to DPVS. More precisely, compared to PECDK-1, our scheme needs less time cost in index building since the dimension of DPVS in our scheme is less than that in PECDK-1. Besides, the time cost of index building in our scheme is slightly higher than that in PECDK-2 since our scheme needs exponentiation computations while PECDK-2 requires exponentiation computations. The reason for this phenomenon is that, compared to PECDK-2, our scheme needs more group elements to support more complex search function. Compared with YY18, our scheme needs more index building time since our scheme needs exponentiation computations while YY18 only runs the encryption algorithm of PCTD n times.
For the storage cost of indices, the group elements on G in the index for our scheme are linear with n. For YY18, since each document’s index contains n ciphertexts generated by PCTD, the space cost of index building is linear with O (n). By contrast, the group elements in the index for PECDK-1 and PECDK-2 are both linear with the square of n since the index structures for PECDK-1 and PECDK-2 are both a matrix. As shown in Figure 3(c), the storage costs of indices in our scheme and YY18 are linear with O (n) while those in PECDK-1 and PECDK-2 are both linear with.
4.3. Trapdoor Generation
As shown in Figure 2(c), the time costs of trapdoor generation in PECDK-1, PECDK-2, YY18, and the proposed scheme are linear with m, m, and respectively. More precisely, for PECDK-1, the keywords in the query are first converted to be a vector, whose dimension is n. Then, this vector will be encrypted by using DPVS. Since the encryption operation needs exponentiation computations of G, the time cost of trapdoor generation in PECDK-1 is linear with. For PECDK-2, suppose that the number of keywords in the query is m, the query is converted to be a vector whose dimension is m, and each dimension needs one exponentiation computation on G. Thus, the time cost of trapdoor generation in PECDK-2 is linear with m. For YY18, if the query contains m keywords, the trapdoor algorithm will perform encryption algorithm of PCTD n times, so the time cost of trapdoor generation is linear with O (m). For the proposed scheme, the query is converted to be m vectors in which each vector’s dimension is n. After this, each vector is encrypted by making use of DPVS, and thus, the time consumption of trapdoor generation in our scheme is linear with.
From Figure 3(d), the space costs for PECDK-1, PECDK-2, YY18, and our scheme are linear with n, n, n, and mn, respectively. The reason for this phenomenon is that the trapdoors in PECDK-1, PECDK-2, and our scheme contain n, m, and mn group elements on G, respectively, and the trapdoor in YY18 involves m ciphertexts of PCTD.
4.4. Search
As shown in Figure 2(d), the time cost of search in PECDK-1 is linear with, while that in PECDK-2, YY18, and our scheme is linear with mn. More precisely, for PECDK-1, the index of W contains n ciphertexts, and each ciphertext needs n pairing operations. For PECDK-2, the index is a matrix, and the trapdoor is a vector whose dimension is m. The test algorithm in PECKD-2 performs mn pairing operations between the first m rows of the matrix and the vector. For YY18, since the index and trapdoor hold n and m ciphertext of PCTD, respectively, the test algorithm will run secure less or equal (SLE) protocol and secure multiplication protocol across domains (SMD) mn times. For the proposed scheme, the trapdoor has m items, and the test algorithm in our scheme performs n pairing operations between each item and the index. Thus, total pairing operations in our scheme are mn. Since PECDK-1, PECDK-2, and our scheme need nearly 2 mn and 3 mn pairing operations, respectively, the time consumption in our scheme is slightly more than that in PECDK-2 and is less than that in PECDK-1. Moreover, since the time cost of a pairing operation is less than that of SLE and SMD, our scheme is more efficient than YY18 in test phase.
4.5. More Comments
As shown in the experimental results, when n = 5, d = 1000, and m = 5, the time cost of index building in our scheme is 331 s, the generation time of a single trapdoor is 1.7 s, and the search time is 142 s. According to the statistical data given in [17, 45], the number of keywords in a document (n) is usually less than 20, e.g., only 3∼5 keywords in the scientific paper, and the number of keywords in a query (m) is often less than 10. We can argue that our scheme is suitable for the applications with fewer keywords, such as the keywords in the scientific literature, e-mail title and summaries, medical data summaries, and so on.
Although Figure 2 shows that the time complexity of our scheme is as good as that of PECDK-2, our scheme can support Boolean keywords search, which is much advanced than the conjunctive and disjunctive keywords search. Compared with YY18 that supports Boolean keywords search, our scheme needs less search time, despite the fact that it increases index building time. In practice, the index building in real-world application is usually a one-time activity, while queries are frequently performed. Thus, we reckon that it is worth sacrificing index building time to reduce retrieval time. For the space complexity, from Figure 3, our scheme needs less space for index storage, though requiring more storage space for the trapdoor and keys. Considering the fact that trapdoor and keys often require much less storage space than the index, we argue that our scheme is practicable in the real world.
5. Conclusions
In this paper, by applying DPVS and the bilinear pairing, we proposed a searchable public key encryption scheme supporting Boolean keyword search, which is proven to be secure under chosen keyword search attack. Compared to previous SPE schemes supporting conjunctive and disjunctive keywords search, the proposed scheme can support more advanced search function. Moreover, through a detailed experiment over a real-world dataset, we can argue that the efficiency of our scheme is suitable for practical applications with fewer keywords. Considering that the efficiency in our scheme still needed to be improved, we will construct a more efficient scheme in the forthcoming work.
Conflicts of Interest
The authors declare that they have no conflicts of interest regarding the publication of this paper.
Acknowledgments
The authors gratefully acknowledge the support of the National Natural Science Foundation of China under Grant nos. 61402393 and 61601396 and Nanhu Scholars Program for Young Scholars of XYNU.
Appendix
A. Proof of Lemma 3
Proof. Given , C needs to decide whether T1, T2, …, Tn and Tn+1 are , distributed as , or , .
By using T1, T2, …, Tn and Tn+1, C can simulate Game0 or GameReal with . To create pk, firstly, randomly selects an invertible matrix . Then, we define a dual orthonormal bases F and F∗ by , , , , , and , , , , , .
C implicitly sets and where the matrix A is applied as a change of basis matrix to and is applied as a change of basis matrix to , as described in Section 2.5. Note that the first 2n + 2 basis vectors are unchanged. According to Lemma 2, E and E∗ are properly distributed.
Choosing a function H1, C computes and sends it to A. Each time A asks C to provide a key for a keyword query Q, C creates a normal trapdoor of Q. Choosing and an invertible matrix and , C computes
At some point, A sends C two challenge keyword sets, and . By randomly choosing β ∈ [0,1] and computing an n-degree polynomial , C sets
Then, C gives the index IW to A . If T1, T2, …, Tn and Tn+1 are equal to , , then this is a properly distributed normal index. In this case, C has properly simulated GameReal.If T1, T2, …, Tn and Tn+1 are equal to , , then there is an additional term of in the exponent part of the index. The coefficients in the basis are the vector τ3(a0, a1, …, an). In order to acquire the coefficients in the basis , we multiply the matrix A−1 by the transpose of these vectors and obtain τ3A−1(a0, a1, …, an). Since A is random, these coefficients are uniformly random. Therefore, in this case, C has properly simulated Game0. So, if A can distinguish GameReal from Game0 with nonnegligible advantage, then C can use the output of A to break subspace assumption with nonnegligible advantage.
B. Proof of Lemma 4
Proof. Given , needs to decide whether T1, T2, …, Tn and Tn+1 are distributed as follows: , or , .
By using T1, T2, …, Tn and Tn+1, can simulate Gamek−1 or Gamek with . To create pk, firstly, randomly selects an invertible matrix and implicitly sets , and , where A is applied as a change of basis matrix to and is applied as a change of basis matrix to , as described in Section 2.5. Then, , , , for i ∈ [0, n]. According to Lemma 2, E and E∗ are properly distributed.
Choosing a function H1, computes and sends it to .
When requests the lth trapdoor query, generates the normal trapdoor or the semifunctional trapdoor as follows:
For l < k, choosing and implicitly setting , , where j ∈ [1, m] and i ∈ [0, n], can produce semifunctional trapdoor by using .
For l > k, runs the normal trapdoor generation algorithm to produce the normal trapdoor.
To create the kth requested trapdoor, firstly chooses and an invertible matrix α. Let and . Then, for each j ∈ [1, m], computes
The above equation implicitly sets and , where j ∈ [1, m]. If T1, T2, …, Tn and Tn+1 are equal to , ; then this is a properly distributed normal trapdoor. If T1, T2, …, Tn and Tn+1 are equal to , , then this is a properly distributed semifunctional trapdoor. For each j ∈ [1, m], Kj’s exponent vector contains the item .
At some point, sends two challenge keyword sets, and . By randomly choosing β ∈ [0, 1] and computing an n-degree polynomial , where , are n roots of the equation f (x) = 0, sets
After that, sends the semifunctional index IW to . Obviously, IW contains the exponent vector . The authors observe that if attempts to test whether the kth trapdoor of Q is semifunctional by creating a semifunctional index of keyword set W which satisfies f(W, Q) = 1, then can find that test algorithm can still work whether the kth key is semifunctional or not, since Vj and V will be eliminated when f(W, Q) = 1. Therefore, we can say that the kth key is a nominally semifunctional key.
In view of this, for each j ∈ [1, m], Vj and V are distributed as random vectors in the spans of and . In Vj, the coefficients in the basis are the vector , where and i ∈ [0, n], j ∈ [1, m]. In order to acquire the coefficients in the basis , we multiply the matrix At by the transpose of these vectors and obtain ). Since is random and if i ≠ j, we can say that E1, E2, …, Em are uniformly random. In V, the coefficients in the basis are the vector μ3(a0, a1, …, an). In order to acquire the coefficients in the basis , we multiply the matrix A−1 by the transpose of these vectors and obtain Ej = μ3A−1(a0, a1, …, an). Since is random and , the coefficients Ej and mentioned above are uniformly random according to Lemma 2, where j ∈ [1, m] and , .
According to the above analysis, we conclude that if T1, T2, …, Tn and Tn+1are distributed as , , has properly simulated Gamek−1. If T1, T2, …, Tn and Tn+1 are equal to , , has properly simulated Gamek. Thus, we argue that if can distinguish Gamek−1 from Gamek with nonnegligible advantage, then can use the output of to break subspace complexity assumption with nonnegligible advantage.
C. Proof of Lemma 5
Proof. Given , T1 and T2, C needs to decide whether T1 and T2 are distributed as and or as and , respectively.
By using T1 and T2, for k ∈ [0, n], C can simulate or with A. To construct pk, C implicitly sets and , where i ∈ [0, k]. Apparently, E and E∗ are properly distributed dual orthonormal bases.
Because C can obtain , pk can be easily created. Each time A asks C to provide a key for a keyword query Q, C creates a semifunctional trapdoor of Q. Choosing r1, r2, …, rm, t1, t2, …, tm, zji ∈ Zp and an invertible matrix and , C computes
At some point, A sends C two challenge keyword sets, and . By randomly choosing β ∈ [0,1], s1, s2 ∈ Zp, two random vectors , , and computing an n-degree polynomial by using the function H1, where are n roots of the equation f (x) = 0, then C sets
Then, C gives the indexIw to A. If T1 and T2 are equal to and , then this is a properly distributed semifunctional index of the vector {x0, x1, …, xk−1, ak, ak+1, …, an}. In this case, C has properly simulated . If T1 and T2 are equal to and , respectively, then this is a properly distributed semifunctional index of the vector {x0, x1, …, xk−1, bk, ak+1, …, an}, where bk = ak + τ3. In this case, C has properly simulated . So, if A can distinguish from with nonnegligible advantage, then C can use the output of A to break subspace assumption with nonnegligible advantage.
Open Research
Data Availability
The data used to support the findings of this study are available from the website http://www.cs.cmu.edu/∼./enron/.