Machine Learning Meets Encrypted Search: The Impact and Efficiency of OMKSA in Data Security
Abstract
The convergence of machine learning and searchable encryption enhances the ability to protect the privacy and security of data and enhances the processing power of confidential data. To enable users to efficiently perform machine learning tasks on encrypted data domains, we delve into oblivious keyword search with authorization (OKSA). The OKSA scheme effectively maintains the privacy of the user’s query keywords and prevents the cloud server from inferring ciphertext information through the searching process. However, limitations arise because the traditional OKSA approach does not support multi-keyword searches. If a data file is associated with multiple keywords, each keyword and corresponding data must be encrypted one by one, resulting in inefficiency. We introduce an innovative approach aimed at enhancing the efficiency of search processes while addressing the limitation of current encryption and search systems that handle only a single keyword. This method, known as the oblivious multiple keyword search with authorization (OMKSA), is designed for more effective keyword retrieval. One of our important innovations is that it uses the arithmetic techniques of bilinear pairs to generate new tokens and new search methods to optimize communication efficiency. Moreover, we present a detailed and rigorous demonstration of the security for our proposed protocol, aligned with the predefined security model. We conducted a comparative experiment to determine which of the two schemes, OKSA and OMKSA, is more efficient when querying multiple keywords. Based on our experimental results, our OMKSA is very efficient for data searchers. As the number of query keywords increases, the computational overhead of connected keyword searches remains stable. Finally, as we move into the 5G era, the potential applications of OMKSA are huge, with clear implications for areas such as machine learning and artificial intelligence. Our findings pave the way for further exploration and deployment of these frontier areas.
1. Introduction
In contemporary times, the convergence of machine learning (ML) [1, 2] and searchable encryption (SE) [3–5] is an important innovation. This combination will promote the enhancement of data privacy protection and the utility of big data. Known for their ability to self-learn and adapt, ML algorithms play a key role in retrieving huge datasets, recognizing patterns, and predicting outcomes. However, since these algorithms require access to large amounts of data for performance optimization, the task of maintaining data privacy and security is becoming increasingly important. To solve this problem, we can use SE technology [6], which can perform a secure search on encrypted data, ensuring data confidentiality while preserving the availability of encrypted information. The convergence of ML and SE enhances the ability to perform efficient secure data analysis on encrypted data to protect the privacy and security of the data.
However, there is an obvious problem in public key SE. The cloud server can infer the content and search keywords of outsourced data based on the frequency of retrieval, resulting in data privacy disclosure. However, most of the existing public-key SE schemes cannot solve the inside keyword guessing attack (IKGA) problem [7–10]. Specifically, an inside adversary is an honest but curious cloud server (and cloud server operator) that honestly executes search protocols but tries to infer user keywords and private information. Once the enemy obtains the keywords, it will threaten the security of the ciphertext data file, and it also means that the search preference, personal identity, and other information of the data receiver may be leaked. Therefore, solving the problem of internal keyword guessing attack is a very meaningful and important research topic. Several encryption schemes [11–15] have been designed to solve this problem. However, their search efficiency is relatively low. This highlights the need for a cryptographic search method that is resistant to internal keyword guessing attacks.
Ogata and Kurosawa proposed the oblivious keyword search (OKS) concept [16]. The OKS scheme protects the privacy of both the database and the client. On the one hand, it is to hide the user’s query database target, and on the other hand, it is to ensure that the client gets any other information except the corresponding value of the query target. At the same time, the privacy of keywords is protected, and the cloud server cannot carry out keyword guessing attacks by matching trap information and keyword information. However, in this scheme, each data file can only be associated with one keyword, and in the encryption process, each ciphertext supports only one keyword encryption search. If a file has multiple keywords, the file must be encrypted multiple times. Obviously, most files have more than one keyword.
In addition, the existing OKS protocol has proven to be inefficient when applied to large databases. We need to consider a situation where a company intends to sell a portion of its encrypted data stored on cloud servers, and potential buyers face these restrictions because they want to keep the details of the specific data content of the intended purchase private. In this case, the sender may approve the set of keywords to be retrieved for the receiver and then restrict the receiver from querying the keywords in that set. Specifically, if X represents a set of authorization keywords, the recipient can only select the keyword w ∈ X. At the same time, while the sender allows the encrypted data associated with the keyword X to be retrieved from set X, they still do not know the specific keyword used, thus protecting the privacy of the receiver. In order to solve this problem, the oblivious keyword search with authorization (OKSA) [17] solution came into being. While OKSA protects the privacy of both recipients and senders, there is an obvious problem. This model binds a single keyword to a single data file, and if a file contains multiple keywords, then each keyword and file need to be encrypted individually, which is obviously inefficient. We introduced the oblivious multiple keyword search with authorization (OMKSA) scheme, as shown in Figure 1.

1.1. Our Contributions
- 1.
We came up with a new concept called OMKSA. It is based on the OKSA scheme. In the OKSA scheme, each encrypted data file is bound to a separate keyword. If the file contains a large number of keywords, we must encrypt each keyword and file one by one; otherwise, the retrieval will fail due to inaccurate keyword extraction. Furthermore, in the real world, where a single file often contains multiple keywords, OKSA cannot meet the need for efficiency. In the OMKSA scheme, we improved the encryption algorithm to allow the data provider to assign several keywords to each encrypted file. And the scheme can resist the internal keyword guessing attack problem.
- 2.
We designed OMKSA protocol. Compared with OKSA protocol, our algorithm has higher efficiency when searching multiple keywords. We have carefully designed our approach to disjunctive and conjunctive keyword search, which will meet the flexible needs of users and cloud storage providers. And it adds no additional communication or storage overhead compared to the OKSA scheme searching for the same number of keywords.
- 3.
We have compiled OMKSA scheme and analyzed the efficiency of OMKSA scheme compared with OKSA scheme through comprehensive simulation experiment. The results show that this scheme can effectively improve the efficiency of multi-keyword search. In addition, in-depth security assessments show that OMKSA meets the needs of artificial intelligence (AI) applications in the 5G era.
1.2. Related Works
Here are some pertinent studies.
1.2.1. Public Key Encryption With Keyword Search (PEKS)
As far as we know, PEKS [4] has undergone significant development and improvement since they were proposed by Boneh et al. The initial design was intended to address the limitations of symmetric searchable encryption (SSE) [18–21] by developing a unique mechanism. This mechanism allows the generation of searchable ciphertext using the user’s public key and specific keywords. This innovation makes it not only feasible to search on encrypted data, but it can be very efficient. This innovation began the history of SE, which later researchers introduced many improvements and tweaks aimed at enhancing security and enhancing functionality. The advent of secure channel-free PEKS (SCF-PEKS) scheme [22–25] is a big development in preventing keyword guessing attacks on cloud servers. The untrustworthiness of cloud servers can compromise search confidentiality. Functionality enhancements were not left behind. The advancement in multi-keyword search capabilities is significantly enhanced by the integration of public-key encryption with conjunctive keyword search (PECKS) [26–30]. The introduction of public-key cryptography with temporary keyword searches (PETKS) [31] by Abdalla et al. is another important research result, as it not only significantly improves the usability of database systems but also enables the flexibility of keyword searches. The transformation protocol connects identity-based encryption (IBE) [32] to encrypted data to enable search under ciphertext, further enriching the field of cryptography. The attribute-based encryption (ABE) scheme [33–35] provides subtle control over data search and enhanced privacy protection in the cloud environment [36, 37].
1.2.2. Oblivious Transfer
The origins of inadvertent diversion (OT) [38] can be traced back to Rabin’s seminal work. In the framework of the oblivious transfer protocol, the procedure is marked by an interaction between a sender, denoted as , and a receiver, indicated as . This interaction facilitates a transaction where the sender is unaware of which specific bit the receiver chooses, and the receiver is equally uninformed about the unselected bit. This approach to ensuring the privacy of both parties during transmission has led to the emergence of many OT variants [39–41], all of which contribute to enhanced security and functional diversity.
1.2.3. OKS
The design of OKS [16] is focused on ensuring rigorous privacy safeguards. It is meticulously designed to mask all informative traces, including access patterns, ensuring a user’s query and retrieval activities remain concealed. The pioneering OKS protocol, elucidated by Ogata and Kurosawa, is anchored in the oblivious polynomial evaluation and the oblivious transfer methodologies. The protocol facilitates a user’s engagement with a database in a manner that allows data acquisition without revealing any details about the search keywords or content to the data provider.
Ogata and Kurosawa developed two separate protocols. The first relies on the intricacies of the one-more RSA inversion problem, while the second depends on the difficulties inherent in polynomial reconstruction. Nonetheless, a constraint is manifested in these initial models, and the search is confined to a singular keyword per query. Rhee et al. advanced the narrative by unveiling an oblivious conjunctive keyword search (OCKS) protocol [42]. Rooted in the RSA known target inversion complexity, this enhancement expanded the search capacity. Simultaneously, Freedman and colleagues tackled the issue of privacy-preserving database access by proposing an efficient framework [43]. This structure links keyword search activities with the oblivious evaluation of pseudorandom functions, creating a cohesive framework. Zhu and Bao further enriched this dialog, leveraging enhanced OPEs [44] to craft an OKS protocol, and validated its security credentials. Notwithstanding these advancements, an inefficiency permeates previous OKS protocols; they are confined to a one-keyword-to-one-document architecture. This limitation amplifies both storage requisites and search durations, particularly pronounced when a document is tagged with multiple keywords, necessitating the replication of the document for each associated keyword. Jiang and Liu confronted this challenge, proposing an adept OKS protocol characterized by sublinear time search efficiency and compact ciphertexts [45]. However, it is circumscribed to disjunctive keyword search and is not configured for conjunctive keyword search applications. In this context, OKS continues to evolve, enhancing privacy while facing inherent operational constraints. Since Ogata and Kurosawa first introduced the OKS concept, new protocols have been developed to emphasize the need for coordination between strict privacy protection and operational efficiency. The exploration and resolution of these challenges are not only intrinsic to OKS’ progress but also a necessity in an era when data are ubiquitous and privacy is often under attack. Future research is expected to further address this complex problem, seeking innovative solutions for data privacy, search efficiency, and functional versatility in the OKS field. Zhang et al. introduced an advanced oblivious multiple keyword search (OMKS) service [46], designed to provide users and cloud storage providers with strong privacy protection while enabling efficient multi-keyword search. Jiang et al. introduced an innovative cryptographic primitive known as OKSA [17]. Specifically, OKSA streamlines the verification of search keywords, confirming their inclusion in the user’s authorized set prior to activating the OKS protocol. This feature enables user authorization and extends the level of privacy and security of traditional OKS.
1.2.4. The Integration of ML and SE
The combination of ML and SE is an emerging field that combines the efficient data processing of ML with the privacy protection principles of SE. In the era of big data, concerns about privacy are escalating, prompting people to seek solutions that can efficiently process data while protecting privacy. ML models can extract and process valuable information from massive datasets. However, when applied to sensitive data, a privacy-security conundrum emerges. SE has emerged as a viable solution that allows searching of encrypted data, thus ensuring data privacy [47]. One of the seminal works in this domain is CryptoNets [48], which explores the use of neural networks in encrypting data to make predictions about it while protecting data privacy. This method uses homomorphic encryption technology to convert data into encrypted form and realizes the calculation in ciphertext state. However, homomorphic encryption, while revolutionary, is computationally expensive. This leads to some solutions, such as personal data collection (PATE) [49]. It uses different private learning methods, aggregating the results of multiple models to protect privacy.
Kamil Khudhair et al. optimized image steganography [50–52] to support multiple scenes, high efficiency, and high security. This inspired us to improve the OKS scheme, combining it with deep learning to further achieve high efficiency and enable multi-scenario application. Raghunandan et al. used chaotic behavior [53] to encrypt data for greater security. How to easily access the encrypted data needs to be fully considered, and our scheme can achieve searchability in the encrypted state. Asmitha et al. perfected the problems of computer vision and deep learning in human authentication [54]. This further reinforces the need for ML and SE. ML needs to invoke a lot of private data, and our solution ensures the safe use of private data. Ramesh et al. introduced false modulus in the public key encryption process to improve the security of Rabin cryptosystem [55]. This is a very interesting study and encourages us to continue to investigate how to implement the security of SE on ML.
More recent advancements have broached efficient protocols like SecureML [56], which enhances scalability and efficiency, which improves the computational challenges inherent in privacy-protecting ML. Furthermore, innovations like oblivious neural network predictions (MiniONN) [57] have facilitated privacy-preserving by transforming existing neural network models into oblivious counterparts. With the convergence of ML and SE, the focus is shifting to ensuring data privacy without compromising the quality of ML predictions. The future trajectory may be to explore enhanced computing efficiency, robust security protocols, and common applications that span different data domains.
1.3. Organization
Section 2 presents essential foundational concepts, details the OMKSA algorithm, describes the security model, and discusses the underlying complexity assumptions. Section 3 introduces an efficient OMKSA protocol, while Section 4 offers a security proof. Section 5 focuses on conducting a thorough analysis and assessing the performance of the system. Additionally, Section 6 delves into applications in the field of ML. The paper concludes in Section 7.
2. Preliminaries
2.1. Notations
The symbols and terms utilized in this paper are efficiently compiled in Table 1.
Notation | Description |
---|---|
A data sender | |
The cloud server | |
The receiver | |
X | Authorization keyword set |
x∗ | Search keyword |
α | The master secret key |
K(x∗) | Token of ∗ |
D(x∗) | Trapdoor cryptosystem of ∗ |
L | A collection of search results |
The actual number of keywords retrieved | |
D | The maximum number of keywords that can be retrieved simultaneously |
2.2. Bilinear Pairing
- 1.
Bilinearity: For the generator g of , the other generator h of , and the positive integers a, b, we can get the equation: e(ga, hb) = e(g, h)ab.
- 2.
Computability: For any element g, , we can efficiently compute e, which is a probabilistic polynomial algorithm.
- 3.
Non-degeneracy: There are two elements in , g and h, that satisfy the equation: e(g, h) ≠ 1. 1 is the identity element of . 1 is the identity element of .
e is a symmetric bilinear pair mapping, and the corresponding symmetric double planet pair system is denoted by .
2.3. Algorithm Definition
Our OMKSA solution interacts with three parties: the data sender, the cloud server, and the data receiver.
2.3.1. Setup
Commencing the procedure, the entity provides a security parameter denoted by 1λ and an integer value n, subsequently generating the master key pair (MPK, MSK) for cryptographic system configuration. Furthermore, in collaboration with each user, establishes a distinct keyword set X, ensuring that the size of X does not exceed n2.
2.3.2. Commit
processes a message mi, a designated keyword xi, and the MPK to produce the corresponding encrypted data Ci. Notably, each message mi is coupled with a distinct keyword array . Subsequently, securely transmits the collection of ciphertexts {Ci} to the cloud storage .
2.3.3. Transfer
-
: In this step, inputs the designated keyword set X, including specific keywords within X and MPK. This leads to the creation of the token of keywords, along with the user’s secret key sk and Γ which is proof of accountability. Then, the token and proof, represented as , are sent to . The token is derived from sk, , and MPK, with Γ aiding in verifying the token’s accountability, specifically ensuring it generates a trapdoor for a valid keyword in the set X.
-
: inputs the token and the MSK to affirm accountability, verifying that . It then generates a trapdoor D for delivery to .
-
: On the condition that the sequence aligns with , upon receiving inputs Ci, D, sk, will yield mi; otherwise, it results in ⊥.
2.3.4. Correctness
For an OMKSA to be deemed effective, it is essential that the recipient accurately acquires the specified message, provided all involved parties comply with the outlined protocol. Furthermore, the authenticity is validated when accountability checks confirm that the trapdoor created from the acquired token corresponds exclusively to specific keywords. It is also crucial that these keywords fall within the scope of the authorized keyword collection.
2.4. Security Notions
-
Receiver privacy
-
For a receiver, discerning whether x is x0 or x1 becomes challenging when presented with (K(x), Γ) and the pair of keywords x0, x1.
-
Indistinguishability
-
When faced with a ciphertext C for (m, x) and two distinct message-keyword pairs (m0, x0) and (m1, x1), the receiver finds it difficult to determine if (m, x) matches (m0, x0) or (m1, x1).
-
Accountability
-
For scenarios where (K(X), X, sk) meets the condition |X| > 1, crafting (K(X), Γ) that successfully undergoes verification is a complex task.
In light of the specified requirements, our formulation of the security models is through a series of interactive games involving a challenger, denoted as , and an opponent, referred to as .
2.4.1. User Privacy
-
Setup.
-
executes the Setup procedure, thereby generating the system parameter MPK which is then transmitted to the adversary .
-
Challenge.
-
The adversary proposes two distinct keywords x0, x1 to the challenger . In response, selects a random θ ∈ 0, 1, sets x to xθ, and produces (K(X), Γ).
-
Guess.
-
attempts to guess by announcing θ′ and succeeds if θ′ matches θ.
The advantage of is quantified as Adv = |Pr[θ′ = θ] − 0.5|.
Definition 1. The OMKSA framework is deemed to adhere to user privacy standards when it is infeasible for any adversary operating in probabilistic polynomial-time to secure a substantial edge in the stipulated user privacy contest.
2.4.2. Indistinguishability
-
Setup.
-
The challenger executes the Setup protocol to produce the system parameter MPK and forwards it to adversary .
-
First Stage.
-
In this stage, requests a trapdoor for keyword X, and complies by providing trapdoor D.
-
Challenge.
-
presents two equally long message-keyword pairs (m0, x0) and (m1, x1) to , ensuring that x0, x1 were not previously queried for trapdoors in the First Stage. then issues a challenge ciphertext C∗, selecting θ randomly from 0,1.
-
Second Stage.
-
continues to make trapdoor requests under the same constraints as the Challenge, with providing responses similarly to the First Stage.
-
Guess.
-
proposes a guess of θ′ and is victorious if it matches θ.
The advantage of is defined as Adv = |Pr[θ′ = θ] − 0.5|.
Definition 2. OMKSA achieves indistinguishability against chosen keyword attacks if no adversary, employing probabilistic polynomial time approaches, can significantly succeed in the described game.
2.4.3. Accountability
-
Setup.
-
Executing the Setup phase, the challenger creates the system’s parameter MPK and subsequently dispatches this parameter to the adversary .
-
Challenge.
-
Here, presents (K(X′), X, X′, sk) alongside a numeral 1 for the challenge, deriving K(X′) from X, X′, sk, MPK with |X′| exceeding 1.
-
Win.
-
The adversary submits (K(X′), Γ) and is deemed victorious if this submission successfully clears the verification process.
The edge of in this scenario is determined as Adv in the formulation of (K(X′), Γ).
Definition 3. Accountability within an OMKSA framework is established when no adversary, operating within polynomial time bounds, can achieve success in the previously outlined game with a notable margin.
2.5. Assumptions
The foundational security constructs of OKSA are built upon two complex challenges: the (f, n)-DHE Problem and the (f, q)-MSE-DDH Problem. While the former has been introduced in earlier works [59], here we provide only a brief overview, directing readers to these references for a thorough exploration of its complexities.
Definition 4. The challenge in the (f, n)-DHE Problem within a group of prime order p, involving elements and , is to compute (f(x), hf(a)) from the given sequence . In this context, f(x) represents a polynomial within whose degree is greater than n.
The (f, q)-MSE-DDH Problem is a refined variant of the MSE-DDH Problem, preserving the original problem’s complexity. This problem stands as a particular case within the broader set of Diffie–Hellman exponent assumptions, as discussed in the previous article [60]. Detailed intractability analysis of this problem will be presented later. For more details, please refer to the paper [59].
3. OMKSA Protocol
In this segment, we introduce a protocol designed for OMKSA. This protocol seamlessly accommodates both disjunctive and conjunctive keyword searches, thanks to the integration of innovative token architectures and the implementation of sophisticated search algorithms as delineated in [59]. Moreover, our protocol endows the receiver with the capability to unobtrusively secure an authorized trapdoor by adaptively presenting a keyword token. A hallmark of this system is its efficiency, characterized by the constant size of communication overhead incurred between and . is enabled to construct a trapdoor nested within X without the capacity to ascertain the specific keyword in question. The OMKSA protocol engages three entities: a data sender , a cloud server , and a receiver , unfolding over three distinct phases: Setup, Commit, and Transfer.
3.1. Disjunctive Keyword Search
The scheme facilitates the execution of disjunctive keyword searches by innovatively creating tokens and applying sophisticated search techniques. Imagine a scenario where the data receiver, denoted as , aims to locate documents that include either keyword x1 OR x2. In such cases, the OMKSA scheme is implemented as follows.
3.1.1. Setup
Entity makes the MPK available to everyone and maintains the confidentiality of K.
3.1.2. Commit
Subsequently, securely transmits the collection of ciphertexts {Ci} to the receiver .
3.1.3. Transfer
It is assumed that establishes a distinct keyword set X for each receiver, with and the size of X is denoted as |X| = k not exceeding n2.
Then sends and to .
Then returns the trapdoor D1, D2 to .
adds to a list L(1) and adds to a list L(2), and has L = L(1) ∪ L(2) as final calculation.
3.1.4. Extension to D Keywords
The outlined OMKSA protocol, initially configured for disjunctive keyword search, is readily expandable to encompass D keywords. The data receiver computes for each where , subsequently recovering and compiling the list for every keyword. The final step entails aggregating the lists corresponding to all D keywords to yield the comprehensive disjunctive search outcome.
3.1.5. Correctness
3.2. Conjunctive Keyword Search
Imagine a scenario where the data receiver is inclined to retrieve documents that encompass both keywords x1 AND x2. An initial instinct might be to adapt the protocol aligned with the earlier elaborated disjunctive keyword search; essentially, would amass the list T(1) correlated with keyword x1 and T(2) associated with x2, culminating in the intersection T = T(1)∩T(2) to finalize the conjunctive search outcome. However, this approach is riddled with the pitfall of information leakage, as inadvertently gains access to documents extraneous to the intersection T(1)∩T(2). In simple terms, if a file c1 contains the keyword x1, but does not contain the keyword x2, then it belongs to set T1, and if a file c2 does not contain the keyword x1, but contains the keyword x2, it belongs to set T2, and neither file should be searched by the data receiver . His searching scope includes these files which must contain two keywords x1 and x2. Therefore, the resulting set of files for the data recipient is required to be the intersection of T1 and T2. Otherwise, the search scope of the data receiver will be expanded, causing the pitfall of information leakage.
To navigate around this challenge, an enhancement of the protocol is requisite, one that curtails from accessing data that are not explicitly within the search range. The refinement of the protocol unfolds as delineated in the succeeding text.
3.2.1. Setup
Entity makes the MPK publicly available while ensuring the confidentiality of MSK.
3.2.2. Commit
securely transfers the set of ciphertexts {Ci} to the recipient .
3.2.3. Transfer
It is assumed that establishes a distinct set of keywords, X, for each receiver. This set X, a subset of , is characterized by having a maximum size denoted as X, where |X| = k and does not exceed n2.
Then sends to .
Subsequently, hands over the trapdoor D to .
3.2.4. Correctness
4. Security Analysis
Our analysis focuses on the security aspects of the OMKSA protocol, in accordance with the framework outlined in Section 2.3. This security reduction draws its foundation from the complex problems specified in Section 2.4. A thorough and systematic proofing process is outlined in the subsequent discussion.
4.1. Receiver Privacy
Theorem 1. In the framework, the assurance of absolute keyword privacy for the token, as it pertains to the receiver, is achieved within the context of the privacy game.
Proof 1. Consider X as the set of authorized keywords, and the pair (K(X), Γ) derived from the condition x = x0. The resultant formulation of the keyword token and its corresponding proof is as follows:
In the case of a unique keyword x1, and with s′ chosen from , we adopt the implicit assignment s′ = s · (α + x1)/(α + x0). This results in an equivalence of keyword tokens, namely, K(x0) equals K(x1), a fact that is substantiated through the following verification:
Let us consider Γ′ to be composed of . In this scenario, the accountability proofs are found to be congruent, signifying Γ1 is equal to and Γ2 is equal to . This equivalency can be substantiated through the following validation:
The equivalence (K(x0), Γ) = (K(x1), Γ′) holds true. Given that s is selected at random from , it follows that s′ possesses uniform randomness within as well. Consequently, the distributions of (K(X), Γ) corresponding to both x0 and x1 are indistinguishable, offering no leverage to an adversary in predicting the keyword encapsulated in K(X). This argument forms the basis for the conclusion of Theorem 1.
4.2. Indistinguishability
Theorem 2. It is structured to ensure semantic security and indistinguishability, as per the criteria of the Indistinguishability game, under the assumption that solving the (f, q)-MSE-DDH Problem is an arduous task, as delineated in [60].
Proof 2. Imagine an adversary, referred to as , capable of undermining this indistinguishability. Under this hypothesis, we can devise an algorithm, named , specifically engineered to tackle the (f, q)-MSE-DDH Problem. Given a scenario with a (f, q)-MSE-DDH Problem instance and a variable Z in , the mission for is to ascertain if Z is equal to or if it is a randomly chosen element within . The strategic interplay between and is elaborated in the following explanation.
4.2.1. Setup
4.2.2. H-Query
Selections for are made randomly from the binary set . Subsequently, incorporates (ai, Xi, hi) into the list and furnishes hi back to .
4.2.3. Phase 1
In this phase, selects a set of keywords X, confined within the bounds of , and limited to a maximum size of |X| not more than n. For a trapdoor request corresponding to any keyword xi from opts for a random as sk = s, and communicates (xi, s) to .
Should xi matches xθ, the process is terminated.
In contrast, if xi differs from delivers to , with qi(α) defined as (q(α))/(α + xi).
It becomes apparent that the expression can be computed from elements , as provided in the instance.
4.2.4. Challenge
When presents the pairs (m0, x0) and (m1, x1) as a challenge to , and assuming no prior trapdoor queries have been made for x0 or x1, the protocol proceeds as follows.
If xθ is neither x0 nor x1, the process halts.
Hence, the ciphertext C∗ constitutes an appropriate challenge.
In the scenario where Z represents a random variable within , from the perspective of , C∗ emerges as a random ciphertext.
4.2.5. Phase 2
seeks further trapdoor queries for xi, adhering to the constraint xi ≠ x0, x1. In response, acts in accordance with the guidelines of Phase 1.
4.2.6. Guess
From this, the advantage of in resolving the (f, q) − MSE − DDH Problem is calculated to be at least , thus finalizing the proof of Theorem 2.
4.3. Accountability
Theorem 3. The framework’s competence in ensuring accountability is validated within the accountability game, contingent on the intractability of the (f, n) − DHE Problem [17].
Proof 3. Let us hypothesize the existence of an adversary who is capable of compromising the accountability security. In this case, an algorithm is formulated to address the (f, n) − DHE Problem. Confronted with a (f, n) − DHE challenge, engages with in the following manner.
4.3.1. Setup
4.3.2. Challenge
4.3.3. Win
Let ; then Γ = hf(a). Given that the function f(x) is classified as a polynomial with a degree, deg f(x), surpassing n, it follows that furnishes the pair (f(x), Γ) as the resolution of the (f, n)-DHE Challenge. The conclusion of this proof of Theorem 3 leads to the deduction that the magnitude of X′ is singular, |X′| = 1, and correspondingly, the token K(X′) also possesses a singular count, |K(X′)| = 1.
5. Performance Evaluation
The experiment was conducted on the Ubuntu-22.04.3-desktop-amd64 operating system, with Linux hardware specifications: an 11th Gen Intel (R) Core i5-1135G7 CPU @2.40 GHz and 40.0 G of RAM. All program algorithms were implemented using the 8.3.1 20190311 (Red Hat 8.3.1-3) (GCC) compiler. The experimental data were derived from a dataset of n = 10, generated by random algorithms. We executed both the OKSA and OMKSA protocols on these data. OKSA is designed by Jang, which can only query keywords one by one. OMKSA is based on OMKSA made by us and can query multiple keywords at the same time. When we want to compare queries for the same keywords, our scheme is more efficient than the OKSA scheme.
To illustrate the effectiveness of actual search terms, we varied the number of query keywords D from 1 to 5. The following figure compares the two scenarios and depicts an increase in search time with the augmentation of keyword quantity. The items from left to right are setup time2a, commit time2b, transfer to time2c, transfer to time2d, transfer time2e, and total time2f. We found that OMKSA is designed to query multiple keywords simultaneously, and the OMKSA protocol runtime does not increase with the number of queried words in the setup and commit algorithms. However, the OKSA scheme needs to query keywords one by one, and the query time will double as the number of query keywords increases.
As shown in Figure 2, at t = 1, the total time consumed by both protocols is almost the same; however, for t ≥ 2, the total time consumption of OMKSA is notably less than that of OKSA. Furthermore, the increase in duration for OMKSA is minimal, indicating its aptness for retrieval under multi-keyword scenarios.






Since cloud servers are not trusted, replay attacks are possible. We can further solve the problem of cloud server replay attacks in the future to further ensure the security of cloud storage. In addition, the proposed scheme works well in the laboratory environment, but the experimental data are limited, and it has not been applied in the real scene, which may cause unknown problems. It is necessary to further improve the scheme in the process of application.
6. Application in ML
The applications of OMKSA within the realm of ML are interesting. Users are able to execute ML tasks on encrypted data domains without revealing the contents of training data or the queries. Integrating this with ML poses both challenges and significant practical value. For instance, data in sectors like healthcare and finance, even when encrypted, may still require ML operations. In addition, users might not want to expose the specifics of their queries or the parameters and structure of their ML models.
6.1. Applied on the Medical Platform
The convergence of AI and SE holds expansive prospects for application within the healthcare platform. This realm of possibility not only underscores the importance of leveraging technological advances but also signals a transformative shift in the way healthcare data are managed and accessed.
To expand on this, we must first understand the tremendous role that AI plays in healthcare. AI systems have the potential to revolutionize the medical field by offering personalized medicine, predictive analytics, and enhanced patient care. From analyzing medical images for early disease detection to predicting patient outcomes based on data trends, AI is pushing the boundaries of what is achievable in medicine. Also, AI is now poised to bridge the patient–doctor gap with automatic diagnostic tools that can operate round the clock. On the other hand, the significance of SE in protecting patient data cannot be overstated. The plethora of data generated in healthcare, from patient records to vital signs, necessitates a stringent data privacy policy. SE offers the solution by not only encrypting these data but also allowing them to be searched without being decrypted. It provides an avenue for secure querying of databases while safeguarding sensitive patient information.
- 1.
Data protection and encryption process: The company meticulously gathers user data, utilizing Public Key SE techniques to securely encrypt the information prior to its upload.
- 2.
Task transfer: The company establishes the parameters by collaborating with the third party, utilizing OMKSA for secure transformation of the encrypted data.
- 3.
Task computation: The external agency conducts analytical operations and then relays the findings back to the company.

6.2. Data Preprocessing and Feature Selection
In contemporary ML applications, data preprocessing is not just a prerequisite step, but it also profoundly influences the overall model performance and accuracy. To achieve optimal training results, it might be necessary to sift through vast encrypted data repositories to select critical data subsets for comprehensive analysis. This step often entails choosing and categorizing a myriad of features. With the introduction of the OMKSA technique, researchers can now employ multi-keyword searches to select specific data or features from encrypted datasets. Crucially, this process ensures that there is no disclosure regarding which particular keywords the researchers are keen on, thereby preserving search privacy.
6.3. Distributed ML
Distributed ML and federated learning have become essential strategies for multi-organizational collaborative ventures. In such settings, various entities or organizations aim to collaboratively develop a robust, shared ML model. However, due to concerns spanning data privacy, legal regulations, and intellectual property rights, the direct sharing of raw data becomes infeasible. The advent of OMKSA provides a novel solution to this conundrum. Using OMKSA, one organization can securely authorize another entity to search specific segments of its encrypted data. This means multiple entities can collaborate and train models without directly sharing any tangible data.
7. Conclusion
In this study, we introduced the advanced concept of OMKSA, which improves the traditional OKSA scheme to accommodate multi-keyword search. While traditional OKSA limits each encrypted data file to a single keyword association, OMKSA gives data providers the flexibility to extract multiple keywords for each encrypted dataset. Our analysis and experiments confirm that the OMKSA protocol is both practical and easy to implement for querying encrypted data in a cloud environment. A significant innovation of our approach lies in its integration of aggregation algorithms to generate the trapdoor. Consequently, the sender sent a fixed-size trapdoor to the receiver, enhancing the efficiency of communication.
Additionally, a comprehensive security analysis of the proposed protocol has been conducted, adhering to recognized security models. Notably, when benchmarked against existing OKSA protocols, our OMKSA stands out as exceptionally efficient for data seekers. Moreover, the computational overhead for conjunctive keyword searches remains stable, regardless of an increase in the volume of query keywords. Finally, for the replay attack problem, we may use the password accumulator tool in the future to further solve the problem of cloud server replay attack to ensure the security of cloud storage. Conclusively, as we transition into the 5G era, the potential applications for OMKSA are vast, with pronounced implications for domains like ML and AI. Our findings pave the way for further exploration and deployment in these cutting-edge fields.
Conflicts of Interest
The authors declare no conflicts of interest.
Funding
This work was supported by the National Natural Science Foundation of China (Nos. 62402290, U24A20244, 62302280, and 62072276), Natural Science Foundation of Shandong Province (No. ZR2023QF133), and Youth Innovation Team of Shandong Higher Education Institutions (No. 2024KJN051).
Acknowledgments
This work was supported by the National Natural Science Foundation of China (Nos. 62402290, U24A20244, 62302280, and 62072276), Natural Science Foundation of Shandong Province (No. ZR2023QF133), and Youth Innovation Team of Shandong Higher Education Institutions (No. 2024KJN051).
Open Research
Data Availability Statement
The data used to support the findings of this study are available from the corresponding authors upon request.