Volume 2025, Issue 1 2429577
Research Article
Open Access

Machine Learning Meets Encrypted Search: The Impact and Efficiency of OMKSA in Data Security

Zhongkai Wei

Zhongkai Wei

School of Mathematics , Shandong University , Jinan , China , sdu.edu.cn

Search for more papers by this author
Ye Su

Ye Su

School of Information Science and Engineering , Shandong Normal University , Jinan , China , sdnu.edu.cn

Search for more papers by this author
Xi Zhang

Xi Zhang

College of Computer and Cyber Security , Hebei Normal University , Shijiazhuang , China , hebtu.edu.cn

Hebei Provincial Key Laboratory of Network and Information Security , Hebei Normal University , Shijiazhuang , China , hebtu.edu.cn

Search for more papers by this author
Haining Yang

Corresponding Author

Haining Yang

School of Mathematics , Shandong University , Jinan , China , sdu.edu.cn

Search for more papers by this author
Jing Qin

Corresponding Author

Jing Qin

School of Mathematics , Shandong University , Jinan , China , sdu.edu.cn

Search for more papers by this author
Jixin Ma

Jixin Ma

School of Computing and Mathematical Sciences , University of Greenwich , London , UK , gre.ac.uk

Search for more papers by this author
First published: 16 January 2025
Academic Editor: Vasudevan Rajamohan

Abstract

The convergence of machine learning and searchable encryption enhances the ability to protect the privacy and security of data and enhances the processing power of confidential data. To enable users to efficiently perform machine learning tasks on encrypted data domains, we delve into oblivious keyword search with authorization (OKSA). The OKSA scheme effectively maintains the privacy of the user’s query keywords and prevents the cloud server from inferring ciphertext information through the searching process. However, limitations arise because the traditional OKSA approach does not support multi-keyword searches. If a data file is associated with multiple keywords, each keyword and corresponding data must be encrypted one by one, resulting in inefficiency. We introduce an innovative approach aimed at enhancing the efficiency of search processes while addressing the limitation of current encryption and search systems that handle only a single keyword. This method, known as the oblivious multiple keyword search with authorization (OMKSA), is designed for more effective keyword retrieval. One of our important innovations is that it uses the arithmetic techniques of bilinear pairs to generate new tokens and new search methods to optimize communication efficiency. Moreover, we present a detailed and rigorous demonstration of the security for our proposed protocol, aligned with the predefined security model. We conducted a comparative experiment to determine which of the two schemes, OKSA and OMKSA, is more efficient when querying multiple keywords. Based on our experimental results, our OMKSA is very efficient for data searchers. As the number of query keywords increases, the computational overhead of connected keyword searches remains stable. Finally, as we move into the 5G era, the potential applications of OMKSA are huge, with clear implications for areas such as machine learning and artificial intelligence. Our findings pave the way for further exploration and deployment of these frontier areas.

1. Introduction

In contemporary times, the convergence of machine learning (ML) [1, 2] and searchable encryption (SE) [35] is an important innovation. This combination will promote the enhancement of data privacy protection and the utility of big data. Known for their ability to self-learn and adapt, ML algorithms play a key role in retrieving huge datasets, recognizing patterns, and predicting outcomes. However, since these algorithms require access to large amounts of data for performance optimization, the task of maintaining data privacy and security is becoming increasingly important. To solve this problem, we can use SE technology [6], which can perform a secure search on encrypted data, ensuring data confidentiality while preserving the availability of encrypted information. The convergence of ML and SE enhances the ability to perform efficient secure data analysis on encrypted data to protect the privacy and security of the data.

However, there is an obvious problem in public key SE. The cloud server can infer the content and search keywords of outsourced data based on the frequency of retrieval, resulting in data privacy disclosure. However, most of the existing public-key SE schemes cannot solve the inside keyword guessing attack (IKGA) problem [710]. Specifically, an inside adversary is an honest but curious cloud server (and cloud server operator) that honestly executes search protocols but tries to infer user keywords and private information. Once the enemy obtains the keywords, it will threaten the security of the ciphertext data file, and it also means that the search preference, personal identity, and other information of the data receiver may be leaked. Therefore, solving the problem of internal keyword guessing attack is a very meaningful and important research topic. Several encryption schemes [1115] have been designed to solve this problem. However, their search efficiency is relatively low. This highlights the need for a cryptographic search method that is resistant to internal keyword guessing attacks.

Ogata and Kurosawa proposed the oblivious keyword search (OKS) concept [16]. The OKS scheme protects the privacy of both the database and the client. On the one hand, it is to hide the user’s query database target, and on the other hand, it is to ensure that the client gets any other information except the corresponding value of the query target. At the same time, the privacy of keywords is protected, and the cloud server cannot carry out keyword guessing attacks by matching trap information and keyword information. However, in this scheme, each data file can only be associated with one keyword, and in the encryption process, each ciphertext supports only one keyword encryption search. If a file has multiple keywords, the file must be encrypted multiple times. Obviously, most files have more than one keyword.

In addition, the existing OKS protocol has proven to be inefficient when applied to large databases. We need to consider a situation where a company intends to sell a portion of its encrypted data stored on cloud servers, and potential buyers face these restrictions because they want to keep the details of the specific data content of the intended purchase private. In this case, the sender may approve the set of keywords to be retrieved for the receiver and then restrict the receiver from querying the keywords in that set. Specifically, if X represents a set of authorization keywords, the recipient can only select the keyword wX. At the same time, while the sender allows the encrypted data associated with the keyword X to be retrieved from set X, they still do not know the specific keyword used, thus protecting the privacy of the receiver. In order to solve this problem, the oblivious keyword search with authorization (OKSA) [17] solution came into being. While OKSA protects the privacy of both recipients and senders, there is an obvious problem. This model binds a single keyword to a single data file, and if a file contains multiple keywords, then each keyword and file need to be encrypted individually, which is obviously inefficient. We introduced the oblivious multiple keyword search with authorization (OMKSA) scheme, as shown in Figure 1.

Details are in the caption following the image
The framework of OMKSA.

1.1. Our Contributions

This paper’s primary advancements are detailed in the following ways:
  • 1.

    We came up with a new concept called OMKSA. It is based on the OKSA scheme. In the OKSA scheme, each encrypted data file is bound to a separate keyword. If the file contains a large number of keywords, we must encrypt each keyword and file one by one; otherwise, the retrieval will fail due to inaccurate keyword extraction. Furthermore, in the real world, where a single file often contains multiple keywords, OKSA cannot meet the need for efficiency. In the OMKSA scheme, we improved the encryption algorithm to allow the data provider to assign several keywords to each encrypted file. And the scheme can resist the internal keyword guessing attack problem.

  • 2.

    We designed OMKSA protocol. Compared with OKSA protocol, our algorithm has higher efficiency when searching multiple keywords. We have carefully designed our approach to disjunctive and conjunctive keyword search, which will meet the flexible needs of users and cloud storage providers. And it adds no additional communication or storage overhead compared to the OKSA scheme searching for the same number of keywords.

  • 3.

    We have compiled OMKSA scheme and analyzed the efficiency of OMKSA scheme compared with OKSA scheme through comprehensive simulation experiment. The results show that this scheme can effectively improve the efficiency of multi-keyword search. In addition, in-depth security assessments show that OMKSA meets the needs of artificial intelligence (AI) applications in the 5G era.

1.2. Related Works

Here are some pertinent studies.

1.2.1. Public Key Encryption With Keyword Search (PEKS)

As far as we know, PEKS [4] has undergone significant development and improvement since they were proposed by Boneh et al. The initial design was intended to address the limitations of symmetric searchable encryption (SSE) [1821] by developing a unique mechanism. This mechanism allows the generation of searchable ciphertext using the user’s public key and specific keywords. This innovation makes it not only feasible to search on encrypted data, but it can be very efficient. This innovation began the history of SE, which later researchers introduced many improvements and tweaks aimed at enhancing security and enhancing functionality. The advent of secure channel-free PEKS (SCF-PEKS) scheme [2225] is a big development in preventing keyword guessing attacks on cloud servers. The untrustworthiness of cloud servers can compromise search confidentiality. Functionality enhancements were not left behind. The advancement in multi-keyword search capabilities is significantly enhanced by the integration of public-key encryption with conjunctive keyword search (PECKS) [2630]. The introduction of public-key cryptography with temporary keyword searches (PETKS) [31] by Abdalla et al. is another important research result, as it not only significantly improves the usability of database systems but also enables the flexibility of keyword searches. The transformation protocol connects identity-based encryption (IBE) [32] to encrypted data to enable search under ciphertext, further enriching the field of cryptography. The attribute-based encryption (ABE) scheme [3335] provides subtle control over data search and enhanced privacy protection in the cloud environment [36, 37].

1.2.2. Oblivious Transfer

The origins of inadvertent diversion (OT) [38] can be traced back to Rabin’s seminal work. In the framework of the oblivious transfer protocol, the procedure is marked by an interaction between a sender, denoted as , and a receiver, indicated as . This interaction facilitates a transaction where the sender is unaware of which specific bit the receiver chooses, and the receiver is equally uninformed about the unselected bit. This approach to ensuring the privacy of both parties during transmission has led to the emergence of many OT variants [3941], all of which contribute to enhanced security and functional diversity.

1.2.3. OKS

The design of OKS [16] is focused on ensuring rigorous privacy safeguards. It is meticulously designed to mask all informative traces, including access patterns, ensuring a user’s query and retrieval activities remain concealed. The pioneering OKS protocol, elucidated by Ogata and Kurosawa, is anchored in the oblivious polynomial evaluation and the oblivious transfer methodologies. The protocol facilitates a user’s engagement with a database in a manner that allows data acquisition without revealing any details about the search keywords or content to the data provider.

Ogata and Kurosawa developed two separate protocols. The first relies on the intricacies of the one-more RSA inversion problem, while the second depends on the difficulties inherent in polynomial reconstruction. Nonetheless, a constraint is manifested in these initial models, and the search is confined to a singular keyword per query. Rhee et al. advanced the narrative by unveiling an oblivious conjunctive keyword search (OCKS) protocol [42]. Rooted in the RSA known target inversion complexity, this enhancement expanded the search capacity. Simultaneously, Freedman and colleagues tackled the issue of privacy-preserving database access by proposing an efficient framework [43]. This structure links keyword search activities with the oblivious evaluation of pseudorandom functions, creating a cohesive framework. Zhu and Bao further enriched this dialog, leveraging enhanced OPEs [44] to craft an OKS protocol, and validated its security credentials. Notwithstanding these advancements, an inefficiency permeates previous OKS protocols; they are confined to a one-keyword-to-one-document architecture. This limitation amplifies both storage requisites and search durations, particularly pronounced when a document is tagged with multiple keywords, necessitating the replication of the document for each associated keyword. Jiang and Liu confronted this challenge, proposing an adept OKS protocol characterized by sublinear time search efficiency and compact ciphertexts [45]. However, it is circumscribed to disjunctive keyword search and is not configured for conjunctive keyword search applications. In this context, OKS continues to evolve, enhancing privacy while facing inherent operational constraints. Since Ogata and Kurosawa first introduced the OKS concept, new protocols have been developed to emphasize the need for coordination between strict privacy protection and operational efficiency. The exploration and resolution of these challenges are not only intrinsic to OKS’ progress but also a necessity in an era when data are ubiquitous and privacy is often under attack. Future research is expected to further address this complex problem, seeking innovative solutions for data privacy, search efficiency, and functional versatility in the OKS field. Zhang et al. introduced an advanced oblivious multiple keyword search (OMKS) service [46], designed to provide users and cloud storage providers with strong privacy protection while enabling efficient multi-keyword search. Jiang et al. introduced an innovative cryptographic primitive known as OKSA [17]. Specifically, OKSA streamlines the verification of search keywords, confirming their inclusion in the user’s authorized set prior to activating the OKS protocol. This feature enables user authorization and extends the level of privacy and security of traditional OKS.

1.2.4. The Integration of ML and SE

The combination of ML and SE is an emerging field that combines the efficient data processing of ML with the privacy protection principles of SE. In the era of big data, concerns about privacy are escalating, prompting people to seek solutions that can efficiently process data while protecting privacy. ML models can extract and process valuable information from massive datasets. However, when applied to sensitive data, a privacy-security conundrum emerges. SE has emerged as a viable solution that allows searching of encrypted data, thus ensuring data privacy [47]. One of the seminal works in this domain is CryptoNets [48], which explores the use of neural networks in encrypting data to make predictions about it while protecting data privacy. This method uses homomorphic encryption technology to convert data into encrypted form and realizes the calculation in ciphertext state. However, homomorphic encryption, while revolutionary, is computationally expensive. This leads to some solutions, such as personal data collection (PATE) [49]. It uses different private learning methods, aggregating the results of multiple models to protect privacy.

Kamil Khudhair et al. optimized image steganography [5052] to support multiple scenes, high efficiency, and high security. This inspired us to improve the OKS scheme, combining it with deep learning to further achieve high efficiency and enable multi-scenario application. Raghunandan et al. used chaotic behavior [53] to encrypt data for greater security. How to easily access the encrypted data needs to be fully considered, and our scheme can achieve searchability in the encrypted state. Asmitha et al. perfected the problems of computer vision and deep learning in human authentication [54]. This further reinforces the need for ML and SE. ML needs to invoke a lot of private data, and our solution ensures the safe use of private data. Ramesh et al. introduced false modulus in the public key encryption process to improve the security of Rabin cryptosystem [55]. This is a very interesting study and encourages us to continue to investigate how to implement the security of SE on ML.

More recent advancements have broached efficient protocols like SecureML [56], which enhances scalability and efficiency, which improves the computational challenges inherent in privacy-protecting ML. Furthermore, innovations like oblivious neural network predictions (MiniONN) [57] have facilitated privacy-preserving by transforming existing neural network models into oblivious counterparts. With the convergence of ML and SE, the focus is shifting to ensuring data privacy without compromising the quality of ML predictions. The future trajectory may be to explore enhanced computing efficiency, robust security protocols, and common applications that span different data domains.

1.3. Organization

Section 2 presents essential foundational concepts, details the OMKSA algorithm, describes the security model, and discusses the underlying complexity assumptions. Section 3 introduces an efficient OMKSA protocol, while Section 4 offers a security proof. Section 5 focuses on conducting a thorough analysis and assessing the performance of the system. Additionally, Section 6 delves into applications in the field of ML. The paper concludes in Section 7.

2. Preliminaries

2.1. Notations

The symbols and terms utilized in this paper are efficiently compiled in Table 1.

Table 1. Notations.
Notation Description
A data sender
The cloud server
The receiver
X Authorization keyword set
x Search keyword
α The master secret key
K(x) Token of  
D(x) Trapdoor cryptosystem of  
L A collection of search results
The actual number of keywords retrieved
D The maximum number of keywords that can be retrieved simultaneously

2.2. Bilinear Pairing

, are two multiplicative cyclic groups of order p, any generator of is g, and bilinear mapping satisfies the following properties [58]:
  • 1.

    Bilinearity: For the generator g of , the other generator h of , and the positive integers a, b, we can get the equation: e(ga, hb) = e(g, h)ab.

  • 2.

    Computability: For any element g, , we can efficiently compute e, which is a probabilistic polynomial algorithm.

  • 3.

    Non-degeneracy: There are two elements in , g and h, that satisfy the equation: e(g, h) ≠ 1. 1 is the identity element of . 1 is the identity element of .

e is a symmetric bilinear pair mapping, and the corresponding symmetric double planet pair system is denoted by .

2.3. Algorithm Definition

Our OMKSA solution interacts with three parties: the data sender, the cloud server, and the data receiver.

2.3.1. Setup

Commencing the procedure, the entity provides a security parameter denoted by 1λ and an integer value n, subsequently generating the master key pair (MPK, MSK) for cryptographic system configuration. Furthermore, in collaboration with each user, establishes a distinct keyword set X, ensuring that the size of X does not exceed n2.

2.3.2. Commit

processes a message mi, a designated keyword xi, and the MPK to produce the corresponding encrypted data Ci. Notably, each message mi is coupled with a distinct keyword array . Subsequently, securely transmits the collection of ciphertexts {Ci} to the cloud storage .

2.3.3. Transfer

It is assumed that establishes a distinct keyword set X in negotiations with each receiver , where .
  • : In this step, inputs the designated keyword set X, including specific keywords within X and MPK. This leads to the creation of the token of keywords, along with the user’s secret key sk and Γ which is proof of accountability. Then, the token and proof, represented as , are sent to . The token is derived from sk, , and MPK, with Γ aiding in verifying the token’s accountability, specifically ensuring it generates a trapdoor for a valid keyword in the set X.

  • : inputs the token and the MSK to affirm accountability, verifying that . It then generates a trapdoor D for delivery to .

  • : On the condition that the sequence aligns with , upon receiving inputs Ci, D, sk, will yield mi; otherwise, it results in ⊥.

2.3.4. Correctness

For an OMKSA to be deemed effective, it is essential that the recipient accurately acquires the specified message, provided all involved parties comply with the outlined protocol. Furthermore, the authenticity is validated when accountability checks confirm that the trapdoor created from the acquired token corresponds exclusively to specific keywords. It is also crucial that these keywords fall within the scope of the authorized keyword collection.

2.4. Security Notions

Based on the previous article [4, 16], our framework establishes three key security criteria: confidentiality for the receiver, indistinguishability, and accountability. The first aspect, user confidentiality, ensures that during the i − th transaction phase, the sender, denoted as , is unable to infer the search keywords from the user’s token. The principle of indistinguishability acts as a protective measure, preventing a potentially adversarial receiver, labeled , from deciphering both the message and keyword embedded within the ciphertext. The accountability aspect primarily ensures that each trapdoor request by a user is distinctly linked to a specific keyword in the authorized keyword array.
  • Receiver privacy

  • For a receiver, discerning whether x is x0 or x1 becomes challenging when presented with (K(x), Γ) and the pair of keywords x0, x1.

  • Indistinguishability

  • When faced with a ciphertext C for (m, x) and two distinct message-keyword pairs (m0, x0) and (m1, x1), the receiver finds it difficult to determine if (m, x) matches (m0, x0) or (m1, x1).

  • Accountability

  • For scenarios where (K(X), X, sk) meets the condition |X| > 1, crafting (K(X), Γ) that successfully undergoes verification is a complex task.

In light of the specified requirements, our formulation of the security models is through a series of interactive games involving a challenger, denoted as , and an opponent, referred to as .

2.4.1. User Privacy

  • Setup.

  • executes the Setup procedure, thereby generating the system parameter MPK which is then transmitted to the adversary .

  • Challenge.

  • The adversary proposes two distinct keywords x0, x1 to the challenger . In response, selects a random θ ∈ 0, 1, sets x to xθ, and produces (K(X), Γ).

  • Guess.

  • attempts to guess by announcing θ and succeeds if θ matches θ.

The advantage of is quantified as Adv = |Pr[θ = θ] − 0.5|.

Definition 1. The OMKSA framework is deemed to adhere to user privacy standards when it is infeasible for any adversary operating in probabilistic polynomial-time to secure a substantial edge in the stipulated user privacy contest.

2.4.2. Indistinguishability

  • Setup.

  • The challenger executes the Setup protocol to produce the system parameter MPK and forwards it to adversary .

  • First Stage.

  • In this stage, requests a trapdoor for keyword X, and complies by providing trapdoor D.

  • Challenge.

  • presents two equally long message-keyword pairs (m0, x0) and (m1, x1) to , ensuring that x0, x1 were not previously queried for trapdoors in the First Stage. then issues a challenge ciphertext C, selecting θ randomly from 0,1.

  • Second Stage.

  • continues to make trapdoor requests under the same constraints as the Challenge, with providing responses similarly to the First Stage.

  • Guess.

  • proposes a guess of θ and is victorious if it matches θ.

The advantage of is defined as Adv = |Pr[θ = θ] − 0.5|.

Definition 2. OMKSA achieves indistinguishability against chosen keyword attacks if no adversary, employing probabilistic polynomial time approaches, can significantly succeed in the described game.

2.4.3. Accountability

In OMKSA, the essence of accountability verification lies in confirming that each trapdoor produced is exclusively linked to a single permitted keyword. This process counteracts scenarios where an adversary, denoted as , might construct a legitimate keyword token K(X), with X being a fraction of the endorsed keyword set X, maintaining that 1 < |X| < |X| ≤ n, indicating ’s awareness of X, X, and the secret key sk used in the token’s formation.
  • Setup.

  • Executing the Setup phase, the challenger creates the system’s parameter MPK and subsequently dispatches this parameter to the adversary .

  • Challenge.

  • Here, presents (K(X), X, X, sk) alongside a numeral 1 for the challenge, deriving K(X) from X, X, sk, MPK with |X| exceeding 1.

  • Win.

  • The adversary submits (K(X), Γ) and is deemed victorious if this submission successfully clears the verification process.

The edge of in this scenario is determined as Adv in the formulation of (K(X), Γ).

Definition 3. Accountability within an OMKSA framework is established when no adversary, operating within polynomial time bounds, can achieve success in the previously outlined game with a notable margin.

2.5. Assumptions

The foundational security constructs of OKSA are built upon two complex challenges: the (f, n)-DHE Problem and the (f, q)-MSE-DDH Problem. While the former has been introduced in earlier works [59], here we provide only a brief overview, directing readers to these references for a thorough exploration of its complexities.

Definition 4. The challenge in the (f, n)-DHE Problem within a group of prime order p, involving elements and , is to compute (f(x), hf(a)) from the given sequence . In this context, f(x) represents a polynomial within whose degree is greater than n.

The (f, q)-MSE-DDH Problem is a refined variant of the MSE-DDH Problem, preserving the original problem’s complexity. This problem stands as a particular case within the broader set of Diffie–Hellman exponent assumptions, as discussed in the previous article [60]. Detailed intractability analysis of this problem will be presented later. For more details, please refer to the paper [59].

3. OMKSA Protocol

In this segment, we introduce a protocol designed for OMKSA. This protocol seamlessly accommodates both disjunctive and conjunctive keyword searches, thanks to the integration of innovative token architectures and the implementation of sophisticated search algorithms as delineated in [59]. Moreover, our protocol endows the receiver with the capability to unobtrusively secure an authorized trapdoor by adaptively presenting a keyword token. A hallmark of this system is its efficiency, characterized by the constant size of communication overhead incurred between and . is enabled to construct a trapdoor nested within X without the capacity to ascertain the specific keyword in question. The OMKSA protocol engages three entities: a data sender , a cloud server , and a receiver , unfolding over three distinct phases: Setup, Commit, and Transfer.

3.1. Disjunctive Keyword Search

The scheme facilitates the execution of disjunctive keyword searches by innovatively creating tokens and applying sophisticated search techniques. Imagine a scenario where the data receiver, denoted as , aims to locate documents that include either keyword x1 OR x2. In such cases, the OMKSA scheme is implemented as follows.

3.1.1. Setup

Entity begins by accepting a security parameter denoted as 1λ, an integer value n, and the group system . Following this, selects generator elements g, h from , and chooses random numbers α, x from . It proceeds to calculate gα and for each i ranging from 1 to n + 1 for i = 1, 2, …, n + 1. A individual hash function H mapping from to a binary string of length is also selected. The ensuing steps involve the generation of the master key pair:
()

Entity makes the MPK available to everyone and maintains the confidentiality of K.

3.1.2. Commit

Within the framework, the complete keyword space is represented by , encompassing a total of n2 elements. Each message is paired with corresponding keywords. For a given message mi within the binary set and associated keywords from , the entity selects a random value ri from , It then calculates β as the product of and forms the encrypted message Ci in the following manner:
()

Subsequently, securely transmits the collection of ciphertexts {Ci} to the receiver .

3.1.3. Transfer

It is assumed that establishes a distinct keyword set X for each receiver, with and the size of X is denoted as |X| = k not exceeding n2.

: With the predefined keyword set X, and specific keywords alongside the MPK, the receiver chooses a random number s from and let sk = s. Then, , formulate the tokens, where and correspond to the chosen keywords to be searched by .
()
and the proof Γ is
()

Then sends and to .

: Given the tuple , with the master public key MPK in hand, proceeds to verify accountability using the subsequent equations,
()
and
()
When both equations are satisfied, it confirms that the tokens K1, K2 are associated with the authorized set of keywords, indicated as
()
In cases where this condition is not met, the algorithm terminates. Upon having the MSK and the keyword set then proceeds to generate the trapdoors D1, D2 as
()

Then returns the trapdoor D1, D2 to .

: Upon receiving Ci, D1, D2, sk, the entity initiates the process of searching by
()
Should the preceding equation be satisfied, the receiver then proceeds with the decryption process by
()

adds to a list L(1) and adds to a list L(2), and has L = L(1)L(2) as final calculation.

3.1.4. Extension to D Keywords

The outlined OMKSA protocol, initially configured for disjunctive keyword search, is readily expandable to encompass D keywords. The data receiver computes for each where , subsequently recovering and compiling the list for every keyword. The final step entails aggregating the lists corresponding to all D keywords to yield the comprehensive disjunctive search outcome.

3.1.5. Correctness

With the (MPK, MSK) obtained from the Setup, and at hand, the verification of accountability’s correctness hinges on the fulfillment of the following expression.
()
With the ciphertext Ci obtained through the execution of the Commit algorithm, coupled with the trapdoors D1 and D2 generated via Transfer, and the user’s sk, the process to validate the accuracy of the scheme’s function involves
()

3.2. Conjunctive Keyword Search

Imagine a scenario where the data receiver is inclined to retrieve documents that encompass both keywords x1 AND x2. An initial instinct might be to adapt the protocol aligned with the earlier elaborated disjunctive keyword search; essentially, would amass the list T(1) correlated with keyword x1 and T(2) associated with x2, culminating in the intersection T = T(1)T(2) to finalize the conjunctive search outcome. However, this approach is riddled with the pitfall of information leakage, as inadvertently gains access to documents extraneous to the intersection T(1)T(2). In simple terms, if a file c1 contains the keyword x1, but does not contain the keyword x2, then it belongs to set T1, and if a file c2 does not contain the keyword x1, but contains the keyword x2, it belongs to set T2, and neither file should be searched by the data receiver . His searching scope includes these files which must contain two keywords x1 and x2. Therefore, the resulting set of files for the data recipient is required to be the intersection of T1 and T2. Otherwise, the search scope of the data receiver will be expanded, causing the pitfall of information leakage.

To navigate around this challenge, an enhancement of the protocol is requisite, one that curtails from accessing data that are not explicitly within the search range. The refinement of the protocol unfolds as delineated in the succeeding text.

3.2.1. Setup

Entity , takes as input a security parameter 1λ, an integer n and the group tuple as Section 3.1.1. Then selects generator elements g and h from , along with random values α, x from the set . It then calculates for i = 1, 2, …, n + 1. is also employed. The construction of the master key pair follows these steps.
()

Entity makes the MPK publicly available while ensuring the confidentiality of MSK.

3.2.2. Commit

In the defined keyword space , which encompasses n elements, each message is linked to specific keywords. For any given message mi within the binary space and its associated set of keywords from , the system selects a random ri from , and calculates . Ci is
()

securely transfers the set of ciphertexts {Ci} to the recipient .

3.2.3. Transfer

It is assumed that establishes a distinct set of keywords, X, for each receiver. This set X, a subset of , is characterized by having a maximum size denoted as X, where |X| = k and does not exceed n2.

: In the context of the authorized keyword collection X, encompassing specific keywords within X, and with access to MPK, the participant selects a random s from to serve as sk = s and proceeds to calculate the corresponding token ,
()
and the proof Γ as
()

Then sends to .

: With the set and the MPK at hand, undertakes the verification of accountability through the application of specific equations,
()
When the conditions of both equations are satisfied, confirms that the provided keyword token corresponds to a singular keyword trapdoor, indicated by . In the event of a failure to meet these conditions, the operation is terminated. Upon possessing MSK and the keyword set X, entity proceeds to generate the specific trapdoor D.
()

Subsequently, hands over the trapdoor D to .

: Upon this transfer, with Ci, the trapdoor D, and the secret key sk in possession, commences the search operation
()
Upon verification, proceeds with the next steps of the decryption process
()

3.2.4. Correctness

With the (MPK, MSK) obtained through the execution of the Setup algorithm, and equipped with the token or proof combination , the validation of accountability’s accuracy is determined through a series of specific equations.
()
Upon receiving the ciphertext Ci generated through the Commit algorithm, along with the trapdoor D created via the Transfer, and the user’s sk, one can authenticate the effectiveness of the search and decryption processes
()

4. Security Analysis

Our analysis focuses on the security aspects of the OMKSA protocol, in accordance with the framework outlined in Section 2.3. This security reduction draws its foundation from the complex problems specified in Section 2.4. A thorough and systematic proofing process is outlined in the subsequent discussion.

4.1. Receiver Privacy

Theorem 1. In the framework, the assurance of absolute keyword privacy for the token, as it pertains to the receiver, is achieved within the context of the privacy game.

Proof 1. Consider X as the set of authorized keywords, and the pair (K(X), Γ) derived from the condition x = x0. The resultant formulation of the keyword token and its corresponding proof is as follows:

()

In the case of a unique keyword x1, and with s chosen from , we adopt the implicit assignment s = s · (α + x1)/(α + x0). This results in an equivalence of keyword tokens, namely, K(x0) equals K(x1), a fact that is substantiated through the following verification:

()

Let us consider Γ to be composed of . In this scenario, the accountability proofs are found to be congruent, signifying Γ1 is equal to and Γ2 is equal to . This equivalency can be substantiated through the following validation:

()

The equivalence (K(x0), Γ) = (K(x1), Γ) holds true. Given that s is selected at random from , it follows that s possesses uniform randomness within as well. Consequently, the distributions of (K(X), Γ) corresponding to both x0 and x1 are indistinguishable, offering no leverage to an adversary in predicting the keyword encapsulated in K(X). This argument forms the basis for the conclusion of Theorem 1.

4.2. Indistinguishability

Theorem 2. It is structured to ensure semantic security and indistinguishability, as per the criteria of the Indistinguishability game, under the assumption that solving the (f, q)-MSE-DDH Problem is an arduous task, as delineated in [60].

Proof 2. Imagine an adversary, referred to as , capable of undermining this indistinguishability. Under this hypothesis, we can devise an algorithm, named , specifically engineered to tackle the (f, q)-MSE-DDH Problem. Given a scenario with a (f, q)-MSE-DDH Problem instance and a variable Z in , the mission for is to ascertain if Z is equal to or if it is a randomly chosen element within . The strategic interplay between and is elaborated in the following explanation.

4.2.1. Setup

The foundational premise posits a universal keyword space, represented as . selects a specific keyword xθ from , with the associated message symbolized as mθ. The algorithm tacitly establishes the polynomials.
()
Additionally, the algorithm lets and it gets . Following this, forwards MPK to adversary , where
()

4.2.2. H-Query

The entity operates a hash list designated as L(ai, Xi, hi), starting in an empty state. When a query for H(ai, Xi) is received and if the tuple (ai, Xi) already exists within L, then provides the associated hi back to . In cases where (ai, Xi) is not present in L, determines the hash value hi using the following approach.
()

Selections for are made randomly from the binary set . Subsequently, incorporates (ai, Xi, hi) into the list and furnishes hi back to .

4.2.3. Phase 1

In this phase, selects a set of keywords X, confined within the bounds of , and limited to a maximum size of |X| not more than n. For a trapdoor request corresponding to any keyword xi from opts for a random as sk = s, and communicates (xi, s) to .

Should xi matches xθ, the process is terminated.

In contrast, if xi differs from delivers to , with qi(α) defined as (q(α))/(α + xi).

Subsequent steps involve verifying the trapdoor.
()

It becomes apparent that the expression can be computed from elements , as provided in the instance.

4.2.4. Challenge

When presents the pairs (m0, x0) and (m1, x1) as a challenge to , and assuming no prior trapdoor queries have been made for x0 or x1, the protocol proceeds as follows.

If xθ is neither x0 nor x1, the process halts.

However, if xθ matches either x0 or x1, verifies the presence of (0, Z) and (1, Z) within the list L. If found, retrieves and labels the corresponding hash values as and . In the absence of these entries, randomly generates and from the binary set and proceeds to set them accordingly.
()
Subsequently, incorporates the pairs and into the compilation L. Following this, furnishes with the designated ciphertext for the challenge.
()
Verification of the condition is feasible by assigning the value of r as rif(α) in an implicit manner.
()

Hence, the ciphertext C constitutes an appropriate challenge.

In the scenario where Z represents a random variable within , from the perspective of , C emerges as a random ciphertext.

4.2.5. Phase 2

seeks further trapdoor queries for xi, adhering to the constraint xix0, x1. In response, acts in accordance with the guidelines of Phase 1.

4.2.6. Guess

The adversary generates a prediction, denoted as θ, aiming to ascertain the value of θ. With this step, our simulation narrative reaches its completion. Next, we transition to an analysis phase, probing the proficiency of in unraveling the intricate hard problem. Let us assume qT represents the total volume of trapdoor queries and n symbolizes the magnitude of the keyword space. In light of the preceding simulation dynamics, the probability of persevering without an abortive outcome is delineated in the following sections.
()
Given that queries for x0 and x1 were not made in either Phase 1 or Phase 2, the quantity qT is bounded by n − 2. Consequently, the probability Pr[abort] is no less than 2/n. Assuming ’s success in compromising the security game is at least ϵ, the reduction in advantage can be represented as
()

From this, the advantage of in resolving the (f, q) − MSE − DDH Problem is calculated to be at least , thus finalizing the proof of Theorem 2.

4.3. Accountability

Theorem 3. The framework’s competence in ensuring accountability is validated within the accountability game, contingent on the intractability of the (f, n) − DHE Problem [17].

Proof 3. Let us hypothesize the existence of an adversary who is capable of compromising the accountability security. In this case, an algorithm is formulated to address the (f, n) − DHE Problem. Confronted with a (f, n) − DHE challenge, engages with in the following manner.

4.3.1. Setup

Algorithm assigns α = a, leading to the derivation of , as per the (f, n) − DHE scenario. Subsequently, selects H akin to the actual scheme. Then circulates the MPK for ,
()

4.3.2. Challenge

selects two unique keyword sets, X and X, ensuring |X| > 1 and |X| ≤ n. A secret key sk is then established by randomly choosing a value s from , with sk = s. Following this, proposes (K(X), X, X, sk) along with a challenge indication set to 1. Here, the token K is characterized as
()

4.3.3. Win

The adversary selects two keyword sets, X and X, conforming to the constraints |X| > 1 and |X| ≤ n. Upon choosing a random secret key s from , denoted as sk = s, then formulates and submits (K(X), X, X, sk) with a challenge indicator set to 1. The token within this setup is characterized as
()
Consequently, both the token and its associated proof are capable of successfully undergoing the matching process as follows:
()

Let ; then Γ = hf(a). Given that the function f(x) is classified as a polynomial with a degree, deg f(x), surpassing n, it follows that furnishes the pair (f(x), Γ) as the resolution of the (f, n)-DHE Challenge. The conclusion of this proof of Theorem 3 leads to the deduction that the magnitude of X is singular, |X| = 1, and correspondingly, the token K(X) also possesses a singular count, |K(X)| = 1.

5. Performance Evaluation

The experiment was conducted on the Ubuntu-22.04.3-desktop-amd64 operating system, with Linux hardware specifications: an 11th Gen Intel (R) Core i5-1135G7 CPU @2.40 GHz and 40.0 G of RAM. All program algorithms were implemented using the 8.3.1 20190311 (Red Hat 8.3.1-3) (GCC) compiler. The experimental data were derived from a dataset of n = 10, generated by random algorithms. We executed both the OKSA and OMKSA protocols on these data. OKSA is designed by Jang, which can only query keywords one by one. OMKSA is based on OMKSA made by us and can query multiple keywords at the same time. When we want to compare queries for the same keywords, our scheme is more efficient than the OKSA scheme.

To illustrate the effectiveness of actual search terms, we varied the number of query keywords D from 1 to 5. The following figure compares the two scenarios and depicts an increase in search time with the augmentation of keyword quantity. The items from left to right are setup time2a, commit time2b, transfer to time2c, transfer to time2d, transfer time2e, and total time2f. We found that OMKSA is designed to query multiple keywords simultaneously, and the OMKSA protocol runtime does not increase with the number of queried words in the setup and commit algorithms. However, the OKSA scheme needs to query keywords one by one, and the query time will double as the number of query keywords increases.

As shown in Figure 2, at t = 1, the total time consumed by both protocols is almost the same; however, for t ≥ 2, the total time consumption of OMKSA is notably less than that of OKSA. Furthermore, the increase in duration for OMKSA is minimal, indicating its aptness for retrieval under multi-keyword scenarios.

Details are in the caption following the image
Comparison of search time between OKSA and OMKSA. (a) Setup time. (b) Commit time. (c) Transfer R to S time. (d) Transfer S to R time. (e) Transfer R time. (f) Total time.
Details are in the caption following the image
Comparison of search time between OKSA and OMKSA. (a) Setup time. (b) Commit time. (c) Transfer R to S time. (d) Transfer S to R time. (e) Transfer R time. (f) Total time.
Details are in the caption following the image
Comparison of search time between OKSA and OMKSA. (a) Setup time. (b) Commit time. (c) Transfer R to S time. (d) Transfer S to R time. (e) Transfer R time. (f) Total time.
Details are in the caption following the image
Comparison of search time between OKSA and OMKSA. (a) Setup time. (b) Commit time. (c) Transfer R to S time. (d) Transfer S to R time. (e) Transfer R time. (f) Total time.
Details are in the caption following the image
Comparison of search time between OKSA and OMKSA. (a) Setup time. (b) Commit time. (c) Transfer R to S time. (d) Transfer S to R time. (e) Transfer R time. (f) Total time.
Details are in the caption following the image
Comparison of search time between OKSA and OMKSA. (a) Setup time. (b) Commit time. (c) Transfer R to S time. (d) Transfer S to R time. (e) Transfer R time. (f) Total time.

Since cloud servers are not trusted, replay attacks are possible. We can further solve the problem of cloud server replay attacks in the future to further ensure the security of cloud storage. In addition, the proposed scheme works well in the laboratory environment, but the experimental data are limited, and it has not been applied in the real scene, which may cause unknown problems. It is necessary to further improve the scheme in the process of application.

6. Application in ML

The applications of OMKSA within the realm of ML are interesting. Users are able to execute ML tasks on encrypted data domains without revealing the contents of training data or the queries. Integrating this with ML poses both challenges and significant practical value. For instance, data in sectors like healthcare and finance, even when encrypted, may still require ML operations. In addition, users might not want to expose the specifics of their queries or the parameters and structure of their ML models.

6.1. Applied on the Medical Platform

The convergence of AI and SE holds expansive prospects for application within the healthcare platform. This realm of possibility not only underscores the importance of leveraging technological advances but also signals a transformative shift in the way healthcare data are managed and accessed.

To expand on this, we must first understand the tremendous role that AI plays in healthcare. AI systems have the potential to revolutionize the medical field by offering personalized medicine, predictive analytics, and enhanced patient care. From analyzing medical images for early disease detection to predicting patient outcomes based on data trends, AI is pushing the boundaries of what is achievable in medicine. Also, AI is now poised to bridge the patient–doctor gap with automatic diagnostic tools that can operate round the clock. On the other hand, the significance of SE in protecting patient data cannot be overstated. The plethora of data generated in healthcare, from patient records to vital signs, necessitates a stringent data privacy policy. SE offers the solution by not only encrypting these data but also allowing them to be searched without being decrypted. It provides an avenue for secure querying of databases while safeguarding sensitive patient information.

Now, let us consider the combined force of AI and SE. When these technologies are integrated, we can pave the way for a reliable, encrypted database where AI can freely and securely extract insights. The use of SE can ensure the privacy of the data while AI can efficiently analyze the data to facilitate improved healthcare. This fusion of technologies can thus pave the way for a new era of medical treatment and patient management. For instance, consider a healthcare portal where users employ distinct keywords to classify their health issues. The company operating this medical platform employs Public Key SE to securely store user information and uploads encrypted data onto a cloud server. To broaden their business scope, the firm opts to delve deeper into research involving its user base, such as identifying common illnesses like “Hypertension,” “Periarthritis of Shoulder,” “Cervical Spondylosis,” “Osteoporosis,” and “Diabetes.” To achieve this objective, the organization delegates the task to an expert external agency, which is tasked with managing these activities while rigorously safeguarding the confidentiality of user health information. Figure 3 illustrates the framework, with the main steps outlined below.
  • 1.

    Data protection and encryption process: The company meticulously gathers user data, utilizing Public Key SE techniques to securely encrypt the information prior to its upload.

  • 2.

    Task transfer: The company establishes the parameters by collaborating with the third party, utilizing OMKSA for secure transformation of the encrypted data.

  • 3.

    Task computation: The external agency conducts analytical operations and then relays the findings back to the company.

Details are in the caption following the image
An application of OMKSA in machine learning.

6.2. Data Preprocessing and Feature Selection

In contemporary ML applications, data preprocessing is not just a prerequisite step, but it also profoundly influences the overall model performance and accuracy. To achieve optimal training results, it might be necessary to sift through vast encrypted data repositories to select critical data subsets for comprehensive analysis. This step often entails choosing and categorizing a myriad of features. With the introduction of the OMKSA technique, researchers can now employ multi-keyword searches to select specific data or features from encrypted datasets. Crucially, this process ensures that there is no disclosure regarding which particular keywords the researchers are keen on, thereby preserving search privacy.

6.3. Distributed ML

Distributed ML and federated learning have become essential strategies for multi-organizational collaborative ventures. In such settings, various entities or organizations aim to collaboratively develop a robust, shared ML model. However, due to concerns spanning data privacy, legal regulations, and intellectual property rights, the direct sharing of raw data becomes infeasible. The advent of OMKSA provides a novel solution to this conundrum. Using OMKSA, one organization can securely authorize another entity to search specific segments of its encrypted data. This means multiple entities can collaborate and train models without directly sharing any tangible data.

7. Conclusion

In this study, we introduced the advanced concept of OMKSA, which improves the traditional OKSA scheme to accommodate multi-keyword search. While traditional OKSA limits each encrypted data file to a single keyword association, OMKSA gives data providers the flexibility to extract multiple keywords for each encrypted dataset. Our analysis and experiments confirm that the OMKSA protocol is both practical and easy to implement for querying encrypted data in a cloud environment. A significant innovation of our approach lies in its integration of aggregation algorithms to generate the trapdoor. Consequently, the sender sent a fixed-size trapdoor to the receiver, enhancing the efficiency of communication.

Additionally, a comprehensive security analysis of the proposed protocol has been conducted, adhering to recognized security models. Notably, when benchmarked against existing OKSA protocols, our OMKSA stands out as exceptionally efficient for data seekers. Moreover, the computational overhead for conjunctive keyword searches remains stable, regardless of an increase in the volume of query keywords. Finally, for the replay attack problem, we may use the password accumulator tool in the future to further solve the problem of cloud server replay attack to ensure the security of cloud storage. Conclusively, as we transition into the 5G era, the potential applications for OMKSA are vast, with pronounced implications for domains like ML and AI. Our findings pave the way for further exploration and deployment in these cutting-edge fields.

Conflicts of Interest

The authors declare no conflicts of interest.

Funding

This work was supported by the National Natural Science Foundation of China (Nos. 62402290, U24A20244, 62302280, and 62072276), Natural Science Foundation of Shandong Province (No. ZR2023QF133), and Youth Innovation Team of Shandong Higher Education Institutions (No. 2024KJN051).

Acknowledgments

This work was supported by the National Natural Science Foundation of China (Nos. 62402290, U24A20244, 62302280, and 62072276), Natural Science Foundation of Shandong Province (No. ZR2023QF133), and Youth Innovation Team of Shandong Higher Education Institutions (No. 2024KJN051).

    Data Availability Statement

    The data used to support the findings of this study are available from the corresponding authors upon request.

      The full text of this article hosted at iucr.org is unavailable due to technical difficulties.