Volume 2025, Issue 1 6618969
Research Article
Open Access

ISAC-Assisted Defense Mechanisms for PUE Attacks in Cognitive Radio Networks

Junxian Li

Junxian Li

Department of Electronic and Communication Engineering , North China Electric Power University , Baoding , 071003 , China , ncepu.edu.cn

Search for more papers by this author
Baogang Li

Corresponding Author

Baogang Li

Department of Electronic and Communication Engineering , North China Electric Power University , Baoding , 071003 , China , ncepu.edu.cn

Hebei Key Laboratory of Power Internet of Things Technology , North China Electric Power University , Baoding , 071003 , China , ncepu.edu.cn

Search for more papers by this author
Guanfei You

Guanfei You

Department of Electronic and Communication Engineering , North China Electric Power University , Baoding , 071003 , China , ncepu.edu.cn

Search for more papers by this author
Jingxi Zhang

Jingxi Zhang

Department of Electronic and Communication Engineering , North China Electric Power University , Baoding , 071003 , China , ncepu.edu.cn

Search for more papers by this author
Wei Zhao

Wei Zhao

Department of Electronic and Communication Engineering , North China Electric Power University , Baoding , 071003 , China , ncepu.edu.cn

Search for more papers by this author
First published: 17 March 2025
Citations: 1
Academic Editor: Mohamadreza (Mohammad) Khosravi

Abstract

With the evolution of communication systems toward the sixth-generation technology (6G), intelligent cognitive communication has gained considerable attention. As an important part of intelligent cognitive communication, cognitive radio (CR) offers promising prospects for efficient spectrum utilization. However, with the introduction of cognitive capabilities, CR networks (CRNs) face not only common security threats in wireless systems, but also unique security threats, including primary user emulation (PUE) attacks, endangering communication reliability and confidentiality. In order to enhance the defense ability of CRNs against PUE attacks, this paper proposes an integrated sensing and communication (ISAC)-assisted approach. Leveraging ISAC technology, our scheme enhances location detection precision. We introduce a high-resolution perception signal parameter estimation method and a position-based identity authentication scheme. Furthermore, deep reinforcement learning is used to dynamically optimize the authentication threshold to ensure the stability of authentication in dynamic scenarios. Simulation results show that the proposed scheme is effective in resisting PUE attacks and improves the security and reliability of CRNs.

1. Introduction

With the rapid advancement of wireless communication technology, cognitive radio (CR) has emerged as a focal point of attention due to its intelligent cognitive communication capabilities, particularly its efficient spectrum utilization. Cognitive communication empowers terminal devices to perceive and adapt to their surrounding communication environment, thereby enhancing spectrum efficiency and system capacity [1]. Specifically, this technology enables the unlicensed secondary user (SU) to sense the primary user (PU) spectrum, and the SU can use the PU spectrum when it is idle [2]. To realize spectrum sensing (SS), the SU first receives the signals from the PU and then uses CR technology to identify whether the signal has the characteristics of the PU. Nevertheless, the openness inherent in CR networks (CRNs) exposes them to various security threats, including interception and tampering of wireless signals. Moreover, the introduction of cognitive capabilities has introduced new security vulnerabilities, such as PU emulation (PUE) attacks, SS data falsification (SSDF) attacks, and objective function attacks [35]. These threats pose significant risks to the reliability and security of communication systems, potentially impacting user experiences and data confidentiality.

Among the various security threats, PUE attacks stand out due to their significant impact on spectrum utilization and system reliability. In such attacks, malicious users (MUs) impersonate PUs to transmit deceptive signals in idle frequency bands, with the aim of maximizing their own spectrum resources or preventing legitimate SUs from utilizing idle frequency bands, thus achieving the goal of denial of service (DoS) [6]. These attacks disrupt the SS process, hampering the effective use of spectrum, while also reducing the available channels for legitimate SUs and introducing significant interference to them.

Detection and defense against PUE attacks have been a focus of research in CRNs. For instance, Alahmadi et al. [7] proposed a method using advanced encryption standard (AES) to defend against PUE attacks, which has been proven reliable in CRNs operating in the white space of digital television (DTV) bandwidth. However, while AES can provide robust security, it may also increase computational complexity and processing delays. Additionally, Rana and Shuvo [8] discussed methods for detecting PUE attacks in sensor networks, proposing a weighted least squares algorithm based on a localization defense model. However, these methods often rely on complex algorithms or require additional hardware support, potentially increasing system overhead and complexity. Particularly, Chen, Park, and Reed [9] proposed a transmitter verification scheme called LocDef (location-based defense). This scheme verifies whether a given signal is from the original transmitter by estimating the location of the signal transmitter and observing its signal characteristics. However, the accuracy of this method may be limited by environmental factors and hardware performance.

Among the various defense mechanisms, location-based authentication is considered an effective method. By verifying the geographical location information of users, it is possible to effectively distinguish between PUs and MUs, thereby enhancing the security of the system. Location authentication not only increases the difficulty for MUs to impersonate PUs, but also improves the accuracy of detecting PUE attacks.

The sixth-generation (6G) mobile networks have been envisioned as a key enabler for many emerging applications, such as smart cities, autonomous driving, smart homes, and extended reality. These applications not only require high-quality transmission rate, but also rely on high-precision sensing capabilities, such as wireless localization and environmental awareness. Therefore, it is expected that future networks will go beyond traditional communication and provide sensing capabilities to sense and even image the surrounding environment [10, 11]. However, the separation design of communication and sensing cannot realize the coexistence of high-quality connection and high-precision sensing, hence integrated sensing and communication (ISAC) technology comes into being [12]. The ISAC system can also detect surrounding targets while conveying information to the user [13], which has a significant performance gain. In addition, ISAC effectively improves spectrum utilization and solves the spectrum collision problem between radar and communication systems [14]. The introduction of ISAC technology provides a new dimension to identify and defend against PUE attacks in CRNs. Using ISAC technology to assist CRNs, SUs with SS ability can further improve the accuracy of location detection. This is significantly important to defend against PUE attacks, as it can help CRNs identify and verify PU location information more accurately, thus effectively enhancing the effectiveness of location authentication. To our knowledge, this aspect has not been discussed in the existing literature. Therefore, this study proposes a scheme for ISAC-assisted CRNs to detect PUE attacks, with the aim of improving the security of CRNs and countering PUE attacks.

Nevertheless, ISAC-assisted location authentication also faces some potential challenges, such as dynamic changes in the environment that can affect the accuracy of authentication results. Different detection algorithms may lead to different location estimation errors, which can affect the reliability of authentication. In addition, dynamic changes of the eavesdropper, such as moving, hiding, and cooperating, may lead to different location attack strategies, thus affecting the robustness and flexibility of the authentication. To address these challenges, the system can utilize deep reinforcement learning to optimize the threshold of the authentication parameter to adapt to dynamic changes in the environment [15], different detection algorithms, and different behavior of the eavesdropper. In addition, deep reinforcement learning can also effectively improve learning speed so that the system can quickly find the optimal authentication parameters in different environments.

The contributions of this work can be summarized as follows:

We propose a high resolution perception signal parameter estimation method based on ISAC technology. Utilizing a super resolution algorithm combined with matched filtering algorithm, we detect the sender’s elevation angle, azimuth angle, and distance. This detection algorithm not only addresses the low resolution angle estimation problem in the perception process but also significantly reduces computational complexity.

We propose a position-based identity authentication scheme in CRNs, comparing the detected location of the message sender with known positions of PUs to authenticate the message sender. Utilizing multidimensional location information increases the difficulty for MUs to successfully impersonate PUs, enhances the system’s resistance to PUE attacks, and strengthens the security of CRNs.

We utilize a deep reinforcement learning method known as deep Q-network (DQN) to dynamically adjust authentication parameter thresholds. By leveraging a feedback loop incorporating both exploration and exploitation, DQN refines its decision-making over time, enabling the system to adapt effectively to dynamic and unpredictable environments. This approach makes DQN particularly well-suited for complex and high-variability scenarios, where traditional methods may struggle to provide reliable results.

The remainder of the paper is organized as follows. Section 2 introduces the system model considered in this paper. In Section 3, the proposed target estimation algorithm is detailed. Section 4 proposes the location-based identity authentication scheme. The DQN-based authentication scheme is proposed in Section 5, and the numerical results are illustrated and discussed in Section 6. Finally, the paper is concluded in Section 7.

2. System Model

The considered system is depicted in Figure 1, comprising a PU, a SU, an ISAC base station (BS), and a MU. The PU serves as the BS in the CRN, with priority access to spectrum resources, and is primarily responsible for transmitting critical data and providing services. The SU functions as a regular user device, sensing the PU’s idle spectrum and engaging in communication. The ISAC BS collaborates with SU to assist in environmental sensing and target detection. The MU attempts to impersonate the PU by sending false signals.

Details are in the caption following the image
System model of our scheme.

During system operation, MU may initiate PUE attacks by mimicking the signal characteristics of the PU, aiming to monopolize idle spectrum resources or disrupt SU’s access to spectrum resources. The ISAC BS can assist SU in detecting the sender’s location and verifying its legitimacy as the PU. This collaborative approach enhances the system’s resilience against malicious attacks and ensures communication security.

Since the focus of this study is on the assistance provided by the ISAC BS to SU in locating users claiming to be PUs, thereby achieving identity authentication and enhancing the CRN’s ability to resist PUE attacks, we will now elaborate on the signal model of the ISAC BS.

2.1. Transmitted Signal

The antenna array of the OFDM ISAC BS is a uniform planar array, whose coordinate model is shown in Figure 2. The number of horizontal array elements is L, and the number of vertical array elements is R. The element spacing in the horizontal and vertical directions is dL and dR, respectively. The planar array is located on the yoz plane, where θ is the elevation angle and φ is the azimuth angle. The steering vector of the array can be written as
()
with
()
()
Details are in the caption following the image
Coordinate model of the antenna array.
The system consists of a transmit (Tx) antenna array with elements and a receive (Rx) antenna array with elements, which are used for communication and sensing, respectively. The power of the OFDM signal to be transmitted is split between communication and sensing. The ISAC BS transmits a waveform with M OFDM symbols and N active subcarriers, and senses K targets. The transmit vector is defined as
()
where and are the precoding vectors that map each modulated symbol sn,m,r,l to the transmit antenna, representing the communication and sensing functions, respectively. ρ ∈ [0, 1] is the parameter that controls the power allocation in the two functions.

2.2. Received Signal

After passing through the fast Fourier transform (FFT) block in the OFDM receiver, the received vector of the nth subcarrier of the mth OFDM symbol of the (r, l)-th antenna is , which is given as
()
where is the frequency-domain channel matrix of the nth subcarrier of the mth OFDM symbol of the (r, l)-th antenna, is the additive white Gaussian noise (AWGN) matrix, which has a circularly symmetric zero mean Gaussian distribution with variance . Considering K target reflections, the channel matrix can be expressed as
()
where K is the number of targets, bk is a complex number related to the scattering characteristics of the target, T0 is the total duration of the OFDM symbol, θk, φk, fD,k, and τk are the elevation angle, the azimuth angle, the Doppler frequency shift and the round-trip delay of the kth target, and fΔ is the spacing of the subcarrier. In equation (6), the first term of the phase represents the phase shift caused by the target Doppler frequency shift, the second term represents the phase shift caused by the target distance, and the third and fourth terms represent the phase shift caused by the position of the antenna.

2.3. Sensor-Target-Sensor Path

The target-sensing process of the OFDM ISAC BS roughly includes the following four steps:
  • 1.

    The OFDM BS transmits radio waves to sense and detect K targets

  • 2.

    Radio waves are reflected after illuminating the targets

  • 3.

    The reflected echoes from the targets are received by the OFDM receiver

  • 4.

    The OFDM receiver processes the echo signals and estimates the parameters of the K targets

Under the line-of-sight (LOS) propagation condition, the signal-to-noise ratio (SNR) of a single receive antenna element related to the kth target is defined as [16].
()
where Pt is the average transmit power of the wireless signal, Gt and Gr are the gains of the transmit and receive antennas, respectively, λ is the wavelength of the system carrier, σRCS,k is the radar cross-section (RCS) of the point target k, γk is the normalized array factor at Tx considering the misalignment between the target direction and the sensing direction [17]. R is the distance between the kth target and the BS. N0 is the one-sided noise power spectral density (PSD) at each antenna element.

3. Target Parameter Estimation Algorithm

This section presents the parameter estimation algorithm for K targets, namely, elevation angle, azimuth angle, delay, and Doppler shift. The algorithm consists of two phases: the first phase uses a two-dimensional Root-MUSIC algorithm to estimate the elevation and azimuth angles, and the second phase uses a matched filter to estimate the delay and Doppler shift of the targets.

3.1. Estimation of the Elevation and Azimuth Angle

Since the ISAC BS operates in a self-radiating and self-receiving mode, the DoD and DoA for sensing are identical. Supposing that there are K uncorrelated sources of incidence, with directions as (θ1, φ1), (θ2, φ2), …, (θk, φk), the received signal at time slot t can be written as [18]
()
with
()
Calculating the covariance matrix of the received signal as
()
where is the signal covariance matrix. Performing eigenvalue decomposition in R, and let the eigenvalues be , and the corresponding eigenvectors be , then is zero at (θ1, φ1), (θ2, φ2), …, (θk, φk).
Let , then (θi, φi), i = 1,  2, ⋅⋅⋅, k satisfies the equation
()

By solving equation (11), we can obtain the azimuth and elevation angles of K incident sources.

3.2. Estimation of Delay and Doppler Shift

For ease of explanation, the received signal can be written in matrix notation as
()
where is the Vandermonde matrix of receiving steering vectors, and the matrix S is described by
()
We can use a Vandermonde matrix composed of K estimated receiving steering vectors to filter the signals on K paths, as shown in the following equation:
()
where , if NrK and the K objects are resolvable [19], then for ik, i.e., ΑHAIK. After performing CP removal and FFT on each row of the signal matrix in (14), we obtain K matrices, where the element corresponding to the nth subcarrier of the mth OFDM symbol of the kth path is
()
Performing an element wise-division [20] to removing the unwanted data symbols xn,m, leading to
()
where , let .
Define a vector related to distance (delay) as follows:
()
and define a vector related to velocity (Doppler shift) as follows:
()
Under the condition of Gaussian white noise, matched filtering can achieve optimal SNR and improve target detection capability. Matched filtering is used to estimate the target parameter (τ,  fD), and the specific expression is
()

The above equation is a two-dimensional matched filtering process for (τ,  fD), with minimum search times of NM. Considering the search times and complexity, the joint-matched filtering can be changed to cascaded matched filtering with some loss of matching performance.

4. Location-Based Authentication Process

To counter PUE attacks in CRNs, we propose an ISAC-assisted position-based physical layer authentication scheme. This scheme utilizes the sensing capability of ISAC BS and the physical properties of sender locations. Specifically, when a SU receives SS signals claimed to be from the PU, it first undergoes signal feature-based detection. Subsequently, it collaborates with the ISAC BS to enable the detection of the sender’s location. The detected sender’s location information is then utilized for deception detection. The specific process can be divided into the following three steps:
  • 1.

    Assuming that PU or MU sends p SS signals to SU in a slot, SU collaborates with ISAC BS to perform a two-dimensional angle estimation based on the received messages, obtaining estimated elevation and azimuth angles of the sender.

  • 2.

    The ISAC BS sends out probing waveforms and filters the received echoes based on the angle information estimated in the previous step, obtaining echoes related to the message sender, and then estimates the distance and velocity of the message transmitter by using matched filtering.

  • 3.

    SU uses the probed location information of the message transmitter to perform a hypothesis test and determines the legitimacy of the message transmitter’s identity.

In this authentication process, SU knows the location information P = (x, y, z) of the PU BS, and the authentication cost is denoted as C. The MU monitors the activity status of the PU and launches deception attacks on idle frequency bands without interfering with the PU transmission for his own benefit. MU chooses the probability of sending deceptive messages as s ∈ [0, 1].

SU first obtains the location of the message transmitter in the Cartesian coordinate system based on the probed location information , as follows:
()
Then, the SU performs a hypothesis test to determine whether the message comes from the PU. The null hypothesis H0 indicates that the message comes from PU. The alternative hypothesis H1 indicates that the message does not come from the PU, that is, the message is sent by the MU. Therefore, the location-based binary hypothesis test in slot t can be expressed as
()
According to the authentication requirements, an authentication threshold is set and the legitimate location vector P is compared with the location vector of the message transmitter. SU calculates the test statistic L of the hypothesis test as
()
where ‖⋅‖ is the Frobenius norm.
The authentication performance can be characterized by the false alarm rate and the miss detection rate , which are given as
()
where Pr(⋅|⋅) is the conditional probability. The false alarm rate is the probability that SU misjudges PU as MU, which may be due to the low authentication threshold. The miss detection rate is the probability that MU successfully deceives SU, passes the authentication, and is regarded as the PU, which may be the consequence of the high authentication threshold. Therefore, it is crucial for SU to choose a suitable test threshold for spoof detection.

5. Location-Based Authentication With Deep Reinforcement Learning

The above authentication strategies face limitations in adapting to dynamic environmental changes. To address this, we integrate deep reinforcement learning, specifically the DQN algorithm, to optimize location-based authentication. DQN dynamically adjusts the authentication threshold by learning from the environment through trial and error, employing a mechanism of exploration and exploitation. This process allows the system to continuously refine its authentication strategy based on feedback, ensuring robust performance in the face of environmental fluctuations. Compared to traditional Q-learning, DQN benefits from its ability to handle high-dimensional state spaces more efficiently, and it converges faster in complex environments. Additionally, when compared to algorithms like deep deterministic policy gradient (DDPG), which are better suited for continuous action spaces, DQN’s discrete action space suits the problem of threshold optimization well, providing a balance of efficiency and simplicity.

In deception detection using DQN, SU selects the test threshold for location-based authentication based on the current state s(t) to determine the sender of the p packets received in slot t. The main elements in reinforcement learning are as follows:

States: At each time slot t, the state of the system consists of the false alarm rate and the miss detection rate of the deception detection at slot t − 1, i.e., , where S is the set of states observed by the SU. For simplicity, the false alarm rate and the miss detection rate are quantized into L + 1 levels, i.e., .

Actions: Based on the observed state, the SU chooses the authentication threshold δ(t) from A + 1 levels, i.e., δ(t) ∈ {a/A}0≤aA.

Policy: SU adopts a near-greedy action selection policy to determine the test threshold [21]. It is as follows:
()
The policy has two modes:
  • 1.

    Exploration: the agent randomly tries different actions at each time step t to discover an effective action.

  • 2.

    Exploitation: the agent selects an action that corresponds to the maximum Q-value predicted by the Q-network for the current state at time step t.

In this policy, the probability of SU performing exploration is ε, and the probability of exploitation is 1 − ε, where ε ∈ (0, 1) is a hyperparameter that balances exploration and exploitation, and the value of ε decreases as the time step t increases, which means that the algorithm gradually shifts from exploration to exploitation.

Reward function: u(t) reflects the performance of the SU against PUE attacks after selecting the threshold δ(t). Therefore, we define the authentication utility u(t) at slot t as follows:
()
where αf (or αm) is the weight factor of the false alarm (or miss detection) in the detection of the deception, and β is the weight factor of the authentication cost.
In the DQN algorithm, in order to solve the convergence problem of traditional Q table, neural network is used instead of Q table. Different from DQN, the Q value is not calculated directly by state values and action, but by the neural network, the input of the neural network is state S, and the output is Q(s, δ), ∀δ(t) ∈ {a/A}0≤aA. The weights ωt in DQN are updated on the minibatch data using gradient descent (GD) algorithm at time t after each iteration, and the loss function can be expressed as
()
with
()

In addition, the experience replay mechanism is used to facilitate the training of DQN, and the experience e(t) = (s(t), δ(t), u(t), s(t + 1)) of each time step t is saved in the experience pool D = {e(1), e(2), ⋅⋅⋅, e(M)}. Each time the target parameter is updated, a part of the data are extracted from the experience pool D for updating, which can break the correlation between adjacent training samples and has the advantages of stability and avoiding local minimum convergence.

We propose a DQN-based method to resist PUE attacks, and the overall flow of the algorithm is as follows (Algorithm 1).

    Algorithm 1: Location-based PHY authentication with DQN.
  • 1.

    Initialize γ, ε, s(0), A, D = ∅.

  • 2.

    fort = 1,  2, ⋅⋅⋅do

  • 3.

     Receive packet p.

  • 4.

     Estimate the sender’s location .

  • 5.

  • 6.

    ε≔max(ε · λ, εmin)

  • 7.

     Sample r ∼ Uniform(0, 1)

  • 8.

     Choose δ(t) via (24).

  • 9.

     Calculate via (22).

  • 10.

    ifthen

  • 11.

      Accept packet p.

  • 12.

    else

  • 13.

      Send a spoofing alarm.

  • 14.

    end if

  • 15.

     Observe u(t) and next state s(t + 1).

  • 16.

     Store experience e(t) = (s(t), δ(t), u(t), s(t + 1)) in D.

  • 17.

     Sample random minibatch of experience (s(j), δ(j), u(j), s(j + 1)) from D.

  • 18.

     Train ω by minimize the loss function L(ω).

  • 19.

    end for

6. Simulation and Results

In this section, we simulate the proposed location-based identity authentication scheme in CRNs using DQN, implemented in Python 3.7. We set the position of the PU relative to the ISAC BS at an elevation angle of 45°, an azimuth angle of 20°, and a distance of 5 m. The MU is modeled as a single instance, with its location randomly varied near the PU during the simulations. The ISAC BS is configured with a carrier frequency of 24 GHz, a bandwidth of 100 MHz, 1024 subcarriers, and both transmit and receive antenna gains of 20 dB.

For the DQN-based reinforcement learning model, the action set (i.e., test thresholds) is quantized into 20 levels, while the false alarm rate, the miss detection rate, and the deception rate are quantized into 10 levels. The model parameters include αf = 0.5, αm = 1, β = 0.01, and the discount factor γ = 0.95.

To comprehensively evaluate the performance of our proposed scheme, we compare it against three established benchmark schemes: the fixed threshold scheme, the random threshold scheme, and the fast Q-learning-based scheme [22]. These comparisons allow us to thoroughly assess the effectiveness of our DQN-based approach in dynamic threshold optimization and adaptability to changing environmental conditions.

Figure 3 illustrates the utility of the SU under different authentication strategies. The DQN-based authentication scheme outperforms the fixed threshold, random threshold, and fast Q-learning-based schemes in terms of utility and convergence speed. Specifically, the utility under the DQN-based scheme converges to approximately −0.046 after 1700 time slots, representing improvements of about 56.6% over the fixed threshold scheme (−0.106) and 37.8% over the fast Q-learning-based scheme (−0.074). Additionally, the DQN approach achieves this near-optimal utility faster than the fast Q-learning-based scheme, while both fixed and random threshold schemes exhibit slower and less stable convergence. Notably, the random threshold scheme does not show any signs of convergence and exhibits significant fluctuations, further underscoring the advantages of employing adaptive learning strategies like DQN for effective threshold selection in dynamic environments.

Details are in the caption following the image
Comparison of the utility of SU with different authentication strategies.

As depicted in Figures 4 and 5, the DQN-based strategy demonstrates a consistent decrease in both the false alarm rate and the miss detection rate over time. Specifically, the false alarm rate decreases from 9.6% to 3.6% after 2000 time slots, representing a reduction of approximately 39% compared to the fixed threshold strategy. Similarly, the miss detection rate drops from 15% to 6.4%, indicating a reduction of about 36.1% compared to the fixed threshold strategy. While the fast Q-learning-based scheme also improves both metrics, reducing the false alarm rate to 4.25% and the miss detection rate to 7.91%, it still lags behind the DQN-based scheme in both performance and convergence speed. The DQN-based strategy stabilizes more rapidly, converging after about 1400 time slots, while the fast Q-learning-based scheme converges more slowly.

Details are in the caption following the image
Comparison of the false alarm rate with different authentication strategies.
Details are in the caption following the image
Comparison of the miss detection rate with different authentication strategies.

Figure 6 illustrates the changes in authentication success rates for the three schemes under different numbers of receiving antennas. From Figure 6, it can be observed that as the number of receiving antennas increases, the authentication success rates of all three schemes improve. Notably, the DQN scheme consistently exhibits the best authentication success rates across all antenna numbers. This demonstrates the significant advantage of the DQN scheme in handling dynamic environments and optimizing threshold selection.

Details are in the caption following the image
Impact of the number of receiving antennas on authentication success rates for different strategies.

Table 1 presents a comprehensive performance comparison of the DQN-based scheme, the fixed threshold scheme, and the fast Q-learning-based scheme across key metrics. The DQN-based approach not only achieves the highest average reward but also maintains the lowest false alarm rate and miss detection rate, underscoring its effectiveness in accurately distinguishing between PUs and MUs. In contrast, while the fast Q-learning-based scheme offers improvements over the fixed threshold approach, it still lags behind the DQN-based method in terms of average authentication success rate. These results collectively highlight the advantages of employing deep reinforcement learning for dynamic threshold optimization in enhancing authentication performance within CRNs.

Table 1. Performance comparison of different authentication strategies.
Metric DQN Fast Q-learning Fixed
Average reward −0.0459 −0.0744 −0.1058
Average false alarm rate 0.0360 0.0425 0.0591
Average miss detection rate 0.0640 0.0791 0.1002
Average authentication success rate 0.9000 0.8784 0.8407

7. Conclusions

In this paper, we propose an ISAC-assisted defense mechanism to address PUE attacks in CRNs. By integrating high-resolution perception signal parameter estimation and a location-based identity authentication scheme, the system significantly improves the accuracy of location verification, which is crucial in distinguishing between PUs and MUs. Additionally, deep reinforcement learning, specifically the DQN-based method, dynamically adjusts authentication thresholds in response to environmental changes, enhancing both adaptability and robustness. The simulation results show significant improvements over existing methods in terms of false alarm and miss detection rates, as well as average rewards, highlighting the efficiency of our approach in enhancing the security of CRNs against PUE attacks. By leveraging ISAC, our approach enhances both SS and security, making it well-suited for complex CRN environments. Future work can explore additional enhancements to further strengthen CRN defenses against evolving threats.

Conflicts of Interest

The authors declare no conflicts of interest.

Funding

This research was supported by National Natural Science Foundation of China (Grants No. 62471181, 61971190) and Natural Science Foundation of Hebei Province (Grants No. F2022502020)

Acknowledgments

This research was supported by National Natural Science Foundation of China (Grants No. 62471181 and 61971190) and Natural Science Foundation of Hebei Province (Grants No. F2022502020)

    Data Availability Statement

    The data used to support the findings of this study are available from the corresponding authors upon request.

      The full text of this article hosted at iucr.org is unavailable due to technical difficulties.