Volume 2025, Issue 1 5813659
Research Article
Open Access

CSI Acquisition in Internet of Vehicle Network: Federated Edge Learning With Model Pruning and Vector Quantization

Yi Wang

Yi Wang

School of Electronics and Information , Zhengzhou University of Aeronautics , Zhengzhou , 450046 , Henan, China , zua.edu.cn

Henan Key Laboratory of General Aviation Technology , Zhengzhou University of Aeronautics , Zhengzhou , 450046 , Henan, China , zua.edu.cn

Search for more papers by this author
Junlei Zhi

Junlei Zhi

School of Electronics and Information , Zhengzhou University of Aeronautics , Zhengzhou , 450046 , Henan, China , zua.edu.cn

Henan Key Laboratory of General Aviation Technology , Zhengzhou University of Aeronautics , Zhengzhou , 450046 , Henan, China , zua.edu.cn

Search for more papers by this author
Linsheng Mei

Linsheng Mei

School of Computer Science and Information Engineering , Hefei University of Technology , Hefei , 230601 , Anhui, China , hfut.edu.cn

Search for more papers by this author
Wei Huang

Corresponding Author

Wei Huang

School of Computer Science and Information Engineering , Hefei University of Technology , Hefei , 230601 , Anhui, China , hfut.edu.cn

Search for more papers by this author
First published: 18 March 2025
Academic Editor: Konglin Zhu

Abstract

The conventional machine learning (ML)–based channel state information (CSI) acquisition has overlooked the potential privacy disclosure and estimation overhead problem caused by transmitting pilot datasets during the estimation stage. In this paper, we propose federated edge learning for CSI acquisition to protect the data privacy in the Internet of vehicle network with massive antenna array. To reduce the channel estimation overhead, the joint model pruning and vector quantization algorithm for network gradient parameters is presented to reduce the amount of exchange information between the centralized server and devices. This scheme allows for local fine-tuning to adapt the global model to the channel characteristics of each device. In addition, we also provide theoretical guarantees of convergence and quantization error bound in closed form, respectively. Simulation results demonstrate that the proposed FL-based CSI acquisition with model pruning and vector quantization scheme can efficiently improve the performance of channel estimation while reducing the communication overhead.

1. Introduction

With the rapid development of Internet of things (IoT) and autonomous driving technology, vehicles and unmanned aerial vehicles (UAVs) have become more intelligent and IoT devices will generate a large amount of data, where those data need powerful communication and computer resource to realize the intelligent in-vehicle applications [1, 2]. Additionally, pushing the part of signal processing operation into the edge of networks is a powerful technique to reduce the transmission latency and communication overhead [35]. Nevertheless, the computing ability of vehicles or UAV is generally limited, due to the size of the device and power supply [6]. In this case, jointly using the centralized server at the roadside base station (BS) and edge server at the vehicles can efficiently leverage computing and communication resource. In order to make full use of the centralized and edge resource, it is necessary to exchange the information between the centralized server and edge servers. Especially in the context of the massive or extremely large-scale multiple-input multiple-output communication system, BSs are usually equipped with a large number of antennas [79]. Therefore, the accurate channel state information (CSI) acquisition becomes important. In current communication systems, the BS is usually equipped with massive or extremely large-scale antenna array, which leads to a large number of communication overhead [10]. It is well known that there exists two types of CSI acquisition mode in the current Internet of vehicle (IoV) communication systems, which are, respectively, called as pilot signaling–based channel estimation and beam training approaches [11, 12].

For the channel estimation with pilot signaling, it will generate lots of estimation overhead in the massive and extremely large-scale antenna array. To address the issue, the works on pilot-based channel estimation algorithm have been developed by utilizing the channel sparsity in angular or polar domain in the past few years, where the compressed sensing theory was exploited to reduce the pilot overhead for the massive or extremal large-scale antenna systems [13, 14]. Unfortunately, channel estimation with compress sensing method has to design the pilot matrix for satisfying the restricted isometry property [15, 16], which leads to the pilots randomization and makes the performance of channel estimation degradation. Alternately, the implicit CSI acquisition with beam training approaches has been proposed for massive frequency division multiplexing communication systems [17, 18].

Recently, machine learning (ML), such as reinforcement learning and graph neural network, has been introduced to uncover the nonlinear relationships in data/signals with lower computational complexity and achieve better performance for parameter inference and be tolerant against the imperfections in the data, which also has been applied into the channel estimation in wireless communication systems [19, 20]. They developed the conditional generative adversarial network framework to uplink-to-downlink mapping of both the CSI and channel covariance matrix, which can efficiently reduce the number of datasets. On the other hand, in the scenarios with some data privacy and sensitive security, e.g., IoV, railway, and bank, the conventional centralized learning algorithms are no longer applicable. Federated learning (FL), acting as a promising distributed ML methodology, was proposed to solve the information island phenomenon, which is able to alleviate privacy risks and the communication overhead required for uploading a large number of user data [2123]. Therefore, the authors in [24] proposed the FL-based channel estimation algorithm to realize the goal of the global optimal estimation performance by exchanging between the centralize server and distributed server without collecting the original data. Different from the conventional CSI feedback or limited feedback strategies, an interacting federated and transfer learning framework for downlink CSI prediction was presented to solve the isolated data silos and online adaptation problem of CSI acquisition [25].

However, due to the unreliable wireless channel and limited wireless resource in the IoV networks, it is essential to develop effective communication and learning framework to improve the ultimate performance of wireless FL. In order to speed up learning rate and deal with high communication overhead of ML algorithm, several edge learning frameworks, such as federated edge learning, enable devices to compute local stochastic gradients, and transmit them to an edge server for aggregation and updating of a global learning model [26]. Since typical stochastic gradients in learning networks are of high dimensionality, transmission of the gradients parameters over communication networks can result in extensive overhead and a bottleneck for fast edge learning. To tackle this challenge, numerous schemes have been developed to reduce the communication overhead, where gradient compression is a common compression strategy. Specifically, the authors in [27, 28] proposed the scalar quantization scheme with “Quantized SGD” (QSGD) and “signSGD” and divided each dimension into several levels into compressing stochastic gradients so as to reduce the number of exchange parameters. Further, the authors in [29, 30] initially tried to explore vector quantization schemes to reduce the amount of communication symbol, where the edge devices communicate with the codeword index to the edge server rather than conveying the quantized version. Moreover, the adaptive period control schemes for the FL algorithm were presented to adjust the communication period so as to speed up the training while ensuring that the training loss is always minimized [31, 32].

Another aspect is to use the model compression technique to reduce the learning latency for local parameter calculation. To allow the different devices to participate in the model training with different model sizes, an ensemble distillation method for model aggregation was proposed in [33]. Based on the ensemble distillation method, network pruning has been adopted for FL to reduce the local model size [34]. Furthermore, the authors in [35] have proposed the adaptive pruning scheme to pruning the local model parameters by exploiting the similarity between the local model and global model, which can accelerate the convergence and reduce the updating overhead. While, the existing wireless FL investigated the homogeneous model settings where the devices train identical local models and the scale of the global model is restricted by the device with the lowest capability. To overcome the problem, an adaptive model pruning at the edge server by pruning the local model was proposed to adapt their heterogeneous computation capabilities [36]. In [37], the authors have developed the concept of the best sparsification levels to perform the model quantization operation. This approach is able to achieve the goal of minimizing the total energy while reducing the time consumption. Due to the gradients being redundant, the joint model pruning and scalar quantization was developed to reap the benefits of deep neural networks (DNNs) while satisfying the capability of resource-constrained devices [38]. Furthermore, it is noted that the device friendly and communication-efficient FL algorithm with model pruning and quantization can reduce storage, communication, and computation requirements, accelerating the training process in the IoT network [38].

In fact, since the number of users and the antennas is huge in the current IoV network, the accuracy of typical scalar quantization will not satisfy the CSI requirement of the IoV network and the wireless channel characteristic is also exploited. Thus, the vector quantization is suitable for the IoV network with massive or extreme large-scale antenna array. As a result, the scheme on joint model pruning and vector quantization to reduce the model size and the number of bits representing each connection has not been studied yet. This paper is the first work on the CSI acquisition via FL with joint model pruning and vector quantization approach, and the main contributions are summarized as follows:
  • First of all, we propose an FL-based CSI acquisition framework for IoV network with massive antenna array, where the developed offline radio map scheme with FL framework can reconstruct the CSI of each user by collecting the channel feature information at the edge servers. The proposed FL framework provides decentralized learning network, which is suitable for more complex environments and large-scale scenarios compared to the traditional learning-based CSI acquisition techniques.

  • Then, we propose the joint model pruning and vector quantization approach for the gradient parameters at the edge servers in the FL framework. The amount of exchange information between the centralized server and edge servers is reduced significantly by pruning and quantizing the gradient parameter of edge networks.

  • Moreover, the impact of the network pruning and vector quantization on the learning performances is mathematically analyzed in closed form, where we analyze the mutual influence on the convergence rate of learning and quantization error bound, respectively.

  • At last, we leverage the Wireless InSite software to construct the IoV network environment and illustrate the effectiveness of our proposed scheme by collecting datasets to train networks for CSI acquisition. The experiment results demonstrate that the proposed scheme can significantly reduce the model size and improve the communication cost while achieving high learning performance.

The rest of this paper is organized as follows. In Section 2, we introduce the system model with wireless FL. In Section 3, we propose CSI acquisition scheme via the FL framework with model pruning and vector quantization algorithm. In Section 4, the convergence rate and quantization error bound are analyzed, respectively. Then, experimental results are presented to verify the proposed algorithm in Section 5. Finally, Section 6 concludes this paper.

1.1. Notation

In this paper, the upper and lower case bold symbols denote matrices and vectors, respectively. We use (·)T, (·)T, (·)H, and |·| to denote the transpose, conjugate, Hermitian transpose, and absolute value, respectively. is the space of M × N complex-valued matrices and denotes the set. Symbols ⊗ and ⊙ stand for Kronecker and Hadamard product, respectively.

2. System Description

As shown in Figure 1, we consider the hierarchical cloud–edge–terminal communication network. This network consists of one BS and K edge servers (vehicle or UAV), where each server consists of Mk users and the total number of users is M, i.e., ∑Mk = M. In such system, we assume that the BS is equipped with UPA with N = NvNh antennas, where Nv and Nh denote the number of antennas along the vertical and horizontal direction, respectively. The edge servers and users are assumed to be equipped with single antenna. Generally, the acquisition of CSI is needed before each user communicates with BS via the wireless links. As the number of antenna becomes large in the future mobile communication systems, the CSI acquisition has to consume more pilot resource, which will reduce the spectrum efficiency of the communication systems. In this paper, we propose an offline radio map scheme with FL framework to reconstruct the channels of each user by collecting the channel feature information at the edge servers.

Details are in the caption following the image
Federated learning–based communication systems.

2.1. Cloud–Edge Communication Systems

As introduced earlier, each edge server collects sample set provided by Mk users, which is expressed as
()
where denotes channel matrix corresponding to the channel feature between user m in edge server k and BS. Generally, the wireless channel adopts the typical clustered model such that the each feature information vector is a sum of the contributions of Q paths (scattering clusters). Thus, the path gain, angle of arrival (AoA), angle of departure (AoD) in vertical direction, and AoD in horizontal direction between the user m and BS can be, respectively, written as
()
()
()
()
where the collected data are independent and identically distributed (IID). We take the sample set as the input of edge server k and the output of edge server k is the estimated channel matrix, which is given by
()
where denotes the estimated channel between user m in edge server k and BS. Different from the typical communication processing that the edge server feeds the estimated channel to the BS, it conveys the gradient information of the individual neural network to the BS in this paper. Then, the centralized server at the BS uses neural network gradient of edge server to train a global model weight for channel estimation and broadcasts the updated global model weight to all edge servers at each FL iteration. By using the FL framework and the feature information of channel, the CSI of all users can be acquired without the pilots.

2.2. Modified FL Model

In this paper, the communication system adopts the FL model with a single iteration of model update detailed in [30, 39]. The centralized server equipped at the BS trains the global model weight based on the gradient information provided by all edge servers to minimize global loss function F(w), which is written as
()
where f(Bk,m, hk,m, w) represents loss function corresponding to sample data Bk,m and w is global model weight. This loss function is generally characterized by the mean square error (MSE) between the data of the output and the label of data and denotes the real channel matrix hk,i, ∀i, k, as the corresponding label of samples. The goal of FL is to recover the optimal global model weight vector w from the set of labeled training samples at the edge server, which satisfies
()
Generally, the FL uses the iterative approach between the edge server and centralized server to recover the optimal w. Due to the sparsity of model weight in the convolutional neural network (CNN), we are able to prune the model weight with insignificant connections among the convolutional layer in each iteration. Thus, the results of model pruning are used to realize the dimensionality reduction of the gradient vector so that we can leverage the shorter length of codeword to quantize the gradient vector. Specifically, define binary vector as the mask matrix for edge server k at the nth iteration which means that whether the model weight is activated or not and​ D​ denotes the model size. Then, the kth edge server sparse model weight vector at the nth iteration is given by
()
where is the global model weight received by the edge server k at the nth iteration.
Similar to the conventional channel feedback mechanism in the frequency division duplexing (FDD) system that the users feed the quantization codeword to the BS, we also adopt the codebook to characterize the gradient parameters in this paper. Specifically, we define as the kth quantized edge server gradient, which consists of codeword selected from the codebook set . Then, each edge server feeds the corresponding codeword factor to the centralized server. Finally, the global model weight wn+1 is updated based on the uploaded codewords from edge servers, which is expressed as
()

Different from the scalar quantization in the existing works [29], the codebook-based vector quantization can efficiently reduce the feedback overhead. Moreover, due to the sparsity of the model weight, we can attain the low-complexity quantization scheme by pruning model weight to alleviate the local computation overhead. The detailed model pruning and quantization scheme will be presented in the following section.

3. Model Pruning and Gradient Quantization Scheme

In practical wireless communication systems, the scale of the model parameter is huge and the aggregation of gradients would cause large communication signaling overhead and increase model training latency, due to the limited spatial–time–frequency resource. The conventional quantizing schemes may lead to great distortion under the limited bit budget, which will reduce the FL performance.

In this paper, the combination of model pruning and vector quantization scheme enables more efficient transmission of information under the limited bit budget within the FL framework. This motivates the compression of the model before transmission. Firstly, the network pruning can be applied before local training to reduce the model size. Secondly, the gradient vector quantization strategy is designed before uploading local model to further reduce the number of parameters that needs to be transmitted. Therefore, our goal is to design an encoding–decoding scheme, which reduces the communication signaling overhead and mitigates the effect of quantization errors on the ability of the centralized server so as to accurately recover the updated model.

3.1. Model Pruning

To improve the communication and computation efficiency for wireless FL, this paper develops a novel learning framework to adaptively generate submodels for edges to train. Moreover, to suppress the adverse effects of the local model in the learning performance, we leverage the gradients of the pruned model for interaction information.

Pruning some unimportant connection among the neuron can effectively reduce the model size while guaranteeing the performance of FL. According to [40], the importance of a weight value can be approximately quantified by the product of weight value and gradient value, which can be expressed as
()
where wk,d denotes the dth element of vector w and gk,d represents the gradient value for wk,d.
We set the unimportant weights to zero such that the weight vector w will be degraded into a sparse weight vector obtained in (6). Then, the pruning ratio of weight vector can be defined as
()
In order to reduce the number of uploaded bits and shorter length of codeword, we remove the corresponding gradient parameters where the binary mask position is zero before gradient quantization. In particular, the centralized server has all the edge server gradient parameters of the previous iteration, which can calculate the same mask to recover the gradient at the condition of the same pruning rate. The dimension reduction gradient after pruning is expressed by
()
where P(·) is reduced dimension function and g(wk) is the original edge server gradient. is the reduced dimension gradient vector with size of D(1 − ρ), which is obtained to remove the original gradient corresponding to position of zero element of mask . The procedure of model pruning and reduced-dimension is shown in Figure 2.
Details are in the caption following the image
Model pruning and dimension reduction.

3.2. Gradient Quantization

Unlike the conventional scalar quantization method, the advantage of vector quantization is the implementation complexity and model flexibility. Therefore, we propose the vector quantization to quantize the gradients evaluated over the pruned model to help reduce the uplink overhead in this paper.

For high-dimension vector , we can partition this into multiple low-dimension vectors so as to reduce the quantization error. Specifically, we first partition high-dimension gradient vector into M segments with length D, where each segment is quantized individually, i.e., . Therefore, vector vi can be deemed as the ith block gradient and is applied into zero padding if not exact division. Further, denote the normalized block gradient by si = vi/‖vi‖ and norm of block gradient by hi = ‖vi‖, respectively. Intuitively, si and hi represent the magnitude and direction of the normalized block gradient, respectively.

Then, we assume that the normalized block gradient s = [s1, …, sM] is uniformly distributed on the Grassmann manifold, which is able to be designed into a Grassmannian quantizer with uniformly distributed codewords for minimizing the distortion [30]. In particular, we focus on quantization functions that minimize the Euclidean distance between and . Let x and be two unit norm vectors, and denotes the chordal distance that measures angular deviation between x and . The optimal construction of is regarded as Grassmannian line packing [41], which is formulated as
()
where and denote the output of the normalized block gradient vector and the label for the normalized block gradient vector, respectively. After that, we can obtain the codebook matrix C, which is written as
()
where denotes the wth codebook vector and each codeword is a D dimensional unit-norm vector as well as W stands for the total number of codewords.
In this paper, we adopt the unbiased estimation method to select codewords, which chooses c to approximate s in a probabilistic manner, as shown in Figure 3. For any normalized block gradient , the codewords can be selected by the unbiased estimation method. Therefore, the probability pi corresponding to selecting codeword cw is defined as
()
where​ , denotes the aggravate probability​ ​ with​ pi​ represent the degree, similar to codeword ci.
Details are in the caption following the image
The procedure of normalized block gradient.
Then, we collect the norm h = [h1, …, hM] and adopt the uniform quantization approach. The minimum value and maximum value in h are defined as hmin and hmax, respectively. Each hm is quantized to one of the T + 1 uniformly spaced levels between hmin and hmax such that the quantization function can be written as
()
where p(h) = (((t + 1)δ + hminh)/δ), δ = ((hmaxhmin)/T), and h ∈ [hmin + tδ, hmax + (t + 1)δ] with t = 0, …, T − 1. Note that the proposed quantization method can be deemed as the linear quantization, the advantage of which is simple in structure with low computational overhead. Therefore, vector quantization takes M log(T + 1) + M log(W + 1) bits to communicate quantized block gradient, in which M log(W + 1) bits are used to transmit the index of the selected codeword and M log(T + 1) bits are used for the quantized norm. The proposed quantization scheme is depicted in Figure 4. Overall, the algorithm with model pruning and quantization is summarized in Algorithm 1.
Details are in the caption following the image
The proposed quantization scheme.
    Algorithm 1: FL with model pruning and vector quantization.
  • 1.

    Initialize: Collect information of users at edge nodes and initialize global model.

  • 2.

    The global model broadcasts the utilized information to all edge nodes.

  • 3.

    For global iteration n = 1, …, N

  • 4.

    For each node k = 1, …, K

  •  Compute weight value Ik,d, according to (8);

  •  Compute gradient at the edge node based on dataset;

  •  Obtain the reduce-dimension gradient ;

  •  Feed back gradient to center node;

  • 5.

    End for

  • 6.

    The center node padding​ 0​ elements into weight value

  • 7.

    Obtain global model via gradient cluster at center node

  • 8.

    The center node broadcasts information to all edge nodes

  • 9.

    End for

4. Performance Analysis

4.1. Quantization Error Bound

In order to represent the gradients , we utilize a finite number of bits inherently to induce the distortion and define the recovered vector as . Then, the moments of the quantization error satisfy the following lemmas.

Lemma 1. Let be a codebook designed by the line packing with resolution W and dimensionality D. The average distortion [30] of normalized block gradient can be bounded as

()

Lemma 2. The distortion for quantizing the norm of block gradient vector h can be upper-bounded as

()

Theorem 1. The quantization error vector satisfies

()

Proof 1. Please refer to the proof in Appendix A.

It is observed from Theorem 1 that the quantization error decreases, as model pruning rate ρ increases. The reason is that it reduces the number of gradient blocks that needs to be quantified. Moreover, quantization error can also be decreased, as the codebook resolution W and the number of quantization intervals T grow. Further, as the length of the block D decreases, degrades accordingly in that pairwise distances between codewords enlarge but the number of gradient blocks will increase.

4.2. FL Convergence Analysis

Given a model pruning and quantization scheme for the stochastic gradient, we focus on the convergence of the learning algorithm, which is usually affected by the proposed model pruning and gradient quantization approach. Therefore, the convergence rate of the learning algorithm under will be theoretically investigated in this section.

Assumption 1 (L-smooth). The nonconvex loss function of the neural network F(·) is L-Lipschitz smooth, which can be written as

()
where L denotes the Lipschitz constant and ∇F(·) represents the gradient of function F(·).

Assumption 2 (bounded stochastic gradients and model). The second moments of the stochastic gradients and weights are bounded, which can be guaranteed by l2-regularization and are also assumed in other works, such as [29, 30]. It can be expressed as

()

Note that l2 regularization effectively constrains the range of weight values by adding a penalty term to the loss function that is proportional to the sum of the squares of the weights, ensuring the stability of both stochastic gradients and weight boundaries. This aids the model in learning smoother and more stable solutions during the training process and results in better generalization performance on test data.

Assumption 3 (unbiased gradient). The locally estimated stochastic gradient is unbiased, which can be written as

()

Lemma 3. If Assumption 2 is valid, the model error of the kth device under the pruning ratio ρk satisfies

()
and
()
where denotes the gradient error of the pruning and dimensionality reduction. The inequality has been proved in [9].

Theorem 2 (learning convergence rate). According to the model pruning and quantization method designed in this paper, on the premise of satisfying Assumptions 13, the learning convergence rate is expressed as

()
where F(w0) is the initial objective value, F(w) denotes the optimal value, and N stands for the number of iterations for the FL algorithm. The second and third terms on the right of (23) represent the effect of model pruning. Note that the convergence rate decreases with the average pruning ratio, since model aggregation error will occur as the pruning ratio increases. The last term on the right of (23) represents the effect of quantization, and it can be seen that the pruning ratio indirectly affects the convergence rate by influencing the quantization error.

Proof 2. Please refer to the proof in Appendix B.

5. Simulation Results

Simulation results are given in this section to demonstrate the effectiveness of the proposed model pruning and gradient quantization scheme on the MSE and the normalized mean square error (NMSE) performance, which are, respectively, defined as
()
where is the number of test samples of edge server k.

We conducted experiments on a simulated environment, where the number of edge server is set to K = 10 and one BS participates in the model training. The number of antennas at the BS is N = 256, where the number of antennas along the vertical and horizontal directions is set to Nv = 8 and Nh = 32, respectively. Each edge server collects 200 users’ channel characteristics as training samples and 100 users’ channel characteristics as testing samples. The users are evenly distributed in the coverage area of the BS. The information of the direct and reflection paths with the highest signal received power NL = 5 is stored at each sample. In this paper, Wireless InSite software [18] is used to collect datasets corresponding to channel acquisition models, where the ray tracing method is used to collect the effective path information of the users, as shown in Figure 5.

Details are in the caption following the image
The ray tracing simulation environment.

Furthermore, the proposed network architecture is a CNN with 9 layers. The first layer is the input layer, and {2, 4, 6} layers are the convolutional layers with NSF = 16, 32, 64 filters, respectively. Each filter employs a 3 × 3 kernel for 2-D spatial feature extraction. {3, 5, 7} layers are the activation layers. The eighth layer is fully connected layer and the last layer is output regression layer. The number of training iterations is 50, and the mini-batch size is 64. The Adam optimizer is leveraged to calculate the gradient of the loss function and update the parameters, and the learning rate is set to 0.01.

The effectiveness of the proposed scheme is evaluated by against four baseline schemes. The four algorithms are described as follows:
  • 1.

    Quantized stochastic gradient descent–based model pruning (QSGDMP) scheme: This scheme uploads QSGD quantization parameters with the model pruning strategy.

  • 2.

    TOPK: Each user only uploads the maximum K weight values.

  • 3.

    Vector quantized stochastic gradient descent (VQSGD) scheme: Each user uploads the parameters by exploiting only the vector quantization without model pruning.

  • 4.

    SGD: In this scheme, each user uploads the whole stochastic gradient descent information.

Figure 6 shows the curves of the MSE vs. the number of iterations for the different schemes. Except for the SGD scheme, we set the other four schemes to transmit the same number of bits for comparison. It is observed that the performance of SGD scheme is better than other schemes because the number of bits is sufficient can be transmitted, and SGD scheme requires a lot of communication overhead. The performance of the proposed scheme can be close to that of SGD and better than that of other schemes; the reason is that the center server shares and clusters the all user’s information. Moreover, the VQSGD algorithm is significantly worse than that of proposed algorithm, which means that improvements of learning performance is the introduction of model pruning.

Details are in the caption following the image
Performance comparison in the condition of the same number of transmitted bits.

Figure 7 reports the curves of MSE vs. the number of iterations, where number of bits allocated to each block is fixed and the effects on block length D and pruning rate ρ are evaluated. It can be observed that as block length D increases, the performance of FL will be degraded. The reason is accordingly that the quantization error for the stochastic gradients enlarges. Furthermore, when the quantization error is relatively small, model pruning error plays a dominant role. In particular, the increase of the model pruning rate will lead to NMSE degradation. On the other hand, when the quantization error is large, quantization error plays a dominant role, and the increase of the model pruning rate will reduce the quantization error and the learning performance will be improved. Therefore, the model pruning rate needs to be set properly.

Details are in the caption following the image
Effect of the block length D and pruning rate ρ.

Furthermore, Figure 8 depicts the NMSE vs. the number of iterations for the different schemes. Compared to MSE in Figure 6, the gap of the proposed scheme of NMSE is obvious for the other schemes, since the NMSE is related to the number of test samples of each server k and MSE is related to the total number of test samples. Besides, the proposed scheme is more smooth for the NMSE performance, since the NMSE metric uses the number of test samples of each server applying the practical scenario.

Details are in the caption following the image
NMSE vs. the number of iterations.

6. Conclusion

In this paper, we proposed a FL framework for the CSI acquisition for IoV with massive antenna array. With the consideration of both the user privacy and wireless communication capacity limitation, we proposed the network pruning and vector quantization schemes applying into the wireless FL IoV system so as to reduce the model size. To evaluate the learning performance of the FL with network pruning and vector quantization, the convergence rate and quantization error bound have been mathematically analyzed, respectively. Finally, based on the Wireless InSite software, the effectiveness of the proposed algorithm has been demonstrated by the experimental results.

Conflicts of Interest

The authors declare no conflicts of interest.

Funding

This work was supported in part by the Natural Science Foundation of Henan (No. 252300421516), in part by the Scientific Research Team Plan of Zhengzhou University of Aeronautics (23ZHTD01005), in part by Key Projects for Joint Fund of Henan Province Science and Technology Research and Development Plan (225200810033), in part by Henan Center for Outstanding Overseas Scientists (GZS2022011), in part by Henan Province Collaborative Innovation Center of Aeronautics and Astronautics Electronic Information Technology, in part by the National Natural Science Foundation of China under grant no. 62371180, in part by the Anhui Provincial Natural Science Foundation under grant no. 2008085QF281, and in part by the Fundamental Research Funds for the Central Universities of China under grant no. JZ2024HGTG0311.

Acknowledgments

This work was supported in part by the Natural Science Foundation of Henan (No. 252300421516), in part by the Scientific Research Team Plan of Zhengzhou University of Aeronautics (23ZHTD01005), in part by Key Projects for Joint fund of Henan Province Science and Technology Research and Development Plan (225200810033), in part by Henan Center for Outstanding Overseas Scientists (GZS2022011), in part by Henan Province Collaborative Innovation Center of Aeronautics and Astronautics Electronic Information Technology, in part by the National Natural Science Foundation of China under grant no. 62371180, in part by the Anhui Provincial Natural Science Foundation under grant no. 2008085QF281, and in part by the Fundamental Research Funds for the Central Universities of China under grant no. JZ2024HGTG0311.

    Appendix A: Proof of Theorem 1

    Note that block gradient is written as v = hs. Then, we can obtain block gradient distortion, which is expressed as
    ()
    When the value of max is small, is dominant over . Thus, we can ignore the last term. Furthermore, the total gradient is concatenated from block gradients, and the change of the number of block gradients after model pruning is obtained.Therefore, the quantization error is the cumulative error of each block gradients. As a result, we have
    ()
    which has proved the theorem.

    Appendix B: Proof of Theorem 2

    We assume that each device is equally important to the FL network, i.e., a1 = a2 = ⋯aK = (1/K), and we can derive the quantized global gradient, which is expressed by
    ()
    where ; both​ (a)​ and​ (b)​ follow from the theory that high-dimensional vectors is quasi-orthogonal​
    ()
    Therefore, we obtain the MSE of the learning model, which is given by
    ()
    where A is a constant. With these assumptions, the relationship between loss function at the nth and the (n + 1)th iteration can be computed as
    ()
    Then, by rearranging the above formula, the norm of the gradient vectors can be written as
    ()
    After N rounds of iteration, we add each of these terms into (B.5) and then obtain that
    ()

    Data Availability Statement

    Research data are not shared.

      The full text of this article hosted at iucr.org is unavailable due to technical difficulties.