Volume 2019, Issue 1 7575037

Research Article

Open Access

WiFi Offloading Algorithm Based on Q-Learning and MADM in Heterogeneous Networks

Lin Sun

orcid.org/0000-0002-0540-4480

Jiangsu Key Laboratory of Wireless Communications, Nanjing University of Posts and Telecommunications, Nanjing 210003, China njupt.edu.cn

Engineering Research Center of Health Service System Based on Ubiquitous Wireless Networks, Nanjing University of Posts and Telecommunications, Ministry of Education, Nanjing, China moe.edu.cn

Search for more papers by this author

Qi Zhu,

Corresponding Author

Qi Zhu

[email protected]

orcid.org/0000-0002-8190-8912

Jiangsu Key Laboratory of Wireless Communications, Nanjing University of Posts and Telecommunications, Nanjing 210003, China njupt.edu.cn

Engineering Research Center of Health Service System Based on Ubiquitous Wireless Networks, Nanjing University of Posts and Telecommunications, Ministry of Education, Nanjing, China moe.edu.cn

Search for more papers by this author

Lin Sun,

Lin Sun

orcid.org/0000-0002-0540-4480

Jiangsu Key Laboratory of Wireless Communications, Nanjing University of Posts and Telecommunications, Nanjing 210003, China njupt.edu.cn

Engineering Research Center of Health Service System Based on Ubiquitous Wireless Networks, Nanjing University of Posts and Telecommunications, Ministry of Education, Nanjing, China moe.edu.cn

Search for more papers by this author

Qi Zhu,

Corresponding Author

Qi Zhu

[email protected]

orcid.org/0000-0002-8190-8912

Jiangsu Key Laboratory of Wireless Communications, Nanjing University of Posts and Telecommunications, Nanjing 210003, China njupt.edu.cn

Engineering Research Center of Health Service System Based on Ubiquitous Wireless Networks, Nanjing University of Posts and Telecommunications, Ministry of Education, Nanjing, China moe.edu.cn

Search for more papers by this author

First published: 27 December 2019

https://doi.org/10.1155/2019/7575037

Citations: 2

Academic Editor: Alessandro Bazzi

Share a link

Email
Wechat
Bluesky

Abstract

This paper proposes a WiFi offloading algorithm based on Q-learning and MADM (multiattribute decision making) in heterogeneous networks for a mobile user scenario where cellular networks and WiFi networks coexist. The Markov model is used to describe the changes of the network environment. Four attributes including user throughput, terminal power consumption, user cost, and communication delay are considered to define the user satisfaction function reflecting QoS (Quality of Service), and Q-learning is used to optimize it. Through AHP (Analytic Hierarchy Process) and TOPSIS (Technique for Order Preference by Similarity to an Ideal Solution) in MADM, the intrinsic connection between each attribute and the reward function is obtained. The user uses Q-learning to make offloading decisions based on current network conditions and their own offloading history, ultimately maximizing their satisfaction. The simulation results show that the user satisfaction of the proposed algorithm is better than the traditional WiFi offloading algorithm.

1. Introduction

With the popularity of smart devices, cellular data traffic is growing at an unprecedented rate. Cisco visual network index [1] predicts that global mobile data traffic will reach 49 exabytes per month in 2021, which is equivalent to six times that of 2016. In order to solve the problem of data traffic explosion, we can add cellular BS (base station) or upgrade the cellular network into networks such as LTE (long-term evolution), LTE-A (LTE-Advanced), and WiMAX release 2 (IEEE 802.16m), but this is usually not economical, which requires expensive CAPEX (capital expenditure) and OPEX (operating expense) [2]. In addition, the limited licensed band is another bottleneck to improve network capacity [3]. As a result, mobile data offloading technology [4] has gradually become a mainstream in 5G, and WiFi offloading is one of the most effective offloading solutions.

WiFi offloading technology transfers part of the cellular network load to WiFi network through the WiFi AP (access point), by which we can solve the congestion in licensed band, achieve load balancing, and fully utilize unlicensed spectrum resources. Due to the effectiveness of WiFi offloading, many literatures have studied it. Li et al. [5] considered the coexistence of WiFi and LTE-U on unlicensed bands and offloaded LTE-U services to WiFi networks, establishing multiple targets for maximizing LTE-U user throughput while optimizing WiFi user throughput. To solve the problem, the authors used the Pareto optimization algorithm to get the optimal value. In [6], a satisfaction function reflecting the user communication rate is defined in the scenario of overlapping WiFi network and cellular network, and a resource block allocation matrix is constructed. Based on the accurate potential game theory, the best response algorithm is used to optimize the total system satisfaction. Cai et al. [7] proposed an incentive mechanism to compensate cellular users who are willing to delay their traffic for WiFi offloading. The authors calculated the optimal compensation value according to the available attribute parameters in the scenario and modeled the problem into two stages. In the first stage of the Stackelberg game, the operator announces that it would provide users with uniform compensation to delay its cellular services. In the second phase, each user decides whether to join the delayed offloading based on the compensation, network congestion, and estimation of the waiting cost for WiFi connection. From the perspective of operators, Kang et al. [8] formulated mobile data offloading problem as a utility maximization problem. The authors established an integer programming problem and obtained a mobile data offloading scheme by considering the relaxed condition. The authors further proved that when the number of users is large, the proposed centralized data offloading scheme is near optimal. Jung et al. [9] proposed a user-centric, network-assisted WiFi offloading model. In this model, heterogeneous networks are responsible for collecting network information and users make offloading decisions based on this information to maximize their throughput. In the heterogeneous network scenario composed of LTE and WiFi, aiming at maximizing the minimum energy efficiency of users, a closed expression is proposed in [10] to calculate the number of users to be offloaded, and these users with the smallest SINR (signal to interference and noise ratio) are offloaded into WiFi network. According to the above references, the most challenging problem in WiFi offloading is how to make an offloading decision, that is, how to choose the most suitable WiFi AP for communication. Fakhfakh and Hamouda [11] aimed to minimize the residence time of the cellular network and optimized it by Q-learning. The reward function considers SINR, handover delay, and AP load. By offloading cellular services to the best WiFi AP nearby, operators can greatly increase their network capacity, and users’ QoS will also increase. However, the above references only make an immediate offloading decision based on the current network conditions, without considering the user’s previous access history. In addition, most of the references only perform an offloading decision for the optimization of one particular attribute, such as throughput or energy efficiency, without considering multiple network attributes for comprehensive decision making.

In this paper, for the mobile user scenario where the cellular base station and the WiFi AP coexist, considering the current network conditions and the access history, a Q-learning scheme is used to make the offloading decision. By considering its own access history, users will accumulate the experience of offloading, which will not only avoid offloading to the poor network that was previously accessed but also actively select the best WiFi AP according to the maximum discounted cumulative reward, which in turn increases user’s QoS. In this paper, four attributes including user throughput, terminal power consumption, user cost, and communication delay are considered and the reward function in Q-learning is defined by TOPSIS. In addition, if the service type is different, the importance of each network attribute will be different. We use AHP to define the weight of each network attribute according to the specific service type. The mobile terminal collects various attributes of the heterogeneous network, and the user continuously updates his discounted cumulative reward in combination with the instant reward and the experience reward until convergence. After the convergence, the user can make the best offloading decision in each state.

The rest of this paper is arranged as follows. Section 2 gives the system model of WiFi offloading in heterogeneous networks. Section 3 builds the Q-learning model, defines the reward function model based on AHP and TOPSIS, and gives the specific steps of the WiFi offloading algorithm. In Section 4, the simulation results are presented and analysed. Finally, Section 5 concludes the paper.

2. System Model

The system model in this paper is shown in Figure 1. A cellular base station is located in the center of the cell with a radius equal to r_cell. There are N_AP WiFi APs in the cell, which are represented as AP_k, k ∈ {1,2, …, N_AP}. The cell is covered by overlapping cellular network and WiFi network. These networks are divided into valid networks and invalid networks. When the throughput of the user accessing a certain network is greater than a threshold, we regard this network as a valid network; otherwise, it is considered as an invalid network. The mobile multimode terminal is the agent of Q-learning, and it can perform data transmission through both cellular network and WiFi network. The agent moves straightly inside the cell, marking its passing position as Posi_i, i ∈ {1,2, …, N_p}, where N_p represents the total number of positions the user has passed. Due to the movement of the agent, the network environment such as channel quality and available bandwidth is constantly changing, which will cause the network attribute of the user to change. This paper regards the four network attributes of the agent in different locations as the state in Q-learning, including throughput, power consumption, cost, and delay. In addition, we consider the offloading decision as the action choice in Q-learning and offload mobile data if agent chooses WiFi network.

Details are in the caption following the image — **Figure 1**
Open in figure viewer PowerPoint

System model of WiFi offloading. The system model consists of a cellular BS, a few WiFi APs, one moving agent, and some other users.

Figure 2 shows the algorithm structure based on Q-learning. The agent first collects the network environment information, filters out invalid networks, and calculates four attributes of user throughput (TP), terminal power consumption (PC), user cost (C), and communication delay (D) of the valid network. The AHP algorithm is used to calculate the weights of the four attributes under different services, and the instant rewards obtained by selecting each network under the current state are calculated by TOPSIS. In combination with the instant reward and the experience reward, the Q-learning iteration is performed and the Q-table is updated. As a result, the offloading decision is made based on the discounted cumulative reward in Q-table.

This paper reflects the performance of the network from four aspects of throughput, power consumption, cost, and delay. The throughput reflects the rate of wireless transmission. According to the large-scale fading model of the wireless channel in [12], combined with the small-scale fading model, when the distance between the agent and the cellular BS or WiFi AP is d, the pass loss is defined as

(1)

where d₀ is the reference distance, L₀ is the path loss when the distance between agent and BS or AP is d₀, α is the path loss exponent, and F_Rayleigh(θ, β) is the Rayleigh fading of the Gaussian distribution with a mean of θ and a variance of β. The signal power

received at BS or AP from agent d away at the i − th position is expressed as

(2)

where

is the transmit power of the terminal which is not fixed. By the Shannon capacity formula [13], we can get the throughput of the agent accessing a network at the i − th position:

(3)

where N₀ is the additive white Gaussian noise power spectral density and Wis the available bandwidth of the agent. Since the available bandwidth of the network is constantly changing and each AP or BS provides services to other users in addition to the agent at the same time, which affects the available network bandwidth of the agent, this paper uses the Markov model to describe the change of W and quantizes the continuous W into N_markov states. The available bandwidth is transferred to the two adjacent states with the probability of p_tr or remains unchanged with the probability of 1 − p_tr.

Power consumption is an important attribute to be considered for the operation of mobile terminals. According to [14], it is assumed that the minimum received power threshold of BS or AP is

. When the transmit power of the terminal is too small, BS and AP will not receive the uplink signal of the terminal. To ensure the normal transmission of data, we define the minimum transmit power

of the terminal as

(4)

The actual transmit power

of the terminal must be greater than

. In this paper, the power consumption of the agent accessing a network at the i − th location is expressed as

(5)

where P₀ is the fixed operating power consumption of the terminal and

is the transmitting power of the terminal.

The operator charges the agent whether he accesses the cellular BS or a WiFi AP. In this paper, the unit price costed per second after the agent accesses a network in i − this defined as , which is used to represent a relative price of two networks. It is usually cheaper if the user chooses to offload.

Communication delay is also an important indicator for users to evaluate the network. In this paper, the transmission delay after the agent accesses a network in i − th location is defined as . Because of CSMA/CA (Carrier Sense Multiple Access with Collision Avoidance), the delay time is longer when the user accesses WiFi, which makes bigger than accessing the BS.

This paper considers the above four network attributes to calculate the satisfaction of the agent in the whole mobile scenario.

Firstly, we calculate the average of the four network attributes at N_p locations; that is, , , , and .

Then, we normalize the four values using the method in [15]:

(6)

where U_max is the maximum possible value of the attribute and U_min is the minimum possible value of the attribute. For user satisfaction, the greater the throughput is, the better satisfaction the agent gets, which is a positive attribute. On the other hand, the other three attributes are kept as small as possible, belonging to the negative attribute. The normalized values of the four network attributes are expressed as

, and

Combining the attribute weight data of different services obtained by using the AHP algorithm, the satisfaction of the user in the entire mobile scenario is defined as the sum of the weighted normalized attribute values:

(7)

where j is the user service type, j = 1 is the streaming media service, j = 2 is the conversation service, and

are the AHP weights of the throughput, power consumption, cost, and delay when the service type is j.

The optimization goal of this paper is to find out the best offloading decision of the user to maximize the satisfaction of the entire mobile scenario:

(8)

where

is the total action space of the user during the whole movement process in which A_i is the action set when the agent passes position i. It is the Cartesian product of the action set of the user passing N_p positions, and Π^∗ is the optimal offloading strategy of the whole moving process. In equation (8), c1 and c2 indicate that the weight of each network attribute is limited to 0 to 1 and the sum is 1; c3 indicates that the user’s transmit power is greater than the minimum transmit power at each position. However, because the action space is very large and the network environment such as available bandwidth is constantly changing, the traditional method is difficult to solve this optimization problem, so we use Q-learning to solve it.

3. WiFi Offloading Algorithm Based on Q-Learning and MADM

For the mobile user scenario where the cellular BS and the WiFi AP coexist, we propose a WiFi offloading algorithm based on Q-learning and MADM. Considering the current network conditions and the access history, the Q-learning algorithm is used to make the offloading decision, which will not only avoid offloading to the poor network that was previously accessed but also actively select the best WiFi AP according to the maximum discounted cumulative reward. MADM is an effective decision-making method when we need to consider a variety of factors. According to [16], attribute weight and network utility value are of great importance in MADM. We use two MADM algorithms in this paper, called AHP and TOPSIS. AHP is used to define the weight of each network attribute according to the specific service type. TOPSIS is used to obtain the instant reward of Q-learning based on the network utility. The agent collects various attributes of the heterogeneous network and continuously updates his discounted cumulative reward in combination with the instant reward and the experience reward. After the convergence, the user can make the best offloading decision in each state.

3.1. Q-Learning

Q-learning is one of the widely used reinforcement learning algorithms that treat learning as a process of trying, evaluation, and feedback. Q-learning consists of three elements, including state, action, and reward. The state set is denoted as S and the action set is denoted as A, and the purpose of Q-learning is to obtain the optimal action selection strategy Π^∗ to maximize the agent′s discounted cumulative reward [11]. In state s ∈ S, the agent selects an action a ∈ A from the action set to act on the environment. After the environment accepts the action, the environment changes and generates an instant reward Rw(s, a) feedback to the agent. Then, the agent will select the next action a^′ ∈ A based on the reward and his own experience, which will in turn affect the discounted cumulative reward Rc(s) and state s^′ of the next moment. It has been proved that for any given Markov decision process, Q-learning can be used to obtain an optimal action selection strategy Π^∗ for each state s, maximizing the discounted cumulative reward for each state [17].

The discounted cumulative reward Rc(s) for state s is

(9)

where Rw(s, a) is the instant reward obtained by the agent selecting action a in state s, δ ∈ (0,1) is the discount factor, and P(s^′|s, a) is the probability when agent performs action a and transmits from state s to s^′. According to Bellman′s theory [18], when the discounted cumulative reward is maximum, the optimal action selection decision under state s can be obtained:

(10)

The optimal action selection decision is

(11)

Since Rw(s, a) and P(s^′|s, a) are still unknown, the agent can learn these values during the Q-learning process of trial, evaluation, and feedback. We use Q function to represent the discounted cumulative reward when agent selects a in state s:

(12)

This paper uses Q-learning to solve the problem of WiFi offloading and proposes a WiFi offloading algorithm based on Q-learning and multiattribute decision making. In this paper, the multimode terminal moving inside the cell is regarded as the agent. The state, action, and reward of Q-learning are mapped in the following, respectively:

(1)
State set S: the location that agent passes and the network environment around the location, that is, S = {s_i = (Posi_i, Envi_i)|i ∈ {1,2, …, N_p}}, where Posi_i represents the location of the agent and Envi_i represents the network attributes of location i, including throughput, power consumption, cost, and delay
(2)
Action set A: the process of selecting an action is regarded as an offloading decision, that is, A = {a_k, k ∈ {0,1,2, …, N_AP}}, where a₀ indicates that the terminal accesses the cellular BS and a_k, k ∈ {1,2, …, N_AP} indicates that the terminal is offloaded to the WiFi AP corresponding to the subscript
(3)
Reward function Rw(s, a): the utility value of the TOPSIS algorithm is used to represent the instant reward that the user obtains after attempting to access a certain network

3.2. AHP Algorithm

This paper uses AHP to calculate the user’s subjective assessment of the importance of each network attribute under different service types. AHP is one of the MADM algorithms using qualitative and quantitative calculations, which is widely used in network evaluation and strategy selection. According to [15], AHP has five steps: (1) establishing a hierarchical model; (2) constructing a paired comparison matrix; (3) calculating attribute weights; (4) checking consistency; and (5) selecting network. However, this paper only needs to use AHP to calculate the weight of different network attributes, so steps (1) and (5) are omitted. The specific steps are as follows:

Step 1: construct the paired comparison matrix according to the user service type j and the attributes to be analysed. Since this paper considers four attributes of throughput, power consumption, cost, and delay, the paired comparison matrix B can be expressed as

(13)

where b_mn represents the ratio of the importance degree between m and n network attributes. We assume b_mn as an integer from 1 to 9 or a reciprocal of them to evaluate the relative importance between different attributes. Furthermore, we have b_mn = 1/b_nm, and the value on the diagonal is 1.
Step 2: calculate the weight of each network attribute in the service type scenario. According to [19], B is a positive reciprocal matrix which has multiple eigenvalues and eigenvector pairs (λ, V):

(14)

where λ is a certain feature value of B and V is a feature vector corresponding to λ. The feature vector corresponding to the largest eigenvalue λ^∗ is selected and normalized into , which is also the AHP weight of the four attributes.
Step 3: check the consistency of the paired comparison matrix. Normally, the most accurate AHP weight cannot be obtained at one time because the paired comparison matrix may be inconsistent if b_mn ≠ b_mk/b_kn, so the weight calculated in Step 2 is not accurate. It is necessary to check consistency of comparison matrix to ensure the subjective weight reasonable [15]. This paper uses the consistency ratio CR to measure the rationality of B:

(15)

where N is the number of network attributes and is the order of matrix B. RI is the index of average random consistency, and it is fixed if comparison matrix order is known [15], as is shown in Table 1.

According to the theory of AHP, if the consistency ratio CR > 0.1, then B is unacceptable, and it is necessary to return to Step 1 to adjust B until CR > 0.1. Finally, the accurate AHP weights of the four network attributes can be obtained (Table 1).

Table 1. Average random consistency with respect to matrix order.

Matrix order	RI
1	0.00
2	0.00
3	0.58
4	0.90
5	1.12
6	1.24
7	1.32
8	1.41
9	1.45

3.3. TOPSIS Algorithm

This paper uses TOPSIS to calculate the instant reward Rw obtained by the terminal accessing the cellular network or WiFi network. TOPSIS is also a MADM algorithm, the principle of which is to calculate and sort the proximity of candidate solutions to ideal solutions. In the Q-learning model, the action set contains all possible network choices; however, this is not a candidate network set because before the TOPSIS algorithm, this paper has filtered the invalid network whose actual throughput is less than the throughput threshold

. So, we use TOPSIS to calculate the reward corresponding to the candidate network. Assume that the filtered candidate network set is {Net₁, …, Net_l, …, Net_L}, which are the L valid actions extracted from the action set A, and the reward corresponding to the filtered invalid action is 0. The specific steps for calculating the Q-learning reward using the TOPSIS algorithm are as follows:

Step 1: establish a standardized decision matrix H. Constructing a candidate network attribute matrix X using the network attribute values calculated in Section 2:

(16)

where l represents the number of the candidate network and n represents the number of the network attribute. Normalize each column to obtain a standardized decision matrix , where h_ln is the normalization of x_ln:

(17)

Step 2: establish a weighted decision matrix Y. Each attribute is weighted by the AHP weight obtained in Section 3.2, which is represented by , and the attribute value of each column in H is multiplied by the corresponding AHP weight to obtain :

(18)

Step 3: calculate the proximity of each candidate solution and two extreme solutions. First, determine the ideal solution and the least ideal solution. Since throughput is a positive attribute and power consumption, cost, and delay are negative attributes, the ideal solution Solution⁺ is

(19)

On the contrary, the least ideal solution is:

(20)

Calculate the Euclidean distances between the l-th candidate network and Solution⁺ and Solution⁻ to get and :

(21)

Step 4: calculate the instant reward after the user selects a candidate network. In this paper, Rw_l is expressed by the relative proximity of the candidate network to the ideal solution:

(22)

The larger is, the smaller is and the closer Rw_l is to 1, indicating the candidate solution is closer to ideal solution and the reward is larger. Conversely, the smaller is, the larger is, indicating that the network accessed by the agent is poor and Rw_l is closer to 0.

In summary, the reward function of the paper is as follows:

(23)

3.4. Algorithm Steps

In order to maximize the satisfaction of mobile users in the cell, this paper considers the four attributes of throughput, power consumption, cost, and delay, uses AHP to calculate the weight of each attribute, defines the reward function by TOPSIS, and relies on Q-learning to iterate until convergence. The best offloading strategy in each state can finally be obtained. In Q-learning, the Q value will be updated with the user learning:

(24)

where μ ∈ (0,1) is the learning rate. The larger μ is, the less the Q value of the previous training is retained and the more important is the instant reward Rw_t(s, a) and the experience reward

. δ is the discount factor of the experience reward, and s^′ is the state that the agent transfers into.

In addition, this paper also introduces the ε-greedy algorithm. In each action selection of Q-learning, the agent explores with a small probability ε, that is, randomly selects a network to offload. Without ε-greedy algorithm, it is possible that the cumulative reward of a suboptimal action becomes bigger and bigger, which makes the user choose this action and increase the cumulative reward again, instead of finding a better one. In other words, the core of ε-greedy is to explore. The reason why the ε-greedy algorithm performs better is that it continuously explores the probability of finding the optimal action. Although it is possible to reduce the user satisfaction in the next period of time, hoping that in the future, we can make better action choices and ultimately get the most user satisfaction. Based on the above analysis, Algorithm 1 gives the WiFi offloading algorithm based on Q-learning and MADM.

Algorithm 1: WiFi offloading algorithm based on Q-learning and MADM in heterogeneous networks.

Input: state set S, action set A, paired comparison matrix B, candidate network attribute matrix X, and iteration limit Z
Output: trained Q-table, best action selection strategy Π^∗, and user satisfaction
(1)
Calculate attribute weights based on B
(2)
For s ∈ S, a ∈ A
(3)
Q(s, a) = 0
(4)
End For
(5)
Randomly choose s_ini ∈ S as the initialization state
(6)
While iteration < Z
(7)
For each state
(8)
If rand < ε
(9)
Randomly choose an action
(10)
Else
(11)
Select the action corresponding to the maximum Q value in this state.
(12)
End If
(13)
Perform a
(14)
Calculate Rw_t(s, a) according to equation (23)
(15)
Observe the next state s^′
(16)
Update the Q-table according to equation (24)
(17)
End For
(18)
End While
(19)
Record the action corresponding to the maximum Q value in each state into Π^∗
(20)
Calculate user satisfaction Φ_sat

4. Numerical and Simulation Results

As shown in Figure 1, the simulation scenario is established in a circular cell with a radius r_cell of 500 m. The cellular BS is located in the cell center, and N_AP WiFi AP is randomly distributed inside the cell. The additive white Gaussian noise power spectral density N₀ is −174 dBm/Hz, and reference distance d₀ is 1 m. In F_Rayleigh(θ, β), mean θ = 0 and variance β = 5 dB. Furthermore, the learning rate μ of the Q-learning is set to 0.8, the discount factor of the experience reward δ is set to 0.1, and ε in ε − greedy is set to 0.01. In AHP, when network attribute number N = 4, the consistency index RI = 0.9 [15]. The paired comparison matrices B of different services are shown in Table 2, and they are recognized results based on the general needs of each service, which are given by experts’ opinions. The remaining parameters are shown in Table 3.

Table 2. Comparison matrices corresponding to stream service and conversation service.

Network attribute	Stream				Conversation
Network attribute	TP	PC	C	D	TP	PC	C	D
TP	1	3	2	5	1	2	1	1/9
PC	1/3	1	1	2	1/2	1	1/3	1/9
C	1/2	1	1	3	1	3	1	1/9
D	1/5	1/2	1/3	1	9	9	9	1

Table 3. Simulation parameters of cellular network and WiFi network in this paper.

Simulation parameters	Cellular network	WiFi network
User cost (/s)	0.8	0.1
Communication delay (ms)	25 to 50	100 to 150
Bandwidth W (MHz)	4 to 6	10 to 12
Path loss L₀ at d₀ (dB)	5.27	8
Terminal fixed power consumption P₀ (mW)	10	10
Minimum received power (dBm)	−110	−100
User throughput threshold (kb/s)	10	12
Path loss exponent α	3.76	4

Firstly, we analyse the performance of this algorithm under stream service. According to AHP algorithm, the weight vector corresponding to throughput, power consumption, cost, and delay is obtained as . When the user conducts streaming media services like watching a video, the most important thing is throughput and the least is delay. Because a video usually has a large size such as 500 MB, 1 GB, or more, we need the throughput to be big enough to support the cache of the video. The user equipment only needs to read the data precached in it to perform the service, which is not real-time. So stream service does not need low delay.

Figure 3 shows the convergence comparison between the invalid action filtering and nonfiltering in the WiFi offloading algorithm under stream service. Advance filtering means that this paper filters the invalid network whose actual throughput is less than the throughput threshold before Q-learning. Assume N_AP = 30, and the total number of positions N_p passed by the user is equal to 10. The two cases are subjected to Q-learning in the same experimental scenario, and the convergence was observed. Since the action selection in Q-learning is discontinuous, user satisfaction will jump when changing the action selection strategy. As can be seen from Figure 3, after filtering out the invalid network whose throughput is less than the threshold in advance, the convergence speed of the Q-learning can be greatly accelerated.

Figures 4 and 5 show the comparison between this paper’s algorithm, Fakhfakh and Hamouda’s algorithm [11], and RSS (received signal strength) algorithm based on user satisfaction, throughput, power consumption, cost, and delay under stream service. We repeatedly scatter APs 1000 times to eliminate randomness. The number of user-passed positions N_p is equal to 10, and the number of WiFi AP is changed from 20 to 60. As can be seen from Figure 4, the WiFi offloading algorithm in this paper is superior to the other two algorithms in user satisfaction. The main difference between this paper and [11] is the reward function of the Q-learning. Fakhfakh and Hamouda’s algorithm [11] aims to minimize the residence time of the cellular network and optimize it by Q-learning, but its reward function only considers SINR, handover delay, and AP load, without considering the attributes directly related to user QoS, such as terminal power consumption, user cost, and communication delay. The RSS algorithm only considers the received signal strength of the terminal, and the terminal automatically accesses network with the largest RSS, so the user satisfaction is lower. The Q-learning algorithm in this paper not only considers the attributes directly related to user QoS but also uses two MADM algorithms to obtain the intrinsic relationship of these attributes. It establishes a more reasonable Q-learning reward function and obtains the best user satisfaction. As can be seen from Figure 5, the algorithm in this paper is similar to [11] in terms of user throughput. This is because Fakhfakh and Hamouda’s algorithm [11] regards SINR as the most important aspect of the reward function, which directly affects throughput. Since the simulation is based on the stream service, the weight of throughput accounts for almost half of all the attributes, so the two algorithms perform similarly in throughput. Since the other two algorithms do not consider power consumption and cost, the algorithm performs better on these two network attributes. The RSS algorithm selects the network with the highest receiving power to access. In this scenario, as long as the terminal is not too far away from the cellular BS, RSS of the cellular network will be the largest, so the number of WiFi offloading is reduced. Since the WiFi network uses the unlicensed frequency band, the bandwidth available to the user is usually larger than accessing the cellular network. As a result, the throughput of it becomes less. Because the delay of cellular network is usually lower than WiFi network, the RSS algorithm performs best on the delay attribute. However, since the weight of the delay attribute in the stream service is very low, the user does not pay attention to the delay of the precached data when watching video or listening to music. As a result, although the algorithm in this paper is not as good as the RSS algorithm in delay, user satisfaction is much higher than it.

Figure 6 shows the user satisfaction against the number of positions passed by agent after repeatedly scattering AP 1000 times to eliminate randomness. The number of WiFi AP N_AP = 30, and the terminal passes through 6, 8, 10, 12, and 14 positions, respectively. It can be seen that the more the positions, the higher the user satisfaction because as the number of positions increases, the states of Q-learning will increase, and the chances of agent actively selecting the optimal network to offload will also increase, so the satisfaction will also become higher.

Figures 7 and 8 show the comparison between this paper’s algorithm, Fakhfakh and Hamouda’s algorithm [11], and RSS algorithm based on user satisfaction, throughput, power consumption, cost, and delay under conversation service. The number of user-passed positions N_p is equal to 10, and the number of WiFi AP is changed from 20 to 60. According to AHP algorithm, the weight vector is obtained as , which indicates that when the user chooses conversation service like making a voice call, the most important attribute is communication delay while the other three attributes are less important. When we make a voice call, it will drastically reduce the QoS if the time we wait is too long. As can be seen from Figure 7, the WiFi offloading algorithm in this paper is superior to the other two algorithms in user satisfaction. Fakhfakh and Hamouda’s algorithm [11] does not consider the communication delay, so the satisfaction is the worst. As is mentioned above, RSS algorithm usually makes the terminal access the cellular BS which has a bigger transmit power and a lower delay, so the satisfaction is better than [11]. As can be seen from Figure 8, the WiFi offloading algorithm in this paper is superior to the RSS algorithm in throughput, power consumption, and cost, while the communication delay performance is near RSS algorithm. In this paper, delay is the most important attribute under conversation service, so the delay performance nears RSS algorithm. We also consider other attributes, which makes a few users offload to WiFi network, so the delay of this algorithm is slightly higher than the RSS algorithm.

5. Conclusion

In the heterogeneous network scenario where cellular network and WiFi network overlap, this paper establishes a model of mobile terminal WiFi offloading, and the Markov model is used to describe the change of available bandwidth. Four network attributes of user throughput, terminal power consumption, user cost, and communication delay are considered to define a user satisfaction function. The AHP algorithm is used to calculate the attribute weights, and the TOPSIS algorithm is used to obtain the instant rewards when the user accesses the cellular network or offloads to the WiFi network. Using the Q-learning algorithm, combined with instant rewards and experience rewards to update the discounted cumulative rewards, the user can make the optimal offloading decision and get the maximum satisfaction in each passing position. The simulation results show that the proposed algorithm can converge under limited times, and compared with the comparison algorithm, the algorithm has a great improvement in user satisfaction.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

Acknowledgments

This work was supported by the National Natural Science Foundation of China (61971239 and 61631020).

Open Research

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

References

1 Cisco White Paper, Cisco Visual Networking Index—Global Mobile Data Traffic Forecast, Update, 2016–2021, 2017, Cisco Systems, Prague, Czech Republic.
Google Scholar
2 Ho D., Park G. S., and Song H., Game-theoretic scalable offloading for video streaming services over LTE and WiFi networks, IEEE Transactions on Mobile Computing. (2018) 17, no. 5, 1090–1104, https://doi.org/10.1109/tmc.2017.2748592, 2-s2.0-85029177928.
10.1109/TMC.2017.2748592
Web of Science® Google Scholar
3 Chen Q., Yu G., Shan H., Maaref A., Li G. Y., and Huang A., Cellular meets WiFi: traffic offloading or resource sharing?, IEEE Transactions on Wireless Communications. (2016) 15, no. 5, 3354–3367, https://doi.org/10.1109/twc.2016.2520478, 2-s2.0-84969758490.
10.1109/TWC.2016.2520478
Web of Science® Google Scholar
4 Aijaz A., Aghvami H., and Amani M., A survey on mobile data offloading: technical and business perspectives, IEEE Wireless Communications. (2013) 20, no. 2, 104–112, https://doi.org/10.1109/mwc.2013.6507401, 2-s2.0-84877788969.
10.1109/MWC.2013.6507401
Web of Science® Google Scholar
5 Li Z., Dong C., Li A., and Wang H., Traffic offloading from LTE-U to WiFi: a multi-objective optimization approach, Proceedings of the 2016 IEEE International Conference on Communication Systems (ICCS), December 2016, Shenzhen, China, IEEE, https://doi.org/10.1109/iccs.2016.7833622, 2-s2.0-85013955960.
10.1109/iccs.2016.7833622
Google Scholar
6 Xu J., Wu S., Xu L., Zhang N., and Zhang Q., Green-oriented user-satisfaction aware WiFi offloading in HetNets, IET Communications. (2018) 12, no. 5, 501–508, https://doi.org/10.1049/iet-com.2017.0489, 2-s2.0-85044230014.
10.1049/iet-com.2017.0489
Web of Science® Google Scholar
7 Cai S., Duan L., Wang J. et al., Incentive mechanism design for delayed WiFi offloading, Proceedings of the ICC 2015—2015 IEEE International Conference on Communications, June 2015, London, UK, IEEE, https://doi.org/10.1109/icc.2015.7248848, 2-s2.0-84953725344.
10.1109/icc.2015.7248848
Google Scholar
8 Kang X., Chia Y.-K., Sun S., and Chong H. F., Mobile data offloading through a third-party WiFi access point: an operator’s perspective, IEEE Transactions on Wireless Communications. (2014) 13, no. 10, 5340–5351, https://doi.org/10.1109/twc.2014.2353057, 2-s2.0-84908032842.
10.1109/TWC.2014.2353057
Web of Science® Google Scholar
9 Jung B. H., Song N. O., and Sung D. K., A network-assisted user-centric WiFi-offloading model for maximizing per-user throughput in a heterogeneous network, IEEE Transactions on Vehicular Technology. (2014) 63, no. 4, 1940–1945, https://doi.org/10.1109/tvt.2013.2286622, 2-s2.0-84903639480.
10.1109/TVT.2013.2286622
Web of Science® Google Scholar
10 Sethakaset U., Chia Y. K., and Sun S., Energy efficient WiFi offloading for cellular uplink transmissions, Proceedings of the 2014 IEEE 79th Vehicular Technology Conference (VTC Spring), May 2015, Seoul, Korea, IEEE, https://doi.org/10.1109/vtcspring.2014.7022909, 2-s2.0-84936880800.
10.1109/vtcspring.2014.7022909
Google Scholar
11 Fakhfakh E. and Hamouda S., Optimised Q-learning for WiFi offloading in dense cellular networks, IET Communications. (2017) 11, no. 15, 2380–2385, https://doi.org/10.1049/iet-com.2017.0213, 2-s2.0-85032750549.
10.1049/iet-com.2017.0213
Web of Science® Google Scholar
12 Kunarak S. and Suleesathira R., Predictive RSS with fuzzy logic based vertical handoff algorithm in heterogeneous wireless networks, Proceedings of the 2010 10th International Symposium on Communications & Information Technologies, June 2010, Tokyo, Japan, IEEE, https://doi.org/10.1109/iscit.2010.5665177, 2-s2.0-78651248994.
10.1109/iscit.2010.5665177
Google Scholar
13 Pelaez J. I., Martinez E. A., and Vargas L. G., Consistency in positive reciprocal matrices: an improvement in measurement methods, IEEE Access. (2018) 6, 25600–25609, https://doi.org/10.1109/access.2018.2829024, 2-s2.0-85045736556.
10.1109/ACCESS.2018.2829024
Web of Science® Google Scholar
14 Zhang L. and Zhu Q., Network selection algorithm based on multi-radio parallel transmission for heterogeneous wireless networks, Journal of Signal Processing. (2014) 30, no. 10, 1176–1184.
CAS Google Scholar
15 Zhang L. and Zhu Q., Multiple attribute network selection algorithm based on AHP and synergetic theory for heterogeneous wireless networks, Journal of Electronics (China). (2014) 31, no. 1, 29–40, https://doi.org/10.1007/s11767-013-3131-1, 2-s2.0-84894318363.
10.1007/s11767-013-3131-1
Google Scholar
16 Yu H.-W. and Zhang B., A hybrid MADM algorithm based on attribute weight and utility value for heterogeneous network selection, Journal of Network and Systems Management. (2019) 27, no. 3, 756–783, https://doi.org/10.1007/s10922-018-9483-y, 2-s2.0-85056907277.
10.1007/s10922-018-9483-y
Web of Science® Google Scholar
17 Watkins C. J. C. H. and Dayan P., Technical note: Q-Learning, Machine Learning. (1992) 8, no. 3-4, 279–292, https://doi.org/10.1007/bf00992698.
10.1007/BF00992698
Web of Science® Google Scholar
18 Shafie A. E., Khattab T., Saad H., and Mohamed A., Optimal cooperative cognitive relaying and spectrum access for an energy harvesting cognitive radio: reinforcement learning approach, Proceedings of the 2015 International Conference on Computing, Networking and Communications (ICNC), February 2015, Anaheim, CA, USA, IEEE, https://doi.org/10.1109/iccnc.2015.7069344, 2-s2.0-84928019075.
10.1109/iccnc.2015.7069344
Google Scholar
19 Bazzi A., Masini B. M., Zanella A., and Dardari D., Performance evaluation of softer vertical handovers in multiuser heterogeneous wireless networks, Wireless Networks. (2017) 23, no. 1, 159–176, https://doi.org/10.1007/s11276-015-1140-8, 2-s2.0-84949646170.
10.1007/s11276-015-1140-8
Web of Science® Google Scholar

Citing Literature

All articles

WiFi Offloading Algorithm Based on Q-Learning and MADM in Heterogeneous Networks

Abstract

1. Introduction

2. System Model