Multi-Step-Ahead Spectrum Prediction for Cognitive Radio in Fading Scenarios

— This paper analyzes multi-step-ahead spectrum prediction for Cognitive Radio (CR) systems using several future states. A slot-based scenario is used, and prediction is based on the Support Vector Machine (SVM) algorithm. The aim is to determine whether multi-step-ahead spectrum prediction has gains in terms of reduced channel-switching and increased network throughput compared with short-term prediction. The system model is simulated in software using an exponential on-off distribution for primary-user traffic. A classical energy detector is used to perform sensing. With the help of simplifications, we present new closed-form expressions for the detection probability under AWGN and Rayleigh fading channels which allows the appropriate number of samples for these scenarios to be found. The performance of the proposed predictor is thoroughly assessed in these scenarios. The SVM algorithm had low prediction error rates, and multi-step-ahead idle-channel scheduling resulted in a reduction in channel switching by the SU of up to 51%. An increase in throughput of approximately 4% was observed for multi-step-ahead prediction with three future states. The results also show channel-switching savings can be achieved in a CR network with the proposed approach.

gains in terms of throughput using MLP prediction. In [13], the prediction stage in the framework proposed by [8] was added prior to the sensing stage to allow the SU to perform sensing only on channels predicted as idle. Both studies [8] and [13], considered only an additive Gaussian white noise (AWGN) channel and did not analyze a fading channel.
To the best of the authors' knowledge, joint analysis of multi-step-ahead prediction and sensing in fading environments was not reported in literature. In addition, multi-step-ahead prediction study with SMV was still not validated. The main contributions of this work are:  The multi-step-ahead joint prediction sensing analysis for the purpose of resourceful channel usage in CR network. This scheme allows channel-scheduling to be implemented as the SU can choose the channel with the largest number of consecutive time slots predicted as idle allowing for a lower number of channels switching. The prediction is realized with SVM ML, attaining a low probability of prediction error. Expressions for the probability of a cognitive radio network with multiple channels and multi-step-ahead prediction are theoretically derived.  A new SU frame with multi-step-ahead prediction and sensing is defined and it is determined whether there is a resulting increase in throughput in the CR network. For this research and future studies, the handoff (channel-switching) is very important and must be analyzed for a more realistic point-of-view of SU frame. The handoff affects the entire framework of the SU, as the time spent by the SU in this process decreases the transmission and the production of results consistent with the real scenarios. For this reason, this work introduces an algorithm that tries to prevent as much as possible the handoff of channels.
 A simplified expression for determining the detection probability of the energy detector for Rayleigh fading channels is proposed which allows the number of required samples for reliable detection in fading scenarios to be accurately determined. This is motivated because Rayleigh fading channel severely attenuates the signal and impacts primarily energy detection other than spectrum prediction. Consequently, a low probability of detection influences the data feeding the predictor algorithm, resulting in a severe reduction of predictor performance.
The results show that there was an improvement in the energy-detector performance in Rayleigh fading scenarios. There was a reduction in channel switching and an increase in CR network throughput when multi-step-ahead prediction schemes were used for both an AWGN channel and a Rayleigh fading scenario. The rest of this work is divided as follows Section II describes the system model used here. Section III shows the proposed improvements. Section IV presents the results of the simulation. Finally, in section V conclusions are drawn.

II.
SYSTEM MODEL Figure 1 shows a CR network consisting of a SU transmitter and a SU receiver which coexist with native PU users. This Figure also shows a primary base station that establishes communication with several PU equipment, each link is established using a single channel. The PU equipment can access licensed channels, which can also be accessed opportunistically by the SU equipment. In Fig. 1, the SU device is sensing all the licensed channels. When a channel is available, the SU can access it to transmit to the corresponding cognitive receiver. The SU receiver is aware of the channel in use by the SU transmitter by means of a Common Control Channel (CCC) with limited bandwidth.
Interference between an SU and PU can occur if spectrum-detection errors occur, in which case the SU device transmits based on the erroneous assumption that the analyzed channel is idle. The occupancy of any given channel due to PU activity is assumed to be independent of occupancy in any other channel. One PU can occupy only one channel at a time and the channels are assumed to be subject to Rayleigh fading. When designing a wireless system, the Rayleigh fading model is one of the most appropriate to consider, as it serves the purpose of modeling a transmission without line of sight, such as dense urban areas [7].
A stochastic process with two states models PU traffic occupancy: busy or idle. Simulations were performed with an exponential on-off traffic distribution model, in which the interval when the channel is busy follows an exponential distribution with mean 1 and the interval when it is idle has mean 0 . The system model is composed of multiple channels, each one occupied by one PU.
The history of PU occupancy is modeled by binary states denoted ∈ {0,1}, where is the state of the slot being analyzed. A busy slot is represented by1, and an idle slot by 0. III.

PROPOSED IMPROVEMENTS
In the works referred to in the previous sections the slot or frame is divided into stages. In [8] only two stages are presented: sensing and transmission; in [13], in contrast, the frame is divided into three stages: prediction, sensing and transmission. In order to deal with multi-step-ahead prediction we made two modifications to the SU frame structure, these can be seen in Fig. 2, which shows a series of larger frames, made up of an initial frame and − 1 subsequent frames, where is the number of predicted future states. The first frame has a duration and is divided into four stages: prediction, with a duration ; sensing, with a duration ; channel-switching latency or delay, with a duration ; and the effective transmission of information, lasting − − − . The next − 1 frames are segmented into up to three stages in a similar manner: sensing, channel-switching latency and transmission. In these frames the actual transmission has a duration − − . After the − 1 three-stage frames, the SU begins the prediction cycle again with the initial frame divided into four stages.
The prediction stage, which occurs only in the initial frame, is performed with an ML algorithm. In this stage, the SU estimates the state of channels based on historical data and selects only those that are considered idle to perform sensing. In the second stage (sensing), a channel that has been predicted to be idle is chosen at random and the energy-detection technique is used to evaluate the actual state of the channel. Data transmission takes place if the channel is in fact idle. Channel switching occurs if the SU determines that the current channel will be busy [31], [32]. Channel switching results in latency because RF (radio frequency) circuitry must be reconfigured. If the predicted channel is the same as that already occupied by the SU, this stage is not counted, increasing the effective information transmission time. This model is more realistic than previous schemes because it considers the channel-switching latency and does not repeat the prediction stages. The channel-switching latency (handoff) affects the entire SU frame, the time spent with handoff impacts the effective transmission decreasing the throughput and it is relevant for the analysis. Its consideration is very important as this work presents an algorithm that tries to avoid channel switching as much as possible.

A. SVM and Prediction Stage
Prediction in the context of CR can be performed through different techniques described in the literature including Machine Learning (ML) algorithms. ML algorithms can be divided into two categories, classifiers and regressors. Classifiers perform the classification by estimating discrete values. One example is the Support Vector Machine (SVM) algorithm, this was developed in [33] and it is used in many areas of science such as market prediction, energy load estimation, among others.
Like other ML techniques, standard SVM performs data classification, it is considered a nonprobabilistic binary linear classifier. It takes a training stage with a set of samples taken from the total space problem data. The algorithm performs a process of categorizing the input data into just one of two classes. One way of viewing the operation of the algorithm is to treat each input data as a point in a hyperplane. The algorithm then finds a line with the maximum distance between two types of data categories. After executing the algorithm, the output is associated with only one of these categories.
The purpose of the algorithm is to maximize the differences between the two categories to achieve correct classification.
The main reasons for choosing the SVM technique instead of the MLP neural network as a prediction tool are the small number of free parameters and the guarantee of converging to the optimal solution [26]. Although the training phase of the two tools is computationally expensive, at the time of execution with the real data set, both have the advantage of being very efficient.
Given a data set , where is a series of discrete samples = 0,1,2,3,4,5, − 1 , an estimated value in the future is defined as + . The classification aims at defining the following equation: where is the weight of the prediction function and is called limit value. The function is the kernel function which is used when the input data space is not linear. The kernel function is a procedure to map the data to a higher dimension and then to perform linear regression. Several kernel functions meet the necessary conditions for mapping, such as hyperbolic, Gaussian and polynomial tangents [26]. In the training phase, the objective is to find the optimal weights and limit values. One of the criteria is to find values for with similar levels between them, this is verified by the Euclidean norm. Another criterion performed in the training phase is to reduce the error generated in the value estimation process; the residual error is given by: where ⋅ is a cost function which depends on the input , prediction and regression function (1). Lagrange multipliers are used to solve this kind of convex optimization problem, which can be executed in linear arrangement in consequence of the kernel mapping process.
In this work, we used the linear kernel instead of the Gaussian kernel. Empirical tests showed a higher performance for the linear kernel which is aligned with the work of [21] that presented superior performance of the linear kernel.
In this case, = +1 and = +1 . For multi-step-ahead prediction, > 1. The training phase consists of providing the algorithm with a large amount of data, previously known and in this case, separated in several observed windows and its associated future-data windows.
The algorithm in its training phase acts to minimize the error between the values of the predicted future window and the already known future-data window.
The MATLAB 2018 software was used to generate the simulation environment. This work utilizes the multiclass model for Support Vector Machine [35] presented in MATLAB native Statistics and Machine Learning Toolbox. As mentioned before, we adopted the linear kernel parameter and the SVM algorithm performs the classification for multiple classes. Although we use binary data to define spectrum occupation or idle states, in our implementation we chose to leave the observed window in binary state and we chose to perform a simple operation in the future-data window by aggregating the binary values within a future window with decimal value by performing a simple binary-decimal conversion. This simple conversion takes place before entering the data in the predictor algorithm, and this scheme is first used in the training phase. For example, considering = 3 , the algorithm defines 2 = 8 possible different classes, a direct result of the binary combination of the three predicted slots. In its training phase, therefore, the algorithm makes the association between an observed window (in binary format) with a decimal number. In the validation phase when the predictor algorithm receives an observed window, it returns a decimal value (multiclass). After performing the prediction by the SVM algorithm in the validation phase of the real data in a multiclass format, we perform the inverse decimal-binary conversion. As stated, the input data (observed window) are not aggregated into multiple classes but are fed into the predictor as binary data.
The predicted vector can also be analyzed in terms of the number of future idle slots. This analysis is important for the scheduling algorithm, which will be described in more detail in Section III-D. The number of future idle slots is counted from the current slot and only takes into account sequential idle slots. The SU can use this information to choose a channel that has been predicted to be idle the longest. The number of subsequent sequential idle slots is denoted by where ≤ and = { +1 , +2 , +3 , …, + }. As an example, if a multi-step-ahead future window with = 5 equals = {0,0,0,1,0}, the vector that considers only the next idle sequential slots is given by = {0,0,0}. This number can also be represented in decimal format; hence, in this example = 3, i.e., there are only three sequential future idle slots. In this work, imperfect prediction was considered because the SVM technique can make erroneous predictions, causing the predicted value for slot , + , to be different from its real value, + . The probability of prediction error is denoted .

B. Sensing Stage
Channels that are predicted to be idle are selected for the second stage of the SU frame, the sensing stage. Energy detection [6] is a well-known sensing technique in which the output is proportional to the energy of the received signal. The amount of energy detected is compared with a threshold, and a state corresponding to either the presence or absence of a signal at the receiver is returned. The result can be presented as a binary hypothesis on occupancy: Noise is represented by ( ), ℎ is the channel gain, ( ) is the PU signal, and ( ) is the signal received at the SU. Under hypothesis only noise is present at SU receiver, while under hypothesis the PU signal is also present at the SU receiver side. The channel-occupancy probabilities, which are shown in Section II for both hypotheses, are functions of traffic density. We also assume slow or quasi-static fading such that the channel gain h remains constant over several time-slots.
Noise samples are assumed to be AWGN with zero mean and variance 2 . The signal-tonoise ratio (SNR) between the PU and SU is given by = ℎ 2 / 2 , where 2 is the average power of ( ). The received signal ( ) passes through the energy detector, and the output of the detector is used to determine the test statistic, . The detection probability and the probability of false alarm are set according to the test statistic as where is the detection threshold.
Closed-form equations for and in AWGN channels have already been derived in [8], [34], as well as the approximations Where represents the number of samples and = is the Gaussianfunction. One of the advantages of working with (5) and (6) is that not only are they simpler forms, but the inverse -function, −1 , can be used. Thus, by rearranging (6), we can obtain the minimum value of for a given design value , as follows: In scenarios with low SNR, i.e., low values of , it is more difficult to detect a signal effectively and the energy detector therefore needs more samples.
According to [8], the minimum number of samples, , required for effective detection in an AWGN channel is: Equation (8) Knowing that = 1/2 − 1/2 erf / 2 and the approximation erf x = [2/(1 + −2.5 )] − 1, as a simplification of the results found in [36], Equation (9) can be rewritten as: Combining (5) and (10) gives the unprecedented closed-form expression for at low-SNR regime: This equation is an excellent approximation for (9), as can be seen in section IV-B, and greatly simplifies the algebraic manipulation required, as can be seen below.

C. Rayleigh Channel Energy Detector
The instantaneous detection probability depends on the SNR and the number of samples as can be seen in (5), and may also be affected by the state of the wireless channel. Shadowing and fading effects on the channel, for example, make it necessary to evaluate the average detection probability which can be obtained by averaging over the distribution of the channel SNR [34], , which gives: In a Rayleigh fading scenario, the distribution of the SNR is = 1 exp − . Using and (11) in (12), and with some algebraic manipulation, gives the novel expression In the above expression, 2 1 is a Gauss hypergeometric function of the form 2 1 ( , , , ) [37]. This equation is a compact form for ( ) found in [34] and allows the specific number of samples for a given value of ( ) to be determined by iteration.

D. Transmission Stage
The last stage of the SU frame is the effective transmission of information based on the prediction and sensing stages. We will present the same calculations developed in [13] for = 1, next we will extend the calculations for > 1.
When prediction is made for only one future state, i.e., = 1, and considering only one channel, let 0 ( ) and 1 ( ) denote, respectively, the probabilities of the prediction stage classifying this channel as idle or busy which are given by According to [13], considering the channels, the prediction probability indicating idle channel and the probability of all the channels being predicted to be busy is When this last event happens, the SU selects a random channel to sense. At no time do the prediction results affect the sensing result.
The joint probability distribution of events prediction, sensing and the true state for only one channel 0 1 , 1 1 , . . . , 7 1 and for the entire network 0 , 1 , . . . , 7 considering prediction for only one future state ( = 1) was derived in [13] and is shown in Table III of [13]. The first subscript number in the probabilities is the decimal representation of the binary sequence representing the 2 3 = 8 different joint events.
Extending the analysis to multiple future states ( > 1), for the sake of simplicity we first consider = 2, i.e., the SU algorithm analyzes only two future states, and then generalize the results to any . For only one channel, the probability of predicting both states as idle is 0 = 0 2 . The probability of predicting the first state to be idle and the next to be busy is 1 = ( 0 )( 1 ).
Similarly, the probability of predicting the first state to be busy and the next to be idle is 2 = ( 1 )( 0 ). Finally, the probability of predicting both states to be busy is 3 = Note that there are X = 2 different combinations of idle-busy prediction results when considering future slots. Let ∈ {0,1, …, (X − 1)} denotes the decimal representation of a combination of idle-busy prediction results. For instance, considering three future slots, = 5 means that the prediction result is (busy, idle, busy), which is represented by the binary sequence {101}. Then, for any the probability of a combination of idle-busy prediction result can be computed as where, is the multinomial coefficient. Note that this is the probability that 0 occurs 0 times, 1 occurs 1 times, ..., and X−1 occurs X−1 times. Still using = 2 we set another example, in order to facilitate understanding of the prediction technique with multiple slots. Under the perspective of multiple channels, considering = 3, the weight vector has four elements ( 0 , 1 , 2 , 3 ) and a possible value for the vector could be (2,1,0,0) as 2 + 1 + 0 + 0 = 3.
This means that 0 occurs two times, 1 occurs one time and there are no occurrences for both 2,1,0,0 = 3 2,1,0,0 The prediction probability indicating idle channel availability across the -channel network is a union of probabilities like the one shown above.
Therefore, among all possible state combinations of vectors, we are only interested in those that involve idle-state sequences starting at ŝ +1 . In the previous example considering By performing the analysis for the entire CR network made up of channels, we get an extension of the result presented in [13]. The generalized equation for the probability of the entire network with multiple channels being idle in multiple future states is therefore Equation (22) is original, and takes into account all cases where there is a sequence of idle channels starting with +1 . The probability of the entire network being busy is given by Which accounts for the events: all channels busy in all future states or only some channels busy in future states.
In the case of two future states, three more columns are added to Table III of [13]. As the number of future states increases, the number of rows in the table increases exponentially. Assuming the case where = 2 and considering the analysis of only one channel, if the prediction is idle, the sensing is idle and the true state is also idle for all = 2 future states analyzed we will have the extension of the result of [13] and the result in this particular case is the square of the first row, third column of Table III of [13]. The total quantity of possible combination for = 2 and just one channel is 2 2×3 = 64 rows, generalizing we have = 2 ×3 different probability of true channel state, prediction and sensing with = 2 and just one channel.
The theoretical probability distribution of true channel state, prediction and sensing for > 1 and the entire network (multiple channels), can be found by multiplying the probabilities of the first term of equations from third column of Table III of [13] and the probability equations for the whole network, (22) and (23): where ∈ 0,1 , which selects equations (22) or (23). The index is described above and ∈ 1, …, 2 ( ×3) . The product inside the brackets in (24) is associated to the quantity of future states analyzed, thus we have a sequence of multiplications of the eight rows in Table III of [13] and as a result ∈ 1, …, 8.
The throughput of a CR network is defined as the total amount of data transmitted divided by the total transmission time. If a channel is sensed busy, the SU will not transmit data and the throughput will be zero, i.e., 0 = 0, where index 0 indicates the non-transmission state. If a channel is sensed idle and is indeed idle, and if the gain of the channel, ℎ , which is subject to Rayleigh fading, is assumed to be an ergodic process [38], [39], the throughput of the channel will be This is the throughput in the best case because there is no interference from the PU in the chosen channel.; is the SNR between the transmitting and receiving SU, ( ) ≜ ∞ e and the factor is given by As multi-step-ahead prediction estimates states into the future, the prediction stage does not need to be run more than once, so there is a slight increase in network throughput. When the channel is sensed idle but is actually occupied by the PU, the throughput will be reduced as there will be interference from the PU in the channel. Therefore corresponds to the transmission capacity in the least favorable circumstances. The network throughput, , is given by Both summations refer to cases where sensing was performed but the actual channel state changes the associated throughput. Equation (28)

E. Channel Scheduling
Channel scheduling increases the efficiency of a CR network by reducing the amount of channel switching, reducing transmission pauses and making communication more stable. Channel switching is investigated in [12] in the context of increased transmission quality in LTE networks. In multichannel communication using timeslots, the user should remain on the same channel for as long as possible as channel switching involves hand-off mechanisms that can result in computational costs and delays for the end user.
In this context, a change in the chosen channels for sensing compared with other studies found in the literature was developed. Because the multi-step-ahead predictor can estimate multiple slots ahead, it determines which channels will be idle longer and which will have the greatest number of sequential idle slots.
The proposed algorithm implemented in the SU looks for the channel with the greatest possible number of idle future states among the available channels. Thus, channels for which all idle states are prioritized for sensing. Figure 3a shows busy slots in red and idle slots in blue. As the choice of channel is based on multi-step-ahead prediction, the SU selects the third channel, as DOI: http://dx.doi.org/10.1590/2179-10742020v19i41069 473 illustrated. In the case of a traditional prediction algorithm, the SU could choose any of the three available channels as the first slot in each of them is considered idle.
The proposed algorithm also yields an improved result for = 1 and gives preference to the channel that had already been chosen in the previous state, avoiding a potentially unnecessary channel change. The pseudo code for the algorithm can be verified in Algorithm 1 (depicted in Fig. 3b). ALGORITHM    In this work we used a training set with = 1000 sample slots, and the total number of validation slots per simulation round was set to = 30000 , the same value as that used in [12].
Simulations were performed for training and performance evaluation. As described in Section III-A, each SU was trained with different slots to avoid biased results. For the remainder of the simulation, the same design values as those in [13] were used according to Table I. Where the SNR of the PU sensed by the SU complies with the design described in Section III-C.
The procedures, steps, equations and algorithms 1 presented in Sections II and III were implemented through the MATLAB® 2018 Software from Mathworks® in a personal computer (PC), where the "Statistics and Machine Learning Toolbox" and "Neural Network Toolbox" are required. The hardware of the PC is a Core i5-7300HQ CPU @2.5 GHz, 8GB DDR4@ 2400MHz. The algorithms were tested using the following MATLAB functions: "fitcknn", "fitctree", "fitcsvm", "fitecoc", "predict", "network" and "train". It was possible to perform several computer simulations that validate the main contributions of this work and present a comparison with what was provided in the literature.

A. Imperfect Prediction
The SVM technique was chosen after a thorough comparison between several ML techniques. Four machine learning algorithms were compared, namely: k-Nearest Neighbor [22], Classification Tree [23], Neural Network [18] and SVM. Each algorithm was tested with a set of data used in this work, that is, a set of spectrum occupation slots with the possible states 1 or 0, with their traffic following the Poisson-binomial distribution. Figure 4 shows the probability of prediction error, , a measure of prediction performance as defined in Section III-A. This quantity is obtained by measuring the predicted data and the actual data.
The figure shows the probability of prediction error as a function of the amount of future data. As expected, the level of uncertainty always increases as we increase the quantity of predicted points, this is known as the error accumulation problem [15]. In the figure it can be observed that the SVM algorithm offers the best performance among all the other tested algorithms. This feature of SVM ML holds for all tested number of future slots. The k-Nearest Neighbor and Classification Tree techniques were 11% and 13% faster compared to the SVM algorithm in the training phase. The Neural Network technique was 16% slower than the SVM algorithm in the training phase. In the prediction phase, the k-Nearest Neighbor and Classification Tree algorithms were 6% and 8% faster compared to the SVM technique, while the Neural Network technique was 4% slower. The SVM was still the best option as it provides the best accuracy. Fig.4. The performance of predictors measured in terms of the probability of prediction error as a function of the number of predicted future slots. The traffic density for all simulations was set to ρ=0.65.
A point that weighed heavily in favor of choosing the SVM tool is that it always converges to the optimal result [26]. This is not guaranteed by the Neural Network, the second best ML algorithm in terms of probability of wrong detection. Another point that was taken into consideration in our analysis was the computational cost. In our study, both the neural network algorithm and the SVM algorithm spent a very similar amount of time for its execution in the training phase, with a discrete difference in favor of the SVM ML algorithm. In the data validation phase (execution of the real data) the SVM technique operated about 17% faster than the neural network algorithm. Finally, to the best of our knowledge the authors are unaware of any multi-step-ahead prediction application with SVM.
Predictor performance is measured in terms of the probability of prediction error , as described in Section III-A. In [13], the authors use two scenarios with constant for different traffic density values ρ. In the present work, however, it was shown that the probability of prediction error depends on traffic density. This is illustrated in Figure 5, where is non-linear and the least favorable cases correspond to Rayleigh fading scenarios. Increasing traffic density alone leads to a variation of approximately 160% in the probability of prediction error. The scenario used in the present study is therefore more realistic as it takes into account the dependence of predictor performance on traffic. Figure 5 shows the probability of prediction error (which is the performance benchmark of the predictor) as a function of traffic density. For the cases where the traffic density is very high or very low, the predictor has an excellent performance obtaining results very similar, although different, for the two curves. In both cases, the traffic density is extremely high or extremely low and the uncertainty of future states is low. The predictor tends to be more accurate in these cases because these scenarios are more static. The fact that the probability of error of the SVM predictor is approximately 10% in both cases is associated with the residual error equation (2)  In cases where the traffic density is close to = 0.5 , the uncertainty of the predictor assumes the highest level since there the scenarios are more dynamical.  Figure 6 shows the very high degree of similarity between the curves of as a function of generated by the equations in [34] and equations (9) and (13) in the present paper. There is a minimal difference between the calculated number of samples for both cases. In computational terms it was observed that computing using (13) is less complex and 4.54 times faster than using equation (4.2) of [34].

B. The appropriate number of samples for the Rayleigh fading channel
Performance gains can be achieved during the sensing stage when the correct number of samples is used in the system design. In the case of the Rayleigh channel, there was an increase in energydetector performance when we used the specific design for this channel type.  Rayleigh channel was used can also be seen in Figure 7. The correct number of samples = 66,718 for a Rayleigh channel was determined from (13).

C. Channel Scheduling
In this simulation we used the specific design for a Rayleigh channel with a past observation window of = 10 , future observation window of = 5 and traffic density values ranging from = 0.1 to = 0.9 . Figure 8 shows a comparison of the amount of channel handoff with and without multi-step-ahead scheduling, which was described in Section III-E. In the scheduling scheme, the predictor uses the multi-step-ahead algorithm to choose the channel with the largest number of future idle slots. For low values of most channels are idle, and when no decision criterion is used by the SU there is a large amount of channel handoff. When the multi-step-ahead prediction scheme is used, channel handoff is limited for low values of and increases as ρ increases, but decreases when = 0.9. For ease of interpretation, the amount of channel handoff was normalized by dividing the number of channel hops by the number of slots analyzed. For = 0.6 there was a saving of up to 42% in the number of channel hops when multi-step-ahead scheduling was used, and for = 0.5 the result of Figure 8 was 51%. When the traffic density is close to = 0.9, it is possible to observe that the number of handoff is very low and the two curves are very close. In this case, most channels are occupied during several time intervals. The SU does not perform the transmission because it is most often in the stop state. The two algorithms perform similarly in this extreme case and the network throughput is severely affected by the high traffic density D. Throughput of the Cognitive Radio Network Figure 9 shows the normalized throughput as a function of traffic density for an AWGN scenario. Two simulations were performed for = 1: one without channel scheduling and one with scheduling, as proposed in Section III-E. Two simulations were also performed for = 3. In all four simulations the same number of past points ( = 15) was used. Curves of the theoretical normalized capacity for an AWGN channel with = 0.2 obtained using the approach described in Section III-D are also shown. All the simulations yielded curves very close to the theoretical curves, and when channel scheduling was used there was an increase in network throughput. When channel with the most probability to be idle is chosen, there are fewer interruptions due to channel switching. In the case of = 1 the scheduling algorithm gives preference to the channel that was used in the past state, avoiding interruptions. A gain in terms of throughput is observed when multi-step-ahead prediction is used. Figure 9 shows that the curve for = 3 with scheduling was extremely close to the curve for = 1 without scheduling. The curve for = 1 is also generated in [13] which analyses the case for single step prediction. For = 0.5 the increase in throughput was about 4%. This increase is a result of multistep-ahead prediction based on the structure proposed in Section III, which ensures an increase in the effective transmission time. Figure 9 also compare the cases with scheduling and without scheduling.
In the case of multi-step prediction without scheduling the average probability of prediction error is slightly greater than the case with single step prediction due the error accumulation problem. The scheduling algorithm implemented in the SU improves the cognitive network throughput performance; the factor that contributes to this increase is that the channel-scheduling algorithm always chooses the channel with the greatest number of predicted idle slots. Thus, the transmission has the least amount of hops possible.
In the case of multi-step-ahead prediction, accumulation of error causes the SU predictor to perform slightly less than the single-step-ahead predictor for < 0.3 . For multi-step, the algorithm tries unsuccessfully to generate scheduling, ultimately generating unnecessary channel hopping. These changes cause loss of performance in the network throughput. The one-step forward algorithm has a more robust prediction and less channel change for these cases. For the other cases from ≥ 0.3 there is a performance gain by the multi-step-ahead prediction. As much as there is a problem of error accumulation, the algorithm can choose more coherently channels with the largest number of free channels, through scheduling.  The theoretical curves for the Rayleigh channel based on these properties are also shown. As in the previous simulations, a value of = 0.2 was used. Also, as in Figure 9, the simulated curves are very close to the theoretical curves. Again, channel scheduling produced gains in terms of throughput.
There is an increase of approximately 3% for = 0.5 for both = 1 and = 3. The result with scheduling for = 3 is very close to the result without scheduling for = 1. The gains in Figure   10 are significantly lower than those in Figure 9 because the Rayleigh channel is subject to greater signal degradation than the AWGN channel, and the performance of the predictor was adversely affected. The results for Rayleigh channel with single slot prediction and multi-step prediction were not explored in literature, the comparison is only realized with the AWGN case.
The results presented in this Section showed significant improvements when compared to some results presented in the literature. First, an improvement of up to 4% in the normalized transfer rate of the cognitive radio network (secondary user network) was found when using the prediction with multi-step ahead compared to the work [13]. This result was possible because the enhanced structure 482 also presented a channel scheduling scheme for multiple steps that had not been explored in the literature, this approach allowed the SU to choose the channel that has the best availability for the longest time. We also used the Support Vector Machine (SVM) machine learning technique, which had a lower probability of prediction error compared to other techniques found in the literature [18], [22] and [23]. This ML algorithm achieved the best results not only for one point in the future but for multiple points in the future. Finally, we present a new way of finding the probability of detection of the energy detector for the channel with Rayleigh fading that is simpler and faster to perform than the equation found in [34].
V. CONCLUSIONS This work focused on the formulation of multi-step-ahead spectrum-prediction schemes for fading channels. Innovative analytical expressions for the probability of a cognitive radio network with multiple channels and multi-step-ahead prediction are theoretically derived. The energy-detector sensing technique was analyzed for the Rayleigh fading scenario, and a new detection-probability equation for low-SNR scenarios was derived. Using this equation, the exact number of samples required to meet the system design requirements was calculated. This approach also resulted in improvements in the throughput of the CR network. Using multi-step-ahead an algorithm was explored which permit SU prioritize channels that are most likely to be idle longer so there is a saving in terms of hand-off periods for users. The proposed approach also reduces the number of repetitions of time spent on predictions in SU frame, generating throughput gains for the entire network.
Excellent levels of prediction error were achieved using the SVM technique. Prediction error was found to depend on PU traffic, which was included as a variable parameter to improve the simulation.
The main suggestions for future research include realize an energy-efficient multi-step prediction algorithm for different fading scenarios and propose a system model were SUs employing multi-step prediction can cooperate with each other providing enhanced network throughput. The continuous study of efficient techniques for the use of the spectrum opens space for new practical models of cognitive radio to be implemented in new generations of wireless communication systems.