Adaptive Hybrid Precoding Strategy for Cell-Free Massive MIMO

Abstract This work presents an adaptive hybrid signal precoding strategy for Cell-Free (CF) Massive multiple-input multiple-output (MIMO) systems. The proposed solution exploits the advantages of both distributed and centralized signal processing schemes to perform precoding. In our proposal, some access points (APs) are dynamically configured to be used in a centralized precoding scheme, whereas others are adjusted to employ a distributed method. Simulation results show that it is possible to achieve a good compromise between spectral efficiency (SE) and computational complexity (CC) in CF Massive MIMO systems finding a sensible balance between APs utilized in centralized and distributed precoding. To the best of our knowledge, prior-art solutions have not addressed an adaptive hybrid precoding method itself, they have only proposed a specific solution for each architecture, i.e., distributed or centralized.


I. INTRODUCTION
Distributed Massive multiple-input multiple-output (MIMO) architectures, such as Cell-Free (CF) Massive MIMO, are considered key enablers for the worldwide dissemination of fifth generation (5G) and sixth generation (6G) wireless communication networks [1]- [7].In a typical CF Massive MIMO system, multiple access points (APs) employ the same time-frequency resources in order to serve a smaller number of users.Each user owns a device named user equipment (UE) and fronthaul (FH) links connect the distributed APs to a central processing unit (CPU), as illustrated in Fig. 1.The main benefit of CF Massive MIMO is providing more uniform connectivity to the users as a result of the macro diversity gains obtained from the distributed antennas [2], [3].
One of the biggest challenges for CF Massive MIMO systems is the development of efficient strategies for signal processing, which can be performed locally (distributed) or in a centralized way.In the first case, each AP performs signal processing algorithms individually, while in the second one, the signal processing of all APs is executed in the CPU [1].Both distributed and centralized signal processing methods have advantages and disadvantages.For example, although distributed signal processing can allow efficient use of the FH transport network, it suffers from the limitation imposed by both the processing and energy capacity of the APs.The CPU, on the other hand, has more power capacity and Fig. 1.A typical CF Massive MIMO system composed of multiple APs, a smaller number of UEs and FH links connecting the APs to a CPU.In this illustration, the blue APs are assumed to be used in a centralized precoding scheme, while the red ones are assumed to employ a distributed precoding method.
resources for signal processing.However, centralized signal processing schemes can face the limitation of the FH capacity, since a large amount of overhead information may be generated.
Precoding is an important signal processing strategy for CF Massive MIMO systems, and hence precoding design gained much attention over the last years [8]- [13].Precoding uses the channel state information (CSI) on the transmitter side for interference cancellation during downlink (DL), aiming at maximizing the signal-to-interference-plus-noise ratio (SINR) and associated spectral efficiency (SE) at the receiver.In [8], each AP performs precoding locally (i.e., in a distributed fashion) using a low-complexity implementation based on the maximum ratio transmission (MRT) method.A key disadvantage of solutions based on MRT is the limited achievable SE.Such a problem can be minimized employing a local minimum mean-square error (L-MMSE) precoder, which is also distributed.However, L-MMSE is not scalable and increases the overall computational complexity (CC) [1].Following another direction, the authors in [11] concluded that CF Massive MIMO systems can achieve a remarkable SE when both the signals and channel estimates are processed in the CPU (i.e., adopting a centralized approach), as well as when the robust MMSE precoder is used.Similarly, it is shown in [12] that MMSE outperforms MRT and that a centralized implementation with optimal MMSE processing maximizes the SE.In this case, the main drawback is the unavoidable generation of overhead in the FH.
Aiming at developing signal precoding strategies for CF Massive MIMO systems that: 1) do not suffer significant impact due to the limitation of processing capacity in the APs, and 2) do not generate large amount of overhead information, this work proposes a hybrid (semi-distributed) signal precoding approach in which part of the processing is moved to the CPU, while the other remains in the APs.This strategy takes the instantaneous load in the FH network as reference to perform the switching procedure aiming at achieving a good compromise between SE and CC.Note that prior-art methods have no solution to exploit adaptively, e.g., based on the dynamic profile of the load on FH, the advantages of both distributed and centralized signal processing schemes to perform precoding, and this is what the technique discussed in this work proposes to do.The remainder of this paper is divided as follows: Section II presents a summary on precoding.Section III introduces the proposed solution.Simulations with the proposed method are discussed in Section IV and, at last, Section V presents the conclusions.
Notation: Boldface lowercase and uppercase letters denote column vectors and matrices, respectively.The superscript () * denotes complex conjugate, whereas () H denotes conjugate transpose and () T denotes transpose.The Euclidean norm, absolute value and expectation operator are expressed as ∥.∥, |.| and E(.), respectively.The notation N C (µ, σ 2 ) stands for a complex Gaussian random variable in which µ is the mean and σ 2 is the variance.The ceiling function is given by ⌈.⌉.

II. PRECODING FOR CF MASSIVE MIMO AND SYSTEM MODEL
Let us assume a time-division duplex CF Massive MIMO system consisting of L APs, equipped with N antennas each, and K single antenna UEs.The total number of antennas in the network is M = N • L, in which M > K.The channel vector h kl ∈ C N between the l-th AP and k-th UE experiences independent spatially-correlated Rayleigh fading, being defined as h kl ∼ N C (0 N , R kl ), in which R kl ∈ C N ×N is the statistical spatial correlation matrix.From this definition, it follows that Furthermore, the channel is assumed to be reciprocal and constant in each coherence time.
Let q j ∈ C be the independent symbol intended for the j-th UE, which satisfies E {q j q * k } = 0 for j ̸ = k and E |q j | 2 = 1.Thus, the DL data signal sent by the l-th AP to all users using a distributed precoding, can be written as in which w jl ∈ C N represents the precoding vector that the l-th AP assigns to the j-th UE.After the signal propagates through the air, the k-th UE receives a linear combination of signals transmitted by the APs, i.e., in which n k ∼ N C (0, σ 2 DL ) stands for the additive Gaussian noise at the receiver.Hence, assuming that the receiver treats the average effective channel as the true channel, the SINR is given by Using a centralized precoding, the DL signal received by the k-th UE can be expressed as in which T ∈ C M is the collective channel to UE k from all the serving APs and w j = [w T j1 . . .w T jL ] T ∈ C M represents the collective precoding vector assigned to the j-th UE.The collective channel is distributed as Note that the information available at the UE for signal detection is the same in both centralized and distributed scenarios.The key difference is about the precoding selection [1].In a distributed operation the detected signal adopts the distributed notation containing summations over the APs, as shown in (2), whereas in a centralized approach it is composed of the sum of the UE's signals in which each term intended for UE j consists of the unit-power DL data signal, as shown in (4).

A. MRT Precoding
The MRT computes precoding vectors aiming at obtaining large inner products with respect to vectors representing the desired transmission channels in vector space.It works well in scenarios which the noise received by a certain UE is much stronger than the interference originated from signals transmitted to the other UEs.However, it does not perform well under severe interference conditions.Anyway, MRT still is attractive due to its low-complexity implementation, being proposed, e.g., in [8], to be used in CF Massive MIMO systems in order to preserve the system scalability.
The MRT precoding vector is given by [1] w in which ρ kl ≥ 0 stands for the transmit power that the l-th AP assigns to the k-th UE, such that E ∥w kl ∥ 2 = ρ kl , and ĥkl is the channel data between the l-th AP and k-th UE estimated during the uplink (UL) training phase.The term √ ρ kl in (6) regulates the power allocation in the precoding vector, whereas the ratio employing ĥ * kl determines the direction of the signal in the vector space.

B. MMSE Precoding
One key drawback of MRT is its poor performance in terms of inter-user interference cancellation.Hence, under severe interference conditions, a better option is to use a MMSE precoder, which creates precoding vectors based on the desired channel directions in vector space but rotated to achieve orthogonality with respect to channel vectors related to non-intended UEs.The MMSE precoding vector is given by [1] w in which I M is an identity matrix, σ 2 UL is the noise power in the UL, ρ k ≥ 0 stands for the total transmit power assigned to UE k from all the serving APs, such that E ∥w k ∥ 2 = ρ k , and ĥk is the collective channel data to UE k estimated during the UL training phase 1 .Unfortunately, the canonical MMSE precoder suffers from a high CC, which comes from computing diverse matrix inversion operations when computing its precoding vectors.Moreover, computing w k also requires more CSI data than w kl .Note that w k ∈ C M , whereas w kl ∈ C N .This issue tends to increase as the number of UEs in the CF system increases.

III. THE PROPOSED HYBRID SOLUTION
The dynamic strategy for signal precoding presented in this work allows switching precoding schemes between the APs and the CPU, considering the instantaneous load in the FH network as reference metric.Thus, the precoding is hybrid, i.e., distributed (if the precoding vectors are estimated in the APs using MRT) and centralized (if the precoding vectors are estimated in the CPU using MMSE).This allows exploring the available bandwidth of the FH more efficiently as well as provides a good compromise between SE and CC.The adaptation of the proposed solution depends on a FH traffic control mechanism.Thus, an entity responsible for the traffic monitoring must be deployed in the CPU.Finally, parameters collected during the traffic monitoring may be used for making decision on where the precoding will be performed.The working principle of the proposed strategy is illustrated in Fig. 2 by a flowchart.
The method assumes that every AP in the system starts precoding the transmitted signals in a distributed fashion using local precoders such as MRT.Then, some APs are selected to switch the precoding estimation to the CPU, in which MMSE can be used in a centralized way.In order to decide how many APs have to switch the precoding calculation to the CPU, the method takes into account the load on the FH shared by different APs and selects the APs whose average SE is lower.The APs that switch the precoding to the CPU remain with basic functions such as frequency conversion, analog-to-digital conversion and vice-versa.
The proposed solution has two phases, described as follows.

A. Training Phase in the APs
First, the UEs send pilot tones to the APs for UL channel estimation (CE).The transmission channel is assumed to be reciprocal, so, UL pilot tones may be used to estimate the DL channel.In the sequence, all APs perform precoding for DL transmission in a distributed way using MRT.Finally, the average SE provided for all APs is calculated after precoding.This average is used later for making decision.The SE calculation is detailed in Section III-C.The steps composing the training phase are depict in Fig. 2 using white background boxes.

B. Execution Phase in the CPU
The execution phase is represented by gray background boxes in Fig. 2 and is composed of two stages.
1) Traffic Monitoring in the FH -In this stage, the data rates generated by the UEs (UE rat ) are converted into a FH load using: FH loa = α×UE rat , in which α is the UE rat to FH loa conversion factor and the UE rat are calculated from the estimated SE.Note that a different number of UEs may connect and disconnect to the network throughout the day.Then, the FH load may vary instantaneously and the CPU must monitor it periodically.However, since the load variations occur in an interval of time bigger than a coherence block, the condition of calculating the FH load in extremely small intervals of time is relaxed and, consequently, the impact on the computational complexity associated with this procedure is reduced.
2) Decision Making -In order to make a proper decision, the CPU compares the estimated FH loa and the maximum FH link capacity (FH cap ).If FH cap = FH loa , the precoding remains distributed, as in Brazilian Microwave and Optoelectronics Society-SBMO received 18 Aug 2022; for review 18 Aug 2022; accepted 23 Feb 2023 the training phase.If FH loa < FH cap , the available FH (FH avb ) is calculated as: FH avb = FH cap -FH loa .From the FH avb , the CPU calculates the number of APs (Nr APs ) that may shift the precoding method to the CPU using in which CE ov is the channel estimation overhead, given by: CE ov = τ p × N × K, being τ p = f × K the pilot length and f the pilot reuse factor.Note that instead of switching the precoders from the APs to the CPU randomly, the proposed strategy selects the APs that provide lower average SE in ascendant order, which is calculated in the training phase.

C. SE Calculation
The SE provided for a given user k is calculated as SE k = τd τc × log 2 (1 + SINR k ), in which τd τc is the pre-log factor, i.e., the fraction of samples per coherence block τ c used for DL transmission and τ d is the number of samples used for DL.The SE associated with each AP is obtained from the SE calculated for the users by: SE l = τd τc × K k=1 log 2 (1 + SINR kl ), in which SINR kl stands for the SINR for a given AP l and user k computed in the training phase using CSI.
In a fully distributed configuration, in which MRT precoding is used, the SE is denoted by: SE = SE MRT  L .Similarly, in a fully centralized configuration, in which MMSE precoding is employed, the SE is denoted by: SE = SE MMSE L .Hence, in a hybrid precoding scenario, in which both MRT and MMSE algorithms are jointly adopted, the total SE is computed by SE hy = SE MMSE N rAPs + SE MRT L−N rAPs .

D. Computational Complexity
The CC refers to the number of complex multiplications performed in each transmission scheme.In this work, we consider both the complexity due to signal reception, given by: C sr = τ u × M × K, in which τ u is the number of samples used in UL data, and the complexity due to the computation of the precoding vectors, denoted by C pr , such that the total CC is CC = C sr + C pr .
Since no matrix inverses are calculated in MRT, the complexity for computing its precoding vectors is neglected, i.e., C MRT pr = 0 and CC MRT = C sr .The complexity for computing the precoding vectors in MMSE is the same as for computing the receive combining vectors, which is given by Finally, in case of the hybrid strategy, the computational complexity is: , in which γ is the percentage of APs whose precoding is shifted to the CPU.

IV. SIMULATIONS
This section presents an evaluation of the proposed strategy through computer-based simulations.The objective is to demonstrate that the method leads to a good compromise between SE and CC.Two scenarios were considered.Scenario 1 emulates a CF Massive MIMO network containing a reduced number of users, such that L = 25, N = 8 and K = 3. Scenario 2 assumes a CF Massive MIMO network in which the number of antennas per AP is extremely large, such that L = 25, N = 128 and K = 16.In both cases M > K.The parameters adopted in the tests are summarized in Table I.The expectations in (3) and (5) were computed using Monte Carlo methods and the results are discussed in the sequence.Figs. 3 and 4 compare the estimated SE using the fully centralized, fully distributed and hybrid precoding schemes in scenarios 1 and 2, respectively.For the hybrid scheme, the number of centralized APs is adjusted to vary between 20%, 40%, 60% and 80%,2 and a ceiling function is used in order to follow (8).It can be observed from the cumulative distribution functions (CDFs) that the performance of distributed precoding based on MRT decreases as the number of users in the system increases.This occurs since interference cancellation is deficient in a distributed operation in which each AP can only regulate the interference that itself is creating.Such a result can be inferred from ( 6) and from the ratio in (3).On the other hand, the performance of centralized precoding based on MMSE is better.In this case, the SE increases by increasing M , since the interference suppression capability depends on the total number of antennas in the system.This can be inferred from (7) and from the ratio in (5).
Considering the hybrid configuration, the SE presents values in the range between the distributed and centralized methods.Naturally, the SE in the hybrid operation improves by increasing the number of APs whose precoding is switched to the central, which can be inferred from the formulation discussed in Section III-C.Note that, in both scenarios, one can achieve the same SE using a setup in which 80% of the APs are precoded in the CPU as well as in a fully centralized configuration, i.e., when 100% of the APs are centralized.This advantage is attested by the superposition of the blue and cian curves in Figs. 3 and 4. For example, in scenario 1, 90% of the users obtain a SE lower or equal than 7.5 bit/s/Hertz adopting a fully centralized configuration.This is exactly the same performance achieved using an arrangement in which 80% of the APs are precoded in the CPU and the other 20% are precoded locally.The numbers derived employing a setup in which 60% of the APs are precoded in the CPU and the other 40% are precoded locally are very close to those ones obtained using a fully centralized operation as well.
The next results show the benefits in terms of CC, aiming at demonstrating that the proposed method leads to a good compromise between SE and CC. Figure 5 shows the estimated CC for both simulated scenarios.It can be observed that, even achieving the same SE of a fully centralized solution, there is a reduction on CC of about 6.45% for scenario 1 and 12.22% for scenario 2 using a setup in which 80% of the APs are precoded in the CPU and the other 20% are precoded locally.The reduction on CC with respect to the fully centralized operation is even larger considering a configuration in which 60% of the APs are precoded in the CPU and the other 40% are precoded locally, even though both achieve similar SE.For these two specific scenarios, the reduction is about 12.90% and 24.44%, respectively.Over the last years, much effort has been spent on the development of precoding strategies for CF Massive MIMO systems.Although there is no unique solution for this purpose, in general, the methods are designed to hold requirements related to SE, CC, hardware complexity, energy efficiency and scalability.This work proposed an adaptive hybrid strategy for signal precoding in CF Massive MIMO systems in which part of the processing is moved to the CPU, while the other remains in the APs.Results of computer-based simulations indicated that it is possible to obtain a good compromise between SE and CC finding a sensible balance between APs using centralized and distributed precoding.This is a key difference between the proposed method and prio-art techniques which typically focus on a specific solution for each architecture.In summary, based on the dynamic profile of the FH load, the method exploits adaptively the advantages of both distributed and centralized signal processing schemes to perform precoding.For example, in low traffic periods, the FH load is low and the dynamic precoding switching allows exploiting the reduction in the FH network traffic to carry the control signals needed for performing robust precoding, such as MMSE.On the other hand, when the FH load is stressed or when low-resolution FH links are adopted, the precoding vectors can be calculated locally using MRT, a simple precoding solution that leads to a reduction on the overall CC and associated overhead.

Fig. 2 .
Fig. 2. Flowchart of the proposed hybrid precoding strategy.The training phase is depicted with white background boxes while the execution phase is represented by gray background boxes.Once the training phase is finished, the CPU must monitor the FH load periodically.

Fig. 3 .
Fig.3.Estimated SE for the fully distributed, fully centralized and hybrid precoding schemes in scenario 1, in which L = 25, N = 8 and K = 3.It is possible to achieve the same SE using a configuration in which 80% of the APs are precoded in the CPU as well as in a fully centralized configuration.

Fig. 4 .
Fig. 4. Estimated SE for the fully distributed, fully centralized and hybrid precoding schemes in scenario 2, in which L = 25, N = 128 and K = 16.It is possible to achieve the same SE using a configuration in which 80% of the APs are precoded in the CPU as well as in a fully centralized configuration.

Fig. 5 .
Fig.5.Estimated CC for the fully distributed, fully centralized and hybrid precoding schemes in scenarios 1 and 2. In the legend, HP stands for hybrid precoding.Even achieving the same SE, there is a reduction of about 6.45% and 12.22% between the fully centralized operation and the hybrid operation in which 80% of the APs are precoded in the CPU and the other 20% are precoded locally.

TABLE I .
SIMULATION PARAMETERS.