Granger causality in the frequency domain: derivation and applications

Physicists are starting to work in areas where noisy signal analysis is required. In these fields, such as Economics, Neuroscience, and Physics, the notion of causality should be interpreted as a statistical measure. We introduce to the lay reader the Granger causality between two time series and illustrate ways of calculating it: a signal $X$ ``Granger-causes'' a signal $Y$ if the observation of the past of $X$ increases the predictability of the future of $Y$ when compared to the same prediction done with the past of $Y$ alone. In other words, for Granger causality between two quantities it suffices that information extracted from the past of one of them improves the forecast of the future of the other, even in the absence of any physical mechanism of interaction. We present derivations of the Granger causality measure in the time and frequency domains and give numerical examples using a non-parametric estimation method in the frequency domain. Parametric methods are addressed in the Appendix. We discuss the limitations and applications of this method and other alternatives to measure causality.


Introduction
The notion of causality has been the concern of thinkers at least since the ancient Greeks [1]. More recently, Clive Granger [2], in his paper entitled "Investigating Causal Relations by Econometric Models and Cross-spectral Methods" from 1969, elaborated a mathematical framework to describe a form of causality -henceforth called Granger Causality 1 (GC) in order to distinguish it from other definitions of causality. Given two stochastic variables, X (t ) and Y (t ), there is a causal relationship (in the sense of Granger) between them if the past observations of Y help to predict the current state of X , and vice-versa. If so, then we say that Y Granger-causes X . Granger was inspired by the definition of causality from Norbert Wiener [3], in which Y causes X if knowing the past of Y increases the efficacy of the prediction of the current state of X (t ) when compared to the prediction of X (t ) by the past values of X alone 2 .
In the multidisciplinary science era, more and more physicists are involved in research in other areas, such as Economics and Neuroscience. These areas usually have big data sets. Data analysis tools, such as GC, come in handy to extract meaningful knowledge from these sets.
The definition of GC involves the prediction of future values of stochastic time series (see Fig. 1). The measurement of the GC between variables may be done in both the time and the frequency domains [26,[32][33][34].
In the present work, we will focus on the frequency domain representation of the GC [26,32] and, for pedagogical purposes, will discuss illustrative examples from previous works by other authors [26,34]. Our main goal is to provide a basic notion of the GC measure to a reader not yet introduced to this subject.
This work is organized as follows: in Section 2, we present the concept of an autoregressive process -a model of linear regression in which GC is based (it is also possible to formulate GC for nonlinear systems, however such a formulation results in a more complex analysis which is beyond the scope of this work [35,36]). Sections 3 and 4 are used to develop the mathematical concepts and definitions of the GC both in the time and frequency domains. In Section 5.1, we introduce the non-2 Other notions of causality have been defined, one worth mentioning is Pearl's causality [4]. Over the years, Pearl's causality has been revised by him and colleagues in a series of published works [5,6]. parametric method to estimate GC through Fourier and wavelet transforms [34]. In Section 6 we introduce examples of the conditional GC (cGC) to determine known links between the elements of a simple network. We then close the paper by discussing applications, implications and limitations of the method.

Autoregresive process
Autoregressive processes form the basis for the parametric estimation of the GC, so in this section we introduce the reader to the basic concepts of such processes [37]. A process X (t ) is autoregressive of order n (i.e., AR(n)) if its state at time t is a function of its n past states: where t is the integer time step, and the real coefficients a i indicate the weighted contribution from i steps in the past, to the current state t of X . The term (t ) is a noise source with variance Σ that models any external additive contribution to the determination of X (t ). If Σ is large, then the process is weakly dependent on its past states and X (t ) may be regarded as just noise. Fig. 2 shows examples of an AR(2) (a) and an AR(4) (b). Fitting the autoregressive coefficients a i and the noise variance Σ, for a recorded signal, is usually done by solving a Yule-Walker set of equations [15,38]. For a brief review on this topic see the Section A of the Appendix.

Granger causality in time domain
In this section we develop the mathematical concepts and definitions of GC in time domain. Consider two stochastic signals, X 1 (t ) and X 2 (t ). We assume that these signals may be modeled by autoregressive stochastic processes of order n, independent of each other, such that their states in time t could be estimated by their n past values: where the variances of 1 and 2 are, respectively, Σ 11 and Σ 22 , and the coefficients a i and c i are adjusted in order to minimize Σ 11 and Σ 22 . However, we may also assume that the signals X 1 (t ) and X 2 (t ) are each modeled by a combination of one another, yielding where the covariance matrix is given by Here, Σ * 11 , Σ * 22 are the variances of * 1 and * 2 respectively, and Σ * 12 = Σ * 21 is the covariance of * 1 and * 2 . Again, the coefficients a i , b i , c i and d i are adjusted to minimize the variances Σ * 11 and Σ * 22 . If Σ * 11 < Σ 11 , then the addition of X 2 (t ) to X 1 (t ) generated a better fit to X 1 (t ), and thus enhanced its predictability. In this sense, we may say there is a causal relation from X 2 to X 1 , or simply that X 2 Granger-causes X 1 . The same applies for the other signal: if Σ * 22 < Σ 22 , then X 1 Granger-causes X 2 because adding X 1 to the dynamics of X 2 enhanced its predictability.
We may summarize this concept into the definition of the total causality index, given by If F 1.2 > 0, there is some Granger-causal relation between X 1 and X 2 , because either Σ * 11 < Σ 11 or Σ * 22 < Σ 22 , otherwise there is correlation between X 1 and X 2 due to Σ * 12 > 0. If neither Granger-causality nor correlations are present, then F 1.2 = 0.
To know specifically whether there is Granger causality from 1 to 2 or from 2 to 1, we may use the specific indices: such that where F 1→2 defines the causality from X 1 (t ) to X 2 (t ), F 2→1 is the causality from X 2 (t ) to X 1 (t ), and F 1↔2 is called instantaneous causality due to correlations between e20200007-4 GC espectral delays X (t ) by k time steps, yielding X (t − k). We may then rewrite equations (4) and (5) as: and rearrange their terms to collect X 1 (t ) and X 2 (t ): We define the coefficients a(L) (15) and (16) into matrix form: where We apply the Fourier transform to equation (17) in order to switch to the frequency domain, where ω is the frequency and A A A(ω) is the coefficient matrix whose elements are given bỹ The expressions above are obtained by representing the lag operator in the spectral domain as L i = exp (− j ωi ).
This derives from the z-transform, where the representation of the z variable 3 in the unit circle (|z| = 1) is z −i = exp (− j ωi ) [40,41].
To obtain the power spectra of X 1 (ω) and X 2 (ω), we first isolate X(ω) in equation (18): where H(ω) = A −1 (ω) is called the transfer matrix, resulting in the following spectra: where 〈.〉 is the ensemble average, † the transposed conjugate of the matrix, and S(ω) is the spectral matrix defined as: In equation (21), S 11 (ω) and S 22 (ω) are called the autospectra, and the elements S 12 (ω) and S 21 (ω) are called the cross-spectra.
We can expand the product in equation (20) to obtain S 11 (ω) and S 22 (ω) (see Section B of the Appendix for details) as: Once we have the S 11 (ω) and S 22 (ω) spectra as the sum of an intrinsic and a causal term, we may define indices to quantify GC in frequency domain just as we did in the time domain (Section 3). For instance, to calculate the causal index, we divide the spectra by their respective intrinsic term in order to eliminate its influence. Thus, the causality index I 2→1 (ω) is defined as: and analogously, I 1→2 (ω), The instantaneous causality index I 1↔2 (ω) is defined as: .
(26) In equations (24) to (26), we have one index for each value ω of the frequency. Conversely, in the time domain there was a single index for the GC between the two signals X 1 and X 2 . Just as discussed in Section 3, the indices I 2→1 (ω), I 1→2 (ω) and I 1↔2 (ω) are greater than zero if there is any relation between the time series. They are zero otherwise.
Just like in the time domain, the total GC in the frequency domain is the sum of its individual components: The total GC is related to the so-called coherence C 12 (ω) between signals (see Section C of the Appendix): Moreover, we recover the GC in time domain through [15,34]:

Estimating Granger causality from data
In the last two sections we have mathematically defined the GC in both time and frequency domains. Here, we discuss how to calculate GC. In Section 5.1, we address a non-parametric estimation method that involves computing the Fourier and wavelet transforms of X 1 (t ), and X 2 (t ) [42][43][44]. In Section D of the Appendix, we address the parametric estimation of GC, which involves fitting the signals X 1 (t ), and X 2 (t ) to auto-regressive models (Section 2).

Calculating GC through Fourier and Wavelet Transforms
Here, we will give a numerical example for calculating and interpreting GC using a nonparametric estimation approach based on Fourier and wavelet transforms [34]. Our example consists of calculating the spectral matrix S(ω) through the Fourier transform of the signals. For two stationary 4 signals X 1 (t ) and X 2 (t ), the i , j element of the spectral matrix in equation (21) is where T is the total duration of the signal,X i (ω) is the discrete Fourier transform of X j (t ) (calculated by a fast fourier transform, FFT, algorithm) andX * j (ω) is its complex conjugate.
The variable ω contains the values of the frequency in the interval [0, f max ] corresponding to where the FFT was calculated. If ∆t is the sampling time interval of the original signals, then the sampling frequency is f s = 1/∆t and f max = f s /2, whereas the frequency interval contains n ω = 1 + T /(2∆t ) points. Then, for m signals (m = 2 in our example), we have a total of n ω spectral matrices S S S of dimensions m × m. Recall that the diagonal elements of S S S are called the autospectra, whereas the other elements are called the cross-spectra.
The transfer matrix, H H H (ω) and the covariance matrix Σ Σ Σ are given by the decomposition of S S S(ω) into the product of equation (20). The Wilson algorithm [34,46,47] (see also Section E of the Appendix) may be used for the decomposition of spectral matrices.
After determining these two matrices, we may calculate the GC indices through the direct application of equations (24) to (26).
For example, consider the autoregressive system studied in Ref. [34], which is given by: Here, X 1 (t ) and X 2 (t ) are AR(2). The variable t is the time step index, such that the actual time is t = t ∆t = t / f s . Besides, we know by construction that X 2 (t ) influences X 1 (t ) through the coupling constant C (although the opposite does not happen). The terms 1 (t ) and 2 (t ) are defined to have variance Σ 11 = Σ 22 = 1 and covariance Σ 12 = 0 (they are independent random processes). To obtain a smooth power spectrum, we simulated 5000 trials of the system in equation (31) and computed the average spectra across trials. We set the parameters as C = 0.25, f s = 200 Hz and T = 25 s, resulting in 5000 data points. When C = 0, X 1 (t ) = X 2 (t ), both processes are independent, and oscillate mainly in 40 Hz (Fig. 3a). For C > 0, the process X 1 receives input from X 2 , generating a causal coupling that is captured by the GC index in equations (24) and (25): a peak in 40 Hz in I 2→1 indicates that process 2 (which oscillates in 40 Hz) is entering process 1 in this very frequency (Fig. 3c). The flat spectrum of I 1→2 indicates that, on the other hand, process 2 does not receive input from 1. The absolute value of C changes the intensity of the GC peak. The instant causality index, I 2↔1 (ω) = 0 from equation (26), because Σ 12 = 0 for all ω. The total GC in the system is obtained from the spectral coherence, equation (28). However, only the specific GC index reveal the directionality of the inputs between 1 and 2.
This simple example illustrates the meaning of causality in the GC framework: a Granger causal link is present if a process runs under the temporal influence of the past of another signal. We could have assumed C as a time-varying function, C (t ), or even different parameters for the autoregressive part of each process alone; or the processes 1 and 2 could have been of different orders, implying in complex individual power spectra. These scenarios are more usual for any real world application [43,48]. Then, instead of observing a clear peak for the GC indices, we could observe a more complex pattern with peaks that vary in time.
Instead of using the Fourier transform (which yields a single static spectrum for the whole signal), we may use the Wavelet transform [43,49,50] to yield time-varying spectra [51,52]. Then, the auto and cross-spectra from equation (30) may be written as where To compute the Wavelet transform, we use a Morlet kernel [43], with scale s = 6 oscillation cycles within a wavelet -a typical value for this parameter [53]. Similarly to what we did to the power spectrum, we measure the wavelet transforms for 5000 trials of the system in equation (31) in order to average the results. It is important to stress that a wavelet transform is applicable in this case because ensemble averages are being taken. Otherwise, estimates would be too unreliable for any meaningful inference.
In practice, there is one matrix S S S for each pair (t , ω); or more intuitively, we have n T = T /∆t matrices S S S(ω), each one for a given time step t . The decomposition of S S S(ω) in equation (20) is done through Wilson's algorithm. Then, we may calculate GC's indices via equations (24) to (26) for each of the S S S(t , ω) matrices with fixed t . This calculation results in I 2→1 (ω), I 1→2 (ω) and I 1↔2 (ω) for each time step t . Finally, we concatenate these spectra across the temporal dimension, yielding I 2→1 (t , ω), I 1→2 (t , ω) and I 1↔2 (t , ω).
For example, consider the same set of processes in equation (31) where H (x) = 1 for x ≥ 0 (zero otherwise) is the Heaviside step function. The parameter t 0 is the time step index in which the coupling from 2 to 1 is turned off. This scenario is equivalent to having a set of concatenated constant C processes, such that the processes with t > t 0 have C = 0. Then, we expect the analysis in Fig. 3 to be valid for all the time steps t < t 0 , and no coupling should be detected whatsoever for t > t 0 .
That is exactly what is shown in Fig. 4: a sharp transition in the I 2→1 (t , ω) happens exactly at t = t 0 when C is turned off. The index I 1→2 (t , ω) remains zero for all the simulation. Again, this illustrates the meaning of GC in our system: whenever there is a directional coupling from a variable to another, there is nonzero GC in that link, in the example from signal X 2 (t ) to X 1 (t ).

Conditional Granger Causality
The concepts developed so far may be applied to a case with m variables. In this case, in order to try and infer the directionality 5 of the interaction between two signals, in a system with m signals, we may use the so-called conditional Granger causality (cGC) [15,33,54,55]. The idea is to infer the GC between signals i and j given the knowledge of all the other m − 2 signals of the system. This is done by comparing the variances obtained considering only i and j to the variances obtained considering all the other signals in the system. The AR model from Eqs (4) and (5) ends up having a total of m variables.
We may write the cGC in time domain as or in the frequency domain as But one may ask: "isn't it simpler to just calculate the standard GC between every pair of signals in the system, always reducing the problem to a two-variable case?" To answer that question, consider the case depicted in Fig. 5a: node 1 (X 1 (t )) sends input to node 2 (X 2 (t )) with a delay δ 12 and sends input to node 3 (X 3 (t )) with a delay δ 13 . Measuring the pairwise GC between X 2 (t ) and X 3 (t ) suggests the existence of a coupling between them even if it does not physically exist (as in Fig. 5b). This occurs because signals X 2 (t ) and X 3 (t ) are correlated due to their common input from X 1 (t ), and the simple pairwise GC between X 2 (t ) and X 3 (t ) fails to represent the correct relationship between the three nodes of Fig. 5a. The cGC solves this issue by considering the contribution of a third signal (X 1 (t ) on this example) onto the analyzed pair (X 2 (t ) and X 3 (t )), as described below.
To describe the system in (Fig. 5), equation (19) may be written as Figura 5: A system that GC fails to describe. a. Node 1 (X 1 (t )) sends input to node 2 (X 2 (t )) with delay δ 12 and to node 3 (X 3 (t )) with delay δ 13 . b. A simple GC calculation wrongly infer a link from X 2 (t ) to X 3 (t ) if δ 12 < δ 13 , or from X 3 (t ) to X 2 (t ) if δ 13 < δ 12 . These links are not physically present in the system and appear only due to the cross-correlation between X 2 (t ) and X 3 (t ) caused by the common input X 1 (t ).
where X 3 (t ) has the noise term 3 (t ) with variance Σ 33 . The corresponding spectral matrix S S S(ω) is and the noise covariance matrix is We want to calculate the cGC from X 2 (t ) to X 3 (t ) given X 1 (t ), i.e. F 2→3|1 in the time domain and I 2→3|1 (ω) in the frequency domain. The first step is to build a partial system from equation (35) ignoring the coefficients related to the probe signal X 2 (t ), resulting in the partial spectral matrix S S S p (ω): From this partial system, we can calculate S S S p (ω) and S S S(ω) using the nonparametric methods already discussed above. Suppose that for S S S(ω), we obtain the transfer matrix H H H (ω) and the covariance matrix Σ Σ Σ (equation (37)), whereas for S S S p (ω) we obtain the transfer matrix G G G(ω) and the covariance matrix ρ ρ ρ: The matrices H H H (ω) and Σ Σ Σ are 3 × 3. The matrices G G G(ω) and ρ ρ ρ are always one dimension less than the original ones, because they are built from the leftover rows and columns of the original system without the coefficients of the probe signal.
In the time domain, F 2→3|1 is defined as or, in general, which is used to calculate the cGC from i to j given k, in time domain. Note that if the link between i and j is totally mediated by k, ρ j j = Σ j j , yielding F i → j |k = 0. However, the standard GC between i and j would result in a link between these variables. For our example in Fig. 5, we obtain F 2→3|1 0, meaning that the influence of X 2 (t ) to X 3 (t ) is conditioned on signal X 1 (t ), and hence is almost null.
In the frequency domain, we first must define the transfer matrix Q Q Q(ω) = G G G(ω) −1 H H H (ω). However, the dimensions of matrix G G G(ω) do not match the dimensions of matrix H H H (ω). To fix that, we add rows and columns from an identity matrix to the rows and columns that were removed from the total system in equation (35) when we built the partial system (i.e. we add the identity rows and columns to the rows and columns corresponding to signal X 2 (t ) that was removed for generating S S S p (ω)), such that: We can now safely calculate Q Q Q(ω) = G G G(ω) −1 H H H (ω), from where we obtain I 2→3|1 (ω): or, in general, To illustrate the procedures for determining cGC, consider the system defined in Fig. 6 composed of 5 interacting elements [26] 6 : Here, X 1 (t ) sends its signal to X 2 (t ), X 3 (t ) and X 4 (t ) with coupling intensities 0.5, −0.4 and −0.5, respectively. Also, X 4 (t ) sends input to X 5 (t ) and vice-versa with couplings −0.25 2 and 0.25 2 respectively. Note that X 1 sends signals to X 2 and X 4 with 2 time steps of delay, and to X 3 with 3 time steps of delay. X 4 and X 5 exchange signals with only 1 time step of delay.
Calculating the cGC index through equations (41) and (44), we recover the expected structure of the network (Figs. 6 Baccalá and Sameshima [26] analysed this system using partial directed coherence and directed transfer function. 7a and 8, respectively). The gray shades in Fig. 7 and the amplitude of the peaks in Fig. 8 are proportional to the coupling constants between each pair of elements. For comparison, Fig. 7b shows the simple pairwise GC, which detects connections that are not physically present in the system. Again, this occurs because the hierarchy of the network generates correlations between many pairs of signals that are not directly connected, as discussed in the example of Fig. 5.
It is important to note that the cGC connectivity not always reflects the underlying physical (or structural) connectivity between elements [35]. The example system in Fig. 6 is an illustrative simple case in which we obtained a neat result. However, real-world applications, such as inferring neuronal connectivity from brain signals, result in a cGC matrix that is more noisy due to multiple incoming signals and multiple delays. Thus, cGC is most generally referred to as giving "functional" connectivity, instead of structural connectivity.

Conclusion
Granger causality is becoming increasingly popular as a method to determine the dependence between signals in many areas of science. We presented its mathematical formulation and showed examples of its applications in general systems of interacting signals. This article also gives a contemporary scientific application of the Fourier transform -a subject that is studied in theoretical physics courses, but usually lacks practical applications in the classroom. We also used wavelet transforms, which may motivate students to learn more about the decomposition of signals in time and frequency domain, and its limitations through the uncertainty principle.
We showed numerical examples, and explained them in an algorithmic way. We included the inference of steady and time-varying coupling, and the inference of connectivity in hierarchical networks via the conditional GC. A limitation of the GC is that it is ultimately based on linear regression models of stochastic processes (the AR models introduced in Section 2). Other measures, such as the transfer entropy, are more suitable to describe nonlinear interactions, and do not need to be fitted to an underlying model. Even though the nonparametric estimation of GC does not rely on fitting, it is still a measure of linear interaction. It is also possible to show that Figura 7: cGC and GC matrices in the time domain for a system of many interacting components. The system is given by equation (45) and is depicted in Fig. 6. a. Conditional GC, equation (41), captures exactly the physical interactions of the system. b. Simple pairwise GC, equation (29), captures the interactions, but also captures underlying correlations coming from the hierarchy of the network. Real systems often do not have a clear cGC matrix as given in panel a due to second order effects. The covariance matrices Σ Σ Σ and ρ ρ ρ were calculated using the nonparametric method.
both GC and transfer entropy yield the same results for Gaussian variables [56].
In spite of the existing debate about what exactly the GC captures, specially in the neuroscience community [57,58], GC has become a well-established measurement for the flux of information in the nervous system [59]. And here, we hope to have provided the necessary tools to those who wish to learn the basic principles and applications underlying GC.

Supplemental Material
All codes were developed in Python and are available in: https://github.com/ViniciusLima94/pyGC.

Referências
where N is the number of data points, and τ the lag applied between the time series. In particular, if X = Y , R X Y (τ) is called the autocorrelation function, and dictates the dependency of future values of a time series with its past values [60], being maximal for τ = 0. Now let us consider the general AR process of order n: Next, by multiplying equation (A2) by X (t − τ), τ ≥ 0, taking the expected value of each term, and isolating the noise term in the right-hand side of this equation we end up with: where the terms indicated by the underbraces are precisely the autocorrelation of X (t ) (R X X ), and the crosscorrelation between X (t ) and the noise (t ) (R X ). The next step is to determine the term R X (τ − i ) in equation (A3), to do so, first note that we can write X (t − τ) from equation (A2) as: which leads to: Note that we have used the fact that signal and the noise are always uncorrelated, hence n i =1 X (t −i −τ) (t ) is equal to zero.
It is known that the autocorrelation of a white noise process relates to its power spectrum via the Wiener-Khinchin theorem 7 [62], and that R X (τ) is zero but for τ = 0, where R X (0) = Σ, and Σ is the noise variance. Said that, we can rewrite equation (A5) as: and equation (A3) as: For τ > 0, equation (A7) can be written as: Rewriting equation (A8) as a matrix assigning τ from 1 to n, we get: 7 The Wiener-Khinchin theorem relates the autocorrelation R (τ) with its power spectrum S(ω) via the inverse Fourier transform: The set of n equations and n variables in equation (A9), are known as the Yule-Walker equations. Each element in matrix R is given by R X X (i − j ), where i = 0, . . . , n, and j = 0, . . . , n are the row, and column index, respectively. Note that R is a symmetric matrix due to the fact that the autocorrelation function is itself symmetric, i.e., R X X (τ) = R X X (−τ).
We can solve equation (A9) to find the coefficients matrix A A A with: where R R R −1 is the inverse matrix of R R R. Once we find the coefficients using equation (A10), the noise variance Σ can be estimated with equation (A8) for τ = 0. In practice, if we rewrite equation (A2) in the following matrix form: . . .
where = [ (n), . . . , (N − 1)] T , is the noise column vector (superscript T indicates matrix transpose). We can compute r r r and R R R in equation (A9) with: and, Here, x x x is a vector whose components are the values of the process X (t ) at each time step t starting at t = n (row 1) up to t = N − 1 (row N − n); and the matrix X X X has each of its rows given by the previous n states of X (t ), i.e. from X (t − 1) in column 1 all the way up to X (t − n) in column n. Then, A A A is a column vector of the coefficients a i to be determined. Note that, despite being possible to find the coefficients and the noise variance via the solution of the Yule-Walker equations, the idea of what order n we should use to fit the AR process is generally unknown. To estimate the order of the AR process usually we compute the Akaike information criterion [9, 63], given by 8 equa-tion (A14): for different orders n, and use the order which minimizes AIC (n) as the optimal one. Intuitively, the AIC can be thought as trying to balance the goodness of the fit measured through the term log(Σ) and the number of parameter used to fit the model, with 2n being a penalty term. The procedure to estimate the correlation matrices R R R and r r r equation (A9) shown above was done for one variable, but it can be extended to any number of variables. For instance, two variables obey the set of equations: where we may write a 1 as a column vector with components a i ,1 (the memory of the process 1 from i time steps in the past), and c 1 as a column vector with components c i ,1 (the constant that couples signal 1 from i time steps in the past to signal 2 in the present). Analogously, we may define a 2 and c 2 . Using a procedure similar to the one before, we can also write these equations into a matrix of the form:       X 1 (n) X 2 (n) X 1 (n + 1) X 2 (n + 1) . . . . . .
Notice that the matrixĀ A A now contains both their own memory coefficients, a i ,1 and a i ,2 , and the coupling coefficients, c i ,1 and c i ,2 , for any given lag in the past between i = 1 and i = n. With equation (A16), we can use equations (A12) and (A13) to compute the matricesR R R andr r r , yielding the Yule-Walker equations for the pair of processes: . . . . . .
Finally, we plug these two matrices in equation (A10) to find the coefficient matrix A A A for the process in equation (A15). Note that the matricesr r r andR R R are square matrices in which the elements are either vectors (inr r r ) or other square matrices (inR R R): where the inner matrices ofr r r are given by the column vectors R R R τ j k = [R j k (τ)] T with elements ranging from τ = 1 to n; and the inner matrices ofR R R are given by the square matrix R R R τ,i j k = [R j k (τ − i )] for τ = 1 to n as the row index and i = 1 to n as the column index. This system represents a set of equations like equation (A9) for each combination of j = 1, 2 and k = 1, 2.
Hence, the 3-variables case would follow exactly the same procedure, withx These settings generate a Yule-Walker equation for a 3×3 matrix of matrices, similarly to equations (A18) and (A19). And so on and so forth for the case of N vars variables. It is easy to see that the dimensions of the matrices involved in the Yule-Walker equations grow exponentially fast with the number of variables involved. Finally, for the multivariables case, the coefficient matrix A A A can be obtained in the same way as for the single variable case, using equation (A10).

B. Derivation of the autospectra
In order to obtain the expressions for autospectra and cross-spectra given in equations (22)-(23) following the method introduced by Geweke [32], we start by rewriting the product of the matrices given in equation (20) as: on the spectra. To overcome this problem, we consider the transformation introduced by Geweke [32] which makes possible to remove the cross term 2Σ 12 Re H 11 H * 12 in equation (B4). The procedure consists in multiplying both sides of equation (18) by the following matrix P P P : which gives the following equation: The newly obtained tranfer matrixH H H (ω) resulting from this transformation will be therefore the inverse of the coefficient matrixÃ Ã A(ω).
For example, the inverse matrix of A A A(ω) with dimensions 2 × 2 is: Therefore, the elements of the transfer matrixH H H (ω), given the fact that det (A A A(ω)) = det Ã Ã A(ω) are: H 12 = H 12 , Similarly, to obtain the autospectra for S 22 (ω) we follow the same procedure considering the tranpose of the transformation matrix P.

C. Relationship between spectral coherence and Granger causality
Based on the auto-spectra and cross-spectra defined in equation (30), we can define a spectral correlation index called coherence (C i j (ω)) given by equation (C1).
where C i j (ω) is limited in 0 ≤ C i j (ω) ≤ 1 [64,65]. Furthermore, it is possible to establish a relationship between coherence and total causality given by equation (27). Here we show how to obtain the total causality as a function of C i j (ω), where the determinant of the matrix S S S(ω) is given by: Due to the fact that S 21 (ω) = S * 12 (ω), and that |S 12 (ω)| 2 = S 12 (ω)S 12 (ω) * , equation (C2) can be rewritten as: Lastly, by inverting the arguments in the logarithm of equation (27) and applying equation (C3), the total Granger causality in frequency domain (I (ω)) can be written as a function of spectral coherence (C i j (ω)) as:

D. Parametric estimation of the GC
To illustrate the procedure for estimating the GC parametrically, consider the same example used in Section 5.1, and given by equation (31).
In order to compute the spectral GC between X 1 (t ), and X 2 (t ) parametrically, we first have to estimate the AR coefficients by solving the Yule-Walker equations (Section A of the Appendix). Assuming that we know the model order to be n = 2, equation (A16) can be written as: Next, we can compute the coefficients of the AR series using equations (A10),(A12)-(A13).
In Table D1 we show the coefficients, and the covariance matrix obtained by solving the Yule-Walker equations for n = 2. The values displayed are an average for 5000 trials of the system given by equation (31). We use such a large number of trials for consistency with the method employed to obtain Fig. 3, however this method should converge in less trials than the non-parametric one.
With the coefficient matrix A A A, the transfer matrix H H H (ω) can be obtained with equation (D2).

H H H (ω) = I I I
where, I I I is the identity matrix with dimensions N vars , A A A(τ) is the coefficient matrix for lag τ, for our example: Tabela D1: Coe cient matrix A A A T , and noise covariance matrix Σ Σ Σ, estimated through solving the Yule-Walker equations for the system given by equation (31). The parameters were averaged for 5000 trials. and, j the imaginary unit, and ω the frequencies of the signal (computed as explained in Section 5.1). The spectral matrix S S S(ω), can be obtained using H H H (ω), and Σ Σ Σ via equation (20). Subsequently, the GC components can be obtained using equations (24)- (26).
In Fig. D1, we show the GC estimated in the frequency domain from X 1 → X 2 , and X 2 → X 1 .
As can be seen in Fig. D1, the GC in fact captured the directional influence imposed by the model, and the frequency dependent GC matches the one computed with the non-parametric method in Fig. 3.

E. Wilson's algorithm
This algorithm is used to calculate the transfer matrix and the covariance matrix used to generate the spectral matrix S S S(ω). It relies on the fact that each of these matrices can be decomposed in terms of basis functions ψ(e i ω ) to be determined recursively. Once these functions are determined, the H H H and Σ Σ Σ matrices that generate the spectral matrices can be inferred.
First, we will briefly describe the mathematical details that allow this process, and then we give a pseudo-code algorithm for those willing to implement the decomposition. Further mathematical details are given in the original paper by Wilson [46]. Figura D1: GC estimated parametrically for the system in equation (31). The result is in accordance to the one obtained in Fig. 3(c), where GC was obtained with non-parametric estimation, and captures the directionality of the information flow imposed to the model. We call the square matrix with dimensions [m, m], the spectral function S S S(ω), defined in the range: −π ≤ ω ≤ π.
The spectral function can be written, through the Wiener-Khinchin theorem [62], as a function of its correlation coefficients R k , as: the correlation coefficients can be obtained via the inverse Fourier transform: The Wilson's factorization theorem [46] states that the spectral matrix S S S(ω) can be written as the product: where ψ is a generative function, represented as a square matrix with the same dimensions of S S S, and can be written as a Fourier series: where A A A k are moving average coefficients, and can be obtained via the inverse Fourier transform: ψ(e i ω )e −i kω d ω.
The function ψ(e i ω ) can be extended to the unit circle by making z k = e i kω (|z| < 1) [34,46], as: where ψ(0) = A A A 0 . The spectral matrix S S S, and the transfer matrix H H H equation (21), can also be represented in the unit circle, with H H H (0) = I I I [34]. Knowing, that we can write equation (E3), as: Comparing equation (21) therefore, Further, noticing that A A A −1 0 A A A 0 A A A T 0 A A A −T 0 = I I I , we may rewrite equation (E3) as (from here on we omit the function arguments to simplify the notation): Comparing equation (21) to equation (E10), leads to: Before going to the proper Wilson's algorithm, let us define the plus operator [.] + as such that, given a function g (ω): the plus operator would be such that: Below we present a pseudo-code for the Wilson's algorithm, where FFT and IFFT are the fast Fourier transform and its inverse, respectively. CHOLE is the Cholesky factorization and TRIU [66] returns the upper-triangle of a matrix, NORM returns the Euclidean norm of a vector. The PlusOperator algorithm is also presented below where X [:] means that we are getting all values in an specific axis of an array.