mvDFA

Multi-Variate Detrended Fluctuation Analysis using Package mvDFA

Julien P. Irmer

2023-06-15

This R package provides an implementation of multivariate extensions of a well-known fractal analysis technique, Detrended Fluctuations Analysis (DFA; Peng et al., 1995, doi:10.1063/1.166141), for multivariate time series: multivariate DFA (mvDFA). Several coefficients are implemented that take into account the correlation structure of the multivariate time series to varying degrees. These coefficients may be used to analyze long memory and changes in the dynamic structure that would by univariate DFA. Therefore, this R package aims to extend and complement the original univariate DFA (Peng et al., 1995) for estimating the scaling properties of nonstationary time series.

Install the package

Install from CRAN:

install.packages("mvDFA")

Install the latest working version from GitHub

This requires the package devtools. The following code installs devtools, if it is not installed.

if(!"devtools" %in% installed.packages()) install.packages("devtools")
devtools::install_github("jpirmer/mvDFA")

Load the package:

library(mvDFA)
#> Package 'mvDFA' version 0.0.4
#> Type 'citation("mvDFA")' for citing this R package in publications.

Test functions for approximated correlated Time Series with different Hurst Exponent

Simulate correlated time series

Choose a covariance matrix \(\Sigma\)

Sigma <- Sigma <- matrix(.5, 4, 4)
diag(Sigma) <- 1
Sigma[3,4] <- Sigma[4,3] <- -.3
Sigma
#>      [,1] [,2] [,3] [,4]
#> [1,]  1.0  0.5  0.5  0.5
#> [2,]  0.5  1.0  0.5  0.5
#> [3,]  0.5  0.5  1.0 -0.3
#> [4,]  0.5  0.5 -0.3  1.0

Simulate from the covariance matrix with approximate univariate Hurst-exponents \(H:=(.2, .5, .7, .9)'\).

set.seed(2023)
X <- simulate_cMTS(N = 10^3, H = c(.2, .5, .7, .9), 
                   Sigma = Sigma, simulation_process = "FGN0", 
                   decomposition = "chol", cor_increments = FALSE)
head(X)
#>           X1         X2         X3          X4
#> 1 -1.1803464 -2.2468738 -1.0811575 -1.15036111
#> 2  0.6675909 -0.4575880  1.4244876 -1.18225387
#> 3 -0.3257142 -0.4014033 -0.1599604 -0.08143119
#> 4 -0.6453245 -1.4622078  0.1136179 -1.48819077
#> 5  0.8467582  0.3872702  0.3847809  0.54190586
#> 6  1.4111376 -0.1111638  0.5180076  0.57863195

simulation_process = "FGN0" uses longmemo::simFGN0 to generate independent (Fractional) Gaussian Processes, which are then mixed using the Cholesky decomposition \(D\) of \(\Sigma := DD'\). Changing this argument to simulation_process = "FGN.fft" uses longmemo::simFGN.fft (using FFT) to generate independent (Fractional) Gaussian Processes, which are then mixed using the Cholesky decomposition \(D\) of \(\Sigma := DD'\) via decomposition = "chol". The differences may be inspected here:

x1 <- simulate_cMTS(N = 3*10^2, H = c(.5), Sigma = as.matrix(1), 
                    simulation_process = "FGN0",
                    cor_increments = FALSE)
plot(x1$X1, main = "H = 0.5 and FGN0", type = "l")

x2 <- simulate_cMTS(N = 3*10^2, H = c(.5), Sigma = as.matrix(1), 
                    simulation_process = "FGN.fft",
                    cor_increments = FALSE)
plot(x2$X1, main = "H = 0.5 and FGN.fft", type = "l")

Note: the true Hurst Exponents might be slightly off, since the mixing (weighted sums) of the time series makes them slightly more normal (due to the central limits theorem). Hence, small Hurst-exponents (\(H<0.5\)) tend to be upwards biased, while larger Hurst-exponents tend to be downward biased (\(H>0.5\)).

Further, the use of Cholesky might influence the generation of data in a different way than another decomposition of \(\Sigma\) would. An alternative is the eigen-value (or singular value) decomposition via decomposition = "eigen". Further research is needed!

Finally, we did not correlated the increments but the whole time series (cor_increments = FALSE), if instead increments should be correlated use cor_increments = FALSE.

Estimate multivariate extensions of Hurst exponent for correlated time series

The mvDFA function needs only the time series in long format as a matrix or data.frame object. Multiple cores can be used (maximum is parallel::detectCores()), which reduces computation time drastically for longer timer series. The steps argument manipulates the number of window sizes, which are separated logarithmically. Larger numbers of steps result in more precise estimates of the extended Hurst exponents, but require more computation time. The degree option specifies the degree of the polynomial of the time trend used for detrending.

mvDFA_result <- mvDFA(X = X, steps = 50, cores = 1, degree = 1)
mvDFA_result
#> $Ltot
#>  tot 
#> 0.61 
#> 
#> $Lgen
#>   gen 
#> 0.585 
#> 
#> $Lfull
#> varX1 varX2 varX3 varX4  cov1  cov2  cov3  cov4  cov5  cov6 
#> 0.175 0.410 0.714 0.689 0.061 0.221 0.006 0.407 0.255 0.828 
#> 
#> $LmeanUni
#> meanUni 
#>   0.497 
#> 
#> $univariate_DFA
#> uniX1 uniX2 uniX3 uniX4 
#> 0.175 0.410 0.714 0.689

The slope of the log-log regression of the root-mean square fluctuations \(RMS\) vs. window size \(s\) gives an estimate for the long memory coefficient \(L\) within the data. The output argument Ltot corresponds to the total variance approach, which uses root-mean of the sum of the diagonal elements of the covariance matrix \(C_{\nu,s}\) of the detrended fluctuation object \(Y_{\nu,s}\) within each window to compute the corresponding root mean square fluctuation \(RMS_\text{tot}(\nu,s)\) averaged across all the subsequences \(\nu\) of length \(s\). Lgen corresponds to the generalized variance approach, which uses root-mean of the determinant of the covariance matrix \(C_{\nu,s}\) of the detrended fluctuation object \(Y_{\nu,s}\) within each window to compute the corresponding root mean square fluctuation \(RMS_\text{tot}(\nu,s)\) averaged across all the subsequences \(\nu\) of length \(s\). Further, we output the corresponding coefficient for all variances and covariances individually (see Lfull), averaged across all univariate time series (see LmeanUni) and done for each univariate time series individually (see univariate_DFA).

Further arguments hidden in the primary output are the corresponding \(R^2\) per analysis approach:

mvDFA_result[6:10]
#> $R2tot
#>     R2tot 
#> 0.9938666 
#> 
#> $R2gen
#>     R2gen 
#> 0.9960478 
#> 
#> $R2full
#>      R2varX1      R2varX2      R2varX3      R2varX4       R2cov1       R2cov2 
#> 0.9778272936 0.9901501773 0.9950418981 0.9929890034 0.0551671849 0.7633510227 
#>       R2cov3       R2cov4       R2cov5       R2cov6 
#> 0.0001347133 0.7475334587 0.4258491832 0.9966240683 
#> 
#> $R2meanUni
#>   meanUni 
#> 0.9890021 
#> 
#> $R2univariate_DFA
#>     uniX1     uniX2     uniX3     uniX4 
#> 0.9778273 0.9901502 0.9950419 0.9929890

Further, the used RMS objects and the used window sizes can be extracted using the following arguments:

mvDFA_result$S
#>  [1]   7   8   9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25
#> [20]  26  27  29  31  33  36  39  42  46  49  53  58  62  67  73  79  85  92  99
#> [39] 107 116 125 135 146 157 170 183 198 214 231 249
mvDFA_result$RMS_tot
#>  [1]  1.281689  1.368366  1.458884  1.544731  1.611343  1.653864  1.760905
#>  [8]  1.799242  1.863493  1.955846  1.981228  2.028004  2.103870  2.209529
#> [15]  2.231147  2.268106  2.311766  2.325178  2.477286  2.445302  2.591545
#> [22]  2.680717  2.744865  2.859671  2.910058  3.070996  3.440538  3.630025
#> [29]  3.621038  3.828981  4.227114  4.019679  4.464728  4.718095  5.184642
#> [36]  5.506916  5.319111  5.555983  6.547079  6.257566  7.312100  7.144616
#> [43]  8.118529  7.770209  8.364011  9.113272  9.063049  9.330909 10.364417
#> [50] 10.994388
mvDFA_result$RMS_gen
#>  [1]  0.007590369  0.012494294  0.019681587  0.027052130  0.035353029
#>  [6]  0.043959272  0.055966346  0.067520320  0.082665344  0.095836936
#> [11]  0.109724870  0.135821072  0.147451569  0.166060295  0.191202942
#> [16]  0.199565142  0.226332470  0.252164023  0.286231795  0.308713374
#> [21]  0.372355755  0.402438336  0.478707604  0.556214631  0.669916959
#> [26]  0.791285660  0.963593035  1.199463797  1.342740595  1.618126034
#> [31]  1.932335106  2.152975678  2.815872783  3.601071975  4.125751287
#> [36]  5.101225013  5.545780169  6.389909270  8.286718763  9.964057240
#> [41] 11.879997793 13.619662829 19.178717599 17.568449365 21.982872905
#> [46] 23.353103586 23.653585467 30.137254988 38.556737506 43.790424706
head(mvDFA_result$CovRMS_s)
#>           [,1]      [,2]      [,3]      [,4]      [,5]      [,6]      [,7]
#> [1,] 0.6348464 0.6552272 0.6348307 0.6382515 0.4455082 0.4456798 0.4522994
#> [2,] 0.6617175 0.7051219 0.6891878 0.6799842 0.4424144 0.4626567 0.4588896
#> [3,] 0.6758102 0.7392815 0.7545125 0.7455177 0.4614919 0.4602390 0.4867858
#> [4,] 0.7077878 0.8026725 0.7664519 0.8083928 0.4723159 0.4610989 0.5212718
#> [5,] 0.7221552 0.8368071 0.8222754 0.8357834 0.4918202 0.4820956 0.5300174
#> [6,] 0.7383051 0.8367540 0.8562739 0.8699477 0.4750978 0.4990387 0.5187805
#>           [,8]      [,9]     [,10]
#> [1,] 0.4488068 0.4666323 0.3455585
#> [2,] 0.4778061 0.4824361 0.3977616
#> [3,] 0.5149633 0.5038502 0.4669152
#> [4,] 0.5056467 0.5741427 0.4922668
#> [5,] 0.5594221 0.5787164 0.5130264
#> [6,] 0.5358736 0.5812485 0.5720075

which may be used to compute the Hurst exponents by hand or to display the log-log-plots. We present this for the total and the generalized approach:

# total
df_tot <- data.frame(mvDFA_result[c("S", "RMS_tot")])
reg_tot <- lm(I(log10(RMS_tot)) ~ 1 + I(log10(S)), data = df_tot)
coef(reg_tot)[2]
#> I(log10(S)) 
#>    0.610202
mvDFA_result$Ltot
#>      tot 
#> 0.610202

plot(log10(df_tot))
abline(reg_tot)

# generalized
df_gen <- data.frame(mvDFA_result[c("S", "RMS_gen")])
reg_gen <- lm(I(log10(RMS_gen)) ~ 1 + I(log10(S)), data = df_gen)
coef(reg_gen)[2]/4
#> I(log10(S)) 
#>   0.5847321
mvDFA_result$Lgen
#>       gen 
#> 0.5847321

plot(log10(df_gen))
abline(reg_gen)


References

Peng, C. K., Havlin, S., Stanley, H. E., & Goldberger, A. L. (1995). Quantification of scaling exponents and crossover phenomena in nonstationary heartbeat time-series. Chaos, 5, 82–87. doi:10.1063/1.166141