Here we show how to use the packages features without the use of the suggested Rgeostats package

Install and load the package

After downloading the package file “Klovan_0.0.9.tar.gz”, put it in your preferred working directory and run both of the following lines:

# install.packages("Klovan_0.0.9.tar.gz", repos = NULL, type = "source")
# library(klovan)

Alternatively in your Rstudio console use this code:

Run code to load data and try transforming it

#loading data
data("Klovan_Row80", package = "klovan")
data("Klovan_2D_all_outlier", package = "klovan")

#apply a range transform to your data 
T_klovan <- klovan::range_transform(Klovan_Row80)

The data we are using is the Klovan mining data set. Which is one of the first applications of FA in the geosciences. Here we know the position of an ore body and we will use geostatistical techniques to find another one without having to start digging.

Principal Component Analysis (PCA)

Principal Component Analysis or PCA is generally used for “data reduction” in order to simplify data sets and to avoid unstable models due to Collinearity. Collinearity occurs when two variables share similar information – one variable can be predicted by the other. This may result in overfitting models. PCA is an unsupervised method and creates a new set of uncorrelated variables containing the same information as the original data set. The new set of variables, called principal components, are ordered, and thus summarize decreasing proportions of the total original variation. Therefore the first few PC’s generally contain most of the total variance. PCA begins with an eigendecomposition of a correlation matrix or a variance-covariance matrix. It produces a number of properties that could give insight into your data. Some of the key properties are:

- Eigenvectors:   also called Principal Components
- Eigenvalues:    the factor by which the eigenvector is scaled

Here we can use a covariance matrix, but we must normalize the data first. This is critical because the scales of each variable can be very different and can influence weighting. Thus, a variable with large numbers can have disproportionate influence compared with variables with small numbers.

First, the Covariance Matrix is calculated (more precisely, the variance co-variance matrix. Recall that the diagonal is the variance. The variance = Sum ((Xi-Mean)^2)/n. The co-variance = Sum ((Xi - Yi)^2)/N.

Here we will build a co-variance matrix and use PCA to find Eigenvectors and Eigenvalues

#build a correlation matrix 
cov_mtrx <- klovan::covar_mtrx(T_klovan)
cov_mtrx

##                   rank      P_Elong         P_Fe      P_Fold     P_Fract
## rank       0.103192043 -0.003208257  0.022142820 -0.01090080 -0.02441241
## P_Elong   -0.003208257  0.063067292  0.049123140  0.05301063  0.03890943
## P_Fe       0.022142820  0.049123140  0.089174087  0.02272592  0.02504509
## P_Fold    -0.010900797  0.053010631  0.022725920  0.05138066  0.03326994
## P_Fract   -0.024412411  0.038909429  0.025045086  0.03326994  0.06031784
## P_Mg       0.024037709  0.045108134  0.086553127  0.01928991  0.01925962
## P_Na       0.021634161  0.050110354  0.086658753  0.02481922  0.02422390
## P_Space   -0.013641520  0.047812205  0.010888847  0.04990160  0.02952942
## P_Sulfide  0.008976645  0.063799741  0.084928536  0.04044762  0.04438451
## P_Veins   -0.025773905  0.017171926  0.007238515  0.01501100  0.05540251
## P_XLSize  -0.036426253  0.010915760 -0.043091477  0.02726559  0.03105227
##                   P_Mg         P_Na      P_Space    P_Sulfide      P_Veins
## rank       0.024037709  0.021634161 -0.013641520  0.008976645 -0.025773905
## P_Elong    0.045108134  0.050110354  0.047812205  0.063799741  0.017171926
## P_Fe       0.086553127  0.086658753  0.010888847  0.084928536  0.007238515
## P_Fold     0.019289909  0.024819218  0.049901599  0.040447625  0.015011001
## P_Fract    0.019259624  0.024223904  0.029529417  0.044384511  0.055402511
## P_Mg       0.084689809  0.084102957  0.007827682  0.080427478  0.002092618
## P_Na       0.084102957  0.084460032  0.013469888  0.083227110  0.005727341
## P_Space    0.007827682  0.013469888  0.050093917  0.029474281  0.012896789
## P_Sulfide  0.080427478  0.083227110  0.029474281  0.091029451  0.023803588
## P_Veins    0.002092618  0.005727341  0.012896789  0.023803588  0.060461400
## P_XLSize  -0.046032117 -0.040531637  0.033547435 -0.020059364  0.032605614
##              P_XLSize
## rank      -0.03642625
## P_Elong    0.01091576
## P_Fe      -0.04309148
## P_Fold     0.02726559
## P_Fract    0.03105227
## P_Mg      -0.04603212
## P_Na      -0.04053164
## P_Space    0.03354743
## P_Sulfide -0.02005936
## P_Veins    0.03260561
## P_XLSize   0.06426804

#calulate Eiegn values
klovan::calc_eigenvalues(cov_mtrx)

##    Cov_Mtrx.eigen.values pc.names1
## 1           4.252014e-01       PC1
## 2           2.314241e-01       PC1
## 3           8.321070e-02       PC1
## 4           6.192743e-02       PC1
## 5           2.191894e-04       PC1
## 6           1.374119e-04       PC1
## 7           1.225983e-05       PC1
## 8           1.104557e-06       PC1
## 9           6.109843e-07       PC1
## 10          2.558324e-07       PC1
## 11          1.203109e-07       PC1

#calulate Eiegn vectors
klovan::calc_eigenvectors(cov_mtrx)

##            PC1         PC2         PC3         PC4         PC5         PC6
## 1  -0.07856805  0.42668399 -0.62073040  0.65258286 -0.01941089 -0.01472310
## 2  -0.32238445 -0.22314978 -0.26953288 -0.14511938  0.04358417  0.10144199
## 3  -0.43850798  0.15844034  0.13620312 -0.02763257 -0.13063243  0.23409819
## 4  -0.20190350 -0.30116374 -0.36764362 -0.17063755  0.15903064  0.06540556
## 5  -0.19929516 -0.37062341  0.18613226  0.37459016 -0.50891754 -0.23751178
## 6  -0.41818441  0.19349166  0.12462039 -0.06438825  0.39138766 -0.78402529
## 7  -0.43051891  0.14360020  0.08939972 -0.05716795 -0.08356200  0.24853397
## 8  -0.14593107 -0.32615051 -0.41299516 -0.18908752  0.24223746  0.07631821
## 9  -0.46060725 -0.04390937  0.06096861  0.02655036 -0.28054264  0.09625143
## 10 -0.09023159 -0.34192125  0.34666896  0.56624952  0.59351060  0.26777669
## 11  0.12369767 -0.48136049 -0.18895334  0.13552086 -0.21579180 -0.33278086
##              PC7          PC8          PC9          PC10          PC11
## 1   0.0003179157  0.001679805 -0.000627944  0.0005810075  6.334857e-05
## 2  -0.0423928519 -0.229674892  0.513950604  0.5513138083  3.431252e-01
## 3  -0.0220260938  0.781063370  0.007351029  0.2429824984 -1.476012e-01
## 4   0.2576091578  0.213663955  0.341887787 -0.6654802408 -5.405337e-02
## 5   0.5523319595 -0.037973393 -0.085962896  0.0310055571  1.329822e-01
## 6   0.0019480790  0.003257689 -0.010033055 -0.0001836497  3.410105e-03
## 7  -0.2201113703 -0.155860754 -0.363819280 -0.3378343682  6.283169e-01
## 8   0.1744004427 -0.036182473 -0.685148026  0.2532645562 -1.902945e-01
## 9  -0.3133071509 -0.427836076  0.061702960 -0.1143235897 -6.288403e-01
## 10 -0.0901756826 -0.008826905  0.038854895 -0.0025131718 -2.110597e-02
## 11 -0.6641710437  0.285233921 -0.065677698 -0.0428454024  1.139370e-01

Not very exciting yet! It gets better…

In the next step we calculate the sum of all the eigenvalues. This is in preparation to calculate the eigenvalue contribution. Each Eigenvalue will be divided by the sum of the eigenvalues in order to determine the proportional contribution.

The proportion of total variance explained by the eigenvalues from the Covariance Matrix. This yields the percent contribution of each eigenvalue.

eigen_data <- klovan::eigen_contribution(T_klovan)
eigen_data

##     EigenValues    CumSum CumSumPct pc.names
## 1            NA 0.0000000   0.00000      PC0
## 2  5.300873e-01 0.5300873  53.00873      PC1
## 3  2.885103e-01 0.8185977  81.85977      PC2
## 4  1.037366e-01 0.9223342  92.23342      PC3
## 5  7.720330e-02 0.9995375  99.95375      PC4
## 6  2.732577e-04 0.9998108  99.98108      PC5
## 7  1.713078e-04 0.9999821  99.99821      PC6
## 8  1.528401e-05 0.9999974  99.99974      PC7
## 9  1.377022e-06 0.9999988  99.99988      PC8
## 10 7.616980e-07 0.9999995  99.99995      PC9
## 11 3.189395e-07 0.9999999  99.99999     PC10
## 12 1.499884e-07 1.0000000 100.00000     PC11

We can also visualize the proportional contribution using a scree plot.

klovan::scree_plot(eigen_data)

Scree Plot

Here we can also customize how our plot looks.

klovan::scree_plot(eigen_data, bar_fill = "green", outline = "darkgreen", eigen_line = "lightblue")

Scree Plot

Alternatively we can use a correlation matrix, thus we do not have to normalize our data first. We can run all the analysis with the matrix aswell.

#make a correlation plot 
klovan::cor_mtrx(Klovan_Row80)

##                  rank         C_X         C_Y     P_Elong        P_Fe
## rank       1.00000000  0.02506345  0.99968586 -0.03976893  0.23082906
## C_X        0.02506345  1.00000000  0.00000000  0.70097957  0.95466075
## C_Y        0.99968586  0.00000000  1.00000000 -0.05735591  0.20696699
## P_Elong   -0.03976893  0.70097957 -0.05735591  1.00000000  0.65503528
## P_Fe       0.23082906  0.95466075  0.20696699  0.65503528  1.00000000
## P_Fold    -0.14970466  0.40728820 -0.15996296  0.93123874  0.33573939
## P_Fract   -0.30943160  0.54374003 -0.32316112  0.63085545  0.34149181
## P_Mg       0.25713098  0.93539066  0.23376029  0.61721603  0.99597365
## P_Na       0.23173484  0.95085566  0.20796845  0.68659362  0.99854463
## P_Space   -0.18973487  0.23983387 -0.19580744  0.85063683  0.16291837
## P_Sulfide  0.09261898  0.96261047  0.06851416  0.84202670  0.94263453
## P_Veins   -0.32630071  0.31994401 -0.33442466  0.27808516  0.09858051
## P_XLSize  -0.44729482 -0.38545137 -0.43777160  0.17145666 -0.56921263
##               P_Fold    P_Fract        P_Mg       P_Na    P_Space   P_Sulfide
## rank      -0.1497047 -0.3094316  0.25713098  0.2317348 -0.1897349  0.09261898
## C_X        0.4072882  0.5437400  0.93539066  0.9508557  0.2398339  0.96261047
## C_Y       -0.1599630 -0.3231611  0.23376029  0.2079684 -0.1958074  0.06851416
## P_Elong    0.9312387  0.6308555  0.61721603  0.6865936  0.8506368  0.84202670
## P_Fe       0.3357394  0.3414918  0.99597365  0.9985446  0.1629184  0.94263453
## P_Fold     1.0000000  0.5976258  0.29242518  0.3767581  0.9836081  0.59142846
## P_Fract    0.5976258  1.0000000  0.26946932  0.3393873  0.5372043  0.59898725
## P_Mg       0.2924252  0.2694693  1.00000000  0.9944205  0.1201780  0.91600515
## P_Na       0.3767581  0.3393873  0.99442050  1.0000000  0.2070837  0.94917925
## P_Space    0.9836081  0.5372043  0.12017804  0.2070837  1.0000000  0.43647540
## P_Sulfide  0.5914285  0.5989872  0.91600515  0.9491792  0.4364754  1.00000000
## P_Veins    0.2693213  0.9174184  0.02924389  0.0801472  0.2343419  0.32085762
## P_XLSize   0.4744795  0.4987385 -0.62394721 -0.5501372  0.5912475 -0.26225796
##               P_Veins   P_XLSize
## rank      -0.32630071 -0.4472948
## C_X        0.31994401 -0.3854514
## C_Y       -0.33442466 -0.4377716
## P_Elong    0.27808516  0.1714567
## P_Fe       0.09858051 -0.5692126
## P_Fold     0.26932130  0.4744795
## P_Fract    0.91741841  0.4987385
## P_Mg       0.02924389 -0.6239472
## P_Na       0.08014720 -0.5501372
## P_Space    0.23434193  0.5912475
## P_Sulfide  0.32085762 -0.2622580
## P_Veins    1.00000000  0.5230651
## P_XLSize   0.52306512  1.0000000

The following code chunk produces a correlation plot called a correlation “circle,” or a “circle” plot. The concept is to plot the loadings from one PC against another. Recall that we already understand that the first 3 PC’s account for 99% of the variance in the data set. So, we need only investigate these PC’s. The correlation plot is a 2D plot, so you can only compare 2 PC’s at a time. In the code chunk below, we compare PC1 against PC2.

klovan::pc_cor_plot(Klovan_Row80, "PC1", "PC2")

#see function decimation for more information on how to interpret this plot

Factor Analysis (FA)

The primary use of Factor Analysis (FA) is to better interpret the meaning behind the various PC’s. While PCA and FA are similar, there are some fundamental differences, particularly in the objectives. In brief, PCA focuses on data reduction in a way that the variance from a particular data set may be explained by a set of fewer new variables we call PC’s. In FA, the premise is a bit different. In FA, the goal is to uncover a “phantom” variable(s) that could not be directly measured.

Run the next code chunk which will perform factor analysis with “Varimax” orthogonal rotation. Additionally, the factor scores will be calculated.

#factor analysis 
klovan::factor_analysis(Klovan_Row80)

##           VariableName   FAC1   FAC2   FAC3
## rank              rank  0.287  0.103 -0.560
## P_Elong        P_Elong  0.557 -0.812  0.174
## P_Fe              P_Fe  0.990 -0.127 -0.006
## P_Fold          P_Fold  0.217 -0.957  0.190
## P_Fract        P_Fract  0.309 -0.387  0.854
## P_Mg              P_Mg  0.990 -0.095 -0.071
## P_Na              P_Na  0.982 -0.176 -0.026
## P_Space        P_Space  0.039 -0.984  0.171
## P_Sulfide    P_Sulfide  0.904 -0.371  0.208
## P_Veins        P_Veins  0.110 -0.068  0.961
## P_XLSize      P_XLSize -0.633 -0.532  0.549

Run the next code chunk. The component axes are renamed to reflect that they are now factors and rotated. The R stands for “rotated,” and the L stands for “loadings”

Note: How is this correlation plot different from the previous one made with principle components?

#make correlation plot using factor data
klovan::factor_cor_plot(klovan::factor_analysis(Klovan_Row80), "FAC1", "FAC2")

#customize color choices 
klovan::factor_cor_plot(Klovan_Row80, "FAC1", "FAC3", text_col = "pink", line_col = "red")

Inverse Distance Weighting (IDW)

The following chunk of code uses the interpolation algorithm, Inverse Distance Weighting (IDW). We will plot the mapped solution for each rotated factor score. Recall from above we have the position of a known ore body. Here it is circled in white.

#use inverse distance weighted method for interpolation
inv_dis_data <- klovan::inv_dis_wt(Klovan_Row80, 3)

## Using C_X & C_Y to make grid

## [inverse distance weighted interpolation]
## [inverse distance weighted interpolation]
## [inverse distance weighted interpolation]

summary(inv_dis_data) #view data summary

##       C_X            C_Y           value               FA           
##  Min.   : 900   Min.   : 900   Min.   :-2.12254   Length:9792       
##  1st Qu.:2176   1st Qu.:1872   1st Qu.:-0.54952   Class :character  
##  Median :3452   Median :2925   Median : 0.05459   Mode  :character  
##  Mean   :3452   Mean   :2925   Mean   : 0.02533                     
##  3rd Qu.:4727   3rd Qu.:3978   3rd Qu.: 0.60358                     
##  Max.   :6003   Max.   :4950   Max.   : 2.09071

klovan::factor_score_plot(a, FALSE, data = Klovan_Row80) + ggforce::geom_ellipse(
    aes(x0 = 3900, y0 = 1700, a = 600, b = 400, angle = pi/2.5),
    color = "white")

Inverse Distance Weighting Plot

Recall that each factor score represents the elements of a “phantom” variable. From factor analysis, it is determined that the phantom variable RC1 represented “Paleo-Temperature” due to the high loadings of Mg, Fe, Na, and Sulfide. RC2, represented “Deformation Intensity” due to high loading of Cleavage Spacing, Elongation, and Fold. V RC3 represented “Porosity/Permeability” due to high loadings of Veins and Fractures. Producing a contoured map of each of these phantom variables can help define their relationship to the known ore body. The facet image below shows the isolines from each of these variables separately along with the position of the known iron ore body. The isolines for each RC variable define the limits or extents of the ore body.Finding the intersection of these three isoline sets can help located any new potential ore body.

The plot shows us the cutoffs should be approximately:

RC1 = 0.0 and 0.5

RC2 = -1.0 and 0.0

RC3 = -0.5 and 1.0

The objective now is to determine where all 3 cutoffs from the rotated component scores overlap.

In the following code chunk the contours are overlain.The intent of this overlaying set of isolines is to assist you in finding a second ore body based on the PCA/FA analysis. Run this code to highlight the new body where the component scores overlap.

klovan::factor_score_plot(inv_dis_data, TRUE, data = Klovan_Row80) + ggforce::geom_ellipse(
    aes(x0 = 3900, y0 = 1700, a = 600, b = 400, angle = pi/2.5),
    color = "white") +
  ggforce::geom_circle(
    aes(x = NULL, y = NULL, x0 = 3300, y0 = 3500, r = 400),
  color = "white", 
  inherit.aes = FALSE)

Inverse Distance Weighting Plot

Unfortunately those plot look fairly bad especially around our data points. Next we will use a better method.

Kriging

Kriging has several advantages over Inverse Distance Weighting (IDW) making it a preferable interpolation method. It effectively models spatial autocorrelation, accounting for both distance and spatial arrangement. Kriging provides the best linear unbiased prediction, thus minimizing estimation variance, and offers an estimate of prediction error, allowing for the assessment of prediction quality. Although computationally intensive and requiring model fitting, its flexibility in adapting to different spatial patterns makes it superior to IDW.

First we must make a good looking variogram model.

#plot variogram for use in kriging
klovan::vario_plot(Klovan_Row80, factor = 1, nugget = .214, nlags = 10, sill = 7.64507, range_val = 6271.83, model_name = "Gau1")

We can use these results as parameters for the klovan::kriging function. We can also use the klovan::kriging.auto function to automatically find the best variogram.

Here we will apply the kriging and see a summary of the data

#use kriging method for interpolation and 
#plot with factors overlapped and separated
krig_data <- klovan::kriging.auto(Klovan_Row80, 3) #customize available for nugget, psill, range, and model see function documentation for more details
summary(krig_data) #view data summary

Now we can see our old and new ore bodies where the component scores overlap

klovan::factor_score_plot(krig_data, TRUE, data = Klovan_Row80) + ggforce::geom_ellipse(
    aes(x0 = 3900, y0 = 1700, a = 600, b = 400, angle = pi/2.5),
    color = "white") +
  ggforce::geom_circle(
    aes(x = NULL, y = NULL, x0 = 3300, y0 = 3500, r = 400),
  color = "white", 
  inherit.aes = FALSE)

Krige Plot

Conclusion

The Klovan v0.0.9 package is a robust tool for performing Principal Component Analysis (PCA) and Factor Analysis (FA) using R. These techniques enable users to simplify complex datasets and uncover underlying patterns. The utility of Klovan was demonstrated through a detailed analysis of a geological dataset, identifying valuable insights into potential ore body locations.

Klovan v0.0.9

Jonathan Gordon, Hope Omodolor, Eric Helfer, Jeffrey Yarus, Roger French

2024-01-29

What is Klovan v0.0.9 and what does it do?

Here we show how to use the packages features without the use of the suggested Rgeostats package

Install and load the package

After downloading the package file “Klovan_0.0.9.tar.gz”, put it in your preferred working directory and run both of the following lines:

Alternatively in your Rstudio console use this code:

Run code to load data and try transforming it

The data we are using is the Klovan mining data set. Which is one of the first applications of FA in the geosciences. Here we know the position of an ore body and we will use geostatistical techniques to find another one without having to start digging.

Principal Component Analysis (PCA)

Here we can use a covariance matrix, but we must normalize the data first. This is critical because the scales of each variable can be very different and can influence weighting. Thus, a variable with large numbers can have disproportionate influence compared with variables with small numbers.

First, the Covariance Matrix is calculated (more precisely, the variance co-variance matrix. Recall that the diagonal is the variance. The variance = Sum ((Xi-Mean)^2)/n. The co-variance = Sum ((Xi - Yi)^2)/N.

Here we will build a co-variance matrix and use PCA to find Eigenvectors and Eigenvalues

Not very exciting yet! It gets better…

In the next step we calculate the sum of all the eigenvalues. This is in preparation to calculate the eigenvalue contribution. Each Eigenvalue will be divided by the sum of the eigenvalues in order to determine the proportional contribution.

The proportion of total variance explained by the eigenvalues from the Covariance Matrix. This yields the percent contribution of each eigenvalue.

We can also visualize the proportional contribution using a scree plot.

Here we can also customize how our plot looks.

Alternatively we can use a correlation matrix, thus we do not have to normalize our data first. We can run all the analysis with the matrix aswell.

Factor Analysis (FA)

Run the next code chunk which will perform factor analysis with “Varimax” orthogonal rotation. Additionally, the factor scores will be calculated.

Run the next code chunk. The component axes are renamed to reflect that they are now factors and rotated. The R stands for “rotated,” and the L stands for “loadings”

Note: How is this correlation plot different from the previous one made with principle components?

Inverse Distance Weighting (IDW)

The following chunk of code uses the interpolation algorithm, Inverse Distance Weighting (IDW). We will plot the mapped solution for each rotated factor score. Recall from above we have the position of a known ore body. Here it is circled in white.

The plot shows us the cutoffs should be approximately:

RC1 = 0.0 and 0.5

RC2 = -1.0 and 0.0

RC3 = -0.5 and 1.0

The objective now is to determine where all 3 cutoffs from the rotated component scores overlap.

In the following code chunk the contours are overlain.The intent of this overlaying set of isolines is to assist you in finding a second ore body based on the PCA/FA analysis. Run this code to highlight the new body where the component scores overlap.

Unfortunately those plot look fairly bad especially around our data points. Next we will use a better method.

Kriging

First we must make a good looking variogram model.

We can use these results as parameters for the klovan::kriging function. We can also use the klovan::kriging.auto function to automatically find the best variogram.

Here we will apply the kriging and see a summary of the data

Now we can see our old and new ore bodies where the component scores overlap

Conclusion