Case Study

Henner Simianer1,2, Johannes Heise3, Stefan Rensing3, Torsten Pook1,2,4, Johannes Geibel1,2,5, Christian Reimer1,2,5

1 University of Goettingen, Department of Animal Sciences, Animal Breeding and Genetics Group

2 University of Goettingen, Center for Integrated Breeding Research

3 IT solutions for animal production (vit)

4 Wageningen University and Research, Animal Breeding and Genomics Group

5 Friedrich-Loeffler-Institut, Institute of Farm Animal genetics

E-Mail: johannes.geibel@fli.de; hsimian@gwdg.de

1 The data

The case study handles the situation in German Holstein Frisian (HF) breeding where the general Index (RZG) is based on several traits. The composition of the index changed in 2021, so that there are now eight instead of 6 traits in the index. Also, the exact trait definition was changed. In total, there are 10 “breeding goal traits”. See Simianer et al. “How economic weights translate into genetic and phenotypic progress, and vice versa” Genet Sel Evol 55, 38 (2023) for more details on it.

The economic weights (w) of the indices are as follows, with traits not in the index having zero weight.

tn <- c("RZM", "RZN", "RZEo", "RZEn", "RZR", "RZKm", "RZKd", "RZH", "RZC", "RZS")

w_old <- c(0.45, 0.2, 0.15, 0, 0.1, 0.03, 0, 0, 0, 0.07)
names(w_old) <- tn; w_old
#>  RZM  RZN RZEo RZEn  RZR RZKm RZKd  RZH  RZC  RZS 
#> 0.45 0.20 0.15 0.00 0.10 0.03 0.00 0.00 0.00 0.07

w_new <- c(0.36, 0.18, 0, 0.15, 0.07, 0.015, 0.015, 0.18, 0.03, 0)
names(w_new) <- tn; w_old
#>  RZM  RZN RZEo RZEn  RZR RZKm RZKd  RZH  RZC  RZS 
#> 0.45 0.20 0.15 0.00 0.10 0.03 0.00 0.00 0.00 0.07

Breeding values are scaled to a mean of 100 index points and a additive genetic standard deviation of 12 index points. This makes it easy to set up the genetic variance-covariance matrix ( \(\Gamma = G\) ) from the genetic correlation matrix by simply multiplying the correlation matrix with a constant of 122.

G <- matrix(
  c(1.0,0.13,0.13,0.07,-0.15,0.11,0.07,0.09,-0.02,0.04,
    0.13,1.0,0.23,0.28,0.43,0.25,0.22,0.78,0.13,0.46,
    0.13,0.23,1.0,0.92,0.02,0.09,-0.05,0.25,-0.1,0.19,
    0.07,0.28,0.92,1.0,0.06,0.08,-.03,0.31,-.1,0.25,
    -0.15,0.43,.02,0.06,1.0,0.32,0.19,0.41,0.04,0.15,
    0.11,0.25,0.09,0.08,0.32,1.0,0.0,0.25,0.04,0.13,
    0.07,0.22,-.05,-.03,0.19,0,1.0,0.23,0.05,0.10,
    0.09,0.78,0.25,0.31,0.41,0.25,0.23,1.0,0.1,0.57,
    -.02,0.13,-.10,-.1,0.04,0.04,0.05,0.1,1.0,0.02,
    0.04,0.46,0.19,0.25,0.15,0.13,0.10,0.57,0.02,1.0)
  ,byrow = TRUE, nrow = length(tn), ncol = length(tn), dimnames = list(tn, tn)
)
G <- G*12^2
G
#>         RZM    RZN   RZEo   RZEn    RZR   RZKm   RZKd    RZH    RZC    RZS
#> RZM  144.00  18.72  18.72  10.08 -21.60  15.84  10.08  12.96  -2.88   5.76
#> RZN   18.72 144.00  33.12  40.32  61.92  36.00  31.68 112.32  18.72  66.24
#> RZEo  18.72  33.12 144.00 132.48   2.88  12.96  -7.20  36.00 -14.40  27.36
#> RZEn  10.08  40.32 132.48 144.00   8.64  11.52  -4.32  44.64 -14.40  36.00
#> RZR  -21.60  61.92   2.88   8.64 144.00  46.08  27.36  59.04   5.76  21.60
#> RZKm  15.84  36.00  12.96  11.52  46.08 144.00   0.00  36.00   5.76  18.72
#> RZKd  10.08  31.68  -7.20  -4.32  27.36   0.00 144.00  33.12   7.20  14.40
#> RZH   12.96 112.32  36.00  44.64  59.04  36.00  33.12 144.00  14.40  82.08
#> RZC   -2.88  18.72 -14.40 -14.40   5.76   5.76   7.20  14.40 144.00   2.88
#> RZS    5.76  66.24  27.36  36.00  21.60  18.72  14.40  82.08   2.88 144.00

In our case, reliabilities (\(r^2_{AI}\)) of the estimated breeding values are available for all traits.

r2 <- c(0.743, 0.673, 0.638, 0.717, 0.541, 0.635, 0.604, 0.720, 0.499, 0.764)
names(r2) <- tn; r2
#>   RZM   RZN  RZEo  RZEn   RZR  RZKm  RZKd   RZH   RZC   RZS 
#> 0.743 0.673 0.638 0.717 0.541 0.635 0.604 0.720 0.499 0.764

Note: If you calculate an index for observed phenotypes based on own performances, the reliability ( \(r^2\) ) equals the narrow sense heritability ( \(h^2\) ).

If we regard a situation where breeding value estimation is only performed for the actual index traits, \(r^2\) needs to be subsetted.

r2_old <- r2[w_old != 0]
r2_new <- r2[w_new != 0]

For the case of the old index, estimates of the observed genetic gains of the traits in the index are available from evaluations of the HF breeding program.

deltautr <- c(0.28401392, 0.21637703, 0.17932963, 0.09986764, 0.08767239, 0.13273939)
names(deltautr) <- tn[w_old>0]

We may further need heritabilities (\(h^2\)) of the traits to translate genetic gain into phenotypic gain.

h2 <- c(0.314, 0.090, 0.194, 0.194, 0.013, 0.049, 0.033, 0.061, 0.014, 0.273)
names(h2) <- tn

Further, to allow residual errors to be correlated, we need a variance-covariance matrix of breeding values (\(H\)) as starting point for the internal estimation process.


H <- matrix (
  c(1.00,0.06,0.06,-.20,0.05,0.03,
    0.06,1.00,0.14,0.40,0.20,0.46,
    0.06,0.14,1.00,-.03,0.03,0.15,
    -.20,0.40,-.03,1.00,0.30,0.13,
    0.05,0.20,0.03,0.30,1.00,0.11,
    0.03,0.46,0.15,0.13,0.11,1.00),
  nrow=6, ncol=6,
  dimnames = list(tn[w_old != 0], tn[w_old != 0]))
H <- H * 144
H
#>         RZM    RZN   RZEo    RZR   RZKm    RZS
#> RZM  144.00   8.64   8.64 -28.80   7.20   4.32
#> RZN    8.64 144.00  20.16  57.60  28.80  66.24
#> RZEo   8.64  20.16 144.00  -4.32   4.32  21.60
#> RZR  -28.80  57.60  -4.32 144.00  43.20  18.72
#> RZKm   7.20  28.80   4.32  43.20 144.00  15.84
#> RZS    4.32  66.24  21.60  18.72  15.84 144.00

2 Usage of the package IndexWizard

All index calculations are performed within the function SelInd() . Minimum required input are w, G and r2. Note that all vectors and matrices need to be named to allow checks and sorting based on trait names.

res <- SelInd(
  w = w_old,
  G = G,
  r2 = r2_old
)
#> - only reliabilities given
#>   --> setting up E based on r2 and G
#>   --> residual errors are assumed to be uncorrelated!
#> - no selection intensity provided
#>   --> can only compute the relative genetic and phenotypic trend
#> - no heritabilities provided
#>   --> cannot compute the expected relative phenotypic trend
#> - No observed composition of genetic progress given
#>   --> cannot calculate realized weights

The function by default informs the user, if certain calculations cannot be performed. This behavior can be silenced with setting verbose = FALSE.

res <- SelInd(w = w_old,  G = G,  r2 = r2_old, verbose = FALSE)

The summary() function further gives information on the number of traits, and all available entries in the SelInd object.

summary(res)
#> An object of class SelInd. The index is based on:
#>    - n = 10 breeding goal traits
#>    - 6 traits with economic weight != 0
#>    - m = 6 index traits
#> 
#> Residual errors were assumed to be uncorrelated.
#> 
#> The SelInd object contains the entries w, G, E, r2, b, b_scaled, var_I, d_G_exp_scaled, r_IP, r_IH, del_d_scaled, delta, D, del_d_abs.
#> The SelInd object does not contain the entries H, h2, w_real, b_real, i, d_G_obs, d_G_exp, d_P_exp, d_P_exp_scaled, dG.

The print() function further re-formats the output to a more readable format, which includes rounding to two decimals. If you want to see the “pure” list, use print.default() instead.

res
#> An object of class SelInd. The index is based on:
#>    - n = 10 breeding goal traits
#>    - 6 traits with economic weight != 0
#>    - m = 6 index traits
#> 
#> --------------------------------------------------------------------
#> 
#> Matrices G, E present in SelInd object, but not printed to reduce complexity.
#> Extract them by the use of the `$` operator.
#> 
#> --------------------------------------------------------------------
#> 
#> Vaiance of the index (`var_I`), expected genetic gain (`dG`)
#> and assumed selection intensity (`i`):
#> 
#> $var_I:
#> [1] 41.29
#> 
#> --------------------------------------------------------------------
#> 
#> Economic (`w`) and index weights (`b`). Potentially they are rescaled (`scaled`)
#> so that sum(abs()) = 1. The weights might represent realized (`real`) weights
#> based on an observed composition of the genetic trend:
#> 
#> $w:
#>  RZM  RZN RZEo RZEn  RZR RZKm RZKd  RZH  RZC  RZS 
#> 0.45 0.20 0.15 0.00 0.10 0.03 0.00 0.00 0.00 0.07 
#> 
#> $b:
#>  RZM  RZN RZEo  RZR RZKm  RZS 
#> 0.45 0.24 0.17 0.09 0.07 0.11 
#> 
#> $b_scaled:
#>  RZM  RZN RZEo  RZR RZKm  RZS 
#> 0.40 0.21 0.15 0.08 0.06 0.09 
#> 
#> --------------------------------------------------------------------
#> 
#> reliabilities (`r2`) and heritabilities (`h2`) of the traits:
#> 
#> $r2:
#>  RZM  RZN RZEo  RZR RZKm  RZS 
#> 0.74 0.67 0.64 0.54 0.64 0.76 
#> 
#> --------------------------------------------------------------------
#> 
#> Composition (`d`) of genetic (`G`) / phenotypic (`P`) trend.
#> The composition might be observed (`obs`) or expected (`exp`).
#> The composition might be scaled (`scaled`) so that sum(abs()) = 1:
#> 
#> $d_G_exp_scaled:
#>  RZM  RZN RZEo RZEn  RZR RZKm RZKd  RZH  RZC  RZS 
#> 0.20 0.16 0.11 0.10 0.05 0.08 0.04 0.14 0.00 0.11 
#> 
#> --------------------------------------------------------------------
#> 
#> Analytic measures:
#> Correlation between the overall index and the phenotype of trait j (`r_IP`) and
#> the loss in prediction accuracy when omitting trait j from the index (`r_IH`).
#> Further the approximate first derivative of d_G_exp with respect to w (`del_d_scaled`) with rows
#> being scaled  so that sum(abs()) = 1.
#> 
#> $r_IP:
#>  RZM  RZN RZEo  RZR RZKm  RZS 
#> 0.80 0.68 0.49 0.25 0.37 0.43 
#> 
#> $r_IH:
#>  RZM  RZN RZEo  RZR RZKm  RZS 
#> 0.30 0.05 0.03 0.01 0.01 0.01 
#> 
#> $del_d_scaled:
#>        RZM   RZN  RZEo  RZEn   RZR  RZKm  RZKd   RZH   RZC   RZS
#> RZM   0.15 -0.15 -0.09 -0.10 -0.14 -0.06 -0.02 -0.14 -0.01 -0.13
#> RZN  -0.16  0.23 -0.01  0.02  0.15  0.04  0.05  0.18  0.04  0.12
#> RZEo -0.11 -0.01  0.36  0.33 -0.03 -0.02 -0.05  0.02 -0.05  0.02
#> RZEn -0.13  0.03  0.33  0.31 -0.01 -0.02 -0.04  0.05 -0.04  0.06
#> RZR  -0.14  0.15 -0.02  0.00  0.31  0.12  0.05  0.13  0.02  0.04
#> RZKm -0.09  0.06 -0.03 -0.02  0.19  0.49 -0.01  0.07  0.02  0.01
#> RZKd -0.08  0.16 -0.13 -0.10  0.18 -0.03  0.07  0.13  0.04  0.08
#> RZH  -0.15  0.19  0.01  0.04  0.14  0.05  0.04  0.17  0.03  0.18
#> RZC  -0.06  0.19 -0.16 -0.13  0.11  0.07  0.05  0.12  0.06  0.04
#> RZS  -0.14  0.13  0.02  0.05  0.05  0.01  0.02  0.19  0.01  0.39

Nevertheless, each entry can also be extracted by the $ operator.

res$w
#>  RZM  RZN RZEo RZEn  RZR RZKm RZKd  RZH  RZC  RZS 
#> 0.45 0.20 0.15 0.00 0.10 0.03 0.00 0.00 0.00 0.07
res$b_scaled
#>        RZM        RZN       RZEo        RZR       RZKm        RZS 
#> 0.39887304 0.20882509 0.15191062 0.08378771 0.06243987 0.09416367

3 Case studies

3.1 Index weights

The basic usage of the selection index would be the calculation of index weights (b) to combine several estimated breeding values to one combined selection criterium (I). We can do this for both of our indices as follows:

res_old <- SelInd(w = w_old, G = G,  r2 = r2_old, verbose = FALSE)
res_new <- SelInd(w = w_new, G = G,  r2 = r2_new, verbose = FALSE)

The according index weights would then be:

round(res_old$b,2)
#>  RZM  RZN RZEo  RZR RZKm  RZS 
#> 0.45 0.24 0.17 0.09 0.07 0.11
round(res_new$b,2)
#>  RZM  RZN RZEn  RZR RZKm RZKd  RZH  RZC 
#> 0.36 0.22 0.17 0.07 0.05 0.04 0.23 0.03

Let’s assume an arbitrary individual with following estimated breeding values:


ind <- c(RZM = 130, RZN = 110, RZEo = 105, RZEn = 106, RZR = 95,
         RZKm = 100, RZKd = 101, RZH = 115,  RZC = 90,  RZS = 120)

The RZG_old of this animal would then be calculated as follows:

t(res_old$b_scaled) %*% ind[names(res_old$b_scaled)]
#>          [,1]
#> [1,] 116.2783

The same individual would have a slightly lower RZG given the new index.

t(res_new$b_scaled) %*% ind[names(res_new$b_scaled)]
#>          [,1]
#> [1,] 114.4104

3.2 Expected composition of gain

Further interest might be, how the index translates into genetic gain. We can derive the expected composition of the total genetic gain by extracting d_G_exp_scaled. This is scaled so that the sum of absolute values in the vector sum up to one. Note that genetic gain is also expected in breeding goal traits that are not part of the index as correlated selection response.

res_old <- SelInd(w = w_old, G = G,  r2 = r2_old, verbose = FALSE)
round(res_old$d_G_exp_scaled, 2)
#>  RZM  RZN RZEo RZEn  RZR RZKm RZKd  RZH  RZC  RZS 
#> 0.20 0.16 0.11 0.10 0.05 0.08 0.04 0.14 0.00 0.11

Even though the expected phenotypic trend in natural units equals the expected genotypic trend in natural units, a comparison of phenotypes across traits would usually follow a scaling in phenotypic standard deviations to make traits comparable. Due to different heritabilities of the traits, this also leads to another composition of the genetic gain. To calculate the expected composition of the total phenotypic gain (d_P_exp_scaled), we need to pass also heritabilities to the function. In our example, the phenotypic gain depends relatively more on milk (“RZM”) than the genetic gain, as milk comes with a higher heritability.

h2
#>   RZM   RZN  RZEo  RZEn   RZR  RZKm  RZKd   RZH   RZC   RZS 
#> 0.314 0.090 0.194 0.194 0.013 0.049 0.033 0.061 0.014 0.273
res_old <- SelInd(w = w_old, G = G,  r2 = r2_old, h2 = h2, verbose = FALSE)
round(res_old$d_P_exp_scaled, 2)
#>  RZM  RZN RZEo RZEn  RZR RZKm RZKd  RZH  RZC  RZS 
#> 0.29 0.13 0.13 0.12 0.02 0.05 0.02 0.09 0.00 0.15

The total genetic gain further depends on the selection intensity (i). So if we are interested in the total genetic gain in genetic standard deviations (dG), we need to further assume a selection intensity.

res_old <- SelInd(w = w_old, G = G,  r2 = r2_old, h2 = h2, i = 0.1, verbose = FALSE)
round(res_old$dG, 2)
#> [1] 0.64

We can then also retrieve the expected gain for the single traits (d_G_exp or d_P_exp) scaled in genetic/ phenotypic standard deviations.

round(res_old$d_G_exp, 2)
#>  RZM  RZN RZEo RZEn  RZR RZKm RZKd  RZH  RZC  RZS 
#> 0.83 0.67 0.47 0.44 0.22 0.35 0.16 0.58 0.02 0.45
round(res_old$d_P_exp, 2)
#>  RZM  RZN RZEo RZEn  RZR RZKm RZKd  RZH  RZC  RZS 
#> 0.04 0.02 0.02 0.02 0.00 0.01 0.00 0.01 0.00 0.02

3.3 Changes in expected gain

Of strong practical interest should be the question how the genetic gain changes when the index is changed. This question can either be answered by comparison of different indices, or by calculation of analytic measures.

3.3.1 Comparison of indices

We will first regard a situation where we not only have a numeric change in the economic weights, but also a discrete change in the index traits. Traits in w, but not in r2 have an economic weight of zero, which allows to calculate information on the indirect response through correlations to index traits.

res_old <- SelInd(w = w_old, G = G,  r2 = r2_old, h2 = h2, verbose = FALSE)
res_new <- SelInd(w = w_new, G = G,  r2 = r2_new, h2 = h2, verbose = FALSE)

This then allows to compare the changes between the two indices, and we see that the total gain will be slightly less due to milk (“RZM”), but more due to gain in functional traits.

round(res_new$d_G_exp_scaled - res_old$d_G_exp_scaled,2)
#>   RZM   RZN  RZEo  RZEn   RZR  RZKm  RZKd   RZH   RZC   RZS 
#> -0.06  0.01 -0.01  0.00  0.02 -0.01  0.02  0.03  0.01 -0.01

The same can also be done for the phenotypic trend, where this effect is even stronger due to the higher heritability of Milk.

round(res_new$d_P_exp_scaled - res_old$d_P_exp_scaled,2)
#>   RZM   RZN  RZEo  RZEn   RZR  RZKm  RZKd   RZH   RZC   RZS 
#> -0.07  0.02  0.00  0.01  0.01  0.00  0.01  0.03  0.00  0.00

3.3.2 Use of analytic measures

If we have a certain index and want to see which effect the single traits have on the composition of the total index, several analytic measures might help.

res_old <- SelInd(w = w_old, G = G,  r2 = r2_old, verbose = FALSE)

First one would be the correlation between the overall index \(I\) and a phenotype in trait \(j\) (\(r_{I,y_j}\); r_IP ). It reflects, to what extent a trait contributes to the response in overall genetic merit.

round(res_old$r_IP, 2)
#>  RZM  RZN RZEo  RZR RZKm  RZS 
#> 0.80 0.68 0.49 0.25 0.37 0.43

Second one is the loss of accuracy of prediction if trait \(j\) is omitted from the index (\(\frac{r_{IH_{-j}}}{r_{IH}}\); r_IH), which reflects the contribution of trait \(j\) to the overall selection objective.

round(res_old$r_IH, 2)
#>  RZM  RZN RZEo  RZR RZKm  RZS 
#> 0.30 0.05 0.03 0.01 0.01 0.01

Further, the first derivative of \(d\) with respect to \(w\) (\(\frac{\delta d}{\delta w}\); del_d_scaled) tells us how a change in the economic weight in some trait would affect the composition of the genetic trend. Note that the values are scaled so that the sum of the absolute values within a row sums up to one. The first row thereby tells us that a change of the weight for “RZM” mainly affects the gain in “RZM”, but only moderately affects the functional traits. A change in the weight for “RZN” (second row) also indirectly affects e.g. “RZR”.

round(res_old$del_d_scaled, 2)
#>        RZM   RZN  RZEo  RZEn   RZR  RZKm  RZKd   RZH   RZC   RZS
#> RZM   0.15 -0.15 -0.09 -0.10 -0.14 -0.06 -0.02 -0.14 -0.01 -0.13
#> RZN  -0.16  0.23 -0.01  0.02  0.15  0.04  0.05  0.18  0.04  0.12
#> RZEo -0.11 -0.01  0.36  0.33 -0.03 -0.02 -0.05  0.02 -0.05  0.02
#> RZEn -0.13  0.03  0.33  0.31 -0.01 -0.02 -0.04  0.05 -0.04  0.06
#> RZR  -0.14  0.15 -0.02  0.00  0.31  0.12  0.05  0.13  0.02  0.04
#> RZKm -0.09  0.06 -0.03 -0.02  0.19  0.49 -0.01  0.07  0.02  0.01
#> RZKd -0.08  0.16 -0.13 -0.10  0.18 -0.03  0.07  0.13  0.04  0.08
#> RZH  -0.15  0.19  0.01  0.04  0.14  0.05  0.04  0.17  0.03  0.18
#> RZC  -0.06  0.19 -0.16 -0.13  0.11  0.07  0.05  0.12  0.06  0.04
#> RZS  -0.14  0.13  0.02  0.05  0.05  0.01  0.02  0.19  0.01  0.39

Since V0.2.0.0, we calculate an approximate derivative by incrementing the weight of one trait slightly, while reducing the weight of the other traits accordingly, to be able to account for the side restriction that \(\sum w = 0\).

3.4 Realized gain

A further interesting feature of the method is to compare the expected composition of the gain with the observed genetic trend. Note that we subset w and G to the index traits, to have a matching scaling.

res_old <- SelInd(w = w_old[w_old>0], G = G[w_old>0,w_old>0],  r2 = r2_old, verbose = FALSE)
round(deltautr - res_old$d_G_exp_scaled, 3)
#>    RZM    RZN   RZEo    RZR   RZKm    RZS 
#>  0.007 -0.007  0.022  0.027 -0.031 -0.018

This shows us that the resulting trend matches the expected well with probably slightly more weight on exterior (“RZEo”) and reproduction (“RZR”).

We can also check what this meant for the economic and index weights by calculating “realized weights” given the observed gains

res_old <- SelInd(
  w = w_old[w_old>0],
  G = G[w_old>0, w_old>0],
  r2 = r2_old,
  d_G_obs = deltautr,
  verbose = FALSE)
round(res_old$w_real - res_old$w, 2)
#>   RZM   RZN  RZEo   RZR  RZKm   RZS 
#> -0.05 -0.11  0.03  0.12 -0.10 -0.03
round(res_old$b_real - res_old$b_scaled, 2)
#>   RZM   RZN  RZEo   RZR  RZKm   RZS 
#> -0.01 -0.05  0.04  0.10 -0.08 -0.03

This reveals that there was effectively slightly more weight on “RZEo” and “RZR” in practical breeding decisions than suggested by the index.

3.5 Correlated residuals

The previous section assumed uncorrelated residual errors, which might not hold in reality. SelInd() therefore also allows to include the residual variance-covariance matrix of the traits (H).

res_old_corRes <- SelInd(
  w = w_old[w_old>0],
  G = G[w_old>0, w_old>0],
  r2 = r2_old,
  d_G_obs = deltautr,
  H = H,
  verbose = FALSE)

Modelling those residual correlations between the traits actually increases the fit between expectation and observation:

round(res_old$w_real - res_old$w, 2) # uncorrelated residuals
#>   RZM   RZN  RZEo   RZR  RZKm   RZS 
#> -0.05 -0.11  0.03  0.12 -0.10 -0.03
round(res_old_corRes$w_real - res_old_corRes$w, 2) # correlated residuals
#>   RZM   RZN  RZEo   RZR  RZKm   RZS 
#> -0.05 -0.06  0.01  0.08 -0.08  0.00