demonstration

Thijs Janzen

2024-01-30

Using treestats

The treestats package provides an easy to use interface to calculate summary statistics on phylogenetic trees. To obtain a list of all supported summary statistics use:

list_statistics()
##  [1] "gamma"                  "sackin"                 "colless"               
##  [4] "beta"                   "blum"                   "crown_age"             
##  [7] "tree_height"            "pigot_rho"              "number_of_lineages"    
## [10] "nltt_base"              "phylogenetic_div"       "avg_ladder"            
## [13] "max_ladder"             "cherries"               "il_number"             
## [16] "pitchforks"             "stairs"                 "laplace_spectrum_a"    
## [19] "laplace_spectrum_p"     "laplace_spectrum_e"     "laplace_spectrum_g"    
## [22] "imbalance_steps"        "j_one"                  "b1"                    
## [25] "b2"                     "area_per_pair"          "average_leaf_depth"    
## [28] "i_stat"                 "ew_colless"             "max_del_width"         
## [31] "max_depth"              "max_width"              "rogers"                
## [34] "stairs2"                "tot_coph"               "var_depth"             
## [37] "symmetry_nodes"         "mpd"                    "psv"                   
## [40] "vpd"                    "mntd"                   "j_stat"                
## [43] "rquartet"               "wiener"                 "max_betweenness"       
## [46] "max_closeness"          "diameter"               "eigenvector"           
## [49] "mean_branch_length"     "var_branch_length"      "mean_branch_length_int"
## [52] "mean_branch_length_ext" "var_branch_length_int"  "var_branch_length_ext"

If your favourite summary statistic is missing, please let the maintainer know, treestats is a dynamic package always under development, and the maintainers are always looking for new statistics!

Given a phylogenetic tree, you can now use of the available functions to calculate your summary statistic of choice. Let’s take for instance the Colless statistic (and we generate a dummy tree):

phy <- ape::rphylo(n = 100, birth = 1, death = 0.1)

treestats::colless(phy)
## [1] 238

Looking at the documentation of the colless statistic (?colless), we find that the function also includes options to normalize for size: either ‘pda’ or ‘yule’:

treestats::colless(phy, normalization = "yule")
## [1] -1.109239

Multiple statistics

The treestats package supports calculating many statistics in one go. For this, several functions have been set up aptly. Firstly, the function calc_all_stats will calculate all statistics:

all_stats <- calc_all_stats(phy)

This generates a named list, which can be very useful to find your focal statistics, but often a conversion into a vector may be more interesting (we use unlist and omit as.vector to retain the names):

unlist(all_stats)
##                  gamma                 sackin                colless 
##          -4.152725e-01           7.540000e+02           2.380000e+02 
##                   beta                   blum              crown_age 
##           1.081738e+00           1.098696e+02           5.332899e+00 
##            tree_height              pigot_rho     number_of_lineages 
##           5.332899e+00           8.397294e-02           1.000000e+02 
##              nltt_base       phylogenetic_div             avg_ladder 
##           7.790139e-01           1.178496e+02           2.166667e+00 
##             max_ladder               cherries              il_number 
##           3.000000e+00           3.500000e+01           3.000000e+01 
##             pitchforks                 stairs     laplace_spectrum_a 
##           1.600000e+01           6.060606e-01          -8.071243e-01 
##     laplace_spectrum_p     laplace_spectrum_e     laplace_spectrum_g 
##           4.218895e+00           7.573827e+00           1.000000e+00 
##        imbalance_steps                  j_one                     b1 
##           8.700000e+01           8.811480e-01           5.435123e+01 
##                     b2          area_per_pair     average_leaf_depth 
##           5.855957e+00           1.276687e+01           7.540000e+00 
##                 i_stat             ew_colless          max_del_width 
##           4.599563e-01           4.373691e-01           1.000000e+01 
##              max_depth              max_width                 rogers 
##           1.200000e+01           3.800000e+01           6.000000e+01 
##                stairs2               tot_coph              var_depth 
##           6.651382e-01           5.725000e+03           2.917576e+00 
##         symmetry_nodes                    mpd                    psv 
##           6.000000e+01           8.422200e+00           4.211100e+00 
##                    vpd                   mntd                 j_stat 
##           7.645229e+00           1.302264e+00           8.422200e-02 
##               rquartet                 wiener        max_betweenness 
##           4.917915e+06           1.442562e+05           1.154700e+04 
##          max_closeness               diameter            eigenvector 
##           1.086454e-03           2.200000e+01           6.761475e-01 
##     mean_branch_length      var_branch_length mean_branch_length_int 
##           5.952001e-01           3.689483e-01           5.381270e-01 
## mean_branch_length_ext  var_branch_length_int  var_branch_length_ext 
##           6.511318e-01           3.545903e-01           3.803586e-01

Similarly, we can also blanket apply all balance associated summary statistics:

balance_stats <- calc_balance_stats(phy)
unlist(balance_stats)
##             sackin            colless               beta               blum 
##       7.540000e+02       2.380000e+02       1.081738e+00       1.098696e+02 
##         avg_ladder         max_ladder           cherries          il_number 
##       2.166667e+00       3.000000e+00       3.500000e+01       3.000000e+01 
##         pitchforks             stairs                 b1                 b2 
##       1.600000e+01       6.060606e-01       5.435123e+01       5.855957e+00 
##      area_per_pair average_leaf_depth             i_stat         ew_colless 
##       1.276687e+01       7.540000e+00       4.599563e-01       4.373691e-01 
##      max_del_width          max_depth          max_width             rogers 
##       1.000000e+01       1.200000e+01       3.800000e+01       6.000000e+01 
##            stairs2           tot_coph          var_depth     symmetry_nodes 
##       6.651382e-01       5.725000e+03       2.917576e+00       6.000000e+01 
##           rquartet    imbalance_steps              j_one           diameter 
##       4.917915e+06       8.700000e+01       8.811480e-01       2.200000e+01