demonstration

Thijs Janzen

2024-01-12

Using treestats

The treestats package provides an easy to use interface to calculate summary statistics on phylogenetic trees. To obtain a list of all supported summary statistics use:

list_statistics()
##  [1] "gamma"                  "sackin"                 "colless"               
##  [4] "beta"                   "blum"                   "crown_age"             
##  [7] "tree_height"            "pigot_rho"              "number_of_lineages"    
## [10] "nltt_base"              "phylogenetic_div"       "avg_ladder"            
## [13] "max_ladder"             "cherries"               "il_number"             
## [16] "pitchforks"             "stairs"                 "laplace_spectrum_a"    
## [19] "laplace_spectrum_p"     "laplace_spectrum_e"     "laplace_spectrum_g"    
## [22] "imbalance_steps"        "j_one"                  "b1"                    
## [25] "b2"                     "area_per_pair"          "average_leaf_depth"    
## [28] "i_stat"                 "ew_colless"             "max_del_width"         
## [31] "max_depth"              "max_width"              "rogers"                
## [34] "stairs2"                "tot_coph"               "var_depth"             
## [37] "symmetry_nodes"         "mpd"                    "psv"                   
## [40] "vpd"                    "mntd"                   "j_stat"                
## [43] "rquartet"               "wiener"                 "max_betweenness"       
## [46] "max_closeness"          "diameter"               "eigenvector"           
## [49] "mean_branch_length"     "var_branch_length"      "mean_branch_length_int"
## [52] "mean_branch_length_ext" "var_branch_length_int"  "var_branch_length_ext"

If your favourite summary statistic is missing, please let the maintainer know, treestats is a dynamic package always under development, and the maintainers are always looking for new statistics!

Given a phylogenetic tree, you can now use of the available functions to calculate your summary statistic of choice. Let’s take for instance the Colless statistic (and we generate a dummy tree):

phy <- ape::rphylo(n = 100, birth = 1, death = 0.1)

treestats::colless(phy)
## [1] 272

Looking at the documentation of the colless statistic (?colless), we find that the function also includes options to normalize for size: either ‘pda’ or ‘yule’:

treestats::colless(phy, normalization = "yule")
## [1] -0.7692387

Multiple statistics

The treestats package supports calculating many statistics in one go. For this, several functions have been set up aptly. Firstly, the function calc_all_stats will calculate all statistics:

all_stats <- calc_all_stats(phy)

This generates a named list, which can be very useful to find your focal statistics, but often a conversion into a vector may be more interesting (we use unlist and omit as.vector to retain the names):

unlist(all_stats)
##                  gamma                 sackin                colless 
##           2.826523e+00           7.720000e+02           2.720000e+02 
##                   beta                   blum              crown_age 
##           5.517578e-01           1.109022e+02           4.801707e+00 
##            tree_height              pigot_rho     number_of_lineages 
##           4.801707e+00           1.771838e-01           1.000000e+02 
##              nltt_base       phylogenetic_div             avg_ladder 
##           8.144651e-01           8.908840e+01           2.600000e+00 
##             max_ladder               cherries              il_number 
##           3.000000e+00           3.500000e+01           3.000000e+01 
##             pitchforks                 stairs     laplace_spectrum_a 
##           1.600000e+01           6.161616e-01          -1.295555e+00 
##     laplace_spectrum_p     laplace_spectrum_e     laplace_spectrum_g 
##           6.839989e+00           7.485156e+00           2.000000e+00 
##        imbalance_steps                  j_one                     b1 
##           9.000000e+01           8.606031e-01           5.406865e+01 
##                     b2          area_per_pair     average_leaf_depth 
##           5.619141e+00           1.296364e+01           7.720000e+00 
##                 i_stat             ew_colless          max_del_width 
##           4.780108e-01           4.493819e-01           1.000000e+01 
##              max_depth              max_width                 rogers 
##           1.100000e+01           3.600000e+01           6.100000e+01 
##                stairs2               tot_coph              var_depth 
##           6.544133e-01           6.129000e+03           3.132929e+00 
##         symmetry_nodes                    mpd                    psv 
##           6.100000e+01           7.819722e+00           3.909861e+00 
##                    vpd                   mntd                 j_stat 
##           6.371447e+00           7.086605e-01           7.819722e-02 
##               rquartet                 wiener        max_betweenness 
##           4.590837e+06           1.379732e+05           1.288700e+04 
##          max_closeness               diameter            eigenvector 
##           1.167785e-03           2.200000e+01           6.499207e-01 
##     mean_branch_length      var_branch_length mean_branch_length_int 
##           4.499414e-01           2.681197e-01           5.475039e-01 
## mean_branch_length_ext  var_branch_length_int  var_branch_length_ext 
##           3.543302e-01           3.476349e-01           1.742629e-01

Similarly, we can also blanket apply all balance associated summary statistics:

balance_stats <- calc_balance_stats(phy)
unlist(balance_stats)
##             sackin            colless               beta               blum 
##       7.720000e+02       2.720000e+02       5.517578e-01       1.109022e+02 
##         avg_ladder         max_ladder           cherries          il_number 
##       2.600000e+00       3.000000e+00       3.500000e+01       3.000000e+01 
##         pitchforks             stairs                 b1                 b2 
##       1.600000e+01       6.161616e-01       5.406865e+01       5.619141e+00 
##      area_per_pair average_leaf_depth             i_stat         ew_colless 
##       1.296364e+01       7.720000e+00       4.780108e-01       4.493819e-01 
##      max_del_width          max_depth          max_width             rogers 
##       1.000000e+01       1.100000e+01       3.600000e+01       6.100000e+01 
##            stairs2           tot_coph          var_depth     symmetry_nodes 
##       6.544133e-01       6.129000e+03       3.132929e+00       6.100000e+01 
##           rquartet    imbalance_steps              j_one           diameter 
##       4.590837e+06       9.000000e+01       8.606031e-01       2.200000e+01