The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.
Sometimes one may wish to compare trees whose tip sets only partially overlap: that is, certain leaves are missing in one tree or the other.
Whilst this process is computationally trivial, interpreting the resultant distances can require a little thought.
Let’s generate some simple trees to follow this through in practice:
library("TreeTools", quietly = TRUE)
balAL <- BalancedTree(letters[1:12])
balAJ <- DropTip(balAL, letters[11:12])
balCL <- DropTip(balAL, letters[1:2])
balGL <- DropTip(balAL, letters[1:6])
pecAL <- PectinateTree(letters[1:12])
pecAF <- DropTip(pecAL, letters[7:12])
pecAJ <- DropTip(pecAL, letters[11:12])
pecCL <- DropTip(pecAL, letters[1:2])
treeList <- list(balAL = balAL, balAJ = balAJ, balCL = balCL, balGL = balGL,
pecAL = pecAL, pecAJ = pecAJ, pecCL = pecCL, pecAF = pecAF)
# Define a function to plot two trees
Plot2 <- function(t1, t2, ..., main2 = "") {
oPar <- par(mfrow = c(1, 2), mar = rep(1, 4), cex = 0.9)
on.exit(par(oPar)) # Restore original parameters
plot(t1, ...)
plot(t2, main = main2)
}
First let’s consider the scenario where we are identifying two
identical trees – except that some leaves present in one tree (here,
a
and b
) are missing from the other.
Plot2(balAL, balCL,
main = "balAL", main2 = "balCL",
font = c(rep(4, 2), rep(3, 10)) # Emphasize missing tips
)
From an information theoretic perspective, all the information present in our reduced tree is also present in the complete tree. The information held in common between the trees is thus equal to the information held in common if only the common leaves are retained:
library("TreeDist")
commonTips <- intersect(TipLabels(balAL), TipLabels(balCL))
# How much information is in tree balCL?
ClusteringEntropy(balCL)
## [1] 5.780608
## [1] 5.780608
# balAL also contains information about leaves A and B,
# so contains more information in total
ClusteringEntropy(balAL)
## [1] 6.845202
Some information is held in common between any pair of trees
with more than a few leaves. Two random trees with many leaves may thus
have more information in common than two identical trees with only a few
leaves, simply because they have more information overall. As such, it
is clearly necessary to perform some form of normalization before
comparing tree distances. The obvious choice – and the default, if
normalize = TRUE
– is to normalize against the maximum
similarity that could be obtained, given the set of leaves that a pair
of trees have in common. A NaN
result is returned when a
pair of trees has no leaves in common.
# Normalized
normalized <- MutualClusteringInfo(treeList, normalize = TRUE)
heatmap(normalized, symm = TRUE)
A more nuanced perspective may be obtained by computing the range of distances that could in principle be obtained if missing tips were added to each tree. This is not yet implemented in ‘TreeDist’.
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.