The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.
Rtropical
packageIn this vignette, we will demonstrate the main capabilities of the Rtropical
package. We’ll show the pipeline to analyze phylogenetic tree data with these methods.
We start by importing the Rtropical
library.
library(Rtropical)
library(ape)
We will carry out tropical SVM with the simulated tree data in the package. This data set contains 300 trees with the first 150 assumed coming from one category and the rest from the other. We firstly prepare the data by transforming it into data matrix and splitting it into training and testing set.
set.seed(101)
data(sim_trees)
= do.call("rbind", lapply(sim_trees, as.vector))
treevecs = as.factor(rep(c(1, 2), each = nrow(treevecs)/2))
labels # generate training data set
= sample(1: nrow(treevecs), nrow(treevecs)*0.8)
trn_ind = treevecs[trn_ind, ]
x = labels[trn_ind]
y
# generate testing data set
= treevecs[-trn_ind, ]
newx = labels[-trn_ind] newy
Next, we run the tropical svm.
# run tropical svm
= Sys.time()
start <- tropsvm(x, y, auto.assignment = TRUE)
trop_fit = Sys.time()
end # predict for testing data
<- predict(trop_fit, newx)
trop_pred # compute classification accuracy
sum(as.vector(trop_pred) == newy)/length(newy)
#> [1] 0.45
print(paste("The running time is: ", round(end - start, digits = 3), "s", sep = ""))
#> [1] "The running time is: 0.238s"
The accuracy seems to be worse because this function does not tune for a good classification method. In the cases when tropsvm
fails, we recommend to use cv.tropsvm
. This function automatically carries out cross-validation to improve performance. For the ease of computation, we set nassignment=100
. However, users can set 500 to reach an accuracy up to 90% with running time approximately 4.5 mins.
# tropical svm with cross-validation
= Sys.time()
start <- cv.tropsvm(x, y, nassignment = 100, parallel = TRUE)
cv_trop_fit = Sys.time()
end <- predict(cv_trop_fit, newx)
cv_trop_pred # compute classification accuracy for testing data
sum(cv_trop_pred == newy)/length(newy)
#> [1] 0.75
print(paste("The running time is: ", round(end - start, digits = 3), "min", sep = ""))
#> [1] "The running time is: 55.371min"
We can also run svm
from e1071
as a comparison:
<- e1071::svm(x, y)
svm_fit <- predict(svm_fit, newx)
svm_pred sum(svm_pred == newy)/length(newy)
#> [1] 0.5833333
Now we analyze some actual phylogenetic tree data with tropical principal component analysis.
data(apicomplexa)
<- as.matrix(apicomplexa, parallel = TRUE)
treevecs = troppca.poly(treevecs) pca_fit
For second order principal component and the projection of data points, we can visualize on a 2D plane by isometric transformation.
plot(pca_fit, fw = T)
Fermat-weber point can be regarded as the “tropical mean” of a data set. It minimizes the sum of distance to each point in a data set, which is also the zero-th order tropical principal component analysis. Unlike tropical PCA with higher order, tropical fermat-weber point can be computed deterministically by tropFW
.
tropFW(treevecs)
#> $fw
#> [1] 0.7815775 0.7815775 0.8904941 0.7815775 0.7815775 0.7815775 0.7815775
#> [8] 0.7815775 0.7815775 0.9524738 0.9665469 0.7815775 0.7815775 0.7815775
#> [15] 0.8353162 0.7815775 0.8837696 0.7815775 0.7815775 0.7815775 0.7815775
#> [22] 0.7815775 0.4808759 0.6561294 0.6561294 0.6561294 0.6561294 0.0000000
#>
#> $distsum
#> [1] 434.4991
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.