The myTAI
package provides analytics tools for datasets fulfilling the PhyloExpressionSet and DivergenceExpressionSet standard. To obtain this data format a PhyloExpressionSet or DivergenceExpressionSet resembles the combination of a Phylostratigraphic Map and an Expressionset (PhyloExpressionSet) or the combination of a Divergence Map and an Expressionset (DivergenceExpressionSet).
The computation of a Phylostratigraphic Map relies on a method named Phylostratigraphy. The computation of a Divergence Map relies on a method named Divergence Stratigraphy. Both methods are computationally expensive and include many methodologies and evolutionary concepts. Nevertheless, the orthologr package aims to automate Divergence Stratigraphy and can be used to obtain a Divergence Map for a query organism of interest.
# install orthologr from CRAN
install.packages("orthologr")
A divergence map quantifies for each protein coding gene of a given organism the degree of selection pressure. The selection pressure is quantified by dNdS estimation.
To perform divergence stratigraphy
using orthologr
you need the following prerequisites
In the following example, we will use Arabidopsis thaliana as query organism and Arabidopsis lyrata as subject organism.
First, we need to download the CDS sequences for all protein coding genes of A. thaliana and A. lyrata.
The CDS retrieval can be done using a Terminal
or by manual downloading the files
Arabidopsis_thaliana.TAIR10.23.cds.all.fa.gz
Arabidopsis_lyrata.v.1.0.23.cds.all.fa.gz
# download CDS file of A. thaliana
curl ftp://ftp.ensemblgenomes.org/pub/
plants/release-23/fasta/arabidopsis_thaliana/
cds/Arabidopsis_thaliana.TAIR10.23.cds.all.fa.gz
-o Arabidopsis_thaliana.TAIR10.23.cds.all.fa.gz
# download CDS file of A. lyrata
curl ftp://ftp.ensemblgenomes.org/pub/plants/
release-23/fasta/arabidopsis_lyrata/cds/
Arabidopsis_lyrata.v.1.0.23.cds.all.fa.gz
-o Arabidopsis_lyrata.v.1.0.23.cds.all.fa.gz
When the download is finished you need to unzip the files and then start R to perform the following analyses:
library(orthologr)
# compute the divergence map of A. thaliana
Athaliana_DM <- divergence_stratigraphy(
query_file = "path/to/Arabidopsis_thaliana.TAIR10.23.cds.all.fa",
subject_file = "path/to/Arabidopsis_lyrata.v.1.0.23.cds.all.fa",
eval = "1E-5", ortho_detection = "RBH",
comp_cores = 1, quiet = TRUE, clean_folders = TRUE )
Note, that you can specify the comp_cores
argument in case you work with an multicore machine.
The next step is to combine the divergence map
of A. thaliana (Athaliana_DM
) with an gene expression set covering a developmental process of interest (in our case A. thaliana embryogenesis). We obtain an example gene expression set covering A. thaliana embryogenesis from the ExpressionMatrix stored in PhyloExpressionSetExample
. This results in an standard DivergenceExpressionSet object.
# load the PhyloExpressionSetExample data set
data(PhyloExpressionSetExample)
# get the ExpressionMatrix covering A. thaliana embryogenesis.
ExprMatrix <- PhyloExpressionSetExample[ , 2:9]
# match the divergence map with the gene expression set of A. thaliana
# to obtain an PhyloExpressionSet object
Ath_PhyloExpressionSet <- MatchMap(Map = Athaliana_DM,ExpressionMatrix = ExprMatrix)
This way you can create any PhyloExpressionSet of interest. In this example, the output of Ath_PhyloExpressionSet
should be analogous to PhyloExpressionSetExample
.
# load the PhyloExpressionSetExample data set
data(PhyloExpressionSetExample)
# look at PhyloExpressionSetExample
head(PhyloExpressionSetExample)
Phylostratum GeneID Zygote Quadrant Globular Heart Torpedo Bent Mature
1 1 at1g01040.2 2173.6352 1911.2001 1152.5553 1291.4224 1000.2529 962.9772 1696.4274
2 1 at1g01050.1 1501.0141 1817.3086 1665.3089 1564.7612 1496.3207 1114.6435 1071.6555
3 1 at1g01070.1 1212.7927 1233.0023 939.2000 929.6195 864.2180 877.2060 894.8189
4 1 at1g01080.2 1016.9203 936.3837 1181.3381 1329.4734 1392.6429 1287.9746 861.2605
5 1 at1g01090.1 11424.5667 16778.1685 34366.6493 39775.6405 56231.5689 66980.3673 7772.5617
6 1 at1g01120.1 844.0414 787.5929 859.6267 931.6180 942.8453 870.2625 792.7542