The myTAI package provides analytics tools for datasets fulfilling the PhyloExpressionSet and DivergenceExpressionSet standard. To obtain this data format a PhyloExpressionSet or DivergenceExpressionSet resembles the combination of a Phylostratigraphic Map and an Expressionset (PhyloExpressionSet) or the combination of a Divergence Map and an Expressionset (DivergenceExpressionSet).

The computation of a Phylostratigraphic Map relies on a method named Phylostratigraphy. The computation of a Divergence Map relies on a method named Divergence Stratigraphy. Both methods are computationally expensive and include many methodologies and evolutionary concepts. Nevertheless, the orthologr package aims to automate Divergence Stratigraphy and can be used to obtain a Divergence Map for a query organism of interest.

Fast Installation guide for orthologr

# install orthologr from CRAN
install.packages("orthologr")

Retrieving a divergence map

A divergence map quantifies for each protein coding gene of a given organism the degree of selection pressure. The selection pressure is quantified by dNdS estimation.

To perform divergence stratigraphy using orthologr you need the following prerequisites

In the following example, we will use Arabidopsis thaliana as query organism and Arabidopsis lyrata as subject organism.

First, we need to download the CDS sequences for all protein coding genes of A. thaliana and A. lyrata.

The CDS retrieval can be done using a Terminal or by manual downloading the files


# download CDS file of A. thaliana
curl ftp://ftp.ensemblgenomes.org/pub/
plants/release-23/fasta/arabidopsis_thaliana/
cds/Arabidopsis_thaliana.TAIR10.23.cds.all.fa.gz 
-o Arabidopsis_thaliana.TAIR10.23.cds.all.fa.gz

# download CDS file of A. lyrata

curl ftp://ftp.ensemblgenomes.org/pub/plants/
release-23/fasta/arabidopsis_lyrata/cds/
Arabidopsis_lyrata.v.1.0.23.cds.all.fa.gz 
-o Arabidopsis_lyrata.v.1.0.23.cds.all.fa.gz

When the download is finished you need to unzip the files and then start R to perform the following analyses:

library(orthologr)

# compute the divergence map of A. thaliana
Athaliana_DM <- divergence_stratigraphy(
                         query_file = "path/to/Arabidopsis_thaliana.TAIR10.23.cds.all.fa",
                         subject_file = "path/to/Arabidopsis_lyrata.v.1.0.23.cds.all.fa",
                         eval = "1E-5", ortho_detection = "RBH",
                         comp_cores = 1, quiet = TRUE, clean_folders = TRUE )

Note, that you can specify the comp_cores argument in case you work with an multicore machine.

The next step is to combine the divergence map of A. thaliana (Athaliana_DM) with an gene expression set covering a developmental process of interest (in our case A. thaliana embryogenesis). We obtain an example gene expression set covering A. thaliana embryogenesis from the ExpressionMatrix stored in PhyloExpressionSetExample. This results in an standard DivergenceExpressionSet object.

# load the PhyloExpressionSetExample data set
data(PhyloExpressionSetExample)

# get the ExpressionMatrix covering A. thaliana embryogenesis.
ExprMatrix <- PhyloExpressionSetExample[ , 2:9]

# match the divergence map with the gene expression set of A. thaliana
# to obtain an PhyloExpressionSet object
Ath_PhyloExpressionSet <- MatchMap(Map = Athaliana_DM,ExpressionMatrix = ExprMatrix)

This way you can create any PhyloExpressionSet of interest. In this example, the output of Ath_PhyloExpressionSet should be analogous to PhyloExpressionSetExample.

# load the PhyloExpressionSetExample data set
data(PhyloExpressionSetExample)

# look at PhyloExpressionSetExample
head(PhyloExpressionSetExample)

  Phylostratum      GeneID     Zygote   Quadrant   Globular      Heart    Torpedo       Bent    Mature
1            1 at1g01040.2  2173.6352  1911.2001  1152.5553  1291.4224  1000.2529   962.9772 1696.4274
2            1 at1g01050.1  1501.0141  1817.3086  1665.3089  1564.7612  1496.3207  1114.6435 1071.6555
3            1 at1g01070.1  1212.7927  1233.0023   939.2000   929.6195   864.2180   877.2060  894.8189
4            1 at1g01080.2  1016.9203   936.3837  1181.3381  1329.4734  1392.6429  1287.9746  861.2605
5            1 at1g01090.1 11424.5667 16778.1685 34366.6493 39775.6405 56231.5689 66980.3673 7772.5617
6            1 at1g01120.1   844.0414   787.5929   859.6267   931.6180   942.8453   870.2625  792.7542