gsEasy
has a function gset
for calculating p-values of enrichment for sets (of genes) in ranked/scored lists (of genes) by permutation (see ‘Gene Set Enrichment Analysis’ described by Subramanian et al, 2005). gset
, which requires arguments N
, the total number of genes and S
, the ranks of the genes in the test set amongst the N. An optional vector r
of length N
with scores, e.g. gene expression correlation, in order of rank can be passed. If unspecified, it defaults to 1-(i-1)/N
for the i
th gene. Finally, a numeric value p
, used to weight the enrichment scores given by r
can be passed (for more details, see Subramanian et al, 2005). The default value is 1.
#highly enriched... the set of ranks are relatively high out of 1000
gset(S=1:5 * 2, N=1000)
## [1] 1e-05
#random sets...
replicate(n=10, expr=gset(S=sample.int(n=1000, size=5), N=1000))
## [1] 0.06602839 0.62686567 0.86069652 0.92537313 0.18905473 0.28855721
## [7] 0.11940299 0.19402985 0.17412935 0.03287000
Alternatively, you can pass the names of genes as S
with a sorted list of gene names as r
(in which case the scores default to the ranks in the list), or a numeric vector of scores named by genes as r
.
gset(S=c("gene 1", "gene 5", "gene 40"), r=paste("gene", 1:100))
## [1] 0.08935361
Multiple gene sets can thus be tested for enrichment with a single call to a high level function such as sapply
(or, if you have many sets to test and multiple cores available, mclapply
), for instance:
gene_sets <- c(list(1:5), replicate(n=10, simplify=FALSE, expr=sample.int(n=1000, size=5)))
names(gene_sets) <- c("enriched set", paste("unenriched set", 1:10))
gene_sets
## $`enriched set`
## [1] 1 2 3 4 5
##
## $`unenriched set 1`
## [1] 533 428 519 988 457
##
## $`unenriched set 2`
## [1] 34 494 137 467 330
##
## $`unenriched set 3`
## [1] 454 21 623 262 794
##
## $`unenriched set 4`
## [1] 353 205 361 985 492
##
## $`unenriched set 5`
## [1] 420 454 49 738 911
##
## $`unenriched set 6`
## [1] 649 926 803 396 446
##
## $`unenriched set 7`
## [1] 965 916 582 247 605
##
## $`unenriched set 8`
## [1] 349 416 960 204 694
##
## $`unenriched set 9`
## [1] 499 263 17 575 325
##
## $`unenriched set 10`
## [1] 186 369 390 641 352
sapply(gene_sets, function(set) gset(S=set, N=1000))
## enriched set unenriched set 1 unenriched set 2 unenriched set 3
## 0.0000100 0.4477612 0.3233831 0.7263682
## unenriched set 4 unenriched set 5 unenriched set 6 unenriched set 7
## 0.2437811 0.6069652 0.9353234 0.8308458
## unenriched set 8 unenriched set 9 unenriched set 10
## 0.5124378 0.4179104 0.3432836
gsEasy
has a function get_ontological_gene_sets
for creating lists of gene sets defined by annotation with ontological terms, such that ontological is-a relations are propagated. get_ontological_gene_sets
accepts an ontological_index
argument and two character vectors, corresponding to genes and terms respectively, whereby the n-th element in each vector corresponds to one annotation pair. The result, a list of character vectors of gene names, can then be used as an argument of gset
.
library(ontologyIndex)
data(hpo)
df <- data.frame(
gene=c("gene 1", "gene 2"),
term=c("HP:0000598", "HP:0000118"),
name=hpo$name[c("HP:0000598", "HP:0000118")],
stringsAsFactors=FALSE,
row.names=NULL)
df
## gene term name
## 1 gene 1 HP:0000598 Abnormality of the ear
## 2 gene 2 HP:0000118 Phenotypic abnormality
get_ontological_gene_sets(hpo, gene=df$gene, term=df$term)
## $`HP:0000001`
## [1] "gene 1" "gene 2"
##
## $`HP:0000118`
## [1] "gene 1" "gene 2"
##
## $`HP:0000598`
## [1] "gene 1"
get_GO_gene_sets
is a specialisation of get_ontological_gene_sets
for the Gene Ontology (GO) which can be called passing just a file path to the annotation file (official version available at http://geneontology.org/gene-associations/gene_association.goa_human.gz).