xLDenricher | R Documentation |
xLDenricher
is supposed to conduct LD-based enrichment analysis
for the input genomic region data (genome build h19), using genomic
annotations (eg active chromatin, transcription factor binding
sites/motifs, conserved sites). Enrichment analysis is achieved by
comparing the observed overlaps against the expected overlaps which are
estimated from the null distribution. The null LD block is generated
via sampling from the background (for example, all GWAS SNPs or all
common SNPs), respecting the maf of the best SNP and/or the distance of
the best SNP to the nearest gene, restricting the same chromosome or
not.
xLDenricher(bLD, GR.SNP = c("dbSNP_GWAS", "dbSNP_Common", "dbSNP_Single"), num.samples = 2000, respect = c("maf", "distance", "both"), restrict.chr = F, preserve = c("exact", "boundary"), seed = 825, p.adjust.method = c("BH", "BY", "bonferroni", "holm", "hochberg", "hommel"), GR.annotation = NA, verbose = T, RData.location = "http://galahad.well.ox.ac.uk/bigdata")
bLD |
a bLD object, containing a set of blocks based on which to generate a null distribution |
GR.SNP |
the genomic regions of SNPs. By default, it is 'dbSNP_GWAS', that is, SNPs from dbSNP (version 150) restricted to GWAS SNPs and their LD SNPs (hg19). It can be 'dbSNP_Common', that is, Common SNPs from dbSNP (version 150) plus GWAS SNPs and their LD SNPs (hg19). Alternatively, the user can specify the customised GR object directly |
num.samples |
the number of samples randomly generated |
respect |
how to respect the properties of to-be-sampled LD blocks. It can be one of 'maf' (respecting the maf of the best SNP), 'distance' (respecting the distance of the best SNP to the nearest gene), and 'both' (respecting the maf and distance) |
restrict.chr |
logical to restrict to the same chromosome. By default, it sets to false |
preserve |
how to preserve the resulting null LD block. It can be one of 'boundary' (preserving the boundary of the LD block), and 'exact' (exactly preserving the relative SNP locations within the LD block). Notably, no huge difference for the boundary preserving when enrichment analysis invovles region-based genomic annotations, but it may make difference when genomic annatations are largely SNP-based (such as eQTLs) |
seed |
an integer specifying the seed |
p.adjust.method |
the method used to adjust p-values. It can be one of "BH", "BY", "bonferroni", "holm", "hochberg" and "hommel". The first two methods "BH" (widely used) and "BY" control the false discovery rate (FDR: the expected proportion of false discoveries amongst the rejected hypotheses); the last four methods "bonferroni", "holm", "hochberg" and "hommel" are designed to give strong control of the family-wise error rate (FWER). Notes: FDR is a less stringent condition than FWER |
GR.annotation |
the genomic regions of annotation data. By
default, it is 'NA' to disable this option. Pre-built genomic
annotation data are detailed in |
verbose |
logical to indicate whether the messages will be displayed in the screen. By default, it sets to false for no display |
RData.location |
the characters to tell the location of built-in
RData files. See |
a data frame with 13 columns:
name
: the annotation name
nAnno
: the number of regions from annotation data
nOverlap
: the observed number of LD blocks overlapped with
annotation data
fc
: fold change
zscore
: z-score
pvalue
: p-value
adjp
: adjusted p-value. It is the p value but after being
adjusted for multiple comparisons
or
: a vector containing odds ratio
CIl
: a vector containing lower bound confidence interval
for the odds ratio
CIu
: a vector containing upper bound confidence interval
for the odds ratio
nData
: the number of input LD blocks
nExpect
: the expected number of LD blocks overlapped with
annotation data
std
: the standard deviation of expected number of LD
blocks overlapped with annotation data
Pre-built genomic annotation data are detailed in
xDefineGenomicAnno
.
xDefineGenomicAnno
# Load the XGR package and specify the location of built-in data library(XGR) RData.location <- "http://galahad.well.ox.ac.uk/bigdata" ## Not run: # a) provide the seed SNPs with the significance info ## load ImmunoBase data(ImmunoBase) ## get lead SNPs reported in AS GWAS and their significance info (p-values) gr <- ImmunoBase$AS$variant data <- GenomicRanges::mcols(gr)[,c('Variant','Pvalue')] # b) get LD block (EUR population) bLD <- xLDblock(data, include.LD="EUR", LD.r2=0.8, RData.location=RData.location) ## c) perform enrichment analysis using FANTOM expressed enhancers eTerm <- xLDenricher(bLD, GR.annotation="ReMap_Encode_mergedTFBS", RData.location=RData.location) ## d) view enrichment results for the top significant terms xEnrichViewer(eTerm) ## e) barplot of enriched terms bp <- xEnrichBarplot(eTerm, top_num='auto', displayBy="fdr") bp ## f) forest plot of enrichment results gp <- xEnrichForest(eTerm, FDR.cutoff=0.01) ## g) save enrichment results to the file called 'LD_enrichments.txt' output <- xEnrichViewer(eTerm, top_num=length(eTerm$adjp), sortBy="adjp", details=TRUE) utils::write.table(output, file="LD_enrichments.txt", sep="\t", row.names=FALSE) ## h) compare boundary and exact GR.SNP <- xRDataLoader("dbSNP_GWAS", RData.location=RData.location) GR.annotation <- xRDataLoader("FANTOM5_CAT_Cell", RData.location=RData.location) eTerm_boundary <- xLDenricher(bLD, GR.SNP=GR.SNP, GR.annotation=GR.annotation, num.samples=20000, preserve="boundary", RData.location=RData.location) eTerm_exact <- xLDenricher(bLD, GR.SNP=GR.SNP, GR.annotation=GR.annotation, num.samples=20000, preserve="exact", RData.location=RData.location) ls_eTerm <- list(boundary=eTerm_boundary, exact=eTerm_exact) ### barplot bp <- xEnrichCompare(ls_eTerm, displayBy="zscore") ### forest plot eTerm_boundary$group <- 'boundary' eTerm_exact$group <- 'exact' df <- rbind(eTerm_boundary, eTerm_exact) gp <- xEnrichForest(df, FDR.cutoff=0.01) ## End(Not run)