xGScoreAdv | R Documentation |
xGScoreAdv
is supposed to calculate per base scores for an input
list of genomic regions (genome build 19), using genomic annotations
(eg genomic segments, active chromatin, transcription factor binding
sites/motifs, conserved sites). The per base scores are calculated for
overlaps with each genomic annotation. Scores for genomic
regions/variants can be constraint/conservation or
impact/pathogenicity.
xGScoreAdv(data, format = c("data.frame", "bed", "chr:start-end", "GRanges"), build.conversion = c(NA, "hg38.to.hg19", "hg18.to.hg19"), GS.annotation = c("fitCons", "phastCons", "phyloP", "mcap", "cadd"), GR.annotation = NA, details = F, verbose = T, RData.location = "http://galahad.well.ox.ac.uk/bigdata")
data |
input genomic regions (GR). If formatted as "chr:start-end" (see the next parameter 'format' below), GR should be provided as a vector in the format of 'chrN:start-end', where N is either 1-22 or X, start (or end) is genomic positional number; for example, 'chr1:13-20'. If formatted as a 'data.frame', the first three columns correspond to the chromosome (1st column), the starting chromosome position (2nd column), and the ending chromosome position (3rd column). If the format is indicated as 'bed' (browser extensible data), the same as 'data.frame' format but the position is 0-based offset from chromomose position. If the genomic regions provided are not ranged but only the single position, the ending chromosome position (3rd column) is allowed not to be provided. The data could also be an object of 'GRanges' (in this case, formatted as 'GRanges') |
format |
the format of the input data. It can be one of "data.frame", "chr:start-end", "bed" or "GRanges" |
build.conversion |
the conversion from one genome build to another. The conversions supported are "hg38.to.hg19" and "hg18.to.hg19". By default it is NA (no need to do so) |
GS.annotation |
which genomic scores (GS) annotaions used. It can be 'fitCons' (the probability of fitness consequences for point mutations; http://www.ncbi.nlm.nih.gov/pubmed/25599402), 'phastCons' (the probability that each nucleotide belongs to a conserved element/negative selection [0,1]), 'phyloP' (conservation at individual sites representing -log p-values under a null hypothesis of neutral evolution, positive scores for conservation and negative scores for acceleration), 'mcap' (eliminating a majority of variants with uncertain significance in clinical exomes at high sensitivity: http://www.ncbi.nlm.nih.gov/pubmed/27776117), and 'cadd' (combined annotation dependent depletion for estimating relative levels of pathogenicity of potential human variants: http://www.ncbi.nlm.nih.gov/pubmed/24487276) |
GR.annotation |
the genomic regions of annotation data. By
default, it is 'NA' to disable this option. Pre-built genomic
annotation data are detailed in |
details |
logical to indicate whether the detailed information (ie ratio) is returned. By default, it sets to false for no inclusion |
verbose |
logical to indicate whether the messages will be displayed in the screen. By default, it sets to false for no display |
RData.location |
the characters to tell the location of built-in
RData files. See |
a data frame with 6 columns:
name
: the annotation name
o_nBase
: the number of bases overlapped between input
regions and annotation regions
o_GS
: the per base genomic scores for overlaps between
input regions and annotation regions
a_nBase
: the number of bases covered by that annotation;
optional, it is only appended when "details" is true
a_GS
: the per base genomic scores for that annotation;
optional, it is only appended when "details" is true
ratio
: ratio of o_GS divided by a_GS; optional, it is only
appended when "details" is true
Pre-built genomic annotation data are detailed in
xDefineGenomicAnno
.
xGScore
## Not run: # Load the XGR package and specify the location of built-in data library(XGR) RData.location <- "http://galahad.well.ox.ac.uk/bigdata" # a) provide the genomic regions ## load ImmunoBase ImmunoBase <- xRDataLoader(RData.customised='ImmunoBase', RData.location=RData.location) ## get lead SNPs reported in AS GWAS data <- ImmunoBase$AS$variant # b) in terms of overlaps with genomic segments (Primary monocytes from peripheral blood) ## fitness consequence score res_df <- xGScoreAdv(data=data, format="GRanges", GS.annotation="fitCons", GR.annotation="EpigenomeAtlas_15Segments_E029", RData.location=RData.location) ## phastCons conservation score res_df <- xGScoreAdv(data=data, format="GRanges", GS.annotation="phastCons", GR.annotation="EpigenomeAtlas_15Segments_E029", RData.location=RData.location) # c) in terms of overlaps with genic annotations ## phyloP conservation score res_df <- xGScoreAdv(data=data, format="GRanges", GS.annotation="phyloP", GR.annotation="Genic_anno", RData.location=RData.location) ## End(Not run)