This vignette gives users the summary information of API functions provided by UCSCXenaTools for UCSC Xena.
Before using API, user should know some concepts about Xena elements. Following description is copied from xenaPython __init__.py
.
Data rows are associated with “sample” IDs.
Sample IDs are unique within a “cohort”. s A “dataset” is a particular assay of a cohort, e.g. gene expression.
Datasets have associated metadata, specifying their data type and cohort.
There are three primary data types: dense matrix (samples by probes), sparse (sample, position, variant), and segmented (sample, position, value).
Dense matrices can be genotypic or phenotypic. Phenotypic matrices have associated field metadata (descriptive names, codes, etc.). Genotypic matricies may have an associated probeMap, which maps probes to genomic locations. If a matrix has hugo probeMap, the probes themselves are gene names. Otherwise, a probeMap is used to map a gene location to a set of probes.
API categories
API functions can be divided into two classes: lower API functions and higher API functions. They have following difference:
- The main difference between them is that the target of higher API functions is
XenaHub
object, which is a S4 class built in R. While the targets of lower API functions are Xena hub urls, cohort names or dataset names with character format. TheXenaHub
object can provide more uniform operation methods and can be used to download corresponding datasets quickly and easily (detail see another vignette). - Lower API functions are not registered in package NAMESPACE, so user may not access them after
library(UCSCXenaTools)
, user need to useUCSCXenaTools:::fun_name
instead. - Lower API functions have no help pages, so user cannot find any description about them in R, which means you cannot use
?fun_name
to get help. However, API report part in this vignette shows all avaiable API functions and their short description. - Higher API functions are built on lower API functions, they return more meaningful and easy results for operation. Most of lower API functions return nested lists as results, user need to tidy them before using them in next step.
Lower API functions
Lower API functions also have 2 classes:
one is generated from
.xq
files, function names all start with.p_
. All.xq
files are copied from xenaPython package, which is official Python API for Xena. These functions are dynamicly created when UCSCXenaTools loaded. Their names are given as following:#> [1] ".p_all_cohorts" #> [2] ".p_all_datasets" #> [3] ".p_all_datasets_n" #> [4] ".p_all_field_metadata" #> [5] ".p_cohort_samples" #> [6] ".p_cohort_summary" #> [7] ".p_dataset_fetch" #> [8] ".p_dataset_field" #> [9] ".p_dataset_field_examples" #> [10] ".p_dataset_field_n" #> [11] ".p_dataset_gene_probe_avg" #> [12] ".p_dataset_gene_probes_values" #> [13] ".p_dataset_list" #> [14] ".p_dataset_metadata" #> [15] ".p_dataset_probe_signature" #> [16] ".p_dataset_probe_values" #> [17] ".p_dataset_samples" #> [18] ".p_dataset_samples_ndense_matrix" #> [19] ".p_datasets_null_rows" #> [20] ".p_feature_list" #> [21] ".p_field_codes" #> [22] ".p_field_metadata" #> [23] ".p_gene_transcripts" #> [24] ".p_match_fields" #> [25] ".p_probe_count" #> [26] ".p_probemap_list" #> [27] ".p_ref_gene_exons" #> [28] ".p_ref_gene_position" #> [29] ".p_ref_gene_range" #> [30] ".p_segment_data_examples" #> [31] ".p_segmented_data_range" #> [32] ".p_sparse_data" #> [33] ".p_sparse_data_examples" #> [34] ".p_sparse_data_match_field" #> [35] ".p_sparse_data_match_field_slow" #> [36] ".p_sparse_data_match_partial_field" #> [37] ".p_sparse_data_range" #> [38] ".p_transcript_expression"
the other one is created in package. The function names all start with
.
, are given as following:#> [1] ".host_cohorts" ".cohort_datasets" #> [3] ".cohort_datasets_count" ".cohort_samples_each" #> [5] ".cohort_samples_any" ".cohort_samples_all" #> [7] ".dataset_samples_each" ".dataset_samples_any" #> [9] ".dataset_samples_all"
I don’t know how to write these query sentence for Xena Hubs. So here I want to say thanks to authors of xenaPython and xenaR packages.
API report
Original Name | Function Name | Level | Description |
---|---|---|---|
cohorts | cohorts | Higher | Return cohorts as character vector |
datasets | datasets | Higher | Return datasets as character vector |
hosts | hosts | Higher | Return hosts as character vector |
samples | samples | Higher | Return samples according to “by” and “how” option |
.cohort_datasets | .cohort_datasets | Lower | Return datasets of cohorts |
.cohort_datasets_count | .cohort_datasets_count | Lower | Return dataset count of cohorts |
.cohort_samples_all | .cohort_samples_all | Lower | Return samples shared by all cohort |
.cohort_samples_any | .cohort_samples_any | Lower | Return samples present any cohort |
.cohort_samples_each | .cohort_samples_each | Lower | Return samples present in each cohort |
.dataset_samples_all | .dataset_samples_all | Lower | Return samples shared by all dataset |
.dataset_samples_any | .dataset_samples_any | Lower | Return samples present in any cohort |
.dataset_samples_each | .dataset_samples_each | Lower | Return samples present in each dataset |
.host_cohorts | .host_cohorts | Lower | Return cohorts of hosts |
all_cohorts | .p_all_cohorts | Lower | NA |
all_datasets | .p_all_datasets | Lower | NA |
all_datasets_n | .p_all_datasets_n | Lower | Count the number datasets with non-null cohort |
all_field_metadata | .p_all_field_metadata | Lower | Metadata for all dataset fields (phenotypic datasets) |
cohort_samples | .p_cohort_samples | Lower | All samples in cohort |
cohort_summary | .p_cohort_summary | Lower | Count datasets per-cohort, excluding the given dataset types |
dataset_fetch | .p_dataset_fetch | Lower | Probe values for give samples |
dataset_field | .p_dataset_field | Lower | All field (probe) names in dataset |
dataset_field_examples | .p_dataset_field_examples | Lower | Field names in dataset, up to |
dataset_field_n | .p_dataset_field_n | Lower | Number of fields in dataset |
dataset_gene_probe_avg | .p_dataset_gene_probe_avg | Lower | Probe average, per-gene, for given samples |
dataset_gene_probes_values | .p_dataset_gene_probes_values | Lower | Probe values in gene, and probe genomic positions, for given samples |
dataset_list | .p_dataset_list | Lower | Dataset metadata for datasets in the given cohorts |
dataset_metadata | .p_dataset_metadata | Lower | Dataset metadata |
dataset_probe_signature | .p_dataset_probe_signature | Lower | Computed probe signature for given samples and weight array |
dataset_probe_values | .p_dataset_probe_values | Lower | Probe values for given samples, and probe genomic positions |
dataset_samples | .p_dataset_samples | Lower | All samples in dataset (optional limit) |
dataset_samples_ndense_matrix | .p_dataset_samples_ndense_matrix | Lower | All samples in dataset (faster, for dense matrix dataset only) |
datasets_null_rows | .p_datasets_null_rows | Lower | NA |
feature_list | .p_feature_list | Lower | Dataset field names and long titles (phenotypic datasets) |
field_codes | .p_field_codes | Lower | Codes for categorical fields |
field_metadata | .p_field_metadata | Lower | Metadata for given fields (phenotypic datasets) |
gene_transcripts | .p_gene_transcripts | Lower | Gene transcripts |
match_fields | .p_match_fields | Lower | Find fields matching names (must be lower-case) |
probe_count | .p_probe_count | Lower | NA |
probemap_list | .p_probemap_list | Lower | Find probemaps |
ref_gene_exons | .p_ref_gene_exons | Lower | Gene model |
ref_gene_position | .p_ref_gene_position | Lower | Gene position from gene model |
ref_gene_range | .p_ref_gene_range | Lower | Gene models overlapping range |
segment_data_examples | .p_segment_data_examples | Lower | Initial segmented data rows, with limit |
segmented_data_range | .p_segmented_data_range | Lower | Segmented (copy number) data overlapping range |
sparse_data | .p_sparse_data | Lower | Sparse (mutation) data rows for genes |
sparse_data_examples | .p_sparse_data_examples | Lower | Initial sparse data rows, with limit |
sparse_data_match_field | .p_sparse_data_match_field | Lower | Genes in sparse (mutation) dataset matching given names |
sparse_data_match_field_slow | .p_sparse_data_match_field_slow | Lower | Genes in sparse (mutation) dataset matching given names, case-insensitive (names must be lower-case) |
sparse_data_match_partial_field | .p_sparse_data_match_partial_field | Lower | Partial match genes in sparse (mutation) dataset |
sparse_data_range | .p_sparse_data_range | Lower | Sparse (mutation) data rows overlapping the given range, for the given samples |
transcript_expression | .p_transcript_expression | Lower | NA |
Of note, I don’t know test all functions generated from .xq
files, most of them works. Sometimes functions return you errors or list()
may caused by invaild format or bad network, you should try more times. If you make sure there are problems/errors in query procedure, you can check corresponding query variables:
#> [1] ".xq_all_cohorts"
#> [2] ".xq_all_datasets"
#> [3] ".xq_all_datasets_n"
#> [4] ".xq_all_field_metadata"
#> [5] ".xq_cohort_samples"
#> [6] ".xq_cohort_summary"
#> [7] ".xq_dataset_fetch"
#> [8] ".xq_dataset_field"
#> [9] ".xq_dataset_field_examples"
#> [10] ".xq_dataset_field_n"
#> [11] ".xq_dataset_gene_probe_avg"
#> [12] ".xq_dataset_gene_probes_values"
#> [13] ".xq_dataset_list"
#> [14] ".xq_dataset_metadata"
#> [15] ".xq_dataset_probe_signature"
#> [16] ".xq_dataset_probe_values"
#> [17] ".xq_dataset_samples"
#> [18] ".xq_dataset_samples_ndense_matrix"
#> [19] ".xq_datasets_null_rows"
#> [20] ".xq_feature_list"
#> [21] ".xq_field_codes"
#> [22] ".xq_field_metadata"
#> [23] ".xq_gene_transcripts"
#> [24] ".xq_match_fields"
#> [25] ".xq_probe_count"
#> [26] ".xq_probemap_list"
#> [27] ".xq_ref_gene_exons"
#> [28] ".xq_ref_gene_position"
#> [29] ".xq_ref_gene_range"
#> [30] ".xq_segment_data_examples"
#> [31] ".xq_segmented_data_range"
#> [32] ".xq_sparse_data"
#> [33] ".xq_sparse_data_examples"
#> [34] ".xq_sparse_data_match_field"
#> [35] ".xq_sparse_data_match_field_slow"
#> [36] ".xq_sparse_data_match_partial_field"
#> [37] ".xq_sparse_data_range"
#> [38] ".xq_transcript_expression"
For example, you’d like to check .p_all_cohorts
function, you can take a look at .xq_all_cohorts
object.
.xq_all_cohorts
#> [1] ";allCohorts\n(fn [exclude]\n\t(map :cohort\n\t (query\n\t\t{:select [[#sql/call [:distinct #sql/call [:ifnull :cohort \"(unassigned)\"]] :cohort]]\n\t\t :from [:dataset]\n\t\t :where [:not [:in :type exclude]]})))\n"
cat
it may give you more easy-to-read format.
Use case
More to do, if you have any suggestion, you can open issue on GitHub.
LICENSE
GPL-3
Please note, code from XenaR package under Apache 2.0 license.