This vignette gives users the summary information of API functions provided by UCSCXenaTools for UCSC Xena.

Before using API, user should know some concepts about Xena elements. Following description is copied from xenaPython __init__.py.

Data rows are associated with “sample” IDs.

Sample IDs are unique within a “cohort”. s A “dataset” is a particular assay of a cohort, e.g. gene expression.

Datasets have associated metadata, specifying their data type and cohort.

There are three primary data types: dense matrix (samples by probes), sparse (sample, position, variant), and segmented (sample, position, value).

Dense matrices can be genotypic or phenotypic. Phenotypic matrices have associated field metadata (descriptive names, codes, etc.). Genotypic matricies may have an associated probeMap, which maps probes to genomic locations. If a matrix has hugo probeMap, the probes themselves are gene names. Otherwise, a probeMap is used to map a gene location to a set of probes.

API categories

API functions can be divided into two classes: lower API functions and higher API functions. They have following difference:

Lower API functions

Lower API functions also have 2 classes:

  • one is generated from .xq files, function names all start with .p_. All .xq files are copied from xenaPython package, which is official Python API for Xena. These functions are dynamicly created when UCSCXenaTools loaded. Their names are given as following:

    #>  [1] ".p_all_cohorts"                    
    #>  [2] ".p_all_datasets"                   
    #>  [3] ".p_all_datasets_n"                 
    #>  [4] ".p_all_field_metadata"             
    #>  [5] ".p_cohort_samples"                 
    #>  [6] ".p_cohort_summary"                 
    #>  [7] ".p_dataset_fetch"                  
    #>  [8] ".p_dataset_field"                  
    #>  [9] ".p_dataset_field_examples"         
    #> [10] ".p_dataset_field_n"                
    #> [11] ".p_dataset_gene_probe_avg"         
    #> [12] ".p_dataset_gene_probes_values"     
    #> [13] ".p_dataset_list"                   
    #> [14] ".p_dataset_metadata"               
    #> [15] ".p_dataset_probe_signature"        
    #> [16] ".p_dataset_probe_values"           
    #> [17] ".p_dataset_samples"                
    #> [18] ".p_dataset_samples_ndense_matrix"  
    #> [19] ".p_datasets_null_rows"             
    #> [20] ".p_feature_list"                   
    #> [21] ".p_field_codes"                    
    #> [22] ".p_field_metadata"                 
    #> [23] ".p_gene_transcripts"               
    #> [24] ".p_match_fields"                   
    #> [25] ".p_probe_count"                    
    #> [26] ".p_probemap_list"                  
    #> [27] ".p_ref_gene_exons"                 
    #> [28] ".p_ref_gene_position"              
    #> [29] ".p_ref_gene_range"                 
    #> [30] ".p_segment_data_examples"          
    #> [31] ".p_segmented_data_range"           
    #> [32] ".p_sparse_data"                    
    #> [33] ".p_sparse_data_examples"           
    #> [34] ".p_sparse_data_match_field"        
    #> [35] ".p_sparse_data_match_field_slow"   
    #> [36] ".p_sparse_data_match_partial_field"
    #> [37] ".p_sparse_data_range"              
    #> [38] ".p_transcript_expression"
  • the other one is created in package. The function names all start with ., are given as following:

    #> [1] ".host_cohorts"          ".cohort_datasets"      
    #> [3] ".cohort_datasets_count" ".cohort_samples_each"  
    #> [5] ".cohort_samples_any"    ".cohort_samples_all"   
    #> [7] ".dataset_samples_each"  ".dataset_samples_any"  
    #> [9] ".dataset_samples_all"

I don’t know how to write these query sentence for Xena Hubs. So here I want to say thanks to authors of xenaPython and xenaR packages.

API report

API functions in UCSCXenaTools
Original Name Function Name Level Description
cohorts cohorts Higher Return cohorts as character vector
datasets datasets Higher Return datasets as character vector
hosts hosts Higher Return hosts as character vector
samples samples Higher Return samples according to “by” and “how” option
.cohort_datasets .cohort_datasets Lower Return datasets of cohorts
.cohort_datasets_count .cohort_datasets_count Lower Return dataset count of cohorts
.cohort_samples_all .cohort_samples_all Lower Return samples shared by all cohort
.cohort_samples_any .cohort_samples_any Lower Return samples present any cohort
.cohort_samples_each .cohort_samples_each Lower Return samples present in each cohort
.dataset_samples_all .dataset_samples_all Lower Return samples shared by all dataset
.dataset_samples_any .dataset_samples_any Lower Return samples present in any cohort
.dataset_samples_each .dataset_samples_each Lower Return samples present in each dataset
.host_cohorts .host_cohorts Lower Return cohorts of hosts
all_cohorts .p_all_cohorts Lower NA
all_datasets .p_all_datasets Lower NA
all_datasets_n .p_all_datasets_n Lower Count the number datasets with non-null cohort
all_field_metadata .p_all_field_metadata Lower Metadata for all dataset fields (phenotypic datasets)
cohort_samples .p_cohort_samples Lower All samples in cohort
cohort_summary .p_cohort_summary Lower Count datasets per-cohort, excluding the given dataset types
dataset_fetch .p_dataset_fetch Lower Probe values for give samples
dataset_field .p_dataset_field Lower All field (probe) names in dataset
dataset_field_examples .p_dataset_field_examples Lower Field names in dataset, up to
dataset_field_n .p_dataset_field_n Lower Number of fields in dataset
dataset_gene_probe_avg .p_dataset_gene_probe_avg Lower Probe average, per-gene, for given samples
dataset_gene_probes_values .p_dataset_gene_probes_values Lower Probe values in gene, and probe genomic positions, for given samples
dataset_list .p_dataset_list Lower Dataset metadata for datasets in the given cohorts
dataset_metadata .p_dataset_metadata Lower Dataset metadata
dataset_probe_signature .p_dataset_probe_signature Lower Computed probe signature for given samples and weight array
dataset_probe_values .p_dataset_probe_values Lower Probe values for given samples, and probe genomic positions
dataset_samples .p_dataset_samples Lower All samples in dataset (optional limit)
dataset_samples_ndense_matrix .p_dataset_samples_ndense_matrix Lower All samples in dataset (faster, for dense matrix dataset only)
datasets_null_rows .p_datasets_null_rows Lower NA
feature_list .p_feature_list Lower Dataset field names and long titles (phenotypic datasets)
field_codes .p_field_codes Lower Codes for categorical fields
field_metadata .p_field_metadata Lower Metadata for given fields (phenotypic datasets)
gene_transcripts .p_gene_transcripts Lower Gene transcripts
match_fields .p_match_fields Lower Find fields matching names (must be lower-case)
probe_count .p_probe_count Lower NA
probemap_list .p_probemap_list Lower Find probemaps
ref_gene_exons .p_ref_gene_exons Lower Gene model
ref_gene_position .p_ref_gene_position Lower Gene position from gene model
ref_gene_range .p_ref_gene_range Lower Gene models overlapping range
segment_data_examples .p_segment_data_examples Lower Initial segmented data rows, with limit
segmented_data_range .p_segmented_data_range Lower Segmented (copy number) data overlapping range
sparse_data .p_sparse_data Lower Sparse (mutation) data rows for genes
sparse_data_examples .p_sparse_data_examples Lower Initial sparse data rows, with limit
sparse_data_match_field .p_sparse_data_match_field Lower Genes in sparse (mutation) dataset matching given names
sparse_data_match_field_slow .p_sparse_data_match_field_slow Lower Genes in sparse (mutation) dataset matching given names, case-insensitive (names must be lower-case)
sparse_data_match_partial_field .p_sparse_data_match_partial_field Lower Partial match genes in sparse (mutation) dataset
sparse_data_range .p_sparse_data_range Lower Sparse (mutation) data rows overlapping the given range, for the given samples
transcript_expression .p_transcript_expression Lower NA

Of note, I don’t know test all functions generated from .xq files, most of them works. Sometimes functions return you errors or list() may caused by invaild format or bad network, you should try more times. If you make sure there are problems/errors in query procedure, you can check corresponding query variables:

#>  [1] ".xq_all_cohorts"                    
#>  [2] ".xq_all_datasets"                   
#>  [3] ".xq_all_datasets_n"                 
#>  [4] ".xq_all_field_metadata"             
#>  [5] ".xq_cohort_samples"                 
#>  [6] ".xq_cohort_summary"                 
#>  [7] ".xq_dataset_fetch"                  
#>  [8] ".xq_dataset_field"                  
#>  [9] ".xq_dataset_field_examples"         
#> [10] ".xq_dataset_field_n"                
#> [11] ".xq_dataset_gene_probe_avg"         
#> [12] ".xq_dataset_gene_probes_values"     
#> [13] ".xq_dataset_list"                   
#> [14] ".xq_dataset_metadata"               
#> [15] ".xq_dataset_probe_signature"        
#> [16] ".xq_dataset_probe_values"           
#> [17] ".xq_dataset_samples"                
#> [18] ".xq_dataset_samples_ndense_matrix"  
#> [19] ".xq_datasets_null_rows"             
#> [20] ".xq_feature_list"                   
#> [21] ".xq_field_codes"                    
#> [22] ".xq_field_metadata"                 
#> [23] ".xq_gene_transcripts"               
#> [24] ".xq_match_fields"                   
#> [25] ".xq_probe_count"                    
#> [26] ".xq_probemap_list"                  
#> [27] ".xq_ref_gene_exons"                 
#> [28] ".xq_ref_gene_position"              
#> [29] ".xq_ref_gene_range"                 
#> [30] ".xq_segment_data_examples"          
#> [31] ".xq_segmented_data_range"           
#> [32] ".xq_sparse_data"                    
#> [33] ".xq_sparse_data_examples"           
#> [34] ".xq_sparse_data_match_field"        
#> [35] ".xq_sparse_data_match_field_slow"   
#> [36] ".xq_sparse_data_match_partial_field"
#> [37] ".xq_sparse_data_range"              
#> [38] ".xq_transcript_expression"

For example, you’d like to check .p_all_cohorts function, you can take a look at .xq_all_cohorts object.

cat it may give you more easy-to-read format.

Use case

More to do, if you have any suggestion, you can open issue on GitHub.

LICENSE

GPL-3

Please note, code from XenaR package under Apache 2.0 license.