The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.
Reference conditioning models generate expression data conditioned on a real reference sample. This allows you to “anchor” to an existing expression profile while applying perturbations or modifications.
This is useful when you want to:
gem-1-bulk_reference-conditioning:
Bulk RNA-seq reference conditioning modelgem-1-sc_reference-conditioning:
Single-cell RNA-seq reference conditioning modelNote: These endpoints may require 1-2 minutes of startup time if they have been scaled down. Plan accordingly for interactive use.
Reference conditioning encodes the biological and technical characteristics from a real expression sample, then generates new expression data that:
Reference conditioning queries require different inputs than baseline models:
# Get the example query structure
example_query <- get_example_query(model_id = "gem-1-bulk_reference-conditioning")$example_query
# Inspect the query structure
str(example_query)The query structure includes:
inputs: A list where each input
contains:
counts: The reference expression
counts (a named list with a counts vector)metadata: Perturbation-only metadata
(see below)num_samples: How many samples to
generateconditioning: Which latent spaces
to condition on (typically
["biological", "technical"])
sampling_strategy:
"mean estimation" or
"sample generation"
Unlike baseline models, reference conditioning queries only accept perturbation metadata fields:
perturbation_ontology_idperturbation_typeperturbation_timeperturbation_doseAll other biological and technical metadata is inferred from the reference expression.
Here’s a complete example simulating a drug treatment effect on a reference sample:
# Start with example query structure
query <- get_example_query(model_id = "gem-1-bulk_reference-conditioning")$example_query
# Replace with your actual reference counts
# The counts vector must match the model's expected gene order and length
query$inputs[[1]]$counts <- list(counts = your_reference_counts)
# Specify the perturbation
query$inputs[[1]]$metadata <- list(
perturbation_ontology_id = "CHEMBL25", # Aspirin (ChEMBL ID)
perturbation_type = "compound",
perturbation_time = "24h",
perturbation_dose = "10uM"
)
query$inputs[[1]]$num_samples <- 3
# Set the sampling strategy
query$sampling_strategy <- "mean estimation"
# Submit the query
result <- predict_query(query, model_id = "gem-1-bulk_reference-conditioning")Simulate the effect of knocking out a specific gene:
query <- get_example_query(model_id = "gem-1-bulk_reference-conditioning")$example_query
# Your reference sample counts
query$inputs[[1]]$counts <- list(counts = control_sample_counts)
# CRISPR knockout of TP53
query$inputs[[1]]$metadata <- list(
perturbation_ontology_id = "ENSG00000141510", # TP53 Ensembl ID
perturbation_type = "crispr"
)
query$inputs[[1]]$num_samples <- 5
result <- predict_query(query, model_id = "gem-1-bulk_reference-conditioning")Controls which latent spaces are conditioned on the reference.
Default is ["biological", "technical"].
When both are conditioned, the model preserves both biological identity and technical characteristics from the reference sample.
Controls the type of prediction:
Controls whether to preserve the reference’s library size:
FALSE (default): The output’s total
count is taken from the reference expression (sum of its counts). Use
this when you want the synthetic sample to preserve the reference’s
library size.TRUE: Forces the model to use the
total_count parameter value (or default) instead of the
reference’s library size.Library size used when converting predicted log CPM back to raw
counts. Only effective when fixed_total_count = TRUE.
If TRUE, the model uses the mean of each latent
distribution (p(z|metadata) for perturbation,
q(z|x) for conditioned components) instead of sampling.
This produces deterministic, reproducible outputs.
FALSE| Field | Description / Format |
|---|---|
perturbation_ontology_id |
Ensembl gene ID (e.g., ENSG00000141510), ChEBI ID, ChEMBL ID, or NCBI Taxonomy ID |
perturbation_type |
One of: “coculture”, “compound”, “control”, “crispr”, “genetic”, “infection”, “other”, “overexpression”, “peptide or biologic”, “shrna”, “sirna” |
perturbation_time |
Time since perturbation (e.g., “24h”, “48h”) |
perturbation_dose |
Dose of perturbation (e.g., “10uM”, “1mg/kg”) |
The result structure is similar to baseline models:
# Access metadata and expression matrices
metadata <- result$metadata
expression <- result$expression
# Compare to your reference
dim(expression)
head(metadata)When conditioning on both biological and technical latents, you can directly compare the generated expression to your reference to identify perturbation effects:
# Your reference (input) counts
reference_cpm <- your_reference_counts / sum(your_reference_counts) * 1e6
# Generated (perturbed) counts
generated_cpm <- expression[1, ] / sum(expression[1, ]) * 1e6
# Log fold change
log2fc <- log2(generated_cpm + 1) - log2(reference_cpm + 1)
# Identify top changed genes
head(sort(log2fc, decreasing = TRUE), 20)The reference counts vector must match the model’s expected number of genes. If the length doesn’t match, the API will return a validation error.
Use get_example_query() to see the expected structure
and ensure your counts vector has the correct length.
Ensure your reference counts are in the same gene order expected by
the model. The response includes a gene_order field that
specifies the expected order.
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.