This is a complete guide to create a configuration file for all modules supported by the current version of shinyExprPortal. It includes a minimal setup example, as well as optional advanced and customization settings. It assumes that that you have already created an initial configuration file and prepared the expression, measures and lookup table files. If you have not concluded that first step yet, please check the Data Preparation Guide.
YAML syntax: configuration fields are defined with a colon (:). Lists of values are defined using hyphens or square brackets (e.g. [“A”, “B”, “C”]. Besides text, lists can contain complex objects, with fields listed with a key-value pair. Values without spaces do not need double quotes, but it is not a problem to use them. Examples:
# Hyphenated list example
Name:
- Value1
- Value2
- Value3
# Alternative with square brackets
Name2: ["A", "B", "C"]
# Key-value list example
FieldName:
Key1: Value1
"Key2": "Value2"
Key3: Value3
# Hyphen + Key-value
# Each hyphenated group corresponds to a list
FieldName:
- Key1: Value1
Key2: Value2
- Key1: Value3
Key2: Value4
Required settings
About
The name of a text, markdown or HTML file located in the same folder of the app.R file. If a file is not provided, the application will show a default placeholder text.
Optional settings
Name
A short name for a project (e.g. the acronym).
Window title name
If no name is provided, you can still define a windowtitle that will appear in the browser.
Logo
A custom logo for the top-left corner can be provided as any web-supported format (e.g. png, jpeg). If a logo is not provided, the short project name will be used instead.
Bootswatch theme
A custom Bootswatch theme can be used instead of the default one. You can use versions 3, 4 or 5 but there is no guarantee of how the style will look across versions. Although dark themes can be used, the visualizations will not be adapted to them.
Under bootstrap, any other additional setting will be passed to the
bslib::theme()
command, allowing further customization. For
further information, check the bs_theme
documentation for the supported arguments or run
?bslib::theme
for the local help page.
Icon menu
An icon menu with highlights can be displayed above the about
section. For that, module names must be included in a list, and PNG
images, with 1:1 aspect ratio and named after these modules, must be
placed in a www
folder (located in the same folder where
app.R is). For example, to include a highlight shortcut to
singleGeneCorr, the module name must be listed and a file
singleGeneCorr.png placed under a www
folder. The images
will be rendered as 250px x 250px. Example:
In the data section, the measures, sample_lookup and expression_matrix files must be defined. Additional files that are shared across multiple modules can also be defined in this section, e.g. the models table for the DE modules (see degModules further down for more details).
data:
measures: measures.csv
sample_lookup: sample_lookup.csv
expression_matrix: matrix.csv
models: models.tsv
Sample and subject variables
The package does not make any assumptions about the order of the columns containing the subject or sample identifiers in the measures and lookup table files, so the names of these columns must be provided. Alternatively, name them as Subject_ID and Sample_ID and the package will look for these names by default.
Time separator
A custom separator can be used to identify a temporal suffix in
measures names. While the default is _
,
timesep
can be used to define another separator. Currently,
this is only used in the compareTrajGroups
module.
Sample categories
The property sample_categories
define metadata variables
that are used in the interface to select subsets of samples. Each sample
class should be defined through its corresponding variable name in the
sample lookup file, a nice label to be displayed and a list of valid
values for subsetting samples. The valid values can also be assigned
nice labels for display through the format
Nice Label: originalValue
. In this case, hyphens or square
brackets should not be used, as in the example below.
Example:
sample_categories:
- name: variable_name
label: Variable Name
values:
Optional Label 1: value1
Optional Label 2: value2
- name: variable_name_2
label: Another Variable
values:
- anothervar_value1
- anothervar_value2
An advanced option allows selecting all samples for a particular
category. To do that, a custom pair of label and value
All: NA
can be used. This is useful to explore subsets of
samples that do not have a sample-based partition (e.g. a subset of
samples from patients with a high disease burden).
Example:
Default advanced settings
For some modules, the package includes default parameters for some computation methods, such as the correlation metric used. These can be overridden and displayed as options in the portal for users, but the defaults can also be defined in the configuration file and applied to all the modules that would look for them. The following properties can be defined:
5/95 percentiles
method or Tukey’s
IQR
method. Package default is No
. Affects
correlation modules.5/95 percentiles
method or
Tukey’s IQR
method. Package default is No
.
Affects correlation modules.pearson
,
spearman
or kendall
correlation measures.
Default is pearson
. Affects correlation modules.none
, linear
,
quadratic
and cubic
. Affects scatterplots in
singleGeneCorr and compareTrajGropus.holm
, hochberg
,
hommel
, bonferroni
, BH
,
BY
, fdr
, none
or
q.value
. Default is q.value
from the
qvalue
package. Affects correlation and degDetails
modules.Example:
The final part of the configuration file is the definition of the
modules to be included in the portal. These should be specified in the
order that they should appear on the portal menu. The function
show_available_modules()
can be used to check which modules
are currently supported after the package is loaded; this guide is also
up to date and can also be used as reference. The following modules and
grouped modules are currently supported (grouped modules are
pseudo-modules that define sub-menus on the portal):
The package contains suggested dependencies when they are only needed by specific modules. This guide lists all the additional dependencies for each module. The other way to discover an additional dependency is to set up the configuration file and try to run the portal. If a package is missing, the portal will not run and you will be notified about which package dependency is missing.
There are also optional settings supported by all modules, most of them for customization purposes.
title
A custom name for the module can be defined for every module
description A description paragraph can be defined to instruct users on usage of a module or to provide additional information, such as descriptions of the data being visualized or links to external references.
Example:
Modules:
Required packages: {r2d3}
This module displays a grid of small line plots for each subject in the measures file. The line plots show each subjects’ trajectory for various measures over time, with the line colors selected by the user from a set of variables.
Minimum configuration
cohortOverview:
profile_variables:
Platelets:
values: [Platelets_m01, Platelets_m02]
colour_variables: [Age, Platelets_m01, Platelets_m02]
Example:
Optional required packages:
[{RColorBrewer}](https://CRAN.R-project.org/package=RColorBrewer)
This module shows the combined expression and measure trajectory of
subjects over time. The user can select one gene and one measure that
was taken over time. The idea of the module is to split subjects in two
or more groups using a sidebyside_category
from the lookup
table (e.g. two drug groups). Samples for all time values defined by a
trajectory_category
are taken and filtered from a user
selection based on the remaining subset_categories
(e.g. if
the lookup table contains DrugGroup
, Time
and
Tissue
, the only remaining valid category here is
Tissue
). The measures to be paired with the samples are
taken by combining compare_variables
with the time-values
from trajectory_category
. For example, if
Platelets
is one of the possible variables and the times
are m01
and m02
, the trajectory will be
constructed from the observations of Platelets_m01
and
Platelets_m02
in the measures table. The expression of the
selected gene will also be taken using observations of samples in
m01
and m2
. In the interface, users will be
able to select from any of the variables listed under
compare_variables
.
Minimum configuration
compareTrajGroups:
subset_categories:
- Tissue
sidebyside_category: DrugGroup
trajectory_category: Time
compare_variables:
- Platelets
Example:
custom_traj_palette: the name of a RColorBrewer
palette or a list of valid colours (e.g. hex value) can be supplied
using this keyword. The colors will be used in order of appearance to
color the points for each value of trajectory_category
.
advanced: the following advanced options can be defined, otherwise the defaults described earlier in this guide will be used:
TRUE
or
AllowHide
, lets users choose a regression method from the
available ones (linear
, quadratic
or
cubic
). AllowHide
includes a None option in
the interface.*Example of optional settings
This module displays volcano plots and a table with the results of
differential expression analysis exported by packages such as
limma
, deseq2
, and edgeR
. It
depends on a separate table that categorizes the DE results files. This
table should contain a high level category variable that identifies
different types of models and a File
column with the
corresponding file names. Other columns in this table should partially
match the sample classes defined in global, when meaningful. The
following is a valid minimal model results table:
Model | File |
---|---|
Linear | Model_1.txt |
Nonlinear | Model_2.txt |
The following is an example of a more layered model setup:
Model | Time | Drug | File |
---|---|---|---|
Linear | m01 | d1 | Model_1.txt |
Linear | m02 | d2 | Model_2.txt |
Nonlinear | m01 | d1 | Model_3.txt |
Nonlinear | m02 | d2 | Model_4.txt |
The configuration for this module requires a
category_variable
to identify the models and a
models
file with the table; this file can be a
tab-separated or comma-separated values file. Individual model results
should be placed inside a models
folder.
Minimum configuration:
Example:
Required packages:
{knitr}, {kableExtra}
This module displays a summary table of all models included in the
models table (see previous module). It aggregates the number of
significant genes by p-value and adjusted p-values (with 0.05 threshold)
for each model. A partition_variable
, matching a variable
in the models table (e.g. Drug) is used to partition the results
vertically. With two drugs d1 and d2, this means that the table will
contain 4 columns: number of significant d1 p-value genes, d1 adjusted
p-value genes, d2 p-value genes and d2 adjusted p-value genes.
Minimum configuration
Required packages: {RColorBrewer}
This modules enables visualizing a subset of genes, e.g. a co-expression module, cluster or signature, related to a downstream analysis method such as regulatory networks. It displays a heatmap with the expression of the genes from a user-selected module and scatterplots with the association between the module eigengene and measures defined in the configuration file. Gene modules must be specified in a table with a model/analysis category or source, valid sample categories associated with that module, module name and a list of associated genes is required. It is also possible to include annotations in the heatmap by defining a list of valid measures or categorical variables that the user can select. The following is an example of a valid modules table:
Category | Time | Drug | ModuleName | targetGenes | rank |
---|---|---|---|---|---|
Linear | m01 | d1 | ABC (Activated) | PQR,QRS,RST | 1 |
Linear | m01 | d1 | BCD (Inhibited) | QPP,PQQ,RST,TRR,WYX,WEX | 2 |
Linear | m01 | d1 | CDE (Activated) | QQZ,ZZE,YYZ,YYE,PPA | 3 |
Linear | m01 | d1 | DEF (Activated) | PP,APP,BBE | 4 |
Linear | m02 | d1 | EFG (Activated) | HJK,JKL,MNJ | 1 |
Linear | m02 | d1 | FGH (Activated) | MNO,NOP,PQR,QRS,RST,STU | 2 |
This module also support generic lists of genes that can be made
visible to any selection of subsets of samples. To define lists of genes
for this purpose, you can use *
as a wildcard in a sample
category column. The following is an example of that in hypothetical
co-expression modules (e.g. outputs from WGCNA or CEMiTool), which
allows the user to explore the same modules for any subset of Time and
Drug.
Category | Time | Drug | ModuleName | targetGenes | rank |
---|---|---|---|---|---|
Coexp | * | * | gray | PQR,QRS,RST | 1 |
Coexp | * | * | red | QPP,PQQ,RST,TRR,WYX,WEX | 2 |
Coexp | * | * | blue | QQZ,ZZE,YYZ,YYE,PPA | 3 |
Coexp | * | * | orange | PP,APP,BBE | 4 |
Coexp | * | * | green | HJK,JKL,MNJ | 1 |
Coexp | * | * | black | MNO,NOP,PQR,QRS,RST,STU | 2 |
The configuration requires a modules_table
pointing to
the table file above, a category_variable
that identifies
the table column with highest-level category of modules, a
modules_variable
that identifies the table column with the
names of the modules and a genes_variable
, which identifies
the table column that contains a list of gene symbols separated by
comma. In the interface, users will select a Category and subsets from
the remaining sample categories on the table (e.g. above, Time and Drug)
to see a list of modules associated with that category and sample
classes. An optional ordering column can be included in the table, and
set in the config file in rank_variable
, to choose a custom
appearance order for each modules, as in the example above.
Minimum configuration
geneModulesHeatmap:
modules_table: modules.csv
category_variable: Category
modules_variable: ModuleName
genes_variable: targetGenes
Example:
subset_categories: list of categories to display as
filter. To be used if not every sample category in the lookup table
should be use to create subsets of samples. For example, if a
pseudo-category was created to split patients based on a clinical
measure, you may not want to show that filter in every module in the
app. scatterplot_variables: a list of measures to
correlate with the module eigengene and display in scatterplots similar
to those in the single gene module.
annotation_variables: a list of measures can be
supplied so that users can add annotations above the heatmap. The
variable names should match the measures table, independent of their
timepoint.
custom_annotation_colors: by default, the heatmap
package (the plotly-powered iheatmapr) will automatically assign colors
for the annotations. Alternatively, an RColorBrewer palette or a list of
colors (e.g. hex values) can be assigned to selected individual
variables (a variable with no custom palette will use a default
palette). For continuous variables, interpolation will be applied to a
list of colors.
annotation_range: for numeric variables that share a
theoretical range of values, but this range varies in observed data, it
is possible to define individual ranges so that the color scale is the
same across annotations. For each annotation variable, a minimum and
maximum value can be provided through a list. A common use case is for
disease activity variables with a maximum value that decreases over time
as patients improve.
custom_heatmap_palette: a custom RColorBrewer palette
can be provided to override the default RdBu
palette used
in the heatmap.
Example of optional settings:
geneModulesHeatmap:
...
subset_categories:
- Cell_Type
scatterplot_variables:
- Platelets_m01
- Platelets_m02
annotation_variables:
- Age
- Platelets_m01
- Platelets_m02
- diseaseActivity_m01
- diseaseActivity_m02
custom_annotation_colors:
drugNaive:
Yes: yellow
No: green
Age: Reds
annotation_range:
diseaseActivity_m01: [0, 99]
diseaseActivity_m02: [0, 99]
custom_heatmap_palette: BrBG
This module allow users to explore expression variation in pre-computed clusters of genes across subsets of subjects, following a pre-computed 2D projection of the genes in the whole dataset. This module was developed considering a 1:1 subject-to-sample ratio rather than multiple samples per subject. It display the 2D projection in two scatterplots: one displays the points colored by the group membership and the other displays the mean expression of the gene relative to the mean expression of all genes (i.e. a fold change). Sample categories can be used to define subsets of samples based on a characteristic of the subjects, such as a disease activity measure. The scatterplot will then show the expression in variation for that subset only. The coordinates file must contain gene symbols, two columns for position and a column for group. The first three must appear in that exact order, as the following:
gene | x | y | group |
---|---|---|---|
KKPRS | -0.0291362 | 0.6624692 | 1 |
CPTSX | -1.0577354 | -0.3607612 | 2 |
QJPST | -0.5526884 | -0.7241874 | 1 |
QSGTR | -0.3204210 | -0.4011870 | 1 |
PSVCA | 1.4322795 | 0.0085155 | 3 |
KKPRS | 0.4017958 | -0.2336005 | 1 |
CPTSX | -1.5929367 | -0.7769150 | 2 |
QJPST | -0.6696105 | -0.0100598 | 1 |
QSGTR | 0.4818681 | -0.1137370 | 1 |
PSVCA | 1.1927303 | 0.9452788 | 3 |
This table can be an .rds, .csv or .tsv file and must be defined in
the coordinates_file
setting. The group column must be
indicated using the group_variable
setting.
Minimum configuration
Example:
The module also includes a heatmap to view the individual expression
of samples for the genes that belong to a selected group, and it can
display annotations based on the measures table. These should be defined
through annotation_variables
, and, as with the previous
module, there is also the option of defining custom colors through
custom_annotation_colors
and fix variable ranges through
annotation_range
.
annotation_variables: a list of measures can be
supplied so that users can add annotations above the heatmap. The
variable names should fully match the measures table names, including
the time point. custom_annotation_colors: by default,
the heatmap package (the plotly-powered iheatmapr) will automatically
assign colors for the annotations. Alternatively, an RColorBrewer
palette or a list of colors (e.g. hex values) can be assigned to
selected individual variables (a variable with no custom palette will
use a default palette). For continuous variables, interpolation will be
applied to a list of colors.
annotation_range: for numeric variables that share a
theoretical range of values but which range varies in observed data, it
is possible to define individual ranges so that the color scale is the
same across annotations. For each annotation variable, a minimum and
maximum value can be provided through a list. A common use case is for
disease activity variables with a maximum value that decreases over time
as patients improve, or blood samples.
Example of optional settings:
This module allows users to compare the expression of a single gene
from a selected subset of samples with observed measures through
scatterplots. Groups of scatterplots are arranged in sub-tabs. Groups
are specified as a list through the tabs
field. Each tab
contains a name
to be displayed, a scale
setting, which can be either free
(each scatterplot has its
own range in the horizontal axis) or fixed
(all
scatterplots in a tab will have the same horizontal axis), and a list of
variables for each scatterplot. This module can be passed as a name to
link_to
in other modules.
Minimum configuration
Example:subset_categories: list of categories to display as filter. To be used if not every sample category in the lookup table should be use to create subsets of samples. For example, if a pseudo-category was created to split patients based on a clinical measure, you may not want to show that filter in every module in the app. color_variables: a list of variables, either numeric or discrete, can be supplied so that users can assign a color to subjects. custom_point_colors: optionally, for the color variables, it is possible to supply custom colors as: valid Vega schemes, list of colors or named list of colors (see DrugNaive in the example below).
advanced: the following advanced options can be defined, otherwise the defaults described earlier in this guide will be used:
TRUE
, allows
users to alternate between Pearson, Spearman and Kendall methods for
computing correlation.TRUE
, allows
users to exclude expression outliers from scatterplots, either through
Tukey’s IQR range filtering, 5/95 percentiles or none.TRUE
, allows
users to exclude measures outliers from scatterplots, either through
Tukey’s IQR range filtering, 5/95 percentiles or none.TRUE
or
AllowHide
, lets users choose a regression method from the
available ones (linear
, quadratic
or
cubic
). AllowHide
includes a None option in
the interface.Example of optional settings:
This module allows users to view a table with the correlation between one single measure and all genes from a selected subset. The module can be set up with a list of measures from which users can select in the interface and the optional advanced settings of correlation method and outlier removal. As a default, the module can be defined simply as TRUE to include all numeric measures and no advanced settings:
Minimum configuration
A list of variables can be provided through
dropdown_variables
:
Example:
subset_categories: list of categories to display as
filter. To be used if not every sample category in the lookup table
should be use to create subsets of samples. For example, if a
pseudo-category was created to split patients based on a clinical
measure, you may not want to show that filter in every module in the
app.
link_to: name of another module to create a link to in
the results table. Every gene on the table will be a clickable link to
the module, with the subset matching the ones selected by the user. The
module linked to should also support this functionality for this setting
to work.
advanced: the following advanced options can be defined, otherwise the defaults described earlier in this guide will be used:
TRUE
, allows
users to alternate between Pearson, Spearman and Kendall methods for
computing correlation.TRUE
, allows
users to exclude expression outliers from scatterplots, either through
Tukey’s IQR range filtering, 5/95 percentiles or none.TRUE
, allows
users to exclude measures outliers from scatterplots, either through
Tukey’s IQR range filtering, 5/95 percentiles or none.Example of optional settings:
This module allows users to view a heatmap and a table with the most
significantly correlated genes across a predefined set of measures. The
genes are taken from a subset selected by user, based on the sample
categories. This module requires the specification of
heatmap_variables
, with multiple named lists of
variables.
Minimum configuration
Example:subset_categories: list of categories to display as filter. To be used if not every sample category in the lookup table should be use to create subsets of samples. For example, if a pseudo-category was created to split patients based on a clinical measure, you may not want to show that filter in every module in the app.
custom_heatmap_scheme: a valid Vega schemes can be
provided to override the default redblue
scheme.
link_to: name of another module to create a link to in the results table. Every gene on the table will be a clickable link to the module, with the subset matching the ones selected by the user. The module linked to should also support this functionality for this setting to work.
advanced: the following advanced options can be defined, otherwise the defaults described earlier in this guide will be used:
TRUE
, allows
users to alternate between Pearson, Spearman and Kendall methods for
computing correlation.TRUE
, allows
users to exclude expression outliers from scatterplots, either through
Tukey’s IQR range filtering, 5/95 percentiles or none.TRUE
, allows
users to exclude measures outliers from scatterplots, either through
Tukey’s IQR range filtering, 5/95 percentiles or none.Example of optional settings:
Each module defined above will appear as an separate entry on the menu. However, some modules share an underlying functionality or purpose, such as the differential expression modules or the correlation modules. Therefore, the package supports a few of these grouped modules that can be specified in the configuration file, and will be grouped in the menu of the website. The existence of these modules is defined internally in the package and is not customisable when deploying a project.
This group of modules includes degDetails
and
degSummary
. As both modules require a model table and
files, when using degModules it is possible to specify the model file as
part of the data section of the configuration and avoid reading all
models twice. Please note below the indentation that is required when
defining the sub-modules.
Configuration example