The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.
peprThis vignette will show you how and why to use the derieved
attributes and implied attributes functionalities concurrently of the
pepr package.
For the basic information about the PEP concept on the project website
Make sure to study the dedicated derived attributes and implied attributes vignettes prior to reading this one
While either derived attributes or implied attributes functionalities
alone are often sufficient to efficiently describe your samples in PEP,
the example below demonstrates how to use the derived attributes to
simplify and unclutter the columns of the
sample_table.csv file, after implying the attributes for
samples that follow certain patterns. The two
functionalities combined provide you with the way of building complex,
yet flexible sample annotation tables effortlessly. Note that the
attributes implication is always performed first - before the attributes
are derived. This means that the newly created attributes (implied ones)
can be used to construct the attributes in the column derivation
process. Please consider the example below for reference:
| sample_name | organism | time | file_path |
|---|---|---|---|
| pig_0h | pig | 0 | data/lab/project/pig_susScr11_untreated.fastq |
| pig_1h | pig | 1 | data/lab/project/pig_susScr11_treated.fastq |
| frog_0h | frog | 0 | data/lab/project/frog_xenTro9_untreated.fastq |
| frog_1h | frog | 1 | data/lab/project/frog_xenTro9_treated.fastq |
The specification of detailed file paths/names (as presented above)
is cumbersome. In order to make your life easier just find the patterns
that the file names in file_path column of
sample_table.csv follow, imply needed attributes and derive
the file names. This multi step process is orchestrated by the
project_config.yaml file via the
sample_modifiers.derive and
sample_modifiers.imply sections:
pep_version: 2.0.0
sample_table: sample_table.csv
output_dir: $HOME/hello_looper_results
sample_modifiers:
derive:
attributes: file_path
sources:
source1: /data/lab/project/{organism}_{genome}_{condition}.fastq
imply:
if:
organism: pig
then:
genome: susScr11
if:
organism: frog
then:
genome: xenTro9
if:
time: 0
then:
condition: untreated
if:
time: 1
then:
condition: treated
The *_untreated files are clearly associated with the
samples that are labeled with time 0. Therefore the
untreated attribute is implied for the samples which have 0
in the time columns. Similarly, the codes
susScr11 and xenTro9 are associated with the
attributes in the oragnism column. Therefore, the column
condion that consists of those two codes is implied from
the attributes in the organism column according to the
project_config.yaml.
Let’s introduce a few modifications to the original
sample_table.csv file to imply the attributes
genome and condition and subsequently map the
appropriate data sources from the project_config.yaml with
attributes in the derived column - [file_path]:
| sample_name | organism | time | file_path |
|---|---|---|---|
| pig_0h | pig | 0 | source1 |
| pig_1h | pig | 1 | source1 |
| frog_0h | frog | 0 | source1 |
| frog_1h | frog | 1 | source1 |
Load pepr and read in the project metadata by specifying
the path to the project_config.yaml:
library(pepr)
projectConfig = system.file(
"extdata",
paste0("example_peps-", branch),
"example_derive_imply",
"project_config.yaml",
package = "pepr"
)
p = Project(projectConfig)
## Loading config file: /tmp/RtmpoymTo9/Rinstb3055bff7/pepr/extdata/example_peps-master/example_derive_imply/project_config.yamlAnd inspect it:
sampleTable(p)
## sample_name organism time file_path
## 1: pig_0h pig 0 /data/lab/project/pig_susScr11_untreated.fastq
## 2: pig_1h pig 1 /data/lab/project/pig_susScr11_treated.fastq
## 3: frog_0h frog 0 /data/lab/project/frog_xenTro9_untreated.fastq
## 4: frog_1h frog 1 /data/lab/project/frog_xenTro9_treated.fastq
## genome condition
## 1: susScr11 untreated
## 2: susScr11 treated
## 3: xenTro9 untreated
## 4: xenTro9 treatedAs you can see, the resulting samples are annotated the same way as
if they were read from the original, unwieldy, annotations file
(enriched with the genome and condition
attributes that were implied).
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.