The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.

Introduction to seeker

Jake Hughey

2024-08-26

RNA-seq data

The seeker package is designed to be a wrapper around various command-line and R-based tools. The main function is, well, seeker(), which is targeted at processing bulk RNA-seq data. seeker()’s main argument is a list of parameters specifying which steps of RNA-seq data processing to perform and how to perform them. The list of parameters can come from a yaml file, an example of which is shown below.

study: 'PRJNA600892' # [string]
metadata:
  run: TRUE # [logical]
  bioproject: 'PRJNA600892' # [string]
  include:
    # [named list or NULL]
    colname: 'run_accession' # [string]
    values: ['SRR10876945', 'SRR10876946'] # [vector]
  # exclude # [named list or NULL]
    # colname # [string]
    # values # [vector]
fetch:
  run: TRUE # [logical]
  # keep # [logical or NULL]
  # overwrite # [logical or NULL]
  # keepSra # [logical or NULL]
  # prefetchCmd # [string or NULL]
  # prefetchArgs # [character vector or NULL]
  # fasterqdumpCmd # [string or NULL]
  # fasterqdumpArgs # [character vector or NULL]
  # pigzCmd # [string or NULL]
  # pigzArgs # [character vector or NULL]
trimgalore:
  run: TRUE # [logical]
  # keep # [logical or NULL]
  # cmd # [string or NULL]
  # args # [character vector or NULL]
  # pigzCmd # [string or NULL]
fastqc:
  run: TRUE # [logical]
  # keep # [logical or NULL]
  # cmd # [string or NULL]
  # args # [character vector or NULL]
salmon:
  run: TRUE # [logical]
  indexDir: '~/refgenie_genomes/alias/mm10/salmon_partial_sa_index/default' # [string]
  # sampleColname # [string or NULL]
  # keep # [logical or NULL]
  # cmd # [string or NULL]
  # args # [character vector or NULL]
multiqc:
  run: TRUE # [logical]
  # cmd # [string or NULL]
  # args # [character vector or NULL]
tximport:
  run: TRUE # [logical]
  tx2gene:
    # [named list or NULL]
    organism: 'mmusculus' # [string]
    # version # [number or NULL]
    # filename # [string or NULL]
  countsFromAbundance: 'lengthScaledTPM' # [string]
  # ignoreTxVersion # [logical or NULL]

An empty template yaml file is available at system.file('extdata', 'params_template.yml', package = 'seeker'). You can copy these yaml files to your working directory like so:

for (filename in c('PRJNA600892.yml', 'params_template.yml')) {
  file.copy(system.file('extdata', filename, package = 'seeker'), '.')}

If you’ve already installed the system dependencies, such as with installSysDeps(), a basic way to run seeker() is then:

library('seeker')
doParallel::registerDoParallel()

yamlPath = 'PRJNA600892.yml'
params = yaml::read_yaml(yamlPath)
seeker(params)

Beware even this minimal example could take some time.

Microarray data

Here you can use the seekerArray() function, which can process data from NCBI GEO and ArrayExpress, and can process raw Affymetrix data stored locally. The main arguments are study and geneIdType. For example:

library('seeker')

study = 'GSE25585'
geneIdType = 'entrez'
seekerArray(study, geneIdType)

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.