The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.

Type: Package
Title: Comprehensive Reproducibility Framework for R and Bioinformatics Analysis
Version: 0.2.0
Description: A comprehensive reproducibility framework designed for R and bioinformatics workflows. Automatically captures the entire analysis environment including R session info, package versions, external tool versions ('Samtools', 'STAR', 'BWA', etc.), 'conda' environments, reference genomes, data provenance with smart checksumming for large files, parameter choices, random seeds, and hardware specifications. Generates executable scripts with 'Docker', 'Singularity', and 'renv' configurations. Integrates with workflow managers ('Nextflow', 'Snakemake', 'WDL', 'CWL') to ensure complete reproducibility of computational research workflows.
License: MIT + file LICENSE
Encoding: UTF-8
Depends: R (≥ 4.0.0)
Imports: renv, jsonlite, digest, yaml, cli, utils
Suggests: testthat (≥ 3.0.0)
BugReports: https://github.com/SAADAT-Abu/Capsule/issues
RoxygenNote: 7.3.3
NeedsCompilation: no
Packaged: 2025-11-06 13:18:57 UTC; saadat
Author: Abu SAADAT ORCID iD [aut, cre] (affiliation: Università degli Studi della Campania Luigi Vanvitelli)
Maintainer: Abu SAADAT <saadatabu1996@gmail.com>
Repository: CRAN
Date/Publication: 2025-11-11 10:00:29 UTC

Build Dependency Graph

Description

Internal function to build a dependency graph for packages

Usage

.build_dependency_graph(packages)

Arguments

packages

Character vector of package names

Value

List representing dependency relationships


Calculate Checksum with Smart Algorithm Selection

Description

Internal function to calculate file checksum using appropriate algorithm based on file size. Large files use faster xxHash, small files use SHA-256.

Usage

.calculate_checksum(file_path, fast_hash, size_threshold_gb, file_size_gb)

Arguments

file_path

Character. Path to file

fast_hash

Logical. Whether to use fast hashing for large files

size_threshold_gb

Numeric. Size threshold in GB

file_size_gb

Numeric. Actual file size in GB

Value

List with checksum and algorithm


Compare Package Lists

Description

Internal function to compare package versions between two snapshots

Usage

.compare_packages(pkg1, pkg2)

Compare Registry Files

Description

Internal function to compare registry files

Usage

.compare_registries(file1, file2, type)

Create Capsule config

Description

Create Capsule config

Usage

.create_config()

Create example workflow script

Description

Create example workflow script

Usage

.create_example_script()

Create .gitignore

Description

Create .gitignore

Usage

.create_gitignore()

Extract Package Information

Description

Internal function to extract detailed package information

Usage

.extract_package_info(pkg_list)

Arguments

pkg_list

List of package info from sessionInfo()

Value

List of package details


Format R Value for Script

Description

Internal function to format R values as code strings

Usage

.format_r_value(value)

Arguments

value

Any R value

Value

Character representation


Format File Size

Description

Internal function to format file size in human-readable format

Usage

.format_size(bytes)

Arguments

bytes

Numeric. Size in bytes

Value

Character. Formatted size


Generate Comparison Report

Description

Internal function to generate markdown comparison report

Usage

.generate_comparison_report(
  snap1,
  snap2,
  meta1,
  meta2,
  pkg_diff,
  param_diff,
  data_diff,
  seed_diff
)

Generate docker-compose.yml

Description

Internal function to generate docker-compose configuration

Usage

.generate_docker_compose(output_file, project_name, include_rstudio)

Generate Docker README

Description

Internal function to generate Docker usage instructions

Usage

.generate_docker_readme(output_file, project_name, include_rstudio)

Generate Dockerfile

Description

Internal function to generate Dockerfile content

Usage

.generate_dockerfile(output_file, r_version, base_image, system_deps)

Generate .dockerignore

Description

Internal function to generate .dockerignore file

Usage

.generate_dockerignore(output_file)

Get FASTA Statistics

Description

Internal function to extract basic statistics from a FASTA file

Usage

.get_fasta_stats(fasta_path)

Arguments

fasta_path

Character. Path to FASTA file

Value

List containing FASTA statistics


Get Tool Path

Description

Internal function to get the full path to a command-line tool

Usage

.get_tool_path(tool)

Arguments

tool

Character. Tool name

Value

Character. Full path or NA


Get Tool Version String

Description

Internal function to get version string from a command-line tool

Usage

.get_tool_version(tool)

Arguments

tool

Character. Tool name

Value

Character. Version string or "not installed"


Load Conda Registry

Description

Internal function to load the conda registry

Usage

.load_conda_registry(registry_file)

Arguments

registry_file

Character. Path to registry file

Value

List containing registry data


Load Parameter Registry

Description

Load Parameter Registry

Usage

.load_param_registry(registry_file)

Load Reference Registry

Description

Internal function to load the reference registry

Usage

.load_reference_registry(registry_file)

Arguments

registry_file

Character. Path to registry file

Value

List containing registry data


Load Provenance Registry

Description

Internal function to load the provenance registry

Usage

.load_registry(registry_file)

Arguments

registry_file

Character. Path to registry file

Value

List containing registry data


Load Seed Registry

Description

Load Seed Registry

Usage

.load_seed_registry(registry_file)

Load Tools Registry

Description

Internal function to load the tools registry

Usage

.load_tools_registry(registry_file)

Arguments

registry_file

Character. Path to registry file

Value

List containing registry data


Parse Package Dependencies

Description

Internal function to parse package dependency strings

Usage

.parse_deps(dep_string)

Arguments

dep_string

Character. Dependency string from package description

Value

Character vector of package names


Save Conda Registry

Description

Internal function to save the conda registry

Usage

.save_conda_registry(registry, registry_file)

Arguments

registry

List. Registry data to save

registry_file

Character. Path to registry file


Save Parameter Registry

Description

Save Parameter Registry

Usage

.save_param_registry(registry, registry_file)

Save Reference Registry

Description

Internal function to save the reference registry

Usage

.save_reference_registry(registry, registry_file)

Arguments

registry

List. Registry data to save

registry_file

Character. Path to registry file


Save Provenance Registry

Description

Internal function to save the provenance registry

Usage

.save_registry(registry, registry_file)

Arguments

registry

List. Registry data to save

registry_file

Character. Path to registry file


Save Seed Registry

Description

Save Seed Registry

Usage

.save_seed_registry(registry, registry_file)

Save Tools Registry

Description

Internal function to save the tools registry

Usage

.save_tools_registry(registry, registry_file)

Arguments

registry

List. Registry data to save

registry_file

Character. Path to registry file


Capture Environment State

Description

Captures the current global environment state including objects and their types

Usage

capture_environment(
  output_file = NULL,
  include_values = FALSE,
  max_size = 1024 * 1024
)

Arguments

output_file

Character. Path to save environment info. If NULL, returns as list.

include_values

Logical. Whether to include object values (for small objects). Default FALSE.

max_size

Numeric. Maximum object size (in bytes) to include values. Default 1MB.

Value

A list containing environment information

Examples

## Not run: 
x <- 1:10
y <- "test"
capture_environment("env_state.json")

## End(Not run)

Capture Hardware Information

Description

Capture hardware specifications including CPU, RAM, and GPU information. Useful for documenting computational resources used in analysis.

Usage

capture_hardware(output_file = NULL)

Arguments

output_file

Character. Path to save hardware info. If NULL, returns as list.

Value

List containing hardware information

Examples

## Not run: 
capture_hardware("hardware_info.json")

## End(Not run)

Capture Complete Session Information

Description

Captures comprehensive R session information including R version, platform, loaded packages, system information, and locale settings.

Usage

capture_session(output_file = NULL, format = c("json", "yaml", "rds"))

Arguments

output_file

Character. Path to save the session info. If NULL, returns as list.

format

Character. Output format: "json", "yaml", or "rds". Default is "json".

Value

A list containing session information, invisibly returned

Examples

## Not run: 
# Capture session info to JSON
capture_session("session_info.json")

# Capture and return as list
info <- capture_session()

## End(Not run)

Capture System Libraries

Description

Capture version information for system libraries that R packages depend on (e.g., libcurl, libxml2, BLAS/LAPACK implementations)

Usage

capture_system_libraries(output_file = NULL)

Arguments

output_file

Character. Path to save library info. If NULL, returns as list.

Value

List containing system library information

Examples

## Not run: 
capture_system_libraries("system_libs.json")

## End(Not run)

Compare Two Workflow Snapshots

Description

Compare two Capsule snapshots to identify differences in packages, parameters, data files, and other tracked components

Usage

compare_snapshots(snapshot1, snapshot2, output_file)

Arguments

snapshot1

Character. Name of first snapshot

snapshot2

Character. Name of second snapshot

output_file

Character. Path to save comparison report (required).

Value

List containing comparison results

Examples

## Not run: 
compare_snapshots("analysis_v1", "analysis_v2",
                  output_file = tempfile(fileext = ".md"))

## End(Not run)

Create renv Lockfile

Description

Generate an renv-compatible lockfile for package reproducibility

Usage

create_renv_lockfile(output_file, project_path = ".")

Arguments

output_file

Character. Path to save lockfile (required).

project_path

Character. Path to project. Default is current directory.

Value

Path to created lockfile

Examples

## Not run: 
create_renv_lockfile(output_file = tempfile(fileext = ".lock"))

## End(Not run)

Create Reproducibility Report

Description

Generate a comprehensive markdown report documenting all reproducibility information

Usage

create_repro_report(
  output_file,
  analysis_name = NULL,
  include_package_list = TRUE
)

Arguments

output_file

Character. Path to save the report (required).

analysis_name

Character. Name of the analysis

include_package_list

Logical. Include full package list. Default TRUE.

Value

Path to generated report

Examples

## Not run: 
create_repro_report(tempfile(fileext = ".md"), "main_analysis")

## End(Not run)

Generate CWL (Common Workflow Language) Input

Description

Export Capsule data in YAML format suitable for CWL workflows

Usage

export_for_cwl(output_file)

Arguments

output_file

Character. Path to save inputs (required).

Value

List containing input data

Examples

## Not run: 
export_for_cwl(tempfile(fileext = ".yml"))

## End(Not run)

Export Capsule Data for Nextflow

Description

Export all Capsule tracking data in a format suitable for Nextflow pipelines

Usage

export_for_nextflow(output_file, include_checksums = TRUE)

Arguments

output_file

Character. Path to save manifest (required).

include_checksums

Logical. Include file checksums. Default TRUE.

Value

List containing manifest data

Examples

## Not run: 
export_for_nextflow(tempfile(fileext = ".json"))

## End(Not run)

Export Capsule Data for Snakemake

Description

Export all Capsule tracking data in YAML format for Snakemake pipelines

Usage

export_for_snakemake(output_file, include_checksums = TRUE)

Arguments

output_file

Character. Path to save config (required).

include_checksums

Logical. Include file checksums. Default TRUE.

Value

List containing config data

Examples

## Not run: 
export_for_snakemake(tempfile(fileext = ".yaml"))

## End(Not run)

Create WDL (Workflow Description Language) Config

Description

Export Capsule data in JSON format suitable for WDL workflows

Usage

export_for_wdl(output_file)

Arguments

output_file

Character. Path to save config (required).

Value

List containing config data

Examples

## Not run: 
export_for_wdl(tempfile(fileext = ".json"))

## End(Not run)

Generate Docker Configuration

Description

Generate a Dockerfile and docker-compose.yml for complete environment reproducibility

Usage

generate_docker(
  output_dir,
  r_version = NULL,
  base_image = "rocker/r-ver",
  system_deps = NULL,
  project_name = "reproflow-project",
  include_rstudio = FALSE
)

Arguments

output_dir

Character. Directory to save Docker files (required).

r_version

Character. R version to use. Default is current R version.

base_image

Character. Base Docker image. Default "rocker/r-ver"

system_deps

Character vector. System dependencies to install

project_name

Character. Name for the project

include_rstudio

Logical. Include RStudio Server. Default FALSE.

Value

List of generated file paths

Examples

## Not run: 
generate_docker(
  output_dir = tempdir(),
  project_name = "my_analysis",
  system_deps = c("libcurl4-openssl-dev", "libxml2-dev")
)

## End(Not run)

Generate Reproducible Script

Description

Generate an executable R script that includes all reproducibility information including package versions, seeds, parameters, and data verification.

Usage

generate_repro_script(
  script_file,
  source_script = NULL,
  analysis_name = "analysis",
  include_renv = TRUE,
  include_data_check = TRUE,
  include_session_info = TRUE
)

Arguments

script_file

Character. Path to save the generated script

source_script

Character. Original analysis script to include

analysis_name

Character. Name for this analysis

include_renv

Logical. Include renv initialization. Default TRUE.

include_data_check

Logical. Include data verification. Default TRUE.

include_session_info

Logical. Include session info at end. Default TRUE.

Value

Path to generated script

Examples

## Not run: 
generate_repro_script(
  "analysis_reproducible.R",
  source_script = "analysis.R",
  analysis_name = "main_analysis"
)

## End(Not run)

Generate Singularity Definition File

Description

Generate a Singularity/Apptainer definition file for HPC environments. Singularity is commonly used in HPC clusters where Docker is not available.

Usage

generate_singularity(
  output_dir,
  r_version = NULL,
  base_image = "rocker/r-ver",
  conda_env = NULL,
  system_deps = NULL,
  project_name = "reproflow-project"
)

Arguments

output_dir

Character. Directory to save Singularity files (required).

r_version

Character. R version to use. Default is current R version.

base_image

Character. Base Docker image. Default "rocker/r-ver"

conda_env

Character. Path to conda environment file. Optional.

system_deps

Character vector. System dependencies to install

project_name

Character. Name for the project

Value

List of generated file paths

Examples

## Not run: 
generate_singularity(
  output_dir = tempdir(),
  project_name = "my_analysis",
  system_deps = c("samtools", "bwa")
)

## End(Not run)

Get Conda Environment Info

Description

Retrieve information about tracked conda environments

Usage

get_conda_env_info(
  env_name = NULL,
  registry_file = ".capsule/conda_registry.json"
)

Arguments

env_name

Character. Specific environment name, or NULL for all

registry_file

Character. Path to conda registry

Value

List of environment information


Get Data Lineage

Description

Retrieve complete lineage information for tracked data

Usage

get_data_lineage(
  data_path = NULL,
  registry_file = ".capsule/data_registry.json"
)

Arguments

data_path

Character. Path to data file. If NULL, returns all lineage.

registry_file

Character. Path to provenance registry.

Value

List containing lineage information

Examples

## Not run: 
# Get lineage for specific file
lineage <- get_data_lineage("data/mydata.csv")

# Get all lineage
all_lineage <- get_data_lineage()

## End(Not run)

Get Parameter History

Description

Retrieve parameter tracking history

Usage

get_param_history(
  analysis_name = NULL,
  registry_file = ".capsule/param_registry.json"
)

Arguments

analysis_name

Character. Specific analysis name, or NULL for all

registry_file

Character. Path to parameter registry

Value

List of parameter records


Get Reference Genome Information

Description

Retrieve information about tracked reference genomes

Usage

get_reference_info(
  genome_build = NULL,
  registry_file = ".capsule/reference_registry.json"
)

Arguments

genome_build

Character. Specific genome build, or NULL for all

registry_file

Character. Path to reference registry

Value

List of reference genome information

Examples

## Not run: 
# Get all tracked references
get_reference_info()

# Get specific reference
get_reference_info("GRCh38")

## End(Not run)

Get Seed History

Description

Retrieve seed tracking history

Usage

get_seed_history(
  analysis_name = NULL,
  registry_file = ".capsule/seed_registry.json"
)

Arguments

analysis_name

Character. Specific analysis name, or NULL for all

registry_file

Character. Path to seed registry

Value

List of seed records


Get External Tool Versions

Description

Retrieve version information for previously tracked external tools

Usage

get_tool_versions(
  tool_name = NULL,
  registry_file = ".capsule/tools_registry.json"
)

Arguments

tool_name

Character. Specific tool name, or NULL for all tools

registry_file

Character. Path to tools registry

Value

List of tool version information

Examples

## Not run: 
# Get all tracked tools
get_tool_versions()

# Get specific tool
get_tool_versions("samtools")

## End(Not run)

Initialize Capsule in Project

Description

Initialize Capsule reproducibility framework in the current project. Creates necessary directory structure and configuration files.

Usage

init_capsule(
  project_path = ".",
  use_renv = TRUE,
  use_git = TRUE,
  create_gitignore = TRUE
)

Arguments

project_path

Character. Path to project directory. Default is current directory.

use_renv

Logical. Initialize renv for package management. Default TRUE.

use_git

Logical. Initialize git if not already present. Default TRUE.

create_gitignore

Logical. Create/update .gitignore. Default TRUE.

Value

Invisible NULL

Examples

## Not run: 
# Initialize Capsule in current directory
init_capsule()

# Initialize without renv
init_capsule(use_renv = FALSE)

## End(Not run)

List Common Reference Genome Sources

Description

Display a helpful list of common reference genome sources

Usage

list_reference_sources()

Value

No return value, called for side effects (displays reference sources)

Examples

list_reference_sources()

List Available Snapshots

Description

List all available snapshots with basic metadata

Usage

list_snapshots()

Value

Data frame with snapshot information

Examples

## Not run: 
list_snapshots()

## End(Not run)

Restore Conda Environment

Description

Restore a conda environment from a previously exported environment file

Usage

restore_conda_env(
  env_file = "conda_environment.yml",
  env_name = NULL,
  use_mamba = FALSE,
  force = FALSE
)

Arguments

env_file

Character. Path to environment YAML file. Default "conda_environment.yml"

env_name

Character. Name for the new environment. If NULL, uses name from file.

use_mamba

Logical. Use mamba instead of conda. Default FALSE.

force

Logical. Remove existing environment if it exists. Default FALSE.

Value

Logical. TRUE if successful, FALSE otherwise

Examples

## Not run: 
# Restore environment from file
restore_conda_env("conda_environment.yml")

# Use mamba for faster installation
restore_conda_env("conda_environment.yml", use_mamba = TRUE)

# Force recreate if exists
restore_conda_env("conda_environment.yml", force = TRUE)

## End(Not run)

Restore Random Seed

Description

Restore a previously tracked random seed

Usage

restore_seed(analysis_name, registry_file = ".capsule/seed_registry.json")

Arguments

analysis_name

Character. Name of analysis to restore seed from

registry_file

Character. Path to seed registry

Value

The seed value (invisibly)

Examples

## Not run: 
# Restore previously tracked seed
restore_seed("simulation_1")

## End(Not run)

Set and Track Random Seed

Description

Set a random seed and track it for reproducibility. Note: This function is explicitly designed to set random seeds as requested by the user.

Usage

set_seed(
  seed = NULL,
  kind = NULL,
  normal.kind = NULL,
  sample.kind = NULL,
  analysis_name = NULL,
  registry_file,
  set_seed = TRUE
)

Arguments

seed

Numeric. Random seed to set. If NULL, generates random seed.

kind

Character. RNG kind (see ?set.seed). Default NULL uses current.

normal.kind

Character. Normal RNG kind. Default NULL uses current.

sample.kind

Character. Sample RNG kind. Default NULL uses current.

analysis_name

Character. Name to associate with this seed

registry_file

Character. Path to seed registry (required).

set_seed

Logical. If TRUE, actually sets the seed. If FALSE, only tracks it. Default TRUE.

Value

The seed value (invisibly)

Examples

## Not run: 
# Set and track a specific seed
set_seed(12345, analysis_name = "simulation_1",
         registry_file = tempfile(fileext = ".json"))

# Generate and track a random seed
set_seed(analysis_name = "bootstrap_analysis",
         registry_file = tempfile(fileext = ".json"))

## End(Not run)

Track Package Versions and Dependencies

Description

Creates a comprehensive snapshot of all installed packages, their versions, dependencies, and sources for reproducibility.

Usage

snapshot_packages(
  output_file = NULL,
  include_dependencies = TRUE,
  only_attached = FALSE
)

Arguments

output_file

Character. Path to save package info. If NULL, returns as list.

include_dependencies

Logical. Include dependency tree. Default TRUE.

only_attached

Logical. Only track attached packages. Default FALSE.

Value

A list containing package information

Examples

## Not run: 
# Track all installed packages
snapshot_packages("package_manifest.json")

# Track only attached packages
snapshot_packages("packages.json", only_attached = TRUE)

## End(Not run)

Create Complete Workflow Snapshot

Description

Create a comprehensive snapshot of the entire workflow including session info, packages, data, parameters, and generate all reproducibility artifacts.

Usage

snapshot_workflow(
  snapshot_name = NULL,
  analysis_name = "analysis",
  source_script = NULL,
  description = NULL,
  generate_docker = TRUE,
  generate_script = TRUE,
  generate_report = TRUE
)

Arguments

snapshot_name

Character. Name for this snapshot. Default is timestamp.

analysis_name

Character. Name of the analysis

source_script

Character. Path to main analysis script

description

Character. Description of this workflow

generate_docker

Logical. Generate Docker configuration. Default TRUE.

generate_script

Logical. Generate reproducible script. Default TRUE.

generate_report

Logical. Generate reproducibility report. Default TRUE.

Value

List containing paths to generated files

Examples

## Not run: 
# Create complete workflow snapshot
snapshot_workflow(
  snapshot_name = "analysis_v1",
  analysis_name = "main_analysis",
  source_script = "analysis.R",
  description = "Initial analysis run"
)

## End(Not run)

Track Conda Environment

Description

Export and track a conda environment specification for reproducibility. Works with both conda and mamba.

Usage

track_conda_env(env_name = NULL, output_file, use_mamba = FALSE, registry_file)

Arguments

env_name

Character. Name of conda environment. If NULL, uses active environment.

output_file

Character. Path to save environment file (required).

use_mamba

Logical. Use mamba instead of conda. Default FALSE.

registry_file

Character. Path to conda registry (required).

Value

List containing environment information

Examples

## Not run: 
# Track currently active conda environment
track_conda_env(output_file = tempfile(fileext = ".yml"),
                registry_file = tempfile(fileext = ".json"))

# Track specific environment
track_conda_env(env_name = "bioinfo_env",
                output_file = tempfile(fileext = ".yml"),
                registry_file = tempfile(fileext = ".json"))

# Use mamba instead
track_conda_env(use_mamba = TRUE,
                output_file = tempfile(fileext = ".yml"),
                registry_file = tempfile(fileext = ".json"))

## End(Not run)

Track Data Provenance

Description

Records comprehensive provenance information for data files including checksums, sources, timestamps, and metadata. Supports fast hashing for large files.

Usage

track_data(
  data_path,
  source = c("downloaded", "generated", "manual", "reference", "other"),
  source_url = NULL,
  description = NULL,
  metadata = NULL,
  fast_hash = TRUE,
  size_threshold_gb = 1,
  registry_file
)

Arguments

data_path

Character. Path to data file or directory.

source

Character. Source of the data (e.g., "downloaded", "generated", "manual", "reference").

source_url

Character. URL if data was downloaded. Optional.

description

Character. Description of the data. Optional.

metadata

List. Additional metadata. Optional.

fast_hash

Logical. Use faster xxHash for large files (>1GB). Default TRUE.

size_threshold_gb

Numeric. Size threshold (GB) for using fast hash. Default 1.

registry_file

Character. Path to provenance registry (required).

Value

A list containing data provenance information

Examples

## Not run: 
# Track a downloaded dataset
track_data("data/mydata.csv",
  source = "downloaded",
  source_url = "https://example.com/data.csv",
  description = "Customer data from API",
  registry_file = tempfile(fileext = ".json")
)

# Track generated data
track_data("results/simulation.rds",
  source = "generated",
  description = "Monte Carlo simulation results",
  registry_file = tempfile(fileext = ".json")
)

# Track large file with fast hashing
track_data("data/large_file.bam",
  source = "generated",
  fast_hash = TRUE,
  registry_file = tempfile(fileext = ".json")
)

## End(Not run)

Track External Bioinformatics Tools

Description

Track versions of external command-line tools commonly used in bioinformatics pipelines (e.g., samtools, STAR, BWA, etc.)

Usage

track_external_tools(tools = NULL, registry_file)

Arguments

tools

Character vector of tool names to track. If NULL, tracks common tools.

registry_file

Character. Path to tools registry. Default ".capsule/tools_registry.json"

Value

List containing tool version information

Examples

## Not run: 
# Track common bioinformatics tools
track_external_tools(registry_file = tempfile(fileext = ".json"))

# Track specific tools
track_external_tools(c("samtools", "bwa", "STAR"),
                     registry_file = tempfile(fileext = ".json"))

## End(Not run)

Track Analysis Parameters

Description

Record analysis parameters and configuration settings for reproducibility

Usage

track_params(params, analysis_name = NULL, description = NULL, registry_file)

Arguments

params

Named list of parameters to track

analysis_name

Character. Name/identifier for this analysis

description

Character. Description of what these parameters control

registry_file

Character. Path to parameter registry (required).

Value

List containing parameter information

Examples

## Not run: 
# Track model parameters
params <- list(
  learning_rate = 0.01,
  epochs = 100,
  batch_size = 32,
  model_type = "neural_network"
)
track_params(params, "model_training", "Deep learning model parameters",
             registry_file = tempfile(fileext = ".json"))

## End(Not run)

Track Reference Genome

Description

Track reference genome files, annotations, and indices for reproducibility. This is critical for genomics/transcriptomics pipelines where the exact reference version affects results.

Usage

track_reference_genome(
  fasta_path,
  gtf_path = NULL,
  gff_path = NULL,
  genome_build = NULL,
  species = NULL,
  source_url = NULL,
  indices = list(),
  metadata = list(),
  registry_file,
  data_registry_file
)

Arguments

fasta_path

Character. Path to reference genome FASTA file

gtf_path

Character. Path to GTF annotation file. Optional.

gff_path

Character. Path to GFF annotation file. Optional.

genome_build

Character. Genome build identifier (e.g., "GRCh38", "mm10")

species

Character. Species name (e.g., "Homo sapiens", "Mus musculus")

source_url

Character. URL where reference was downloaded from

indices

Named list. Paths to aligner indices (STAR, BWA, etc.)

metadata

List. Additional metadata about the reference

registry_file

Character. Path to reference registry (required).

data_registry_file

Character. Path to data registry for tracking files (required).

Value

List containing reference genome information

Examples

## Not run: 
track_reference_genome(
  fasta_path = "ref/GRCh38.fa",
  gtf_path = "ref/gencode.v38.annotation.gtf",
  genome_build = "GRCh38",
  species = "Homo sapiens",
  source_url = "https://www.gencodegenes.org/",
  indices = list(
    star = "ref/STAR_index/",
    bwa = "ref/bwa_index/GRCh38"
  ),
  registry_file = tempfile(fileext = ".json"),
  data_registry_file = tempfile(fileext = ".json")
)

## End(Not run)

Verify Data Integrity

Description

Verify that tracked data files have not been modified by comparing checksums

Usage

verify_data(data_path = NULL, registry_file = ".capsule/data_registry.json")

Arguments

data_path

Character. Path to specific file, or NULL to verify all tracked files.

registry_file

Character. Path to provenance registry.

Value

Logical. TRUE if data is unchanged, FALSE otherwise

Examples

## Not run: 
# Verify specific file
verify_data("data/mydata.csv")

# Verify all tracked files
verify_data()

## End(Not run)

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.