The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.

biostats biostats_logo

R-CMD-check Tests codecov CRAN status

Overview

biostats is an R package that functions as a toolbox to streamline biostatistics and clinical data analysis workflows.

Key features

Designed primarily for comparative clinical studies, trial planning, and analysis, this package serves both as an analytical toolkit for professional biostatisticians and clinical data analysts and as an educational resource for researchers transitioning to R-based biostatistics, including professionals from other domains, clinical research professionals, and medical practitioners involved in the development of clinical trials.

Developed by the biostatistics team at Laboratorios Sophia S.A. de C.V.

Installation

# Install from GitHub
install.packages("remotes") 
library(remotes)
remotes::install_github("sebasquirarte/biostats")
library(biostats)

Usage

This package comprises 14 functions across four analytical domains:

Descriptive Statistics and Exploratory Data Analysis (EDA)

clinical_data()

Description

Creates a simple simulated clinical trial dataset with subject demographics, multiple visits, treatment groups with different effects, numerical and categorical variables, as well as optional missing data and dropout rates.

Parameters
Parameter Description Default
n Integer indicating the number (1-999) of subjects. 100
visits Integer indicating the number of visits including baseline. 3
arms Character vector of treatment arm names. c("Placebo", "Treatment")
dropout Numeric parameter indicating the proportion (0-1) of subjects who dropout. 0
missing Numeric parameter indicating the proportion (0-1) of missing values to be introduced across numeric variables with fixed proportions (biomarker = 15%, weight = 25%, response = 60%). 0
Examples
# Simulate basic clinical data
clinical_df <- clinical_data()

str(clinical_df)
#> 'data.frame':    300 obs. of  8 variables:
#>  $ participant_id: chr  "001" "001" "001" "002" ...
#>  $ visit         : Factor w/ 3 levels "1","2","3": 1 2 3 1 2 3 1 2 3 1 ...
#>  $ sex           : Factor w/ 2 levels "Male","Female": 1 1 1 1 1 1 1 1 1 1 ...
#>  $ treatment     : Factor w/ 2 levels "Placebo","Treatment": 2 2 2 1 1 1 1 1 1 1 ...
#>  $ age           : num  35 35 35 21 21 21 47 47 47 35 ...
#>  $ weight        : num  55.4 60.3 58.1 68.3 66.3 64 76 77.6 74.9 61.7 ...
#>  $ biomarker     : num  42.2 44.7 44.9 56.5 51 ...
#>  $ response      : Factor w/ 3 levels "Complete","Partial",..: 1 3 2 3 3 3 3 2 3 3 ...

head(clinical_df, 10)
#>    participant_id visit  sex treatment age weight biomarker response
#> 1             001     1 Male Treatment  35   55.4     42.22 Complete
#> 2             001     2 Male Treatment  35   60.3     44.70     None
#> 3             001     3 Male Treatment  35   58.1     44.85  Partial
#> 4             002     1 Male   Placebo  21   68.3     56.51     None
#> 5             002     2 Male   Placebo  21   66.3     51.03     None
#> 6             002     3 Male   Placebo  21   64.0     39.59     None
#> 7             003     1 Male   Placebo  47   76.0     24.92     None
#> 8             003     2 Male   Placebo  47   77.6     49.99  Partial
#> 9             003     3 Male   Placebo  47   74.9     60.69     None
#> 10            004     1 Male   Placebo  35   61.7     50.58     None
# Simulate more complex clinical data
clinical_df_full <- clinical_data(n = 300,
                                  visits = 10,
                                  arms = c('A', 'B', 'C'), 
                                  dropout = 0.10,
                                  missing = 0.05)

str(clinical_df_full)
#> 'data.frame':    3000 obs. of  8 variables:
#>  $ participant_id: chr  "001" "001" "001" "001" ...
#>  $ visit         : Factor w/ 10 levels "1","2","3","4",..: 1 2 3 4 5 6 7 8 9 10 ...
#>  $ sex           : Factor w/ 2 levels "Male","Female": 1 1 1 1 1 1 1 1 1 1 ...
#>  $ treatment     : Factor w/ 3 levels "A","B","C": 3 3 3 3 3 3 3 3 3 3 ...
#>  $ age           : num  25 25 25 25 25 25 25 25 25 25 ...
#>  $ weight        : num  64.7 65.1 64.2 62.3 62.1 NA NA 61.8 63.7 64.1 ...
#>  $ biomarker     : num  48.2 22.2 51.2 43.4 44.5 ...
#>  $ response      : Factor w/ 3 levels "Complete","Partial",..: 1 1 3 1 3 1 3 3 NA 1 ...

head(clinical_df_full, 20)
#>    participant_id visit    sex treatment age weight biomarker response
#> 1             001     1   Male         C  25   64.7     48.24 Complete
#> 2             001     2   Male         C  25   65.1     22.17 Complete
#> 3             001     3   Male         C  25   64.2     51.21     None
#> 4             001     4   Male         C  25   62.3     43.38 Complete
#> 5             001     5   Male         C  25   62.1     44.52     None
#> 6             001     6   Male         C  25     NA     24.25 Complete
#> 7             001     7   Male         C  25     NA     49.55     None
#> 8             001     8   Male         C  25   61.8     47.78     None
#> 9             001     9   Male         C  25   63.7     23.65     <NA>
#> 10            001    10   Male         C  25   64.1     45.97 Complete
#> 11            002     1 Female         B  72   71.1     34.18     None
#> 12            002     2 Female         B  72   70.7     65.47     None
#> 13            002     3 Female         B  72   71.2     34.29     None
#> 14            002     4 Female         B  72     NA        NA     <NA>
#> 15            002     5 Female         B  72     NA        NA     <NA>
#> 16            002     6 Female         B  72     NA        NA     <NA>
#> 17            002     7 Female         B  72     NA        NA     <NA>
#> 18            002     8 Female         B  72     NA        NA     <NA>
#> 19            002     9 Female         B  72     NA        NA     <NA>
#> 20            002    10 Female         B  72     NA        NA     <NA>

summary_table()

Description

Generates a summary table for biostatistics and clinical data analysis with automatic normality, effect size, and statistical test calculations. Handles both numeric and categorical variables, performing appropriate descriptive statistics and inferential tests for single-group summaries or two-group comparisons.

Parameters
Parameter Description Default
data Dataframe containing the variables to be summarized. Required
group_by Character string indicating the name of the grouping variable for two-group comparisons. NULL
normality_test Character string indicating the normality test to use: ‘S-W’ for Shapiro-Wilk or ‘K-S’ for Kolmogorov-Smirnov. 'S-W'
all Logical parameter that shows all calculated statistics. FALSE
effect_size Logical parameter that includes effect size estimates. FALSE
exclude Character vector of variable names to exclude from the summary. NULL
Examples
# Overall summary without considering treatment groups
summary_table(clinical_df, exclude = c('subject_id', 'visit'))

# Grouped summary by treatment group
summary_table(clinical_df, group_by = 'treatment', exclude = c('subject_id', 'visit'))

# Grouped summary by treatment group with all stats and effect size
summary_table(clinical_df,
              group_by = 'treatment',
              all = TRUE,
              effect_size = TRUE,
              exclude = c('subject_id', 'visit'))

normality()

Description

Tests normality using sample size-appropriate methods: Shapiro-Wilk test (n less than or equal to 50) or Kolmogorov-Smirnov test (n greater than 50) with Q-Q plots and histograms. Evaluates skewness and kurtosis using z-score criteria based on sample size. Automatically detects outliers and provides comprehensive visual and statistical assessment.

Parameters
Parameter Description Default
data Dataframe containing the variables to be summarized. Required
x Character string indicating the variable to be analyzed. Required
all Logical parameter that displays all row indices of values outside 95% CI. FALSE
color Character string indicating color for plots. "#79E1BE"
Examples
# Filter clinical data to Placebo arm
clinical_df_treat <- clinical_df[clinical_df$treatment == "Placebo", ]

# Normally distributed variable
normality(data = clinical_df_treat, "biomarker")
#> 
#> Normality Test for 'biomarker' 
#> 
#> n = 159 
#> mean (SD) = 49.44 (9.2) 
#> median (IQR) = 50.38 (13.1) 
#> 
#> Kolmogorov-Smirnov: D = 0.054, p = 0.740 
#> Shapiro-Wilk: W = 0.992, p = 0.546 
#> Skewness: 0.06 (z = 0.30) 
#> Kurtosis: -0.03 (z = -0.08) 
#> 
#> Data appears normally distributed.
#> 


# Non-normally distributed variable with points outside 95% CI displayed
normality(data = clinical_df_treat, "weight", all = TRUE)
#> 
#> Normality Test for 'weight' 
#> 
#> n = 159 
#> mean (SD) = 72.56 (12.9) 
#> median (IQR) = 69.20 (21.1) 
#> 
#> Kolmogorov-Smirnov: D = 0.125, p = 0.014 
#> Shapiro-Wilk: W = 0.951, p = < 0.001 
#> Skewness: 0.28 (z = 1.45) 
#> Kurtosis: -1.09 (z = -2.85) 
#> 
#> Data appears not normally distributed.
#>  
#> VALUES OUTSIDE 95% CI (row indices): 40, 41, 47, 22, 3, 16, 71, 105, 125, 72, 90, 89, 129, 34, 93, 103, 69, 65, 59, 2, 66, 109, 114, 107, 110, 95, 111, 58, 70, 1, 106, 113, 152, 32, 112, 115, 57, 20, 84, 29, 142, 21, 55, 102, 143, 56, 86, 144, 83

missing_values()

Description

Provides descriptive statistics and visualizations of missing values in a dataframe.

Parameters
Parameter Description Default
data Dataframe containing the variables to be analyzed. Required
color Character string indicating the color for missing values. "#79E1BE"
all Logical parameter that shows all variables including those without missing values. FALSE
Examples
# Missing value analysis of only variables with missing values
missing_values(clinical_df_full)
#> 
#> Missing Value Analysis
#> 
#> Complete rows: 2452 (81.7%)
#> Missing cells: 868 (3.6%)
#> 
#>           n_missing pct_missing
#> response        403       13.43
#> weight          251        8.37
#> biomarker       214        7.13


# Show all variables including those without missing values
missing_values(clinical_df_full, all = TRUE)
#> 
#> Missing Value Analysis
#> 
#> Complete rows: 2452 (81.7%)
#> Missing cells: 868 (3.6%)
#> 
#>                n_missing pct_missing
#> response             403       13.43
#> weight               251        8.37
#> biomarker            214        7.13
#> participant_id         0        0.00
#> visit                  0        0.00
#> sex                    0        0.00
#> treatment              0        0.00
#> age                    0        0.00

outliers()

Description

Identifies outliers using Tukey’s interquartile range (IQR) method and provides descriptive statistics and visualizations for outlier assessment in numeric data.

Parameters
Parameter Description Default
data Dataframe containing the variables to be analyzed. Required
x Character string indicating the variable to be analyzed. Required
threshold Numeric value multiplying the IQR to define outlier boundaries. 1.5
color Character string indicating the color for non-outlier data points. "#79E1BE"
Examples
# Basic outlier detection
outliers(clinical_df_full, "biomarker")
#> 
#> Outlier Analysis
#> 
#> Variable: 'biomarker'
#> n: 2786
#> Missing: 214 (7.1%)
#> Method: Tukey's IQR x 1.5
#> Bounds: [18.971, 74.761]
#> Outliers detected: 19 (0.7%)
#> 
#> Outlier indices: 27, 223, 440, 559, 795, 931, 973, 1175, 1277, 1346, 1381, 1680, 1706, 2288, 2370, 2571, 2584, 2602, 2764


# Using custom threshold
outliers(clinical_df_full, "biomarker", threshold = 1.0)
#> 
#> Outlier Analysis
#> 
#> Variable: 'biomarker'
#> n: 2786
#> Missing: 214 (7.1%)
#> Method: Tukey's IQR x 1.0
#> Bounds: [25.945, 67.788]
#> Outliers detected: 115 (4.1%)
#> 
#> Outlier indices: 2, 6, 9, 24, 27, 38, 42, 47, 56, 130 (...)

Sample Size and Power Calculation

sample_size()

Description

Calculates the sample size needed in a clinical trial based on study design and statistical parameters using standard formulas for hypothesis testing (Chow, S. 2008).

Parameters
Parameter Description Default
sample Character string indicating whether one or two samples need to be calculated. Options: "one-sample" or "two-sample". Required
design Character string indicating study design when sample = "two-sample". Options: "parallel" or "crossover". NULL (for one-sample tests)
outcome Character string indicating the type of outcome variable. Options: "mean" or "proportion". Required
type Character string indicating the type of hypothesis test. Options: "equality", "equivalence", "non-inferiority", or "superiority". Required
alpha Numeric parameter indicating the Type I error rate (significance level). 0.05
beta Numeric parameter indicating the Type II error rate (1 - power). 0.20
x1 Numeric value of the mean or proportion for group 1 (treatment group). Required
x2 Numeric value of the mean or proportion for group 2 (control group or reference value). Required
SD Numeric value indicating the standard deviation. Required for mean outcomes and crossover designs with proportion outcomes. NULL
delta Numeric value indicating the margin of clinical interest. Required for non-equality tests. Must be negative for non-inferiority and positive for superiority/equivalence. NULL
dropout Numeric value indicating the discontinuation rate expected in the study. Must be between 0 and 1. 0
k Numeric value indicating the allocation ratio (n1/n2) for two-sample tests. 1
Examples
# Two-sample parallel non-inferiority test for means with 10% expected dropout
sample_size(sample = 'two-sample', design = 'parallel', outcome = 'mean',
            type = 'non-inferiority', x1 = 5.0, x2 = 5.0, 
            SD = 0.1, delta = -0.05, k = 1, dropout = 0.1)
#> 
#> Sample Size Calculation
#> 
#> Test type: non-inferiority
#> Design: parallel, two-sample
#> Outcome: mean
#> Alpha (α): 0.050
#> Beta (β): 0.200
#> Power: 80.0%
#> 
#> Parameters:
#> x1 (treatment): 5.000
#> x2 (control/reference): 5.000
#> Difference (x1 - x2): 0.000
#> Standard Deviation (σ): 0.100
#> Allocation Ratio (k): 1.00
#> Delta (δ): -0.050
#> Dropout rate: 10.0%
#> 
#> Required Sample Size
#> n1 = 55
#> n2 = 55
#> Total = 110
#> 
#> Note: Sample size increased by 10.0% to account for potential dropouts.
# One-sample equivalence test for means
sample_size(sample = "one-sample", outcome = "mean", type = "equivalence",
            x1 = 0, x2 = 0, SD = 0.1, delta = 0.05)
#> 
#> Sample Size Calculation
#> 
#> Test type: equivalence
#> Design: one-sample
#> Outcome: mean
#> Alpha (α): 0.050
#> Beta (β): 0.200
#> Power: 80.0%
#> 
#> Parameters:
#> x1 (treatment): 0.000
#> x2 (control/reference): 0.000
#> Difference (x1 - x2): 0.000
#> Standard Deviation (σ): 0.100
#> Delta (δ): 0.050
#> 
#> Required Sample Size
#> n = 35
#> Total = 35

sample_size_range()

Description

Calculates required sample sizes for specified power levels (70%, 80%, 90%) across a range of treatment effect values (), while keeping the control group value () fixed. Internally calls and generates a plot to visualize how total sample size changes with varying .

Parameters
Parameter Description Default
x1_range Numeric vector of length 2 specifying the range of values to evaluate for the treatment group mean or proportion (x1). Required
x2 Numeric value for the control group mean or proportion (reference value). Required
step Numeric value indicating the step size to increment across the x1_range. 0.1
... Additional arguments passed to sample_size(), such as sample, design, outcome, type, SD, alpha, etc. Required
Examples
# Two-sample parallel non-inferiority test for proportions with 10% dropout
result <- sample_size_range(x1_range = c(0.65, 0.75), x2 = 0.65, step = 0.01,
                            sample = "two-sample", design = "parallel", outcome = "proportion",
                            type = "non-inferiority", delta = -0.1, dropout = 0.1)

print(result)
#> 
#> Sample Size Range Analysis
#> 
#> Treatment range (x1): 0.650 to 0.660
#> Control/Reference (x2): 0.650
#> Step size: 0.010
#> 
#> 70% Power: Total n = 108 to 474
#> 80% Power: Total n = 144 to 622
#> 90% Power: Total n = 196 to 858
#> 
#> Sample size increased by 10.0% to account for potential dropouts.

result$data
power x1 x2 diff n1 n2 total
70 0.65 0.65 0.00 237 237 474
70 0.66 0.65 0.01 194 194 388
70 0.67 0.65 0.02 162 162 324
70 0.68 0.65 0.03 137 137 274
70 0.69 0.65 0.04 117 117 234
70 0.70 0.65 0.05 102 102 204
70 0.71 0.65 0.06 88 88 176
70 0.72 0.65 0.07 77 77 154
70 0.73 0.65 0.08 69 69 138
70 0.74 0.65 0.09 61 61 122
70 0.75 0.65 0.10 54 54 108
80 0.65 0.65 0.00 311 311 622
80 0.66 0.65 0.01 255 255 510
80 0.67 0.65 0.02 213 213 426
80 0.68 0.65 0.03 180 180 360
80 0.69 0.65 0.04 154 154 308
80 0.70 0.65 0.05 134 134 268
80 0.71 0.65 0.06 116 116 232
80 0.72 0.65 0.07 102 102 204
80 0.73 0.65 0.08 91 91 182
80 0.74 0.65 0.09 80 80 160
80 0.75 0.65 0.10 72 72 144
90 0.65 0.65 0.00 429 429 858
90 0.66 0.65 0.01 352 352 704
90 0.67 0.65 0.02 294 294 588
90 0.68 0.65 0.03 249 249 498
90 0.69 0.65 0.04 213 213 426
90 0.70 0.65 0.05 184 184 368
90 0.71 0.65 0.06 160 160 320
90 0.72 0.65 0.07 141 141 282
90 0.73 0.65 0.08 125 125 250
90 0.74 0.65 0.09 110 110 220
90 0.75 0.65 0.10 98 98 196
# One-sample equivalence test for means
result <- sample_size_range(x1_range = c(-0.01, 0.01), x2 = 0, step = 0.005,
                            sample = "one-sample", outcome = "mean", type = "equivalence",
                            SD = 0.1, delta = 0.05, alpha = 0.05)

print(result)
#> 
#> Sample Size Range Analysis
#> 
#> Treatment range (x1): -0.010 to -0.005
#> Control/Reference (x2): 0.000
#> Step size: 0.005
#> 
#> 70% Power: Total n = 29 to 45
#> 80% Power: Total n = 35 to 54
#> 90% Power: Total n = 44 to 68

result$data
power x1 x2 diff n1 n2 total
70 -0.010 0 -0.010 45 45 45
70 -0.005 0 -0.005 36 36 36
70 0.000 0 0.000 29 29 29
70 0.005 0 0.005 36 36 36
70 0.010 0 0.010 45 45 45
80 -0.010 0 -0.010 54 54 54
80 -0.005 0 -0.005 43 43 43
80 0.000 0 0.000 35 35 35
80 0.005 0 0.005 43 43 43
80 0.010 0 0.010 54 54 54
90 -0.010 0 -0.010 68 68 68
90 -0.005 0 -0.005 54 54 54
90 0.000 0 0.000 44 44 44
90 0.005 0 0.005 54 54 54
90 0.010 0 0.010 68 68 68

Statistical Analysis and Inference

omnibus()

Description

Performs omnibus tests to evaluate overall differences between three or more groups. Automatically selects the appropriate statistical test based on data characteristics and assumption testing. Supports both independent groups and repeated measures designs. Tests include one-way ANOVA, repeated measures ANOVA, Kruskal-Wallis test, and Friedman test. Performs comprehensive assumption checking (normality, homogeneity of variance, sphericity) and post-hoc testing when significant results are detected.

Parameters
Parameter Description Default
data Dataframe containing the variables to be analyzed. Data must be in long format with one row per observation. Required
y Character string indicating the dependent variable (outcome). Required
x Character string indicating the independent variable (group or within-subject variable). Required
paired_by Character string indicating the source of repeated measurements. If provided, a repeated measures design is assumed. If NULL, independent groups design is assumed. NULL
alpha Numeric value indicating the significance level for hypothesis tests. 0.05
p_method Character string indicating the method for p-value adjustment in post-hoc multiple comparisons to control for Type I error inflation. Options: "holm", "hochberg", "hommel", "bonferroni", "BH", "BY", "none". "holm"
na.action Character string indicating the action to take if NAs are present ("na.omit" or "na.exclude"). "na.omit"
Examples
# Compare numerical variable across treatments
omnibus(data = clinical_df_full, y = "biomarker", x = "treatment")
#> 
#> Omnibus Test: One-way ANOVA
#> 
#> Assumption Testing Results:
#> 
#>   Normality (Shapiro-Wilk Test):
#>   A: W = 0.9980, p = 0.321
#>   B: W = 0.9975, p = 0.237
#>   C: W = 0.9988, p = 0.733
#>   Overall result: Normal distribution assumed.
#> 
#>   Homogeneity of Variance (Bartlett Test):
#>   Chi-squared(2) = 1.3685, p = 0.504
#>   Effect size (Cramer's V) = 0.0151
#>   Result: Homogeneous variances.
#> 
#> Test Results:
#>   Formula: biomarker ~ treatment
#>   alpha: 0.05
#>   Result: significant (p = <0.001)
#> 
#> Post-hoc Multiple Comparisons
#> 
#>   Tukey Honest Significant Differences (alpha: 0.050):
#>   Comparison               Diff    Lower    Upper    p-adj
#>   --------------------------------------------------------- 
#>   B - A                  -3.178   -4.296   -2.060   <0.001*
#>   C - A                  -5.542   -6.618   -4.466   <0.001*
#>   C - B                  -2.364   -3.468   -1.259   <0.001*
#> 
#> The study groups show a moderately imbalanced distribution of sample sizes (Δn = 0.214).
 
# Compare numerical variable changes across visits 
omnibus(y = "biomarker", x = "visit", data = clinical_df, paired_by = "participant_id")
#> 
#> Omnibus Test: Repeated measures ANOVA
#> 
#> Assumption Testing Results:
#> 
#>   Sphericity (Mauchly Test):
#>   W = 0.9881, p = 0.556
#>   Result: Sphericity assumed.
#> 
#>   Normality (Shapiro-Wilk Test):
#>   1: W = 0.9848, p = 0.309
#>   2: W = 0.9926, p = 0.861
#>   3: W = 0.9884, p = 0.536
#>   Overall result: Normal distribution assumed.
#> 
#>   Homogeneity of Variance (Bartlett Test):
#>   Chi-squared(2) = 0.5190, p = 0.771
#>   Effect size (Cramer's V) = 0.0294
#>   Result: Homogeneous variances.
#> 
#> Test Results:
#>   Formula: biomarker ~ visit + Error(participant_id/visit)
#>   alpha: 0.05
#>   Result: not significant (p = 0.609)
#> Post-hoc tests not performed (results not significant).
#> 
#> The study groups show a moderately imbalanced distribution of sample sizes (Δn = 0.203).
# Filter simulated data to just one treatment
clinical_df_A <- clinical_df[clinical_df$treatment == "Treatment", ]

# Compare numerical variable changes across visits 
omnibus(y = "biomarker", x = "visit", data = clinical_df_A, paired_by = "participant_id")
#> 
#> Omnibus Test: Repeated measures ANOVA
#> 
#> Assumption Testing Results:
#> 
#>   Sphericity (Mauchly Test):
#>   W = 0.9825, p = 0.672
#>   Result: Sphericity assumed.
#> 
#>   Normality (Shapiro-Wilk Test):
#>   1: W = 0.9617, p = 0.125
#>   2: W = 0.9812, p = 0.642
#>   3: W = 0.9904, p = 0.964
#>   Overall result: Normal distribution assumed.
#> 
#>   Homogeneity of Variance (Bartlett Test):
#>   Chi-squared(2) = 0.9232, p = 0.630
#>   Effect size (Cramer's V) = 0.0572
#>   Result: Homogeneous variances.
#> 
#> Test Results:
#>   Formula: biomarker ~ visit + Error(participant_id/visit)
#>   alpha: 0.05
#>   Result: not significant (p = 0.233)
#> Post-hoc tests not performed (results not significant).
#> 
#> The study groups show a moderately imbalanced distribution of sample sizes (Δn = 0.217).

effect_measures()

Description

Calculates measures of effect: Odds Ratio (OR), Risk Ratio (RR), and either Number Needed to Treat (NNT) or Number Needed to Harm (NNH).

Parameters
Parameter Description Default
exposed_event Numeric value indicating the number of events in the exposed group. Required
exposed_no_event Numeric value indicating the number of non-events in the exposed group. Required
unexposed_event Numeric value indicating the number of events in the unexposed group. Required
unexposed_no_event Numeric value indicating the number of non-events in the unexposed group. Required
alpha Numeric value between 0 and 1 specifying the alpha level for confidence intervals (CI). 0.05
correction Logical parameter that indicates whether a continuity correction (0.5) will be applied when any cell contains 0. TRUE
Examples
effect_measures(exposed_event = 15, 
                exposed_no_event = 85,
                unexposed_event = 5,
                unexposed_no_event = 95)
#> 
#> Odds/Risk Ratio Analysis
#> 
#> Contingency Table:
#>                 Event No Event      Sum
#> Exposed            15       85      100
#> Unexposed           5       95      100
#> Sum                20      180      200
#> 
#> Odds Ratio: 3.353 (95% CI: 1.169 - 9.616)
#> Risk Ratio: 3.000 (95% CI: 1.133 - 7.941)
#> 
#> Risk in exposed: 15.0%
#> Risk in unexposed: 5.0%
#> Absolute risk difference: 10.0%
#> Number needed to harm (NNH): 10.0
#> 
#> Note: Correction not applied (no zero values).

Data Visualization

plot_bar()

Description

Generates publication-ready bar plots with minimal code using ggplot2.

Parameters
Parameter Description Default
data A data frame containing the variables to plot. Required
x Character string specifying the x-axis variable. Required
y Character string specifying the y-axis variable. If NULL, counts calculated automatically. NULL
group Character string specifying the grouping variable for fill color. NULL
facet Character string specifying the faceting variable. NULL
position Character string specifying bar position: "dodge", "stack", or "fill". Required
stat Character string for statistical aggregation: "mean" or "median". Required
colors Character vector of colors. If NULL, uses TealGrn palette. NULL
title Character string for plot title. NULL
xlab Character string for x-axis label. NULL
ylab Character string for y-axis label. NULL
legend_title Character string for legend title. NULL
flip Logical parameter indicating whether to flip coordinates. FALSE
values Logical parameter indicating whether to display value labels above bars. FALSE
Examples
# Simulated clinical data
clinical_df <- clinical_data()

# Proportion of response by treatment
plot_bar(data = clinical_df, x = "treatment", group = "response", position = "fill", 
         title = "Proportion of response by treatment", values = TRUE)


# Grouped barplot of categorical variable by treatment with value labels
plot_bar(data = clinical_df, x = "response", group = "visit", facet = "treatment", 
         title = "Response by visit and treatment",values = TRUE)

plot_line()

Description

Generates publication-ready line plots with minimal code using ggplot2.

Parameters
Parameter Description Default
data A data frame containing the variables to plot. Required
x Character string specifying the x-axis variable. Required
y Character string specifying the y-axis variable. Required
group Character string specifying the grouping variable for multiple lines. NULL
facet Character string specifying the faceting variable. NULL
stat Character string for statistical aggregation: "mean" or "median". Required
error Character string for error bars: "se", "sd", "ci", or "none". "se"
error_width Numeric value indicating the width of error bar caps. 0.2
colors Character vector of colors. If NULL, uses TealGrn palette. NULL
title Character string for plot title. NULL
xlab Character string for x-axis label. NULL
ylab Character string for y-axis label. NULL
legend_title Character string for legend title. NULL
points Logical parameter indicating whether to add points to lines. TRUE
line_size Numeric value indicating thickness of lines. 1
point_size Numeric value indicating size of points if shown. 3
y_limits Numeric vector of length 2 for y-axis limits. NULL
x_limits Numeric vector of length 2 for x-axis limits. NULL
Examples
# Line plot with mean and standard error by treatment
plot_line(data = clinical_df_full, x = "visit", y = "biomarker",
          group = "treatment", stat = "mean", error = "se")


# Faceted line plots with median and no error bars
plot_line(data = clinical_df_full, x = "visit", y = "biomarker", group = "treatment", 
          facet = "sex", stat = "median", error = "none", points = FALSE)  

plot_hist()

Description

Generates publication-ready histogram plots with minimal code using ggplot2.

Parameters
Parameter Description Default
data A dataframe containing the variables to plot. Required
x Character string specifying the variable for the histogram. Required
group Character string specifying the grouping variable for multiple histograms. NULL
facet Character string specifying the faceting variable. NULL
bins Numeric value indicating the number of bins for the histogram. 30
binwidth Numeric value indicating the width of the bins (overrides bins if specified). NULL
alpha Numeric value indicating the transparency level for the bars. 0.7
colors Character vector of colors. If NULL, uses TealGrn palette. NULL
title Character string for plot title. NULL
xlab Character string for x-axis label. NULL
ylab Character string for y-axis label. NULL
legend_title Character string for legend title. NULL
y_limits Numeric vector of length 2 for y-axis limits. NULL
x_limits Numeric vector of length 2 for x-axis limits. NULL
stat Character string that adds line for "mean" or "median". NULL
Examples
# Mirror histogram for 2 groups with mean lines
plot_hist(clinical_df, x = "biomarker", group = "treatment", stat = "mean")


# Faceted histogram
plot_hist(clinical_df, x = "biomarker", facet = "treatment")

plot_box()

Description

Generates publication-ready boxplots with minimal code using ggplot2.

Parameters
Parameter Description Default
data A dataframe containing the variables to plot. Required
x Character string specifying the x-axis variable. Required
y Character string specifying the y-axis variable. Required
group Character string specifying grouping variable for fill/color. NULL
facet Character string specifying faceting variable. NULL
colors Character vector of colors. If NULL, uses TealGrn palette. NULL
title Character string for plot title. NULL
xlab Character string for x-axis label. NULL
ylab Character string for y-axis label. NULL
legend_title Character string for legend title. NULL
points Logical parameter indicating if jittered points should be shown. FALSE
point_size Numeric value indicating the size of points. 2
y_limits Numeric vector of length 2 for y-axis limits. NULL
show_mean Logical parameter indicating if mean should be shown. TRUE
Examples
# Boxplot of biomarker by treatment
plot_box(clinical_df, x = "treatment", y = "biomarker", group = "treatment")
#> Ignoring unknown labels:
#> • colour : "treatment"


# Boxplot of biomarker by study visit and treatment
plot_box(clinical_df, x = "visit", y = "biomarker", group = "treatment")
#> Ignoring unknown labels:
#> • colour : "treatment"

plot_corr()

Description

Generates publication-ready correlation matrix heatmaps with minimal code using ggplot2.

Parameters
Parameter Description Default
data A dataframe containing the variables to analyze. Required
vars Character vector specifying which variables to include. NULL
method Character string specifying correlation method: "pearson" or "spearman". "pearson"
type Character string specifying matrix type: "full", "upper", or "lower". "full"
colors Character vector of 3 colors for negative, neutral, and positive correlations. NULL
title Character string for plot title. NULL
show_values Logical parameter indicating whether to display correlation values in cells. TRUE
value_size Numeric value indicating size of correlation value text. 3
show_sig Logical parameter indicating whether to mark significant correlations. FALSE
sig_level Numeric value indicating significance level for marking. 0.05
sig_only Logical parameter indicating whether to show only statistically significant values. FALSE
show_legend Logical parameter indicating whether to show legend. TRUE
p_method Character string indicating the method for p-value adjustment in post-hoc multiple comparisons to control for Type I error inflation. Options: "holm", "hochberg", "hommel", "bonferroni", "BH", "BY", or "none". "holm"
Examples
# Correlation matrix for base R dataset 'swiss'
plot_corr(data = swiss)


# Lower triangle with significance indicators and filtering
plot_corr(data = swiss, type = "lower", show_sig = TRUE, sig_only = TRUE)

Contributions & Feedback

This package is still under active development. Features, functions, and examples may change as improvements are made.

We welcome feedback, suggestions, and bug reports. You can share your thoughts via email (sebastian.quirarte@sophia.com.mx) or GitHub issues.

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.