Introduction to FracFixR

The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.

Alice Cleynen, Agin Ravindran, Nikolay Shirokikh

2025-10-16

Introduction

FracFixR is a compositional statistical framework for analyzing RNA-seq data from fractionation experiments. It addresses the fundamental challenge where library preparation and sequencing depth obscure the original proportions of RNA fractions.

This vignette demonstrates:

Basic usage with polysome profiling data
Differential proportion testing
Visualization of results
Advanced applications

Installation

# From CRAN
install.packages("FracFixR")

# From GitHub (development version)
devtools::install_github("Arnaroo/FracFixR")

Quick Start Example

Let’s simulate a simple polysome profiling experiment:

library(FracFixR)

# Set seed for reproducibility
set.seed(123)

Creating Example Data

# Simulate count data for 500 genes across 12 samples
n_genes <- 100
n_samples <- 12

# Generate count matrix with varying expression levels
total_counts <- matrix(
  rnbinom(n_genes * 4, mu = 100, size = 10),
  nrow = n_genes,
  dimnames = list(
    paste0("Gene", 1:n_genes),
    paste0("Sample", 1:4)
  )
)
prob=c(1/4,1/4,1/2)

# distribute counts evenly accross different fractions in Control
multinom_samples <- apply(total_counts, c(1, 2), function(x) rmultinom(1, size = x, prob = prob))
reshaped <- aperm(multinom_samples, c(3, 2, 1))  # now (n, 4, 3)
reshaped_matrix1 <- matrix(reshaped, nrow = n_genes, ncol = 12)
colnames(reshaped_matrix1) <- paste0("V", rep(1:4, each = 3), "_p", 1:3)

counts<-cbind(total_counts[,1:2],reshaped_matrix1[,c(2:3,5:6)],total_counts[,3:4],reshaped_matrix1[,c(8:9,11:12)])

# Create annotation data frame
annotation <- data.frame(
  Sample = colnames(counts),
  Condition = rep(c("Control", "Treatment"), each = 6),
  Type = rep(c("Total", "Total", "Light_Polysome", "Heavy_Polysome","Light_Polysome", "Heavy_Polysome"), 2),
  Replicate = c("Rep1","Rep2", rep("Rep1", 2), rep("Rep2", 2), "Rep1","Rep2", rep("Rep1", 2), rep("Rep2", 2))
)

print(head(annotation))
#>    Sample Condition           Type Replicate
#> 1 Sample1   Control          Total      Rep1
#> 2 Sample2   Control          Total      Rep2
#> 3   V1_p2   Control Light_Polysome      Rep1
#> 4   V1_p3   Control Heavy_Polysome      Rep1
#> 5   V2_p2   Control Light_Polysome      Rep2
#> 6   V2_p3   Control Heavy_Polysome      Rep2

Running FracFixR

# Run the main analysis
results <- FracFixR(MatrixCounts = counts, Annotation = annotation)
#> Setting up parallel processing...
#> Filtering transcripts based on Total samples...
#> Retained 100 transcripts present in Total samples
#> 
#> Processing condition 1/2: Control
#>   Selecting transcripts with 70-96% quantiles (range: 214.3 - 296.1)
#>   Selected 26 transcripts for regression
#>   Processing replicate 1/2: Rep1
#>   Processing replicate 2/2: Rep2
#> 
#> Processing condition 2/2: Treatment
#>   Selecting transcripts with 70-96% quantiles (range: 217.3 - 280.0)
#>   Selected 26 transcripts for regression
#>   Processing replicate 1/2: Rep1
#>   Processing replicate 2/2: Rep2
#> 
#> Finalizing results...
#> FracFixR analysis complete!

# View the structure of results
names(results)
#> [1] "OriginalData"  "Annotation"    "Propestimates" "Coefficients" 
#> [5] "Fractions"     "plots"

Understanding the Output

The FracFixR output contains several components:

# 1. Fraction proportions for each replicate
print(results$Fractions)
#>   Light_Polysome Heavy_Polysome      Lost      Replicate
#> 1     0.08560270     0.00123615 0.9131612   Control_Rep1
#> 2     0.04790483     0.06823600 0.8838592   Control_Rep2
#> 3     0.03838147     0.05640819 0.9052103 Treatment_Rep1
#> 4     0.08793944     0.10989656 0.8021640 Treatment_Rep2

# 2. Regression coefficients
print(results$Coefficients)
#>       Lost Light_Polysome Heavy_Polysome      Replicate
#> 1 226.1183      0.6616193    0.009933183   Control_Rep1
#> 2 219.9686      0.3981329    0.547867870   Control_Rep2
#> 3 223.3712      0.3011167    0.234843211 Treatment_Rep1
#> 4 201.5765      0.3582424    0.424722648 Treatment_Rep2

# 3. Estimated Proportions (first 5 genes, first 6 samples)
print(results$Propestimates[1:5, 1:6])
#>       Sample1 Sample2      V1_p2        V1_p3      V2_p2      V2_p3
#> Gene1     253     250 0.17673392 0.0021771360 0.05453876 0.15010079
#> Gene2     237     236 0.05319552 0.0006988169 0.04201403 0.03579036
#> Gene3     249     238 0.07548677 0.0007666551 0.02538431 0.03309269
#> Gene4     243     248 0.10875934 0.0031976685 0.05999263 0.12758567
#> Gene5     241     237 0.06175113 0.0011478345 0.02123376 0.05113433

Visualizing Fraction Proportions

# Create fraction plot
frac_plot <- PlotFractions(results)
print(frac_plot)

Fraction proportions across replicates

The plot shows: - The proportion of RNA in each fraction - The “Lost” fraction (grey) representing unrecoverable material - Consistency across replicates

Differential Proportion Testing

Now let’s identify genes with differential polysome association between conditions:

# Test for differential proportion in heavy polysomes
diff_heavy <- DiffPropTest(
  NormObject = results,
  Conditions = c("Control", "Treatment"),
  Types = "Heavy_Polysome",
  Test = "Logit"
)
#> Performing Logit test comparing Conditions Control and Treatment in the Heavy_Polysome fraction
#> Applying FDR correction...
#> Analysis complete. Found 14 transcripts with padj < 0.01

# View top differentially associated genes
top_genes <- diff_heavy[order(diff_heavy$padj), ]
print(head(top_genes, 10))
#>    transcript mean_success_cond1 mean_success_cond2  mean_diff   log2FC
#> 72     Gene72         0.02617524          0.1390912 0.11291599 2.828417
#> 82     Gene82         0.04533465          0.1535843 0.10824963 2.352886
#> 50     Gene50         0.06573491          0.1267624 0.06102752 2.633829
#> 6       Gene6         0.02014019          0.1492147 0.12907447 2.603087
#> 11     Gene11         0.01350989          0.1079133 0.09440345 3.669058
#> 29     Gene29         0.02035831          0.1017006 0.08134232 2.841302
#> 33     Gene33         0.01628909          0.1063028 0.09001371 3.070389
#> 35     Gene35         0.02986100          0.1431196 0.11325857 2.242306
#> 40     Gene40         0.03166690          0.1083089 0.07664199 2.710986
#> 46     Gene46         0.02526712          0.1347122 0.10944504 2.256062
#>            pval        padj
#> 72 3.099748e-05 0.001549874
#> 82 1.675630e-05 0.001549874
#> 50 5.916326e-05 0.001972109
#> 6  8.567553e-05 0.002141888
#> 11 7.595966e-04 0.006124418
#> 29 6.455671e-04 0.006124418
#> 33 7.840268e-04 0.006124418
#> 35 7.961743e-04 0.006124418
#> 40 6.586108e-04 0.006124418
#> 46 7.268264e-04 0.006124418

Interpreting Results

mean_diff: Difference in proportion between conditions
log2FC: Log2 fold change of proportions
padj: FDR-adjusted p-value

Positive mean_diff indicates higher proportion in Treatment.

Creating a Volcano Plot

volcano <- PlotComparison(
  diff_heavy,
  Conditions = c("Control", "Treatment"),
  Types = "Heavy_Polysome",
  cutoff = 20
)
print(volcano)

Volcano plot of differential polysome association

Data Requirements

For real experiments, ensure your data meets these requirements:

Count Matrix: Raw counts (not normalized)
Annotation: Must include:
- Sample names matching count matrix columns
- At least one “Total” sample per condition-replicate
- Consistent fraction names across replicates

Session Info

sessionInfo()
#> R version 4.4.2 (2024-10-31)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.3 LTS
#> 
#> Matrix products: default
#> BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.12.0 
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0
#> 
#> locale:
#>  [1] LC_CTYPE=fr_FR.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=fr_FR.UTF-8        LC_COLLATE=C              
#>  [5] LC_MONETARY=fr_FR.UTF-8    LC_MESSAGES=fr_FR.UTF-8   
#>  [7] LC_PAPER=fr_FR.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=fr_FR.UTF-8 LC_IDENTIFICATION=C       
#> 
#> time zone: Australia/Sydney
#> tzcode source: system (glibc)
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] FracFixR_1.0.0
#> 
#> loaded via a namespace (and not attached):
#>  [1] future.apply_1.20.0 gtable_0.3.6        jsonlite_2.0.0     
#>  [4] dplyr_1.1.4         compiler_4.4.2      tidyselect_1.2.1   
#>  [7] parallel_4.4.2      tidyr_1.3.1         jquerylib_0.1.4    
#> [10] globals_0.18.0      scales_1.4.0        yaml_2.3.10        
#> [13] fastmap_1.2.0       ggplot2_3.5.2       R6_2.6.1           
#> [16] labeling_0.4.3      generics_0.1.4      knitr_1.50         
#> [19] future_1.67.0       tibble_3.3.0        bslib_0.9.0        
#> [22] pillar_1.11.0       RColorBrewer_1.1-3  rlang_1.1.6        
#> [25] cachem_1.1.0        xfun_0.52           nnls_1.6           
#> [28] sass_0.4.10         aod_1.3.3           cli_3.6.5          
#> [31] withr_3.0.2         magrittr_2.0.3      digest_0.6.37      
#> [34] grid_4.4.2          lifecycle_1.0.4     vctrs_0.6.5        
#> [37] evaluate_1.0.4      glue_1.8.0          farver_2.1.2       
#> [40] listenv_0.9.1       codetools_0.2-20    parallelly_1.45.1  
#> [43] purrr_1.1.0         rmarkdown_2.29      matrixStats_1.5.0  
#> [46] tools_4.4.2         pkgconfig_2.0.3     htmltools_0.5.8.1

References

Cleynen A, Ravindran A, Shirokikh N (2025). FracFixR: A compositional statistical framework for absolute proportion estimation between fractions in RNA sequencing data.

For more examples and advanced usage, see the FracFixR GitHub repository.

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.