The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.

Getting Started with BioMoR

BioMoR: Bioinformatics Modeling with Recursion and Autoencoder-Based Ensembles

BioMoR is an R package for bioinformatics modeling that integrates: • Recursive Transformer architectures via Mixture-of-Recursions (MoR) (Bae et al. 2025 doi:10.48550/arXiv.2507.10524) • Autoencoder-based representation learning (Hinton & Salakhutdinov 2006 doi:10.1126/science.1127647) • Random Forests for robust tree-based modeling (Breiman 2001 doi:10.1023/A:1010933404324) • XGBoost for efficient gradient boosting (Chen & Guestrin 2016 doi:10.1145/2939672.2939785) • Stacked ensembles to combine diverse models for stronger predictive power.

It is designed as a benchmarking framework for predictive workflows in bioinformatics, enabling consistent cross-validation, calibration, and threshold optimization.

Motivation

Modern bioinformatics involves high-dimensional and noisy data such as genomics, transcriptomics, and proteomics. BioMoR addresses these challenges by: • Using Mixture-of-Recursions (MoR) for adaptive recursive depth and computational efficiency. • Learning latent embeddings through autoencoders to improve classifier generalization. • Leveraging ensemble methods (RF, XGB) for robustness. • Providing a standardized benchmarking interface to evaluate models on ROC-AUC, PR-AUC, F1, Balanced Accuracy, Brier score, calibration, and threshold optimization.

Example Workflow

We illustrate with the classic iris dataset (binary recoding for simplicity):

library(BioMoR)

# Prepare dataset: recode labels to binary
data(iris)
iris$Label <- ifelse(iris$Species == "setosa", "Active", "Inactive")

# Cross-validation control
ctrl <- get_cv_control(cv = 3)

# Train a Random Forest
fit <- train_rf(iris, outcome_col = "Label", ctrl = ctrl)

# Benchmark the model
results <- biomor_benchmark(fit, iris, outcome_col = "Label")
#> Warning in bake(object$recipe, new_data = newdata, all_predictors()): ! There was 1 column that was a factor when the recipe was prepped:
#> • `Label`
#> ℹ This may cause errors when processing new data.
#> ! There was 1 column that was a factor when the recipe was prepped:
#> • `Label`
#> ℹ This may cause errors when processing new data.
#> Warning in confusionMatrix.default(y_pred, y_true): Levels are not in the same
#> order for reference and data. Refactoring data to match.

# Print metrics
results$metrics
#> NULL

Visualization

# ROC Curve
results$plots$ROC
#> NULL
# Precision-Recall Curve
results$plots$PR
#> NULL
# Threshold Optimization
results$plots$Thresholds
#> NULL
# Calibration Curve
results$plots$Calibration
#> NULL

Extending BioMoR • Replace train_rf() with train_xgb_caret() for XGBoost. • Incorporate autoencoder features via train_autoencoder() and get_embeddings(). • Use train_biomor() to stack multiple models. • Benchmark across models to compare pipelines in one consistent framework.

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.