The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.
The SymLink Tool is a way for researchers to manage multiple data pipeline output runs.
It’s designed with user flexibility and project officers in mind, and doesn’t require using anything like a database.
This tool assumes you have a large set of output folders for runs of your pipeline, and you store them on the file system.
The Symlink Tool will:
H:
or
J:
drive i.e. on the File System.OK, that’s nice, but symlinks are pretty easy to make. What else do I (the researcher) get for free if I use this tool?
The most important thing you get are some simple logs that automatically keep track of which folders have been ‘best.’
Hmm, is that all? I feel like I could keep an excel document or HUB page that does the same thing?
That’s very true, but this doesn’t require you to remember, or do any typing yourself.
Also, maybe there are pipeline runs you want to track for different reasons.
And your Project Officer gets things too!
That sounds pretty nice, but didn’t you also say something about deleting folders? Why do I need help doing that?
You get some additional benefits - The SymLink Tool will also:
When you’re ready to delete, you’ll get: 1. Safety - The Symlink Tool will only delete folders that marked to ‘remove’. 1. Provenance - You’ll get a record in the central log telling you which pipeline runs were deleted, when, why they were deleted (user gets to add a comment).
I’m still reading, and curious to see how this works.
What this demonstration is.
We’ll showcase the life-cycle of a typical pipeline output folder.
best_
keep_
remove_
What this demonstration is not.
This won’t be an exhaustive demonstration of all the available options, this is a vignette of an average use-case.
symlink_tool_vignette_technical.Rmd
file for
more detailed technical explanations.My team uses a output_root
folder for all inputs we
submit to ST-GPR.
This way we can prepare the data, then submit various ST-GPR models with
different parameters without needing to re-prep the inputs. The results
of the ST-GPR models go into an output folder, which we’ll ignore for
simplicity.
library(vmTools)
library(data.table)
# Make the root folder
output_root <- file.path(tempdir(), "slt", "output_root")
dir.create(output_root,
recursive = TRUE,
showWarnings = FALSE)
Call on SLT bare to print class information and methods (functions linked with the tool).
# For this Intro Vignette, we're only showing public methods
# - for all methods, see the Technical Vignette
SLT
#> <Symlink_Tool> object generator
#> Public:
#> new: function (user_root_list = NULL, user_central_log_root = NULL,
#> return_dictionaries: function (item_names = NULL)
#> return_dynamic_fields: function (item_names = NULL)
#> mark_best: function (version_name, user_entry)
#> mark_keep: function (version_name, user_entry)
#> mark_remove: function (version_name, user_entry)
#> unmark: function (version_name, user_entry)
#> roundup_best: function ()
#> roundup_keep: function ()
#> roundup_remove: function ()
#> roundup_unmarked: function ()
#> roundup_by_date: function (user_date, date_selector)
#> get_common_new_version_name: function (date = "today", root_list = private$DICT$ROOTS)
#> make_new_version_folder: function (version_name = self$get_common_new_version_name())
#> make_new_log: function (version_name)
#> delete_version_folders: function (version_name, user_entry, require_user_input = TRUE)
#> make_reports: function ()
#> Call SLT$new() to make a Symlink Tool, with startup guidance messages!
When you make a new tool, this tool is tied to a specific output folder. You can’t change the output folder once you’ve made the tool.
symlink_tool_vignette_technical.Rmd
file.Note: You can define the root for results outputs and logs separately, but we’re using the same root for simplicity.
# Instantiate (create) a new Symlink Tool object
slt_prep <- SLT$new(
user_root_list = list("output_root" = output_root),
user_central_log_root = output_root
)
Note: SLT
is an R6 class included with
the vmTools package that manages the symlink tool.
Use the Symlink Tool to create a new folder in your output root.
YYYY_MM_DD.VV
naming schemedate_vers1 <- get_output_dir(output_root, "2024_02_01")
slt_prep$make_new_version_folder(version_name = date_vers1)
Capture some paths, using the Symlink Tool to help. We’ll use these in a minute.
path_log_central <- slt_prep$return_dictionaries()[["LOG_CENTRAL"]][["path"]]
fname_dv_log <- slt_prep$return_dictionaries()[["log_path"]]
root_dv1 <- slt_prep$return_dynamic_fields()[["VERS_PATHS"]][["output_root"]]
path_log_dv1 <- file.path(root_dv1, fname_dv_log)
Show the file tree.
#> |-- 2024_02_01.01
#> | `-- logs
#> | `-- log_version_history.csv
#> `-- log_symlinks_central.csv
Show central log.
#> log_id timestamp user version_name action comment
#> <int> <char> <char> <char> <char> <char>
#> 1: 0 2025_07_24_111336 ssbyrne CENTRAL_LOG create log created
Show new run version folder log.
#> log_id timestamp user version_name action comment
#> <int> <char> <char> <char> <char> <char>
#> 1: 0 2025_07_24_111336 ssbyrne 2024_02_01.01 create log created
Now let’s make some files representing models in this folder.
# Make some dummy files
fnames_my_models <- paste0("my_model_", 1:5, ".csv")
invisible(file.create(file.path(root_dv1, fnames_my_models)))
print_tree(output_root)
#> |-- 2024_02_01.01
#> | |-- logs
#> | | `-- log_version_history.csv
#> | |-- my_model_1.csv
#> | |-- my_model_2.csv
#> | |-- my_model_3.csv
#> | |-- my_model_4.csv
#> | `-- my_model_5.csv
#> `-- log_symlinks_central.csv
We like the models! We want to elevate this run version folder to
best_
status.
Note: All mark_xxxx
operations require
a user entry as a named list.
comment
field is currently supported (future
version will expand).# Mark best, and take note of messaging
slt_prep$mark_best(version_name = date_vers1,
user_entry = list(comment = "Best model GBD2023"))
#> Marking best: 2024_02_01.01
#> No existing symlinks found - moving on
#> No 'best' symlink found - moving on: /tmp/Rtmp3tDGBK/slt/output_root/best
#> Promoting to 'best': /tmp/Rtmp3tDGBK/slt/output_root/2024_02_01.01
#> Writing log to /tmp/Rtmp3tDGBK/slt/output_root/2024_02_01.01/logs/log_version_history.csv
#> Writing central log to /tmp/Rtmp3tDGBK/slt/output_root/log_symlinks_central.csv
Inspect both the central log and …
#> log_id timestamp user version_name action comment
#> <int> <char> <char> <char> <char> <char>
#> 1: 0 2025_07_24_111336 ssbyrne CENTRAL_LOG create log created
#> 2: 1 2025_07_24_111336 ssbyrne 2024_02_01.01 promote_best Best model GBD2023
…the run version folder log.
#> log_id timestamp user version_name action comment
#> <int> <char> <char> <char> <char> <char>
#> 1: 0 2025_07_24_111336 ssbyrne 2024_02_01.01 create log created
#> 2: 1 2025_07_24_111336 ssbyrne 2024_02_01.01 promote_best Best model GBD2023
We now have a ‘best’ symlink that points to our ‘best’ run version,
2024_02_01.01
print_tree(output_root)
#> |-- 2024_02_01.01
#> | |-- logs
#> | | `-- log_version_history.csv
#> | |-- my_model_1.csv
#> | |-- my_model_2.csv
#> | |-- my_model_3.csv
#> | |-- my_model_4.csv
#> | `-- my_model_5.csv
#> |-- best
#> | |-- logs
#> | | `-- log_version_history.csv
#> | |-- my_model_1.csv
#> | |-- my_model_2.csv
#> | |-- my_model_3.csv
#> | |-- my_model_4.csv
#> | `-- my_model_5.csv
#> |-- log_symlinks_central.csv
#> `-- report_key_versions.csv
resolve_symlink(file.path(output_root, "best"))
#> [1] "/tmp/Rtmp3tDGBK/slt/output_root/2024_02_01.01"
Since we run our pipelines many times, we want to track those runs.
Run the pipeline two more times on the same day, inspect the models, and make a human decision about the result quality.
# Second run
date_vers2 <- get_output_dir(output_root, "2024_02_01")
slt_prep$make_new_version_folder(version_name = date_vers2)
# note - the dynamic fields update when you make new folders, so we won't see the dv1 path anymore
root_dv2 <- slt_prep$return_dynamic_fields()$VERS_PATHS
invisible(file.create(file.path(root_dv2, fnames_my_models)))
# Third run
date_vers3 <- get_output_dir(output_root, "2024_02_01")
slt_prep$make_new_version_folder(version_name = date_vers3)
root_dv3 <- slt_prep$return_dynamic_fields()$VERS_PATHS
invisible(file.create(file.path(root_dv3, fnames_my_models)))
Now let’s look at our file output structure, and central log.
print_tree(output_root)
#> |-- 2024_02_01.01
#> | |-- logs
#> | | `-- log_version_history.csv
#> | |-- my_model_1.csv
#> | |-- my_model_2.csv
#> | |-- my_model_3.csv
#> | |-- my_model_4.csv
#> | `-- my_model_5.csv
#> |-- 2024_02_01.02
#> | |-- logs
#> | | `-- log_version_history.csv
#> | |-- my_model_1.csv
#> | |-- my_model_2.csv
#> | |-- my_model_3.csv
#> | |-- my_model_4.csv
#> | `-- my_model_5.csv
#> |-- 2024_02_01.03
#> | |-- logs
#> | | `-- log_version_history.csv
#> | |-- my_model_1.csv
#> | |-- my_model_2.csv
#> | |-- my_model_3.csv
#> | |-- my_model_4.csv
#> | `-- my_model_5.csv
#> |-- best
#> | |-- logs
#> | | `-- log_version_history.csv
#> | |-- my_model_1.csv
#> | |-- my_model_2.csv
#> | |-- my_model_3.csv
#> | |-- my_model_4.csv
#> | `-- my_model_5.csv
#> |-- log_symlinks_central.csv
#> `-- report_key_versions.csv
#> log_id timestamp user version_name action comment
#> <int> <char> <char> <char> <char> <char>
#> 1: 0 2025_07_24_111336 ssbyrne CENTRAL_LOG create log created
#> 2: 1 2025_07_24_111336 ssbyrne 2024_02_01.01 promote_best Best model GBD2023
After inspecting our results, we decide the third run is
actually the best_
.
best_
status.# Mark best, and take note of messaging
slt_prep$mark_best(version_name = date_vers3,
user_entry = list(comment = "New best model GBD2023"))
#> Marking best: 2024_02_01.03
#> No existing symlinks found - moving on
#> Demoting from 'best': /tmp/Rtmp3tDGBK/slt/output_root/2024_02_01.01
#> Writing log to /tmp/Rtmp3tDGBK/slt/output_root/2024_02_01.01/logs/log_version_history.csv
#> Writing central log to /tmp/Rtmp3tDGBK/slt/output_root/log_symlinks_central.csv
#> Promoting to 'best': /tmp/Rtmp3tDGBK/slt/output_root/2024_02_01.03
#> Writing log to /tmp/Rtmp3tDGBK/slt/output_root/2024_02_01.03/logs/log_version_history.csv
#> Writing central log to /tmp/Rtmp3tDGBK/slt/output_root/log_symlinks_central.csv
Inspect the central log - The third version is now bested.
#> log_id timestamp user version_name action comment
#> <int> <char> <char> <char> <char> <char>
#> 1: 0 2025_07_24_111336 ssbyrne CENTRAL_LOG create log created
#> 2: 1 2025_07_24_111336 ssbyrne 2024_02_01.01 promote_best Best model GBD2023
#> 3: 2 2025_07_24_111337 ssbyrne 2024_02_01.01 demote_best New best model GBD2023
#> 4: 3 2025_07_24_111337 ssbyrne 2024_02_01.03 promote_best New best model GBD2023
Note: Multiple “marks” on the same folder will produce no results (but reports will still run)
slt_prep$mark_best(version_name = date_vers3,
user_entry = list(comment = "New best model GBD2023"))
#> Marking best: 2024_02_01.03
#> /tmp/Rtmp3tDGBK/slt/output_root/2024_02_01.03 - already marked best - moving on.
Let’s also take a look inside each of the run version folder logs.
best
automatically, and the third version was marked as best
.
best
symlink points to the third pipeline run.Looking at all three run-version logs we see:
#> log_id timestamp user version_name action comment
#> <int> <char> <char> <char> <char> <char>
#> 1: 0 2025_07_24_111336 ssbyrne 2024_02_01.01 create log created
#> 2: 1 2025_07_24_111336 ssbyrne 2024_02_01.01 promote_best Best model GBD2023
#> 3: 2 2025_07_24_111337 ssbyrne 2024_02_01.01 demote_best New best model GBD2023
#> log_id timestamp user version_name action comment
#> <int> <char> <char> <char> <char> <char>
#> 1: 0 2025_07_24_111336 ssbyrne 2024_02_01.02 create log created
#> log_id timestamp user version_name action comment
#> <int> <char> <char> <char> <char> <char>
#> 1: 0 2025_07_24_111336 ssbyrne 2024_02_01.03 create log created
#> 2: 1 2025_07_24_111337 ssbyrne 2024_02_01.03 promote_best New best model GBD2023
print_tree(output_root)
#> |-- 2024_02_01.01
#> | |-- logs
#> | | `-- log_version_history.csv
#> | |-- my_model_1.csv
#> | |-- my_model_2.csv
#> | |-- my_model_3.csv
#> | |-- my_model_4.csv
#> | `-- my_model_5.csv
#> |-- 2024_02_01.02
#> | |-- logs
#> | | `-- log_version_history.csv
#> | |-- my_model_1.csv
#> | |-- my_model_2.csv
#> | |-- my_model_3.csv
#> | |-- my_model_4.csv
#> | `-- my_model_5.csv
#> |-- 2024_02_01.03
#> | |-- logs
#> | | `-- log_version_history.csv
#> | |-- my_model_1.csv
#> | |-- my_model_2.csv
#> | |-- my_model_3.csv
#> | |-- my_model_4.csv
#> | `-- my_model_5.csv
#> |-- best
#> | |-- logs
#> | | `-- log_version_history.csv
#> | |-- my_model_1.csv
#> | |-- my_model_2.csv
#> | |-- my_model_3.csv
#> | |-- my_model_4.csv
#> | `-- my_model_5.csv
#> |-- log_symlinks_central.csv
#> `-- report_key_versions.csv
resolve_symlink(file.path(output_root, "best"))
#> [1] "/tmp/Rtmp3tDGBK/slt/output_root/2024_02_01.03"
We want to keep the first run, even though it’s not the best anymore.
best
runs. We can mark
this version with a keep_
symlink.# Mark keep, and take note of messaging
slt_prep$mark_keep(
version_name = date_vers1,
user_entry = list(comment = "Previous best")
)
#> No existing symlinks found - moving on
#> Promoting to 'keep': /tmp/Rtmp3tDGBK/slt/output_root/keep_2024_02_01.01
#> Writing log to /tmp/Rtmp3tDGBK/slt/output_root/2024_02_01.01/logs/log_version_history.csv
#> Writing central log to /tmp/Rtmp3tDGBK/slt/output_root/log_symlinks_central.csv
The first version is now marked as keep
.
#> log_id timestamp user version_name action comment
#> <int> <char> <char> <char> <char> <char>
#> 1: 0 2025_07_24_111336 ssbyrne CENTRAL_LOG create log created
#> 2: 1 2025_07_24_111336 ssbyrne 2024_02_01.01 promote_best Best model GBD2023
#> 3: 2 2025_07_24_111337 ssbyrne 2024_02_01.01 demote_best New best model GBD2023
#> 4: 3 2025_07_24_111337 ssbyrne 2024_02_01.03 promote_best New best model GBD2023
#> 5: 4 2025_07_24_111337 ssbyrne 2024_02_01.01 promote_keep Previous best
Note: Marking a folder keep_
does not
make it unique, like best_
. Many folders can be marked
keep_
.
#> log_id timestamp user version_name action comment
#> <int> <char> <char> <char> <char> <char>
#> 1: 0 2025_07_24_111336 ssbyrne 2024_02_01.01 create log created
#> 2: 1 2025_07_24_111336 ssbyrne 2024_02_01.01 promote_best Best model GBD2023
#> 3: 2 2025_07_24_111337 ssbyrne 2024_02_01.01 demote_best New best model GBD2023
#> 4: 3 2025_07_24_111337 ssbyrne 2024_02_01.01 promote_keep Previous best
print_tree(output_root)
#> |-- 2024_02_01.01
#> | |-- logs
#> | | `-- log_version_history.csv
#> | |-- my_model_1.csv
#> | |-- my_model_2.csv
#> | |-- my_model_3.csv
#> | |-- my_model_4.csv
#> | `-- my_model_5.csv
#> |-- 2024_02_01.02
#> | |-- logs
#> | | `-- log_version_history.csv
#> | |-- my_model_1.csv
#> | |-- my_model_2.csv
#> | |-- my_model_3.csv
#> | |-- my_model_4.csv
#> | `-- my_model_5.csv
#> |-- 2024_02_01.03
#> | |-- logs
#> | | `-- log_version_history.csv
#> | |-- my_model_1.csv
#> | |-- my_model_2.csv
#> | |-- my_model_3.csv
#> | |-- my_model_4.csv
#> | `-- my_model_5.csv
#> |-- best
#> | |-- logs
#> | | `-- log_version_history.csv
#> | |-- my_model_1.csv
#> | |-- my_model_2.csv
#> | |-- my_model_3.csv
#> | |-- my_model_4.csv
#> | `-- my_model_5.csv
#> |-- keep_2024_02_01.01
#> | |-- logs
#> | | `-- log_version_history.csv
#> | |-- my_model_1.csv
#> | |-- my_model_2.csv
#> | |-- my_model_3.csv
#> | |-- my_model_4.csv
#> | `-- my_model_5.csv
#> |-- log_symlinks_central.csv
#> `-- report_key_versions.csv
resolve_symlink(file.path(output_root, "keep_2024_02_01.01"))
#> [1] "/tmp/Rtmp3tDGBK/slt/output_root/2024_02_01.01"
We want to remove the second run, because the model was experimental, or performed poorly.
remove_
symlink. From here we
could use the Symlink Tool to delete the folders, or round them up for
ST-GPR model deletion, etc. Either way, we now have a record of which
folders are no longer needed, and why.# Mark remove, and take note of messaging
slt_prep$mark_remove(
version_name = date_vers2,
user_entry = list(comment = "Obsolete dev folder"))
#> No existing symlinks found - moving on
#> Promoting to 'remove': /tmp/Rtmp3tDGBK/slt/output_root/remove_2024_02_01.02
#> Writing log to /tmp/Rtmp3tDGBK/slt/output_root/2024_02_01.02/logs/log_version_history.csv
#> Writing central log to /tmp/Rtmp3tDGBK/slt/output_root/log_symlinks_central.csv
Inspect the central log - The second version is now marked as
remove_
.
remove_
does
not make it unique, like best_
. Many folders can be marked
remove
_`.#> log_id timestamp user version_name action comment
#> <int> <char> <char> <char> <char> <char>
#> 1: 0 2025_07_24_111336 ssbyrne CENTRAL_LOG create log created
#> 2: 1 2025_07_24_111336 ssbyrne 2024_02_01.01 promote_best Best model GBD2023
#> 3: 2 2025_07_24_111337 ssbyrne 2024_02_01.01 demote_best New best model GBD2023
#> 4: 3 2025_07_24_111337 ssbyrne 2024_02_01.03 promote_best New best model GBD2023
#> 5: 4 2025_07_24_111337 ssbyrne 2024_02_01.01 promote_keep Previous best
#> 6: 5 2025_07_24_111337 ssbyrne 2024_02_01.02 promote_remove Obsolete dev folder
print_tree(output_root)
#> |-- 2024_02_01.01
#> | |-- logs
#> | | `-- log_version_history.csv
#> | |-- my_model_1.csv
#> | |-- my_model_2.csv
#> | |-- my_model_3.csv
#> | |-- my_model_4.csv
#> | `-- my_model_5.csv
#> |-- 2024_02_01.02
#> | |-- logs
#> | | `-- log_version_history.csv
#> | |-- my_model_1.csv
#> | |-- my_model_2.csv
#> | |-- my_model_3.csv
#> | |-- my_model_4.csv
#> | `-- my_model_5.csv
#> |-- 2024_02_01.03
#> | |-- logs
#> | | `-- log_version_history.csv
#> | |-- my_model_1.csv
#> | |-- my_model_2.csv
#> | |-- my_model_3.csv
#> | |-- my_model_4.csv
#> | `-- my_model_5.csv
#> |-- best
#> | |-- logs
#> | | `-- log_version_history.csv
#> | |-- my_model_1.csv
#> | |-- my_model_2.csv
#> | |-- my_model_3.csv
#> | |-- my_model_4.csv
#> | `-- my_model_5.csv
#> |-- keep_2024_02_01.01
#> | |-- logs
#> | | `-- log_version_history.csv
#> | |-- my_model_1.csv
#> | |-- my_model_2.csv
#> | |-- my_model_3.csv
#> | |-- my_model_4.csv
#> | `-- my_model_5.csv
#> |-- log_symlinks_central.csv
#> |-- remove_2024_02_01.02
#> | |-- logs
#> | | `-- log_version_history.csv
#> | |-- my_model_1.csv
#> | |-- my_model_2.csv
#> | |-- my_model_3.csv
#> | |-- my_model_4.csv
#> | `-- my_model_5.csv
#> `-- report_key_versions.csv
resolve_symlink(file.path(output_root, "remove_2024_02_01.02"))
#> [1] "/tmp/Rtmp3tDGBK/slt/output_root/2024_02_01.02"
Now that we have marked the second run as remove_
, we
can use the Symlink Tool to delete the folders.
First, we’ll find (roundup
) all our remove_
folders.
(dt_to_remove <- slt_prep$roundup_remove())
#> $output_root
#> version_name dir_name dir_name_resolved
#> <char> <char> <char>
#> 1: 2024_02_01.02 /tmp/Rtmp3tDGBK/slt/output_root/remove_2024_02_01.02 /tmp/Rtmp3tDGBK/slt/output_root/2024_02_01.02
Next, we can handle them any way we choose. For this demonstration, we’ll delete them.
_remove
-marked runs to free quota space,
for example.for(dir_dv_remove in dt_to_remove$output_root$version_name){
slt_prep$delete_version_folders(
version_name = dir_dv_remove,
user_entry = list(comment = "Deleting dev folder"),
require_user_input = FALSE
)
}
#>
#> Writing central log to /tmp/Rtmp3tDGBK/slt/output_root/log_symlinks_central.csv
#> Deleting /tmp/Rtmp3tDGBK/slt/output_root/2024_02_01.02
#> Deleting /tmp/Rtmp3tDGBK/slt/output_root/remove_2024_02_01.02
# The default setting prompts user input, but the process can be automated, as for this vignette.
#
# Do you want to delete the following folders?
# /tmp/RtmpRmKCTu/slt/output_root/2024_02_01.02
# /tmp/RtmpRmKCTu/slt/output_root/remove_2024_02_01.02
#
# 1: No
# 2: Yes
Check the central log - since the folder is gone, this will maintain a record of when this folder was deleted.
#> log_id timestamp user version_name action comment
#> <int> <char> <char> <char> <char> <char>
#> 1: 0 2025_07_24_111336 ssbyrne CENTRAL_LOG create log created
#> 2: 1 2025_07_24_111336 ssbyrne 2024_02_01.01 promote_best Best model GBD2023
#> 3: 2 2025_07_24_111337 ssbyrne 2024_02_01.01 demote_best New best model GBD2023
#> 4: 3 2025_07_24_111337 ssbyrne 2024_02_01.03 promote_best New best model GBD2023
#> 5: 4 2025_07_24_111337 ssbyrne 2024_02_01.01 promote_keep Previous best
#> 6: 5 2025_07_24_111337 ssbyrne 2024_02_01.02 promote_remove Obsolete dev folder
#> 7: 6 2025_07_24_111337 ssbyrne 2024_02_01.02 delete_remove_folder Deleting dev folder
Note: As soon as we marked a folder, there was a
report ready in our folder. The report_key_versions.csv
file will scan every run-version with a Tool-created Symlink for a log,
and show its last row (current status).
(data.table::fread(file.path(output_root, "report_key_versions.csv")))
#> log_id timestamp user version_name version_path action comment
#> <int> <char> <char> <char> <char> <char> <char>
#> 1: 1 2025_07_24_111337 ssbyrne 2024_02_01.03 /tmp/Rtmp3tDGBK/slt/output_root/2024_02_01.03 promote_best New best model GBD2023
#> 2: 3 2025_07_24_111337 ssbyrne 2024_02_01.01 /tmp/Rtmp3tDGBK/slt/output_root/2024_02_01.01 promote_keep Previous best
We can generate more reports of the pipeline runs, and the status of the folders based on different needs. These reports are useful for tracking the status of the pipeline runs, and for making decisions about which folders to keep, delete, or promote.
REPORT_DISCREPANCIES.csv
that will show
issues with the run-version logs, in case some were edited by hand in
ways that could cause problems.# Generate reports
slt_prep$make_reports()
#> Writing last-row log reports for:
#> /tmp/Rtmp3tDGBK/slt/output_root
#> /tmp/Rtmp3tDGBK/slt/output_root
print_tree(output_root)
#> |-- 2024_02_01.01
#> | |-- logs
#> | | `-- log_version_history.csv
#> | |-- my_model_1.csv
#> | |-- my_model_2.csv
#> | |-- my_model_3.csv
#> | |-- my_model_4.csv
#> | `-- my_model_5.csv
#> |-- 2024_02_01.03
#> | |-- logs
#> | | `-- log_version_history.csv
#> | |-- my_model_1.csv
#> | |-- my_model_2.csv
#> | |-- my_model_3.csv
#> | |-- my_model_4.csv
#> | `-- my_model_5.csv
#> |-- best
#> | |-- logs
#> | | `-- log_version_history.csv
#> | |-- my_model_1.csv
#> | |-- my_model_2.csv
#> | |-- my_model_3.csv
#> | |-- my_model_4.csv
#> | `-- my_model_5.csv
#> |-- keep_2024_02_01.01
#> | |-- logs
#> | | `-- log_version_history.csv
#> | |-- my_model_1.csv
#> | |-- my_model_2.csv
#> | |-- my_model_3.csv
#> | |-- my_model_4.csv
#> | `-- my_model_5.csv
#> |-- log_symlinks_central.csv
#> |-- report_all_logs.csv
#> |-- report_all_logs_non_symlink.csv
#> |-- report_all_logs_symlink.csv
#> `-- report_key_versions.csv
The report_all_logs.csv
file will scan every
run-version for a log, and show its last row (current status).
(data.table::fread(file.path(output_root, "report_all_logs.csv")))
#> log_id timestamp user version_name version_path action comment
#> <int> <char> <char> <char> <char> <char> <char>
#> 1: 3 2025_07_24_111337 ssbyrne 2024_02_01.01 /tmp/Rtmp3tDGBK/slt/output_root/2024_02_01.01 promote_keep Previous best
#> 2: 1 2025_07_24_111337 ssbyrne 2024_02_01.03 /tmp/Rtmp3tDGBK/slt/output_root/2024_02_01.03 promote_best New best model GBD2023
Two other reports sometimes diagnostically helpful are:
report_all_logs_symlink.csv
file will scan
run-version folders for log of any other symlink type (in case the user
hand-creates symlinks).report_all_logs_non_symlink.csv
file will scan
run-version folders that are not currently marked, and show their
current status.These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.