The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.
featdelta is an R package for incremental feature
engineering with database persistence. It is designed for workflows
where raw observations live in a database, feature logic is easier to
write and test in R, and the computed features should be stored back in
a database table for modelling, reporting, monitoring, or downstream
reuse.
Instead of rebuilding the same feature table from scratch whenever
new rows arrive, featdelta lets you define feature
expressions in R, fetch only rows that have not yet been processed,
compute the features locally, and upsert the results into a persistent
feature table.
install.packages("featdelta")You can install the development version from GitHub with:
# install.packages("remotes")
remotes::install_github("LordRudolf/featdelta")The standard pipeline is:
fd_define();The main orchestration function is fd_run(), which
combines the fetch, compute, and upsert steps.
This example uses an in-memory SQLite database and
mtcars, but the same pattern applies to database tables
selected with ordinary SQL.
library(DBI)
library(RSQLite)
library(featdelta)
cars <- mtcars
cars$id <- seq_len(nrow(cars))
con <- dbConnect(SQLite(), ":memory:")
dbWriteTable(con, "raw_cars", cars[1:20, ], overwrite = TRUE)
defs <- fd_define(
transmission = ifelse(am == 1, "automatic", "manual"),
hp_per_cyl = hp / cyl,
wt_per_hp = wt / hp
)
run_day_one <- fd_run(
con = con,
sql = "SELECT * FROM raw_cars ORDER BY id",
defs = defs,
key = "id",
feat_table_name = "car_features"
)
dbGetQuery(con, "SELECT * FROM car_features ORDER BY id")When more raw rows arrive, call the same pipeline again. By default,
fd_run(fetch_mode = "new_only") processes only keys that
are not already in the feature table.
dbAppendTable(con, "raw_cars", cars[21:30, ])
run_day_two <- fd_run(
con = con,
sql = "SELECT * FROM raw_cars ORDER BY id",
defs = defs,
key = "id",
feat_table_name = "car_features"
)
dbGetQuery(con, "SELECT * FROM car_features ORDER BY id")If feature definitions change and existing rows should be recomputed,
use fetch_mode = "all" to refresh the rows returned by the
SQL query.
defs_v2 <- fd_define(
transmission = ifelse(am == 1, "automatic", "manual"),
hp_per_cyl = hp / cyl,
wt_per_hp = wt / hp,
mpg_per_cyl = mpg / cyl
)
fd_run(
con = con,
sql = "SELECT * FROM raw_cars ORDER BY id",
defs = defs_v2,
key = "id",
feat_table_name = "car_features",
fetch_mode = "all"
)For feature logic that naturally produces several columns at once,
use fd_block().
defs <- fd_define(
engine_ratios = fd_block({
data.frame(
hp_per_cyl = hp / cyl,
disp_per_cyl = disp / cyl,
wt_per_hp = wt / hp
)
})
)The package includes vignettes covering the getting-started workflow, feature definition patterns, database pipeline details, production patterns, and scheduled runs.
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.