The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.
veesa
is an R package for implementing the VEESA
pipeline for an explainable approach to training machine learning models
with functional data inputs. See a preprint manuscript describing the
approach on arXiv.
Installing veesa
can be implemented using either of the
commands below.
# CRAN
install.packages("veesa")
# Development version from GitHub
::install_github("sandialabs/veesa") remotes
Keep reading for an example using veesa
to implement the
VEESA pipeline.
# Load R packages
library(cowplot)
library(dplyr)
library(ggplot2)
library(purrr)
library(randomForest)
library(tidyr)
library(veesa)
# Specify a color palette
= wesanderson::wes_palette("Zissou1", 5, type = "continuous")
color_pal
# Specify colors for PC direction plots
= "#784D8C"
col_plus1 = "#A289AE"
col_plus2 = "#EA9B44"
col_minus1 = "#EBBC88"
col_minus2 = c(col_plus1, "black", col_minus1)
col_pcdir_1sd = c(col_plus2, col_plus1, "black", col_minus1, col_minus2) col_pcdir_2sd
Simulate data:
= simulate_functions(M = 100, N = 75, seed = 20211130) sim_data
Separate data into training/testing:
set.seed(20211130)
= unique(sim_data$id)
id = length(id) * 0.25
M_test = sample(x = id, size = M_test, replace = FALSE)
id_test = sim_data %>% mutate(data = ifelse(id %in% id_test, "test", "train")) sim_data
Simulated functions colored by covariates:
Prepare matrices from the data frames:
<- function(df, train_test) {
prep_matrix %>%
df filter(data == train_test) %>%
select(id, t, y) %>%
ungroup() %>%
pivot_wider(id_cols = t,
names_from = id,
values_from = y) %>%
select(-t) %>%
as.matrix()
}= prep_matrix(df = sim_data, train_test = "train")
sim_train_matrix = prep_matrix(df = sim_data, train_test = "test") sim_test_matrix
Create a vector of times:
= sim_data$t %>% unique() times
Prepare train data
<-
train_transformed_jfpca prep_training_data(
f = sim_train_matrix,
time = times,
fpca_method = "jfpca",
optim_method = "DPo"
)
Prepare test data:
<-
test_transformed_jfpca prep_testing_data(
f = sim_test_matrix,
time = times,
train_prep = train_transformed_jfpca,
optim_method = "DPo"
)
Plot several PCs:
Compare jfPCA coefficients from train and test data:
Create response variable:
<-
x1_train %>% filter(data == "train") %>%
sim_data select(id, x1) %>%
distinct() %>%
pull(x1)
Create data frame with PCs and response for random forest:
<-
rf_jfpca_df $fpca_res$coef %>%
train_transformed_jfpcadata.frame() %>%
rename_all(.funs = function(x) stringr::str_replace(x, "X", "pc")) %>%
mutate(x1 = x1_train) %>%
select(x1, everything())
Fit random forest:
set.seed(20211130)
= randomForest(x1 ~ ., data = rf_jfpca_df) rf_jfpca
Compute PFI:
set.seed(20211130)
<- compute_pfi(
pfi_jfpca x = rf_jfpca_df %>% select(-x1),
y = rf_jfpca_df$x1,
f = rf_jfpca,
K = 10,
metric = "nmse"
)
PFI results (mean of reps):
PFI results (variability across reps):
Identify the top PC for each elastic fPCA method:
<-
top_pc_jfpca data.frame(pfi = pfi_jfpca$pfi) %>%
mutate(pc = 1:n()) %>%
arrange(desc(pfi)) %>%
slice(1) %>%
pull(pc)
Principal directions of top PC for each jfPCA method:
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.