The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.
The ma_projection()
function implements a model-assisted
projection estimator for combining information from two independent
surveys. This method is especially useful in survey sampling scenarios
where:
This vignette illustrates how to use ma_projection()
for
domain-level estimation using various supervised learning models,
including machine learning techniques via the parsnip
interface.
The approach follows the work of Kim & Rao (2012), where a working model is trained on Survey 2 to predict the outcome variable. Predictions are made for the auxiliary-only Survey 1 data. These predictions are then aggregated by domain to generate small area estimates.
# Filter non-missing values for income
svy22_income <- df_svy22 %>% filter(!is.na(income))
svy23_income <- df_svy23 %>% filter(!is.na(income))
# Fit projection model
lm_result <- ma_projection(
income ~ age + sex + edu + disability,
cluster_ids = "PSU",
weight = "WEIGHT",
strata = "STRATA",
domain = c("PROV", "REGENCY"),
working_model = linear_reg(),
data_model = svy22_income,
data_proj = svy23_income,
nest = TRUE
)
# View results
head(lm_result$df_result)
# Filter youth population for NEET classification
svy22_neet <- df_svy22 %>% filter(between(age, 15, 24))
svy23_neet <- df_svy23 %>% filter(between(age, 15, 24))
# Fit logistic regression model
lr_result <- ma_projection(
formula = neet ~ sex + edu + disability,
cluster_ids = ~ PSU,
weight = ~ WEIGHT,
strata = ~ STRATA,
domain = ~ PROV + REGENCY,
working_model = logistic_reg(),
data_model = svy22_neet,
data_proj = svy23_neet,
nest = TRUE
)
# View results
head(lr_result$df_result)
# Define LightGBM model with tuning
lgbm_model <- boost_tree(
mtry = tune(), trees = tune(), min_n = tune(),
tree_depth = tune(), learn_rate = tune(),
engine = "lightgbm"
)
# Fit with cross-validation
lgbm_result <- ma_projection(
formula = neet ~ sex + edu + disability,
cluster_ids = "PSU",
weight = "WEIGHT",
strata = "STRATA",
domain = c("PROV", "REGENCY"),
working_model = lgbm_model,
data_model = svy22_neet,
data_proj = svy23_neet,
cv_folds = 3,
tuning_grid = 5,
nest = TRUE
)
# View results
head(lgbm_result$df_result)
ma_projection()
supports many working models using the
parsnip
interface, including:
linear_reg()
, logistic_reg()
(also with
Stan engine)poisson_reg()
, mlp()
,
naive_bayes()
, nearest_neighbor()
decision_tree()
, bag_tree()
,
boost_tree()
with LightGBM/XGBoost,
rand_forest()
(ranger, aorsf), bart()
svm_linear()
, svm_poly()
,
svm_rbf()
Kim, J. K., & Rao, J. N. (2012). Combining data from two independent surveys: a model-assisted approach. Biometrika, 99(1), 85–100. doi:10.1093/biomet/asr063
ma_projection()
provides a flexible and robust way to
combine survey data using modern modeling tools. It supports a wide
range of use cases including socioeconomic indicators, health estimates,
and more.
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.