The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.

sae.projection: A Model-Assisted Projection Estimator for Combining Independent Surveys

Ridson Al Farizal P (ridsonap@bps.go.id)

2025-07-06

Introduction

The ma_projection() function implements a model-assisted projection estimator for combining information from two independent surveys. This method is especially useful in survey sampling scenarios where:

This vignette illustrates how to use ma_projection() for domain-level estimation using various supervised learning models, including machine learning techniques via the parsnip interface.

Method Overview

The approach follows the work of Kim & Rao (2012), where a working model is trained on Survey 2 to predict the outcome variable. Predictions are made for the auxiliary-only Survey 1 data. These predictions are then aggregated by domain to generate small area estimates.

Required Packages

library(sae.projection)
library(dplyr)
library(tidymodels)
library(bonsai)  # for modern tree-based models

Example: Income Estimation Using Linear Regression

# Filter non-missing values for income
svy22_income <- df_svy22 %>% filter(!is.na(income))
svy23_income <- df_svy23 %>% filter(!is.na(income))

# Fit projection model
lm_result <- ma_projection(
  income ~ age + sex + edu + disability,
  cluster_ids = "PSU",
  weight = "WEIGHT",
  strata = "STRATA",
  domain = c("PROV", "REGENCY"),
  working_model = linear_reg(),
  data_model = svy22_income,
  data_proj = svy23_income,
  nest = TRUE
)

# View results
head(lm_result$df_result)

Example: Binary Outcome Using Logistic Regression

# Filter youth population for NEET classification
svy22_neet <- df_svy22 %>% filter(between(age, 15, 24))
svy23_neet <- df_svy23 %>% filter(between(age, 15, 24))

# Fit logistic regression model
lr_result <- ma_projection(
  formula = neet ~ sex + edu + disability,
  cluster_ids = ~ PSU,
  weight = ~ WEIGHT,
  strata = ~ STRATA,
  domain = ~ PROV + REGENCY,
  working_model = logistic_reg(),
  data_model = svy22_neet,
  data_proj = svy23_neet,
  nest = TRUE
)

# View results
head(lr_result$df_result)

Example: LightGBM with Hyperparameter Tuning

# Define LightGBM model with tuning
lgbm_model <- boost_tree(
  mtry = tune(), trees = tune(), min_n = tune(),
  tree_depth = tune(), learn_rate = tune(),
  engine = "lightgbm"
)

# Fit with cross-validation
lgbm_result <- ma_projection(
  formula = neet ~ sex + edu + disability,
  cluster_ids = "PSU",
  weight = "WEIGHT",
  strata = "STRATA",
  domain = c("PROV", "REGENCY"),
  working_model = lgbm_model,
  data_model = svy22_neet,
  data_proj = svy23_neet,
  cv_folds = 3,
  tuning_grid = 5,
  nest = TRUE
)

# View results
head(lgbm_result$df_result)

Supported Models

ma_projection() supports many working models using the parsnip interface, including:

References

Kim, J. K., & Rao, J. N. (2012). Combining data from two independent surveys: a model-assisted approach. Biometrika, 99(1), 85–100. doi:10.1093/biomet/asr063

Conclusion

ma_projection() provides a flexible and robust way to combine survey data using modern modeling tools. It supports a wide range of use cases including socioeconomic indicators, health estimates, and more.

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.