The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.

BRM on the house dataset (regression)

Overview

A third regression demonstration on the King County house-sales dataset (~21,600 rows), shipped in full with the package.

library(blockwise)
data(house)
str(house)
#> 'data.frame':    21597 obs. of  17 variables:
#>  $ bedrooms     : int  3 3 2 4 3 4 3 3 3 3 ...
#>  $ bathrooms    : num  1 2.25 1 3 2 4.5 2.25 1.5 1 2.5 ...
#>  $ sqft_living  : int  1180 2570 770 1960 1680 5420 1715 1060 1780 1890 ...
#>  $ sqft_lot     : int  5650 7242 10000 5000 8080 101930 6819 9711 7470 6560 ...
#>  $ floors       : num  1 2 1 1 1 1 2 1 1 2 ...
#>  $ waterfront   : int  0 0 0 0 0 0 0 0 0 0 ...
#>  $ view         : int  0 0 0 0 0 0 0 0 0 0 ...
#>  $ condition    : int  3 3 3 5 3 3 3 3 3 3 ...
#>  $ grade        : int  7 7 6 7 8 11 7 7 7 7 ...
#>  $ sqft_above   : int  1180 2170 770 1050 1680 3890 1715 1060 1050 1890 ...
#>  $ sqft_basement: int  0 400 0 910 0 1530 0 0 730 0 ...
#>  $ yr_built     : int  1955 1951 1933 1965 1987 2001 1995 1963 1960 2003 ...
#>  $ yr_renovated : int  0 1991 0 0 0 0 0 0 0 0 ...
#>  $ zip          : int  981 981 980 981 980 980 980 981 981 980 ...
#>  $ sqft_living15: int  1340 1690 2720 1360 1800 4760 2238 1650 1780 2390 ...
#>  $ sqft_lot15   : int  5650 7639 8062 5000 7503 101930 6819 9711 8113 7570 ...
#>  $ price        : num  221900 538000 180000 604000 510000 ...

Induce missingness, split, fit

house_miss <- simulate_blockwise_missing(
  house,
  blocks = list(
    c("sqft_living", "sqft_lot", "sqft_above"),
    c("bedrooms", "bathrooms", "floors", "grade")
  ),
  prop_missing = 0.30,
  noise        = 0.05
)

set.seed(1234)
idx <- sample(nrow(house_miss), floor(0.75 * nrow(house_miss)))
train <- house_miss[idx, ]
test  <- house_miss[-idx, ]

X_train <- train[, setdiff(names(train), "price")]
y_train <- train$price
X_test  <- test[,  setdiff(names(test),  "price")]
y_test  <- test$price

set.seed(1234)
fit <- brm(X_train, y_train, learner = learner_lm())
fit
#> Blockwise Reduced Model (BRM)
#>   blocks        : 4 
#>   overlap       : TRUE 
#>   learner type  : regression 
#>   features      : 16 
#>   cols / block  : 16, 13, 12, 9

pred <- predict(fit, X_test)
cat("RMSE:", round(sqrt(mean((y_test - pred)^2)), 0), "\n")
#> RMSE: 217514

Citation

Srinivasan, K., Currim, F., and Ram, S. (2025). A Reduced Modeling Approach for Making Predictions With Incomplete Data Having Blockwise Missing Patterns. INFORMS Journal on Data Science.

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.