The goal of this document is to introduce kms
(as in keras_model_sequential()
), a regression-style function which allows users to call keras
neural nets with R
formula
objects (hence, library(kerasformula
)).
First, make sure that keras
is properly configured:
install.packages("keras")
library(keras)
install_keras() # see https://keras.rstudio.com/ for details.
kms
splits training and test data into sparse matrices.kms
also auto-detects whether the dependent variable is categorical, binary, or continuous. kms
accepts the major parameters found in library(keras)
as inputs (loss function, batch size, number of epochs, etc.) and allows users to customize basic neural nets (dense neural nets of various input shapes and dropout rates). The final example below also shows how to pass a compiled keras_model_sequential
to kms
(preferable for more complex models).
This example works with some of the imdb
movie review data that comes with library(keras
). Specifically, this example compares the default dense model that ksm
generates to the lstm
model described here. To expedite package building and installation, the code below is not actually run but can be run in under six minutes on a 2017 MacBook Pro with 16 GB of RAM (of which the majority of the time is for the lstm).
max_features <- 5000 # 5,000 words (ranked by popularity) found in movie reviews
maxlen <- 50 # Cut texts after 50 words (among top max_features most common words)
Nsample <- 1000
cat('Loading data...\n')
imdb <- keras::dataset_imdb(num_words = max_features)
imdb_df <- as.data.frame(cbind(c(imdb$train$y, imdb$test$y),
pad_sequences(c(imdb$train$x, imdb$test$x))))
set.seed(2017) # can also set kms(..., seed = 2017)
demo_sample <- sample(nrow(imdb_df), Nsample)
P <- ncol(imdb_df) - 1
colnames(imdb_df) <- c("y", paste0("x", 1:P))
out_dense <- kms("y ~ .", data = imdb_df[demo_sample, ], Nepochs = 10,
scale=NULL) # scale=NULL means leave data on original scale
plot(out_dense$history) # incredibly useful
# choose Nepochs to maximize out of sample accuracy
out_dense$confusion
1
0 107
1 105
cat('Test accuracy:', out_dense$evaluations$acc, "\n")
Test accuracy: 0.495283
Pretty bad–that's a 'broken clock' model. Suppose want to add some more layers. Below find the default setting for layers
appart from an additional softmax layer. Notice in layers
below anything that appears only once is repeated for each layer as appropriate.
out_dense <- kms("y ~ .", data = imdb_df[demo_sample, ], Nepochs = 10, seed=123, scale=NULL,
layers = list(units = c(512, 256, 128, NA),
activation = c("softmax", "relu", "relu", "softmax"),
dropout = c(0.75, 0.4, 0.3, NA),
use_bias = TRUE,
kernel_initializer = NULL,
kernel_regularizer = "regularizer_l1",
bias_regularizer = "regularizer_l1",
activity_regularizer = "regularizer_l1"
))
out_dense$confusion
1
0 92
1 106
cat('Test accuracy:', out_dense$evaluations$acc, "\n")
Test accuracy: 0.4816514
No progress. Suppose we want to build an lstm
model and pass it to ksm
.
use_session_with_seed(12345)
k <- keras_model_sequential()
k %>%
layer_embedding(input_dim = max_features, output_dim = 128) %>%
layer_lstm(units = 64, dropout = 0.2, recurrent_dropout = 0.2) %>%
layer_dense(units = 1, activation = 'sigmoid')
k %>% compile(
loss = 'binary_crossentropy',
optimizer = 'adam',
metrics = c('accuracy')
)
out_lstm <- kms("y ~ .", imdb_df[demo_sample, ],
keras_model_seq = k, Nepochs = 10, seed = 12345, scale = NULL)
out_lstm$confusion
0 1
0 74 23
1 23 79
cat('Test accuracy:', out_lstm$evaluations$acc, "\n")
Test accuracy: 0.7688442
76.8% out-of-sample accuracy. That's marked improvement!
If you're OK with ->
(right assignment), the above is equivalent to:
use_session_with_seed(12345)
keras_model_sequential() %>%
layer_embedding(input_dim = max_features, output_dim = 128) %>%
layer_lstm(units = 64, dropout = 0.2, recurrent_dropout = 0.2) %>%
layer_dense(units = 1, activation = 'sigmoid') %>%
compile(loss = 'binary_crossentropy',
optimizer = 'adam', metrics = c('accuracy')) %>%
kms(input_formula = "y ~ .", data = imdb_df[demo_sample, ],
Nepochs = 10, seed = 12345, scale = NULL) ->
out_lstm
plot(out_lstm$history)
For another worked example starting with raw data (from rtweet
) visit here.