The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.
Please cite Keras in your publications if it helps your research. Here is an example BibTeX entry:
@misc{chollet2017kerasR,
title={R Interface to Keras},
author={Chollet, Fran\c{c}ois and Allaire, JJ and others},
year={2017},
publisher={GitHub},
howpublished={\url{https://github.com/rstudio/keras}},
}
Note that installation and configuration of the GPU-based backends can take considerably more time and effort. So if you are just getting started with Keras you may want to stick with the CPU version initially, then install the appropriate GPU version once your training becomes more computationally demanding.
Below are instructions for installing and enabling GPU support for the various supported backends.
If your system has an NVIDIA® GPU and you have the GPU version of TensorFlow installed then your Keras code will automatically run on the GPU.
Additional details on GPU installation can be found here: https://tensorflow.rstudio.com/installation_gpu.html.
If you are running on the Theano backend, you can set the
THEANO_FLAGS
environment variable to indicate you’d like to
execute tensor operations on the GPU. For example:
Sys.setenv(KERAS_BACKEND = "keras")
Sys.setenv(THEANO_FLAGS = "device=gpu,floatX=float32")
library(keras)
The name ‘gpu’ might have to be changed depending on your device’s
identifier (e.g. gpu0
, gpu1
, etc).
If you have the GPU version of CNTK installed then your Keras code will automatically run on the GPU.
Additional information on installing the GPU version of CNTK can be found here: https://learn.microsoft.com/en-us/cognitive-toolkit/setup-linux-python
We recommend doing so using the TensorFlow backend. There are two ways to run a single model on multiple GPUs: data parallelism and device parallelism.
In most cases, what you need is most likely data parallelism.
Data parallelism consists in replicating the target model once on
each device, and using each replica to process a different fraction of
the input data. Keras has a built-in utility,
multi_gpu_model()
, which can produce a data-parallel
version of any model, and achieves quasi-linear speedup on up to 8
GPUs.
For more information, see the documentation for multi_gpu_model. Here is a quick example:
# Replicates `model` on 8 GPUs.
# This assumes that your machine has 8 available GPUs.
parallel_model <- multi_gpu_model(model, gpus=8)
parallel_model %>% compile(
loss = "categorical_crossentropy",
optimizer = "rmsprop"
)
# This `fit` call will be distributed on 8 GPUs.
# Since the batch size is 256, each GPU will process 32 samples.
parallel_model %>% fit(x, y, epochs = 20, batch_size = 256)
Device parallelism consists in running different parts of a same model on different devices. It works best for models that have a parallel architecture, e.g. a model with two branches.
This can be achieved by using TensorFlow device scopes. Here is a quick example:
# Model where a shared LSTM is used to encode two different sequences in parallel
input_a <- layer_input(shape = c(140, 256))
input_b <- layer_input(shape = c(140, 256))
shared_lstm <- layer_lstm(units = 64)
# Process the first sequence on one GPU
library(tensorflow)
with(tf$device_scope("/gpu:0", {
encoded_a <- shared_lstm(tweet_a)
}):
# Process the next sequence on another GPU
with(tf$device_scope("/gpu:1", {
encoded_b <- shared_lstm(tweet_b)
}):
# Concatenate results on CPU
with(tf$device_scope("/cpu:0", {
merged_vector <- layer_concatenate(list(encoded_a, encoded_b))
}):
Below are some common definitions that are necessary to know and understand to correctly utilize Keras:
evaluation_data
or
evaluation_split
with the fit
method of Keras
models, evaluation will be run at the end of every
epoch.Unlike most R objects, Keras objects are “mutable”. That means that when you modify an object you’re modifying it “in place”, and you don’t need to assign the updated object back to the original name. For example, to add layers to a Keras model you might use this code:
model %>%
layer_dense(units = 32, activation = 'relu', input_shape = c(784)) %>%
layer_dense(units = 10, activation = 'softmax')
Rather than this code:
model <- model %>%
layer_dense(units = 32, activation = 'relu', input_shape = c(784)) %>%
layer_dense(units = 10, activation = 'softmax')
You need to be aware of this because it makes the Keras API a little different than most other pipelines you may have used, but it’s necessary to match the data structures and behavior of the underlying Keras library.
You can use save_model_hdf5()
to save a Keras model into
a single HDF5 file which will contain:
You can then use load_model_hdf5()
to reinstantiate your
model. load_model_hdf5()
will also take care of compiling
the model using the saved training configuration (unless the model was
never compiled in the first place).
Example:
If you only need to save the architecture of a model, and not its weights or its training configuration, you can do:
The generated JSON / YAML files are human-readable and can be manually edited if needed.
You can then build a fresh model from this data:
If you need to save the weights of a model, you can do so in HDF5 with the code below.
Assuming you have code for instantiating your model, you can then load the weights you saved into a model with the same architecture:
If you need to load weights into a different architecture (with some layers in common), for instance for fine-tuning or transfer-learning, you can load weights by layer name:
For example:
# assuming the original model looks like this:
# model <- keras_model_sequential()
# model %>%
# layer_dense(units = 2, input_dim = 3, name = "dense 1") %>%
# layer_dense(units = 3, name = "dense_3") %>%
# ...
# save_model_weights(model, fname)
# new model
model <- keras_model_sequential()
model %>%
layer_dense(units = 2, input_dim = 3, name = "dense 1") %>% # will be loaded
layer_dense(units = 3, name = "dense_3") # will not be loaded
# load weights from first model; will only affect the first layer, dense_1.
load_model_weights(fname, by_name = TRUE)
A Keras model has two modes: training and testing. Regularization mechanisms, such as Dropout and L1/L2 weight regularization, are turned off at testing time.
Besides, the training loss is the average of the losses over each batch of training data. Because your model is changing over time, the loss over the first batches of an epoch is generally higher than over the last batches. On the other hand, the testing loss for an epoch is computed using the model as it is at the end of the epoch, resulting in a lower loss.
One simple way is to create a new Model
that will output
the layers that you are interested in:
To provide training or evaluation data incrementally you can write an
R generator function that yields batches of training data then pass the
function to the fit_generator()
function (or related
functions evaluate_generator()
and
predict_generator()
.
The output of generator functions must be a list of one of these forms:
All arrays should contain the same number of samples. The generator is expected to loop over its data indefinitely. For example, here’s simple generator function that yields randomly sampled batches of data:
sampling_generator <- function(X_data, Y_data, batch_size) {
function() {
rows <- sample(1:nrow(X_data), batch_size, replace = TRUE)
list(X_data[rows,], Y_data[rows,])
}
}
model %>%
fit_generator(sampling_generator(X_train, Y_train, batch_size = 128),
steps_per_epoch = nrow(X_train) / 128, epochs = 10)
The steps_per_epoch
parameter indicates the number of
steps (batches of samples) to yield from generator before declaring one
epoch finished and starting the next epoch. It should typically be equal
to the number of unique samples if your dataset divided by the batch
size.
The above example doesn’t however address the use case of datasets that don’t fit in memory. Typically to do that you’ll write a generator that reads from another source (e.g. a sparse matrix or file(s) on disk) and maintains an offset into that data as it’s called repeatedly. For example, imagine you have a set of text files in a directory you want to read from:
data_files_generator <- function(dir) {
files < list.files(dir)
next_file <- 0
function() {
# move to the next file (note the <<- assignment operator)
next_file <<- next_file + 1
# if we've exhausted all of the files then start again at the
# beginning of the list (keras generators need to yield
# data infinitely -- termination is controlled by the epochs
# and steps_per_epoch arguments to fit_generator())
if (next_file > length(files))
next_file <<- 1
# determine the file name
file <- files[[next_file]]
# process and return the data in the file. note that in a
# real example you'd further subdivide the data within the
# file into appropriately sized training batches. this
# would make this function much more complicated so we
# don't demonstrated it here
file_to_training_data(file)
}
}
The above function is an example of a stateful generator—the function
maintains information across calls to keep track of which data to
provide next. This is accomplished by defining shared state outside the
generator function body and using the <<-
operator to
assign to it from within the generator.
You can also use the flow_images_from_directory()
and
flow_images_from_data()
functions along with
fit_generator()
for training on sets of images stored on
disk (with optional image augmentation/normalization via
image_data_generator()
).
You can also do batch training using the
train_on_batch()
and test_on_batch()
functions. These functions enable you to write a training loop that
reads into memory only the data required for each batch.
You can use an early stopping callback:
early_stopping <- callback_early_stopping(monitor = 'val_loss', patience = 2)
model %>% fit(X, y, validation_split = 0.2, callbacks = c(early_stopping))
Find out more in the callbacks documentation.
If you set the validation_split
argument in
fit
to e.g. 0.1, then the validation data used will be the
last 10% of the data. If you set it to 0.25, it will be the
last 25% of the data, etc. Note that the data isn’t shuffled before
extracting the validation split, so the validation is literally just the
last x% of samples in the input you passed.
The same validation set is used for all epochs (within a same call to
fit
).
Yes, if the shuffle
argument in fit
is set
to TRUE
(which is the default), the training data will be
randomly shuffled at each epoch.
Validation data is never shuffled.
The model.fit
method returns an History
callback, which has a history
attribute containing the
lists of successive losses and other metrics.
To “freeze” a layer means to exclude it from training, i.e. its weights will never be updated. This is useful in the context of fine-tuning a model, or using fixed embeddings for a text input.
You can pass a trainable
argument (boolean) to a layer
constructor to set a layer to be non-trainable:
Additionally, you can set the trainable
property of a
layer to TRUE
or FALSE
after instantiation.
For this to take effect, you will need to call compile()
on
your model after modifying the trainable
property. Here’s
an example:
x <- layer_input(shape = c(32))
layer <- layer_dense(units = 32)
layer$trainable <- FALSE
y <- x %>% layer
frozen_model <- keras_model(x, y)
# in the model below, the weights of `layer` will not be updated during training
frozen_model %>% compile(optimizer = 'rmsprop', loss = 'mse')
layer$trainable <- TRUE
trainable_model <- keras_model(x, y)
# with this model the weights of the layer will be updated during training
# (which will also affect the above model since it uses the same layer instance)
trainable_model %>% compile(optimizer = 'rmsprop', loss = 'mse')
frozen_model %>% fit(data, labels) # this does NOT update the weights of `layer`
trainable_model %>% fit(data, labels) # this updates the weights of `layer`
Finally, you can freeze or unfreeze the weights for an entire model
(or a range of layers within the model) using the
freeze_weights()
and unfreeze_weights()
functions. For example:
# instantiate a VGG16 model
conv_base <- application_vgg16(
weights = "imagenet",
include_top = FALSE,
input_shape = c(150, 150, 3)
)
# freeze it's weights
freeze_weights(conv_base)
# create a composite model that includes the base + more layers
model <- keras_model_sequential() %>%
conv_base %>%
layer_flatten() %>%
layer_dense(units = 256, activation = "relu") %>%
layer_dense(units = 1, activation = "sigmoid")
# compile
model %>% compile(
loss = "binary_crossentropy",
optimizer = optimizer_rmsprop(lr = 2e-5),
metrics = c("accuracy")
)
# unfreeze weights from "block5_conv1" on
unfreeze_weights(conv_base, from = "block5_conv1")
# compile again since we froze or unfroze layers
model %>% compile(
loss = "binary_crossentropy",
optimizer = optimizer_rmsprop(lr = 2e-5),
metrics = c("accuracy")
)
Making a RNN stateful means that the states for the samples of each batch will be reused as initial states for the samples in the next batch.
When using stateful RNNs, it is therefore assumed that:
X1
and X2
are successive batches of
samples, then X2[[i]]
is the follow-up sequence to
X1[[i]
, for every i
.To use statefulness in RNNs, you need to:
batch_size
argument to the first layer in your model. E.g.
batch_size=32
for a 32-samples batch of sequences of 10
timesteps with 16 features per timestep.stateful=TRUE
in your RNN layer(s).shuffle=FALSE
when calling fit().To reset the states accumulated in either a single layer or an entire
model use the reset_states()
function.
Notes that the methods predict()
, fit()
,
train_on_batch()
, predict_classes()
, etc. will
all update the states of the stateful layers in a model. This
allows you to do not only stateful training, but also stateful
prediction.
You can remove the last added layer in a Sequential model by calling
pop_layer()
:
Code and pre-trained weights are available for the following image classification models:
For example:
For a few simple usage examples, see the documentation for the Applications module.
The VGG16 model is also the basis for the Deep dream Keras example script.
By default the Keras Python and R packages use the TensorFlow backend. Other available backends include Theano or CNTK. To learn more about using alternatate backends (e.g. Theano or CNTK) see the article on Keras backends.
PlaidML is an open source portable deep learning engine that runs on most existing PC hardware with OpenCL-capable GPUs from NVIDIA, AMD, or Intel. PlaidML includes a Keras backend which you can use as described below.
First, build and install PlaidML as described on the project website. You must be sure that PlaidML is correctly installed, setup, and working before proceeding further!
Then, to use Keras with the PlaidML backend you do the following:
This should automatically discover and use the Python environment
where plaidml
and plaidml-keras
were
installed. If this doesn’t work as expected you can also force the
selection of a particular Python environment. For example, if you
installed PlaidML in conda environment named “plaidml” you would do
this:
The main consideration in using Keras within another R package is to
ensure that your package can be tested in an environment where Keras is
not available (e.g. the CRAN test servers). To do this, arrange for your
tests to be skipped when Keras isn’t available using the
is_keras_available()
function.
For example, here’s a testthat utility function that can be used to skip a test when Keras isn’t available:
# testthat utilty for skipping tests when Keras isn't available
skip_if_no_keras <- function(version = NULL) {
if (!is_keras_available(version))
skip("Required keras version not available for testing")
}
# use the function within a test
test_that("keras function works correctly", {
skip_if_no_keras()
# test code here
})
You can pass the version
argument to check for a
specific version of Keras.
Another consideration is gaining access to the underlying Keras Python module. You might need to do this if you require lower level access to Keras than is provided for by the Keras R package.
Since the Keras R package can bind to multiple different
implementations of Keras (either the original Keras or the TensorFlow
implementation of Keras), you should use the
keras::implementation()
function to obtain access to the
correct python module. You can use this function within the
.onLoad
function of a package to provide global access to
the module within your package. For example:
If you create custom layers in R or
import other Python packages which include custom Keras layers, be sure
to wrap them using the create_layer()
function so that they
are composable using the magrittr pipe operator. See the documentation
on layer wrapper
functions for additional details.
During development of a model, sometimes it is useful to be able to obtain reproducible results from run to run in order to determine if a change in performance is due to an actual model or data modification, or merely a result of a new random sample.
The use_session_with_seed()
function establishes a
common random seed for R, Python, NumPy, and TensorFlow. It furthermore
disables hash randomization, GPU computations, and CPU parallelization,
which can be additional sources of non-reproducibility.
To use the function, call it immediately after you load the keras package:
This function takes all measures known to promote reproducible
results from Keras sessions, however it’s possible that various
individual features or libraries used by the backend escape its effects.
If you encounter non-reproducible results please investigate the
possible sources of the problem. The source code for
use_session_with_seed()
is here: https://github.com/rstudio/tensorflow/blob/main/R/seed.R.
Contributions via pull request are very welcome!
Please note again that use_session_with_seed()
disables
GPU computations and CPU parallelization by default (as both can lead to
non-deterministic computations) so should generally not be used when
model training time is paramount. You can re-enable GPU computations
and/or CPU parallelism using the disable_gpu
and
disable_parallel_cpu
arguments. For example:
The default directory where all Keras data is stored is:
In case Keras cannot create the above directory (e.g. due to
permission issues), /tmp/.keras/
is used as a backup.
The Keras configuration file is a JSON file stored at
$HOME/.keras/keras.json
. The default configuration file
looks like this:
{
"image_data_format": "channels_last",
"epsilon": 1e-07,
"floatx": "float32",
"backend": "tensorflow"
}
It contains the following fields:
channels_last
or
channels_first
).epsilon
numerical fuzz factor to be used to prevent
division by zero in some operations.Likewise, cached dataset files, such as those downloaded with
get_file()
, are stored by default in
$HOME/.keras/datasets/
.
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.