Gaussian Processes ToolKit - R Package
This page describes examples of how to use the Gaussian Processes
Toolkit (gptk).
RELEASEINFORMATION
Version 1.07
Switched licensing from AGPL-3 to FreeBSD. Now you can use gptk commercially.
Version 1.06
Replaced all occurences of is.real() (deprecated) to is.double() .
Version 1.03
Fixed error that cmpndKernParamInit gave on profiles on zero data variance.
Now using the Matrix sparse matrix class to handle this (dependency on Matrix).
Version 1.02
Removed assignments from .Rd files.
Version 1.01
Demos no longer enforce creation of png and gif files.
Version 1.0
R implementation of a GP toolkit for Matlab originally written by Neil D. Lawrence.
Written in R by Alfredo Kalaitzis. Contributions by Antti Honkela, Pei Gao, Neil D. Lawrence.
Examples
Functions from Gaussians
This example shows how points which look like they come from a
function to be sampled from a Gaussian distribution. The sample is 25
dimensional and is from a Gaussian with a particular covariance.
> demGpSample()


Left A single, 25
dimensional, sample from a Gaussian distribution. Right the
covariance matrix of the Gaussian distribution.
Joint Distribution over two Variables
Gaussian processes are about conditioning a Gaussian distribution
on the training data to make the test predictions. To illustrate this
process, we can look at the joint distribution over two variables.
> demGpCov2D(c(1,2))
Gives the joint distribution for f1 and
f2. The plots show the joint distributions as well
as the conditional for f2 given
f1.


Left Blue line is
contour of joint distribution over the variables f1
and f2. Green line indicates an observation of
f1. Red line is conditional distribution of
f2 given f1. Right Similar
for f1 and f5.
Different Samples from Gaussian Processes
A script is provided which samples from a Gaussian process with the
provided covariance function.
> gpSample('rbf', 10, c(1,1), c(-3,3))
will give 10 samples from an RBF covariance function with a
parameter vector given by [1 1] (inverse width 1, variance 1) across
the range -3 to 3 on the x-axis. The random seed will be set to
1e5.
> gpSample('rbf', 10, c(16,1), c(-3,3))
is similar, but the inverse width is now set to 16 (length scale 0.25).


Left samples from an RBF style covariance function
with length scale 1. Right samples from an RBF style covariance
function with length scale 0.25.
Posterior Samples
Gaussian processes are non-parametric models. They are specified by their covariance function and a mean function. When combined with data observations a posterior Gaussian process is induced. The demos below show samples from that posterior.
> gpPosteriorSample('rbf', 5, c(1,1), c(-3,3))
and
> gpPosteriorSample('rbf', 5, c(16,1), c(-3,3))


Left samples from the posterior induced by an RBF style covariance function
with length scale 1 and 5 "training" data points taken from a sine wave. Right Similar but for a length scale of 0.25.
Simple Interpolation Demo
This simple demonstration plots, consecutively, an increasing
number of data points, followed by an interpolated fit through the
data points using a Gaussian process. This is a noiseless system, and
the data is sampled from a GP with a known covariance function. The
curve is then recovered with minimal uncertainty after only nine data
points are included. The code is run with
> demInterpolation()

Gaussian process prediction after one/three/seven/ points with a new
data point sampled and after the new data points are included
in the prediction.
Simple Regression Demo
The regression demo very much follows the format of the
interpolation demo. Here the difference is that the data is sampled
with noise. Fitting a model with noise means that the regression will
not necessarily pass right through each data point.
The code is run with
> demRegression()

Gaussian process prediction after one/three/seven/ points with a new
data point sampled and after the new data points are included
in the prediction.
Optimizing Hyper Parameters
One of the advantages of Gaussian processes over pure kernel
interpretations of regression is the ability to select the hyper
parameters of the kernel automatically. The demo
> demOptimiseGp()
shows a series of plots of a Gaussian process with different length
scales fitted to six data points. For each plot there is a
corresponding plot of the log likelihood. The log likelihood peaks for
a length scale close to 1. This was the length scale used to generate
the data.
Left Gaussian process regression applied to the data with an increasing length scale. The
length scales used were 0.05, 0.1, 0.25, 0.5, 1, 2, 4, 8 and
16.
Right Log-log plot of
the log likelihood of the data against the length scales. The log-likelihood
is shown as a black line. The log-likelihood is made up of
a data fit term (the quadratic form) shown by a green line and a
complexity term (the log determinant) shown by a red line. The data
fit is larger for short length scales, the complexity is larger for
long length scales. The combination leads to a maximum around the true
length scale value of 1.