Gaussian Processes ToolKit - R Package

This page describes examples of how to use the Gaussian Processes Toolkit (gptk).

RELEASEINFORMATION

Version 1.07

Switched licensing from AGPL-3 to FreeBSD. Now you can use gptk commercially.

Version 1.06

Replaced all occurences of is.real() (deprecated) to is.double() .

Version 1.03

Fixed error that cmpndKernParamInit gave on profiles on zero data variance. Now using the Matrix sparse matrix class to handle this (dependency on Matrix).

Version 1.02

Removed assignments from .Rd files.

Version 1.01

Demos no longer enforce creation of png and gif files.

Version 1.0

R implementation of a GP toolkit for Matlab originally written by Neil D. Lawrence. Written in R by Alfredo Kalaitzis. Contributions by Antti Honkela, Pei Gao, Neil D. Lawrence.

Examples

Functions from Gaussians

This example shows how points which look like they come from a function to be sampled from a Gaussian distribution. The sample is 25 dimensional and is from a Gaussian with a particular covariance.

> demGpSample()


Left A single, 25 dimensional, sample from a Gaussian distribution. Right the covariance matrix of the Gaussian distribution.

Joint Distribution over two Variables

Gaussian processes are about conditioning a Gaussian distribution on the training data to make the test predictions. To illustrate this process, we can look at the joint distribution over two variables.

> demGpCov2D(c(1,2))

Gives the joint distribution for f1 and f2. The plots show the joint distributions as well as the conditional for f2 given f1.


Left Blue line is contour of joint distribution over the variables f1 and f2. Green line indicates an observation of f1. Red line is conditional distribution of f2 given f1. Right Similar for f1 and f5.

Different Samples from Gaussian Processes

A script is provided which samples from a Gaussian process with the provided covariance function.

> gpSample('rbf', 10, c(1,1), c(-3,3))

will give 10 samples from an RBF covariance function with a parameter vector given by [1 1] (inverse width 1, variance 1) across the range -3 to 3 on the x-axis. The random seed will be set to 1e5.

> gpSample('rbf', 10, c(16,1), c(-3,3))

is similar, but the inverse width is now set to 16 (length scale 0.25).


Left samples from an RBF style covariance function with length scale 1. Right samples from an RBF style covariance function with length scale 0.25.

Posterior Samples

Gaussian processes are non-parametric models. They are specified by their covariance function and a mean function. When combined with data observations a posterior Gaussian process is induced. The demos below show samples from that posterior.

> gpPosteriorSample('rbf', 5, c(1,1), c(-3,3))

and

> gpPosteriorSample('rbf', 5, c(16,1), c(-3,3))


Left samples from the posterior induced by an RBF style covariance function with length scale 1 and 5 "training" data points taken from a sine wave. Right Similar but for a length scale of 0.25.

Simple Interpolation Demo

This simple demonstration plots, consecutively, an increasing number of data points, followed by an interpolated fit through the data points using a Gaussian process. This is a noiseless system, and the data is sampled from a GP with a known covariance function. The curve is then recovered with minimal uncertainty after only nine data points are included. The code is run with

> demInterpolation()


Gaussian process prediction after one/three/seven/ points with a new data point sampled and after the new data points are included in the prediction.

Simple Regression Demo

The regression demo very much follows the format of the interpolation demo. Here the difference is that the data is sampled with noise. Fitting a model with noise means that the regression will not necessarily pass right through each data point. The code is run with

> demRegression()


Gaussian process prediction after one/three/seven/ points with a new data point sampled and after the new data points are included in the prediction.

Optimizing Hyper Parameters

One of the advantages of Gaussian processes over pure kernel interpretations of regression is the ability to select the hyper parameters of the kernel automatically. The demo

> demOptimiseGp()

shows a series of plots of a Gaussian process with different length scales fitted to six data points. For each plot there is a corresponding plot of the log likelihood. The log likelihood peaks for a length scale close to 1. This was the length scale used to generate the data.


Left Gaussian process regression applied to the data with an increasing length scale. The length scales used were 0.05, 0.1, 0.25, 0.5, 1, 2, 4, 8 and 16.
Right Log-log plot of the log likelihood of the data against the length scales. The log-likelihood is shown as a black line. The log-likelihood is made up of a data fit term (the quadratic form) shown by a green line and a complexity term (the log determinant) shown by a red line. The data fit is larger for short length scales, the complexity is larger for long length scales. The combination leads to a maximum around the true length scale value of 1.