GPyTorch Regression Tutorial (GPU)

(This notebook is the same as the simple GP regression tutorial notebook, but does all computations on a GPU for acceleration. Check out the multi-GPU tutorial if you have large datasets that needs multiple GPUs!)

Introduction

In this notebook, we demonstrate many of the design features of GPyTorch using the simplest example, training an RBF kernel Gaussian process on a simple function. We’ll be modeling the function

\[\begin{split}\begin{align} y &= \sin(2\pi x) + \epsilon \\ \epsilon &\sim \mathcal{N}(0, 0.04) \end{align}\end{split}\]

with 100 training examples, and testing on 51 test examples.

[1]:
import math
import torch
import gpytorch
from matplotlib import pyplot as plt

%matplotlib inline
%load_ext autoreload
%autoreload 2

Set up training data

In the next cell, we set up the training data for this example. We’ll be using 100 regularly spaced points on [0,1] which we evaluate the function on and add Gaussian noise to get the training labels.

[2]:
# Training data is 100 points in [0,1] inclusive regularly spaced
train_x = torch.linspace(0, 1, 100)
# True function is sin(2*pi*x) with Gaussian noise
train_y = torch.sin(train_x * (2 * math.pi)) + torch.randn(train_x.size()) * math.sqrt(0.04)

Setting up the model

See the simple GP regression tutorial for a detailed explanation for all the terms.

[3]:
class ExactGPModel(gpytorch.models.ExactGP):
    def __init__(self, train_x, train_y, likelihood):
        super(ExactGPModel, self).__init__(train_x, train_y, likelihood)
        self.mean_module = gpytorch.means.ConstantMean()
        self.covar_module = gpytorch.kernels.ScaleKernel(gpytorch.kernels.RBFKernel())

    def forward(self, x):
        mean_x = self.mean_module(x)
        covar_x = self.covar_module(x)
        return gpytorch.distributions.MultivariateNormal(mean_x, covar_x)

# initialize likelihood and model
likelihood = gpytorch.likelihoods.GaussianLikelihood()
model = ExactGPModel(train_x, train_y, likelihood)

Using the GPU

To do computations on the GPU, we need to put our data and model onto the GPU. (This requires PyTorch with CUDA).

[4]:
train_x = train_x.cuda()
train_y = train_y.cuda()
model = model.cuda()
likelihood = likelihood.cuda()

That’s it! All the training code is the same as in the simple GP regression tutorial.

Training the model

[5]:
# Find optimal model hyperparameters
model.train()
likelihood.train()

# Use the adam optimizer
optimizer = torch.optim.Adam(model.parameters(), lr=0.1)  # Includes GaussianLikelihood parameters

# "Loss" for GPs - the marginal log likelihood
mll = gpytorch.mlls.ExactMarginalLogLikelihood(likelihood, model)

training_iter = 50
for i in range(training_iter):
    # Zero gradients from previous iteration
    optimizer.zero_grad()
    # Output from model
    output = model(train_x)
    # Calc loss and backprop gradients
    loss = -mll(output, train_y)
    loss.backward()
    print('Iter %d/%d - Loss: %.3f   lengthscale: %.3f   noise: %.3f' % (
        i + 1, training_iter, loss.item(),
        model.covar_module.base_kernel.lengthscale.item(),
        model.likelihood.noise.item()
    ))
    optimizer.step()
Iter 1/50 - Loss: 0.944   lengthscale: 0.693   noise: 0.693
Iter 2/50 - Loss: 0.913   lengthscale: 0.644   noise: 0.644
Iter 3/50 - Loss: 0.879   lengthscale: 0.598   noise: 0.598
Iter 4/50 - Loss: 0.841   lengthscale: 0.555   noise: 0.554
Iter 5/50 - Loss: 0.798   lengthscale: 0.514   noise: 0.513
Iter 6/50 - Loss: 0.750   lengthscale: 0.475   noise: 0.474
Iter 7/50 - Loss: 0.698   lengthscale: 0.439   noise: 0.437
Iter 8/50 - Loss: 0.645   lengthscale: 0.405   noise: 0.402
Iter 9/50 - Loss: 0.595   lengthscale: 0.372   noise: 0.369
Iter 10/50 - Loss: 0.548   lengthscale: 0.342   noise: 0.339
Iter 11/50 - Loss: 0.507   lengthscale: 0.315   noise: 0.310
Iter 12/50 - Loss: 0.469   lengthscale: 0.292   noise: 0.284
Iter 13/50 - Loss: 0.432   lengthscale: 0.272   noise: 0.259
Iter 14/50 - Loss: 0.398   lengthscale: 0.255   noise: 0.236
Iter 15/50 - Loss: 0.363   lengthscale: 0.241   noise: 0.215
Iter 16/50 - Loss: 0.329   lengthscale: 0.230   noise: 0.196
Iter 17/50 - Loss: 0.296   lengthscale: 0.222   noise: 0.178
Iter 18/50 - Loss: 0.263   lengthscale: 0.215   noise: 0.162
Iter 19/50 - Loss: 0.230   lengthscale: 0.210   noise: 0.147
Iter 20/50 - Loss: 0.198   lengthscale: 0.207   noise: 0.134
Iter 21/50 - Loss: 0.167   lengthscale: 0.205   noise: 0.122
Iter 22/50 - Loss: 0.136   lengthscale: 0.205   noise: 0.110
Iter 23/50 - Loss: 0.107   lengthscale: 0.206   noise: 0.100
Iter 24/50 - Loss: 0.079   lengthscale: 0.208   noise: 0.091
Iter 25/50 - Loss: 0.053   lengthscale: 0.211   noise: 0.083
Iter 26/50 - Loss: 0.028   lengthscale: 0.215   noise: 0.076
Iter 27/50 - Loss: 0.006   lengthscale: 0.220   noise: 0.069
Iter 28/50 - Loss: -0.013   lengthscale: 0.225   noise: 0.063
Iter 29/50 - Loss: -0.029   lengthscale: 0.231   noise: 0.058
Iter 30/50 - Loss: -0.043   lengthscale: 0.237   noise: 0.053
Iter 31/50 - Loss: -0.053   lengthscale: 0.243   noise: 0.049
Iter 32/50 - Loss: -0.060   lengthscale: 0.249   noise: 0.045
Iter 33/50 - Loss: -0.065   lengthscale: 0.254   noise: 0.042
Iter 34/50 - Loss: -0.066   lengthscale: 0.259   noise: 0.039
Iter 35/50 - Loss: -0.066   lengthscale: 0.262   noise: 0.037
Iter 36/50 - Loss: -0.063   lengthscale: 0.265   noise: 0.035
Iter 37/50 - Loss: -0.060   lengthscale: 0.266   noise: 0.033
Iter 38/50 - Loss: -0.056   lengthscale: 0.266   noise: 0.032
Iter 39/50 - Loss: -0.052   lengthscale: 0.265   noise: 0.031
Iter 40/50 - Loss: -0.049   lengthscale: 0.262   noise: 0.030
Iter 41/50 - Loss: -0.047   lengthscale: 0.259   noise: 0.029
Iter 42/50 - Loss: -0.046   lengthscale: 0.254   noise: 0.029
Iter 43/50 - Loss: -0.046   lengthscale: 0.249   noise: 0.029
Iter 44/50 - Loss: -0.047   lengthscale: 0.243   noise: 0.029
Iter 45/50 - Loss: -0.049   lengthscale: 0.237   noise: 0.029
Iter 46/50 - Loss: -0.051   lengthscale: 0.231   noise: 0.029
Iter 47/50 - Loss: -0.054   lengthscale: 0.225   noise: 0.030
Iter 48/50 - Loss: -0.057   lengthscale: 0.219   noise: 0.030
Iter 49/50 - Loss: -0.059   lengthscale: 0.214   noise: 0.031
Iter 50/50 - Loss: -0.061   lengthscale: 0.210   noise: 0.032

Make predictions with the model

First, we need to make some test data, and then throw it onto the GPU

[6]:
test_x = torch.linspace(0, 1, 51).cuda()

Now the rest of the code follows the simple GP regression tutorial.

[7]:
# Get into evaluation (predictive posterior) mode
model.eval()
likelihood.eval()

# Test points are regularly spaced along [0,1]
# Make predictions by feeding model through likelihood
with torch.no_grad(), gpytorch.settings.fast_pred_var():
    observed_pred = likelihood(model(test_x))
    mean = observed_pred.mean
    lower, upper = observed_pred.confidence_region()

For plotting, we’re going to grab the data from the GPU and put it back on the CPU. We can accomplish this with the .cpu() function.

[8]:
mean = mean.cpu()
lower = lower.cpu()
upper = upper.cpu()

train_x = train_x.cpu()
train_y = train_y.cpu()
test_x = test_x.cpu()
[9]:
with torch.no_grad():
    # Initialize plot
    f, ax = plt.subplots(1, 1, figsize=(4, 3))

    # Plot training data as black stars
    ax.plot(train_x.numpy(), train_y.numpy(), 'k*')
    # Plot predictive means as blue line
    ax.plot(test_x.numpy(), mean.numpy(), 'b')
    # Shade between the lower and upper confidence bounds
    ax.fill_between(test_x.numpy(), lower.numpy(), upper.numpy(), alpha=0.5)
    ax.set_ylim([-3, 3])
    ax.legend(['Observed Data', 'Mean', 'Confidence'])
../../_images/examples_02_Scalable_Exact_GPs_Simple_GP_Regression_CUDA_18_0.png