{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Stochastic Variational GP Regression\n", "\n", "## Overview\n", "\n", "In this notebook, we'll give an overview of how to use SVGP stochastic variational regression ((https://arxiv.org/pdf/1411.2005.pdf)) to rapidly train using minibatches on the `3droad` UCI dataset with hundreds of thousands of training examples. This is one of the more common use-cases of variational inference for GPs.\n", "\n", "If you are unfamiliar with variational inference, we recommend the following resources:\n", "- [Variational Inference: A Review for Statisticians](https://arxiv.org/abs/1601.00670) by David M. Blei, Alp Kucukelbir, Jon D. McAuliffe.\n", "- [Scalable Variational Gaussian Process Classification](https://arxiv.org/abs/1411.2005) by James Hensman, Alex Matthews, Zoubin Ghahramani." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import tqdm\n", "import math\n", "import torch\n", "import gpytorch\n", "from matplotlib import pyplot as plt\n", "\n", "# Make plots inline\n", "%matplotlib inline" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For this example notebook, we'll be using the `song` UCI dataset used in the paper. Running the next cell downloads a copy of the dataset that has already been scaled and normalized appropriately. For this notebook, we'll simply be splitting the data using the first 80% of the data as training and the last 20% as testing.\n", "\n", "**Note**: Running the next cell will attempt to download a **~136 MB** file to the current directory." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "import urllib.request\n", "import os\n", "from scipy.io import loadmat\n", "from math import floor\n", "\n", "\n", "# this is for running the notebook in our testing framework\n", "smoke_test = ('CI' in os.environ)\n", "\n", "\n", "if not smoke_test and not os.path.isfile('../elevators.mat'):\n", " print('Downloading \\'elevators\\' UCI dataset...')\n", " urllib.request.urlretrieve('https://drive.google.com/uc?export=download&id=1jhWL3YUHvXIaftia4qeAyDwVxo6j1alk', '../elevators.mat')\n", "\n", "\n", "if smoke_test: # this is for running the notebook in our testing framework\n", " X, y = torch.randn(1000, 3), torch.randn(1000)\n", "else:\n", " data = torch.Tensor(loadmat('../elevators.mat')['data'])\n", " X = data[:, :-1]\n", " X = X - X.min(0)[0]\n", " X = 2 * (X / X.max(0)[0]) - 1\n", " y = data[:, -1]\n", "\n", "\n", "train_n = int(floor(0.8 * len(X)))\n", "train_x = X[:train_n, :].contiguous()\n", "train_y = y[:train_n].contiguous()\n", "\n", "test_x = X[train_n:, :].contiguous()\n", "test_y = y[train_n:].contiguous()\n", "\n", "if torch.cuda.is_available():\n", " train_x, train_y, test_x, test_y = train_x.cuda(), train_y.cuda(), test_x.cuda(), test_y.cuda()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Creating a DataLoader\n", "\n", "The next step is to create a torch `DataLoader` that will handle getting us random minibatches of data. This involves using the standard `TensorDataset` and `DataLoader` modules provided by PyTorch.\n", "\n", "In this notebook we'll be using a fairly large batch size of 1024 just to make optimization run faster, but you could of course change this as you so choose." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "from torch.utils.data import TensorDataset, DataLoader\n", "train_dataset = TensorDataset(train_x, train_y)\n", "train_loader = DataLoader(train_dataset, batch_size=1024, shuffle=True)\n", "\n", "test_dataset = TensorDataset(test_x, test_y)\n", "test_loader = DataLoader(test_dataset, batch_size=1024, shuffle=False)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Creating a SVGP Model\n", "\n", "\n", "For most variational/approximate GP models, you will need to construct the following GPyTorch objects:\n", "\n", "1. A **GP Model** (`gpytorch.models.ApproximateGP`) - This handles basic variational inference.\n", "1. A **Variational distribution** (`gpytorch.variational._VariationalDistribution`) - This tells us what form the variational distribution q(u) should take.\n", "1. A **Variational strategy** (`gpytorch.variational._VariationalStrategy`) - This tells us how to transform a distribution q(u) over the inducing point values to a distribution q(f) over the latent function values for some input x.\n", "\n", "Here, we use a `VariationalStrategy` with `learn_inducing_points=True`, and a `CholeskyVariationalDistribution`. These are the most straightforward and common options.\n", "\n", "\n", "#### The GP Model\n", " \n", "The `ApproximateGP` model is GPyTorch's simplest approximate inference model. It approximates the true posterior with a distribution specified by a `VariationalDistribution`, which is most commonly some form of MultivariateNormal distribution. The model defines all the variational parameters that are needed, and keeps all of this information under the hood.\n", "\n", "The components of a user built `ApproximateGP` model in GPyTorch are:\n", "\n", "1. An `__init__` method that constructs a mean module, a kernel module, a variational distribution object and a variational strategy object. This method should also be responsible for construting whatever other modules might be necessary.\n", "\n", "2. A `forward` method that takes in some $n \\times d$ data `x` and returns a MultivariateNormal with the *prior* mean and covariance evaluated at `x`. In other words, we return the vector $\\mu(x)$ and the $n \\times n$ matrix $K_{xx}$ representing the prior mean and covariance matrix of the GP." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "from gpytorch.models import ApproximateGP\n", "from gpytorch.variational import CholeskyVariationalDistribution\n", "from gpytorch.variational import VariationalStrategy\n", "\n", "class GPModel(ApproximateGP):\n", " def __init__(self, inducing_points):\n", " variational_distribution = CholeskyVariationalDistribution(inducing_points.size(0))\n", " variational_strategy = VariationalStrategy(self, inducing_points, variational_distribution, learn_inducing_locations=True)\n", " super(GPModel, self).__init__(variational_strategy)\n", " self.mean_module = gpytorch.means.ConstantMean()\n", " self.covar_module = gpytorch.kernels.ScaleKernel(gpytorch.kernels.RBFKernel())\n", " \n", " def forward(self, x):\n", " mean_x = self.mean_module(x)\n", " covar_x = self.covar_module(x)\n", " return gpytorch.distributions.MultivariateNormal(mean_x, covar_x)\n", "\n", "inducing_points = train_x[:500, :]\n", "model = GPModel(inducing_points=inducing_points)\n", "likelihood = gpytorch.likelihoods.GaussianLikelihood()\n", "\n", "if torch.cuda.is_available():\n", " model = model.cuda()\n", " likelihood = likelihood.cuda()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Training the Model\n", "\n", "The cell below trains the model above, learning both the hyperparameters of the Gaussian process **and** the parameters of the neural network in an end-to-end fashion using Type-II MLE.\n", "\n", "Unlike when using the exact GP marginal log likelihood, performing variational inference allows us to make use of stochastic optimization techniques. For this example, we'll do one epoch of training. Given the small size of the neural network relative to the size of the dataset, this should be sufficient to achieve comparable accuracy to what was observed in the DKL paper.\n", "\n", "The optimization loop differs from the one seen in our more simple tutorials in that it involves looping over both a number of training iterations (epochs) *and* minibatches of the data. However, the basic process is the same: for each minibatch, we forward through the model, compute the loss (the `VariationalELBO` or ELBO), call backwards, and do a step of optimization." ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "scrolled": true }, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "caa4cabd1e484d19b1b1ee90b4eab721", "version_major": 2, "version_minor": 0 }, "text/plain": [ "HBox(children=(IntProgress(value=0, description='Epoch', max=4, style=ProgressStyle(description_width='initial…" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "", "version_major": 2, "version_minor": 0 }, "text/plain": [ "HBox(children=(IntProgress(value=0, description='Minibatch', max=13, style=ProgressStyle(description_width='in…" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "", "version_major": 2, "version_minor": 0 }, "text/plain": [ "HBox(children=(IntProgress(value=0, description='Minibatch', max=13, style=ProgressStyle(description_width='in…" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "", "version_major": 2, "version_minor": 0 }, "text/plain": [ "HBox(children=(IntProgress(value=0, description='Minibatch', max=13, style=ProgressStyle(description_width='in…" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "", "version_major": 2, "version_minor": 0 }, "text/plain": [ "HBox(children=(IntProgress(value=0, description='Minibatch', max=13, style=ProgressStyle(description_width='in…" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n" ] } ], "source": [ "num_epochs = 1 if smoke_test else 4\n", "\n", "\n", "model.train()\n", "likelihood.train()\n", "\n", "optimizer = torch.optim.Adam([\n", " {'params': model.parameters()},\n", " {'params': likelihood.parameters()},\n", "], lr=0.01)\n", "\n", "# Our loss object. We're using the VariationalELBO\n", "mll = gpytorch.mlls.VariationalELBO(likelihood, model, num_data=train_y.size(0))\n", "\n", "\n", "epochs_iter = tqdm.notebook.tqdm(range(num_epochs), desc=\"Epoch\")\n", "for i in epochs_iter:\n", " # Within each iteration, we will go over each minibatch of data\n", " minibatch_iter = tqdm.notebook.tqdm(train_loader, desc=\"Minibatch\", leave=False)\n", " for x_batch, y_batch in minibatch_iter:\n", " optimizer.zero_grad()\n", " output = model(x_batch)\n", " loss = -mll(output, y_batch)\n", " minibatch_iter.set_postfix(loss=loss.item())\n", " loss.backward()\n", " optimizer.step()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Making Predictions\n", "\n", "The next cell gets the predictive covariance for the test set (and also technically gets the predictive mean, stored in `preds.mean()`). Because the test set is substantially smaller than the training set, we don't need to make predictions in mini batches here, although this can be done by passing in minibatches of `test_x` rather than the full tensor." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "model.eval()\n", "likelihood.eval()\n", "means = torch.tensor([0.])\n", "with torch.no_grad():\n", " for x_batch, y_batch in test_loader:\n", " preds = model(x_batch)\n", " means = torch.cat([means, preds.mean.cpu()])\n", "means = means[1:]" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Test MAE: 0.11148034781217575\n" ] } ], "source": [ "print('Test MAE: {}'.format(torch.mean(torch.abs(means - test_y.cpu()))))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "anaconda-cloud": {}, "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.9" } }, "nbformat": 4, "nbformat_minor": 2 }