Converting Variational Models to TorchScript

The purpose of this notebook is to demonstrate how to convert a variational GPyTorch model to a ScriptModule that can e.g. be exported to LibTorch.

In general the process is quite similar to standard torch models, where we will trace them using torch.jit.trace. However there are two key differences:

  1. The first time you make predictions with a GPyTorch model (exact or approximate), we cache certain computations. These computations can’t be traced, but the results of them can be. Therefore, we’ll need to pass data through the untraced model once, and then trace the model.
  2. You can’t trace models that return Distribution objects. Therefore, we’ll write a simple wrapper than unpacks the MultivariateNormal that our GPs return in to just a mean and variance tensor.

Download Data and Define Model

In this tutorial, we’ll be tracing an SVGP model trained for just 10 epochs on the elevators UCI dataset. The next two cells are copied directly from our variational tutorial, and download the data and define the variational GP model.

[17]:
import torch
import urllib.request
import os
from scipy.io import loadmat
from math import floor


# this is for running the notebook in our testing framework
smoke_test = ('CI' in os.environ)

if not smoke_test and not os.path.isfile('../elevators.mat'):
    print('Downloading \'elevators\' UCI dataset...')
    urllib.request.urlretrieve('https://drive.google.com/uc?export=download&id=1jhWL3YUHvXIaftia4qeAyDwVxo6j1alk', '../elevators.mat')


if smoke_test:  # this is for running the notebook in our testing framework
    X, y = torch.randn(1000, 18), torch.randn(1000)
else:
    data = torch.Tensor(loadmat('../elevators.mat')['data'])
    X = data[:, :-1]
    X = X - X.min(0)[0]
    X = 2 * (X / X.max(0)[0]) - 1
    y = data[:, -1]


train_n = int(floor(0.8 * len(X)))
train_x = X[:train_n, :].contiguous()
train_y = y[:train_n].contiguous()

test_x = X[train_n:, :].contiguous()
test_y = y[train_n:].contiguous()

if torch.cuda.is_available():
    train_x, train_y, test_x, test_y = train_x.cuda(), train_y.cuda(), test_x.cuda(), test_y.cuda()
[18]:
import gpytorch

from gpytorch.models import ApproximateGP
from gpytorch.variational import CholeskyVariationalDistribution
from gpytorch.variational import VariationalStrategy

class GPModel(ApproximateGP):
    def __init__(self, inducing_points):
        variational_distribution = CholeskyVariationalDistribution(inducing_points.size(0))
        variational_strategy = VariationalStrategy(self, inducing_points, variational_distribution, learn_inducing_locations=True)
        super(GPModel, self).__init__(variational_strategy)
        self.mean_module = gpytorch.means.ConstantMean()
        self.covar_module = gpytorch.kernels.ScaleKernel(gpytorch.kernels.RBFKernel(ard_num_dims=18))

    def forward(self, x):
        mean_x = self.mean_module(x)
        covar_x = self.covar_module(x)
        return gpytorch.distributions.MultivariateNormal(mean_x, covar_x)

inducing_points = torch.randn(500, 18)
model = GPModel(inducing_points=inducing_points)
likelihood = gpytorch.likelihoods.GaussianLikelihood()

if torch.cuda.is_available():
    model = model.cuda()
    likelihood = likelihood.cuda()

Load a Trained Model

To keep things simple for this notebook, we won’t be training here. Instead, we’ll be loading the parameters for a pre-trained model on elevators that we trained in the SVGP example notebook.

[19]:
if torch.cuda.is_available():
    model_state_dict, likelihood_state_dict = torch.load('svgp_elevators.pt')
else:
    model_state_dict, likelihood_state_dict = torch.load('svgp_elevators.pt', map_location='cpu')
model.load_state_dict(model_state_dict)
likelihood.load_state_dict(likelihood_state_dict)

[19]:
<All keys matched successfully>

Define a Wrapper

Instead of directly tracing the GP, we’ll need to trace a PyTorch Module that returns tensors. In the next cell, we define a wrapper that calls a GP and then unpacks the resulting Distribution in to a mean and variance.

You could also return the full covariance_matrix if you wanted that rather than the variance.

[20]:
class MeanVarModelWrapper(torch.nn.Module):
    def __init__(self, gp):
        super().__init__()
        self.gp = gp

    def forward(self, x):
        output_dist = self.gp(x)
        return output_dist.mean, output_dist.variance

Trace the Model

In the next cell, we trace the model as normal, with the exception that we first pass data through the wrapped model so that GPyTorch can compute all of the things it needs to cache that can’t be traced. Mostly, this just involves some complex linear algebra operations for variational GPs.

Additionally, we’ll need to run with the gpytorch.settings.trace_mode setting enabled, because PyTorch can’t trace custom autograd Functions. Note that this results in some inefficiencies, e.g. for variational models we will always compute the full predictive posterior covariance in the traced model. This is not so bad, because we can always just process minibatches of data.

Note: You’ll get a lot of warnings from the tracer. That’s fine. GPyTorch models are pretty large graphs, and include things like .item() calls that you wouldn’t normally encounter in a basic neural network.

[21]:
wrapped_model = MeanVarModelWrapper(model)

with torch.no_grad(), gpytorch.settings.trace_mode():
    fake_input = test_x[:1024, :]
    pred = wrapped_model(fake_input)  # Compute caches
    traced_model = torch.jit.trace(wrapped_model, fake_input)
[22]:
## Compute Errors on a minibatch

mean1 = wrapped_model(test_x[:1024, :])[0]
mean2 = traced_model(test_x[:1024, :])[0]

print(torch.mean(torch.abs(mean1 - test_y[:1024])))
print(torch.mean(torch.abs(mean2 - test_y[:1024])))
tensor(0.0756, device='cuda:0', grad_fn=<MeanBackward0>)
tensor(0.0756, device='cuda:0', grad_fn=<MeanBackward0>)
[23]:
traced_model.save('traced_model.pt')
[ ]: