Converting Variational Models to TorchScript¶
The purpose of this notebook is to demonstrate how to convert a variational GPyTorch model to a ScriptModule that can e.g. be exported to LibTorch.
In general the process is quite similar to standard torch models, where we will trace them using
torch.jit.trace. However there are two key differences:
The first time you make predictions with a GPyTorch model (exact or approximate), we cache certain computations. These computations can’t be traced, but the results of them can be. Therefore, we’ll need to pass data through the untraced model once, and then trace the model.
You can’t trace models that return Distribution objects. Therefore, we’ll write a simple wrapper than unpacks the MultivariateNormal that our GPs return in to just a mean and variance tensor.
Download Data and Define Model¶
In this tutorial, we’ll be tracing an SVGP model trained for just 10 epochs on the
elevators UCI dataset. The next two cells are copied directly from our variational tutorial, and download the data and define the variational GP model.
import torch import urllib.request import os from scipy.io import loadmat from math import floor # this is for running the notebook in our testing framework smoke_test = ('CI' in os.environ) if not smoke_test and not os.path.isfile('../elevators.mat'): print('Downloading \'elevators\' UCI dataset...') urllib.request.urlretrieve('https://drive.google.com/uc?export=download&id=1jhWL3YUHvXIaftia4qeAyDwVxo6j1alk', '../elevators.mat') if smoke_test: # this is for running the notebook in our testing framework X, y = torch.randn(1000, 18), torch.randn(1000) else: data = torch.Tensor(loadmat('../elevators.mat')['data']) X = data[:, :-1] X = X - X.min(0) X = 2 * (X / X.max(0)) - 1 y = data[:, -1] train_n = int(floor(0.8 * len(X))) train_x = X[:train_n, :].contiguous() train_y = y[:train_n].contiguous() test_x = X[train_n:, :].contiguous() test_y = y[train_n:].contiguous() if torch.cuda.is_available(): train_x, train_y, test_x, test_y = train_x.cuda(), train_y.cuda(), test_x.cuda(), test_y.cuda()
import gpytorch from gpytorch.models import ApproximateGP from gpytorch.variational import CholeskyVariationalDistribution from gpytorch.variational import VariationalStrategy class GPModel(ApproximateGP): def __init__(self, inducing_points): variational_distribution = CholeskyVariationalDistribution(inducing_points.size(0)) variational_strategy = VariationalStrategy(self, inducing_points, variational_distribution, learn_inducing_locations=True) super(GPModel, self).__init__(variational_strategy) self.mean_module = gpytorch.means.ConstantMean() self.covar_module = gpytorch.kernels.ScaleKernel(gpytorch.kernels.RBFKernel(ard_num_dims=18)) def forward(self, x): mean_x = self.mean_module(x) covar_x = self.covar_module(x) return gpytorch.distributions.MultivariateNormal(mean_x, covar_x) inducing_points = torch.randn(500, 18) model = GPModel(inducing_points=inducing_points) likelihood = gpytorch.likelihoods.GaussianLikelihood() if torch.cuda.is_available(): model = model.cuda() likelihood = likelihood.cuda()
Load a Trained Model¶
To keep things simple for this notebook, we won’t be training here. Instead, we’ll be loading the parameters for a pre-trained model on elevators that we trained in the SVGP example notebook.
if torch.cuda.is_available(): model_state_dict, likelihood_state_dict = torch.load('svgp_elevators.pt') else: model_state_dict, likelihood_state_dict = torch.load('svgp_elevators.pt', map_location='cpu') model.load_state_dict(model_state_dict) likelihood.load_state_dict(likelihood_state_dict)
<All keys matched successfully>
Define a Wrapper¶
Instead of directly tracing the GP, we’ll need to trace a PyTorch Module that returns tensors. In the next cell, we define a wrapper that calls a GP and then unpacks the resulting Distribution in to a mean and variance.
You could also return the full
covariance_matrix if you wanted that rather than the variance.
class MeanVarModelWrapper(torch.nn.Module): def __init__(self, gp): super().__init__() self.gp = gp def forward(self, x): output_dist = self.gp(x) return output_dist.mean, output_dist.variance
Trace the Model¶
In the next cell, we trace the model as normal, with the exception that we first pass data through the wrapped model so that GPyTorch can compute all of the things it needs to cache that can’t be traced. Mostly, this just involves some complex linear algebra operations for variational GPs.
Additionally, we’ll need to run with the
gpytorch.settings.trace_mode setting enabled, because PyTorch can’t trace custom autograd Functions. Note that this results in some inefficiencies, e.g. for variational models we will always compute the full predictive posterior covariance in the traced model. This is not so bad, because we can always just process minibatches of data.
Note: You’ll get a lot of warnings from the tracer. That’s fine. GPyTorch models are pretty large graphs, and include things like
.item() calls that you wouldn’t normally encounter in a basic neural network.
wrapped_model = MeanVarModelWrapper(model) with torch.no_grad(), gpytorch.settings.trace_mode(): fake_input = test_x[:1024, :] pred = wrapped_model(fake_input) # Compute caches traced_model = torch.jit.trace(wrapped_model, fake_input)
## Compute Errors on a minibatch mean1 = wrapped_model(test_x[:1024, :]) mean2 = traced_model(test_x[:1024, :]) print(torch.mean(torch.abs(mean1 - test_y[:1024]))) print(torch.mean(torch.abs(mean2 - test_y[:1024])))
tensor(0.0756, device='cuda:0', grad_fn=<MeanBackward0>) tensor(0.0756, device='cuda:0', grad_fn=<MeanBackward0>)