gpytorch.variational¶
There are many possible variants of variational/approximate GPs. GPyTorch makes use of 3 composible objects that make it possible to implement most GP approximations:
VariationalDistribution
, which define the form of the approximate inducing value posterior \(q(\mathbf u)\).VarationalStrategies
, which define how to compute \(q(\mathbf f(\mathbf X))\) from \(q(\mathbf u)\)._ApproximateMarginalLogLikelihood
, which defines the objective function to learn the approximate posterior (e.g. variational ELBO).
All three of these objects should be used in conjunction with a gpytorch.models.ApproximateGP
model.
Variational Strategies¶
VariationalStrategy objects control how certain aspects of variational inference should be performed. In particular, they define two methods that get used during variational inference:
 The
prior_distribution()
method determines how to compute the GP prior distribution of the inducing points, e.g. \(p(u) \sim N(\mu(X_u), K(X_u, X_u))\). Most commonly, this is done simply by calling the user defined GP prior on the inducing point data directly.  The
forward()
method determines how to marginalize out the inducing point function values. Specifically, forward defines how to transform a variational distribution over the inducing point values, \(q(u)\), in to a variational distribution over the function values at specified locations x, \(q(fx)\), by integrating \(\int p(fx, u)q(u)du\)
In GPyTorch, we currently support two categories of this latter functionality. In scenarios where the inducing points are learned (or set to be exactly the training data), we apply the derivation in Hensman et al., 2015 to exactly marginalize out the variational distribution. When the inducing points are constrained to a grid, we apply the derivation in Wilson et al., 2016 and exploit a deterministic relationship between \(\mathbf f\) and \(\mathbf u\).
_VariationalStrategy¶

class
gpytorch.variational.
_VariationalStrategy
(model, inducing_points, variational_distribution, learn_inducing_locations=True)[source]¶ Abstract base class for all Variational Strategies.

forward
(x, inducing_points, inducing_values, variational_inducing_covar=None)[source]¶ The
forward()
method determines how to marginalize out the inducing point function values. Specifically, forward defines how to transform a variational distribution over the inducing point values, \(q(u)\), in to a variational distribution over the function values at specified locations x, \(q(fx)\), by integrating \(\int p(fx, u)q(u)du\)Parameters:  x (torch.Tensor) – Locations \(\mathbf X\) to get the variational posterior of the function values at.
 inducing_points (torch.Tensor) – Locations \(\mathbf Z\) of the inducing points
 inducing_values (torch.Tensor) – Samples of the inducing function values \(\mathbf u\) (or the mean of the distribution \(q(\mathbf u)\) if q is a Gaussian.
 variational_inducing_covar (LazyTensor) – If the distribuiton \(q(\mathbf u)\)
is Gaussian, then this variable is the covariance matrix of that Gaussian. Otherwise, it will be
None
.
Return type: Returns: The distribution \(q( \mathbf f(\mathbf X))\)

kl_divergence
()[source]¶ Compute the KL divergence between the variational inducing distribution \(q(\mathbf u)\) and the prior inducing distribution \(p(\mathbf u)\).
Return type: torch.Tensor

prior_distribution
¶ The
prior_distribution()
method determines how to compute the GP prior distribution of the inducing points, e.g. \(p(u) \sim N(\mu(X_u), K(X_u, X_u))\). Most commonly, this is done simply by calling the user defined GP prior on the inducing point data directly.Return type: MultivariateNormal
Returns: The distribution \(p( \mathbf u)\)

VariationalStrategy¶

class
gpytorch.variational.
VariationalStrategy
(model, inducing_points, variational_distribution, learn_inducing_locations=True)[source]¶ The standard variational strategy, as defined by Hensman et al. (2015). This strategy takes a set of \(m \ll n\) inducing points \(\mathbf Z\) and applies an approximate distribution \(q( \mathbf u)\) over their function values. (Here, we use the common notation \(\mathbf u = f(\mathbf Z)\). The approximate function distribution for any abitrary input \(\mathbf X\) is given by:
\[q( f(\mathbf X) ) = \int p( f(\mathbf X) \mid \mathbf u) q(\mathbf u) \: d\mathbf u\]This variational strategy uses “whitening” to accelerate the optimization of the variational parameters. See Matthews (2017) for more info.
Parameters:  model (ApproximateGP) – Model this strategy is applied to. Typically passed in when the VariationalStrategy is created in the __init__ method of the user defined model.
 inducing_points (torch.Tensor) – Tensor containing a set of inducing points to use for variational inference.
 variational_distribution (VariationalDistribution) – A VariationalDistribution object that represents the form of the variational distribution \(q(\mathbf u)\)
 learn_inducing_points (bool) – (optional, default True): Whether or not the inducing point locations \(\mathbf Z\) should be learned (i.e. are they parameters of the model).
MultitaskVariationalStrategy¶

class
gpytorch.variational.
MultitaskVariationalStrategy
(base_variational_strategy, num_tasks, task_dim=1)[source]¶ MultitaskVariationalStrategy wraps an existing
VariationalStrategy
to product aMultitaskMultivariateNormal
distribution. This is useful for multioutput variational models.The base variational strategy is assumed to operate on a batch of GPs. One of the batch dimensions corresponds to the multiple tasks.
Parameters:  base_variational_strategy (VariationalStrategy) – Base variational strategy
 task_dim (int) – (default=1) Which batch dimension is the task dimension
OrthogonallyDecoupledVariationalStrategy¶

class
gpytorch.variational.
OrthogonallyDecoupledVariationalStrategy
(model, inducing_points, variational_distribution)[source]¶ Implements orthogonally decoupled VGPs as defined in Salimbeni et al. (2018). This variational strategy uses a different set of inducing points for the mean and covariance functions. The idea is to use more inducing points for the (computationally efficient) mean and fewer inducing points for the (computationally expensive) covaraince.
This variational strategy defines the inducing points/
_VariationalDistribution
for the mean function. It then wraps a different_VariationalStrategy
which defines the covariance inducing points. Example:
>>> mean_inducing_points = torch.randn(1000, train_x.size(1), dtype=train_x.dtype, device=train_x.device) >>> covar_inducing_points = torch.randn(100, train_x.size(1), dtype=train_x.dtype, device=train_x.device) >>> >>> covar_variational_strategy = gpytorch.variational.VariationalStrategy( >>> model, covar_inducing_points, >>> gpytorch.variational.CholeskyVariationalDistribution(covar_inducing_points.size(2)), >>> learn_inducing_locations=True >>> ) >>> >>> variational_strategy = gpytorch.variational.OrthogonallyDecoupledVariationalStrategy( >>> covar_variational_strategy, mean_inducing_points, >>> gpytorch.variational.DeltaVariationalDistribution(mean_inducing_points.size(2)), >>> )
UnwhitenedVariationalStrategy¶

class
gpytorch.variational.
UnwhitenedVariationalStrategy
(model, inducing_points, variational_distribution, learn_inducing_locations=True)[source]¶ Similar to
VariationalStrategy
, but does not perform the whitening operation. In almost all casesVariationalStrategy
is preferable, with a few exceptions: When the inducing points are exactly equal to the training points (i.e. \(\mathbf Z = \mathbf X\)). Unwhitened models are faster in this case.
 When the number of inducing points is very large (e.g. >2000). Unwhitened models can use CG for faster computation.
Parameters:  model (ApproximateGP) – Model this strategy is applied to. Typically passed in when the VariationalStrategy is created in the __init__ method of the user defined model.
 inducing_points (torch.Tensor) – Tensor containing a set of inducing points to use for variational inference.
 variational_distribution (VariationalDistribution) – A VariationalDistribution object that represents the form of the variational distribution \(q(\mathbf u)\)
 learn_inducing_points (bool) – (optional, default True): Whether or not the inducing point locations \(\mathbf Z\) should be learned (i.e. are they parameters of the model).
GridInterpolationVariationalStrategy¶

class
gpytorch.variational.
GridInterpolationVariationalStrategy
(model, grid_size, grid_bounds, variational_distribution)[source]¶ This strategy constrains the inducing points to a grid and applies a deterministic relationship between \(\mathbf f\) and \(\mathbf u\). It was introduced by Wilson et al. (2016).
Here, the inducing points are not learned. Instead, the strategy automatically creates inducing points based on a set of grid sizes and grid bounds.
Parameters:  model (ApproximateGP) – Model this strategy is applied to. Typically passed in when the VariationalStrategy is created in the __init__ method of the user defined model.
 grid_size (int) – Size of the grid
 grid_bounds (list) – Bounds of each dimension of the grid (should be a list of (float, float) tuples)
 variational_distribution (VariationalDistribution) – A VariationalDistribution object that represents the form of the variational distribution \(q(\mathbf u)\)
Variational Distributions¶
VariationalDistribution objects represent the variational distribution \(q(\mathbf u)\) over a set of inducing points for GPs. Typically the distributions are some sort of parameterization of a multivariate normal distributions.
_VariationalDistribution¶

class
gpytorch.variational.
_VariationalDistribution
(num_inducing_points, batch_shape=<MagicMock name='mock()' id='140227764346384'>, mean_init_std=0.001)[source]¶ Abstract base class for all Variational Distributions.

forward
()[source]¶ Constructs and returns the variational distribution
Return type: MultivariateNormal
Returns: The distribution :math:q(mathbf u)”

CholeskyVariationalDistribution¶

class
gpytorch.variational.
CholeskyVariationalDistribution
(num_inducing_points, batch_shape=<MagicMock name='mock()' id='140227762644752'>, mean_init_std=0.001, **kwargs)[source]¶ A
_VariationalDistribution
that is defined to be a multivariate normal distribution with a full covariance matrix.The most common way this distribution is defined is to parameterize it in terms of a mean vector and a covariance matrix. In order to ensure that the covariance matrix remains positive definite, we only consider the lower triangle.
Parameters:  num_inducing_points (int) – Size of the variational distribution. This implies that the variational mean should be this size, and the variational covariance matrix should have this many rows and columns.
 batch_shape (torch.Size) – (Optional.) Specifies an optional batch size for the variational parameters. This is useful for example when doing additive variational inference.
 mean_init_std (float) – (default=1e3) Standard deviation of gaussian noise to add to the mean initialization.
DeltaVariationalDistribution¶

class
gpytorch.variational.
DeltaVariationalDistribution
(num_inducing_points, batch_shape=<MagicMock name='mock()' id='140227763291752'>, mean_init_std=0.001, **kwargs)[source]¶ This
_VariationalDistribution
object replaces a variational distribution with a single particle. It is equivalent to doing MAP inference.Parameters:  num_inducing_points (int) – Size of the variational distribution. This implies that the variational mean should be this size.
 batch_shape (torch.Size) – (Optional.) Specifies an optional batch size for the variational parameters. This is useful for example when doing additive variational inference.
 mean_init_std (float) – (default=1e3) Standard deviation of gaussian noise to add to the mean initialization.
MeanFieldVariationalDistribution¶

class
gpytorch.variational.
MeanFieldVariationalDistribution
(num_inducing_points, batch_shape=<MagicMock name='mock()' id='140227764483296'>, mean_init_std=0.001, **kwargs)[source]¶ A
_VariationalDistribution
that is defined to be a multivariate normal distribution with a diagonal covariance matrix. This will not be as flexible/expressive as aCholeskyVariationalDistribution
.Parameters:  num_inducing_points (int) – Size of the variational distribution. This implies that the variational mean should be this size, and the variational covariance matrix should have this many rows and columns.
 batch_shape (torch.Size) – (Optional.) Specifies an optional batch size for the variational parameters. This is useful for example when doing additive variational inference.
 mean_init_std (float) – (default=1e3) Standard deviation of gaussian noise to add to the mean initialization.