gpytorch.kernels¶

If you don’t know what kernel to use, we recommend that you start out with a gpytorch.kernels.ScaleKernel(gpytorch.kernels.RBFKernel()) + gpytorch.kernels.ConstantKernel().

Kernel¶

class gpytorch.kernels.Kernel(ard_num_dims=None, batch_shape=None, active_dims=None, lengthscale_prior=None, lengthscale_constraint=None, **kwargs)[source]¶

Kernels in GPyTorch are implemented as a gpytorch.Module that, when called on two torch.Tensor objects \(\mathbf x_1\) and \(\mathbf x_2\) returns either a torch.Tensor or a LinearOperator that represents the covariance matrix between \(\mathbf x_1\) and \(\mathbf x_2\).

In the typical use case, extend this class simply requires implementing a forward() method.

Note

The __call__() method does some additional internal work. In particular, all kernels are lazily evaluated so that we can index in to the kernel matrix before actually computing it. Furthermore, many built-in kernel modules return LinearOperators that allow for more efficient inference than if we explicitly computed the kernel matrix itself.

As a result, if you want to get an actual torch.tensor representing the covariance matrix, you may need to call the to_dense() method on the output.

This base Kernel class includes a lengthscale parameter \(\Theta\), which is used by many common kernel functions. There are a few options for the lengthscale:

Default: No lengthscale (i.e. \(\Theta\) is the identity matrix).
Single lengthscale: One lengthscale can be applied to all input dimensions/batches (i.e. \(\Theta\) is a constant diagonal matrix). This is controlled by setting the attribute has_lengthscale=True.
ARD: Each input dimension gets its own separate lengthscale (i.e. \(\Theta\) is a non-constant diagonal matrix). This is controlled by the ard_num_dims keyword argument (as well as has_lengthscale=True).

In batch mode (i.e. when \(\mathbf x_1\) and \(\mathbf x_2\) are batches of input matrices), each batch of data can have its own lengthscale parameter by setting the batch_shape keyword argument to the appropriate number of batches.

Note

You can set a prior on the lengthscale parameter using the lengthscale_prior argument.

Parameters:

ard_num_dims (Optional) – Set this if you want a separate lengthscale for each input dimension. It should be D if \(\mathbf x\) is a … x N x D matrix. (Default: None.)
batch_shape (Optional) – Set this if you want a separate lengthscale for each batch of input data. It should be \(B_1 \times \ldots \times B_k\) if \(\mathbf x_1\) is a \(B_1 \times \ldots \times B_k \times N \times D\) tensor.
active_dims (Optional) – Set this if you want to compute the covariance of only a few input dimensions. The ints corresponds to the indices of the dimensions. (Default: None.)
lengthscale_prior (Optional) – Set this if you want to apply a prior to the lengthscale parameter. (Default: None.)
lengthscale_constraint (Optional) – Set this if you want to apply a constraint to the lengthscale parameter. (Default: Positive.)

Variables:

batch_shape (torch.Size) – The (minimum) number of batch dimensions supported by this kernel. Typically, this captures the batch shape of the lengthscale and other parameters, and is usually set by the batch_shape argument in the constructor.
dtype (torch.dtype) – The dtype supported by this kernel. Typically, this depends on the dtype of the lengthscale and other parameters.
is_stationary (bool) – Set to True if the Kernel represents a stationary function (one that depends only on \(\mathbf x_1 - \mathbf x_2\)).
lengthscale (torch.Tensor) – The lengthscale parameter. Size/shape of parameter depends on the ard_num_dims and batch_shape arguments.

Example

>>> covar_module = gpytorch.kernels.LinearKernel()
>>> x1 = torch.randn(50, 3)
>>> lazy_covar_matrix = covar_module(x1) # Returns a RootLinearOperator
>>> tensor_covar_matrix = lazy_covar_matrix.to_dense() # Gets the actual tensor for this kernel matrix

__call__(x1, x2=None, diag=False, last_dim_is_batch=False, **params)[source]¶

Computes the covariance between \(\mathbf x_1\) and \(\mathbf x_2\).

Note

Following PyTorch convention, all GP objects should use __call__ rather than forward(). The __call__ method applies additional pre- and post-processing to the forward method, and additionally employs a lazy evaluation scheme to reduce memory and computational costs.

Parameters:

x1 (Tensor) – First set of data (… x N x D).
x2 (Optional) – Second set of data (… x M x D). (If None, then x2 is set to x1.)
diag (bool) – Should the Kernel compute the whole kernel, or just the diag? If True, it must be the case that x1 == x2. (Default: False.)
last_dim_is_batch (bool) – If True, treat the last dimension of x1 and x2 as another batch dimension. (Useful for additive structure over the dimensions). (Default: False.)

Return type:

gpytorch.lazy.lazy_evaluated_kernel_tensor.LazyEvaluatedKernelTensor | linear_operator.operators._linear_operator.LinearOperator | torch.Tensor

Returns:

An object that will lazily evaluate to the kernel matrix or vector. The shape depends on the kernel’s evaluation mode:

full_covar: … x N x M
full_covar with last_dim_is_batch=True: … x K x N x M
diag: … x N
diag with last_dim_is_batch=True: … x K x N

__getitem__(index)[source]¶

Constructs a new kernel where the lengthscale (and other kernel parameters) are modified by an indexing operation.

Parameters:: index – Index to apply to all parameters.
Return type:: Kernel

covar_dist(x1, x2, diag=False, last_dim_is_batch=False, square_dist=False, **params)[source]¶

This is a helper method for computing the Euclidean distance between all pairs of points in \(\mathbf x_1\) and \(\mathbf x_2\).

Parameters:

x1 (Tensor) – First set of data (… x N x D).
x2 (Tensor) – Second set of data (… x M x D).
diag (bool) – Should the Kernel compute the whole kernel, or just the diag? If True, it must be the case that x1 == x2. (Default: False.)
last_dim_is_batch (bool) – If True, treat the last dimension of x1 and x2 as another batch dimension. (Useful for additive structure over the dimensions). (Default: False.)
square_dist (bool) – If True, returns the squared distance rather than the standard distance. (Default: False.)

Return type:

Tensor

Returns:

The kernel matrix or vector. The shape depends on the kernel’s evaluation mode:

full_covar: … x N x M
full_covar with last_dim_is_batch=True: … x K x N x M
diag: … x N
diag with last_dim_is_batch=True: … x K x N

expand_batch(*sizes)[source]¶

Constructs a new kernel where the lengthscale (and other kernel parameters) are expanded to match the batch dimension determined by sizes.

Parameters:: sizes (torch.Size | tuple[int, ...]) – The batch shape of the new tensor
Return type:: Kernel

abstract forward(x1, x2, diag=False, last_dim_is_batch=False, **params)[source]¶

Computes the covariance between \(\mathbf x_1\) and \(\mathbf x_2\). This method should be implemented by all Kernel subclasses.

Parameters:

x1 (Tensor) – First set of data (… x N x D).
x2 (Tensor) – Second set of data (… x M x D).
diag (bool) – Should the Kernel compute the whole kernel, or just the diag? If True, it must be the case that x1 == x2. (Default: False.)
last_dim_is_batch (bool) – If True, treat the last dimension of x1 and x2 as another batch dimension. (Useful for additive structure over the dimensions). (Default: False.)

Return type:

torch.Tensor | linear_operator.operators._linear_operator.LinearOperator

Returns:

The kernel matrix or vector. The shape depends on the kernel’s evaluation mode:

full_covar: … x N x M
full_covar with last_dim_is_batch=True: … x K x N x M
diag: … x N
diag with last_dim_is_batch=True: … x K x N

named_sub_kernels()[source]¶

For compositional Kernel classes (e.g. AdditiveKernel or ProductKernel).

Return type:: collections.abc.Iterable
Returns:: An iterator over the component kernel objects, along with the name of each component kernel.

num_outputs_per_input(x1, x2)[source]¶

For most kernels, num_outputs_per_input = 1.

However, some kernels (e.g. multitask kernels or interdomain kernels) return a num_outputs_per_input x num_outputs_per_input matrix of covariance values for every pair of data points.

I.e. if x1 is size … x N x D and x2 is size … x M x D, then the size of the kernel will be … x (N * num_outputs_per_input) x (M * num_outputs_per_input).

Return type:

int

Returns:

num_outputs_per_input (usually 1).

Parameters:

x1 (Tensor) –
x2 (Tensor) –

sub_kernels()[source]¶

For compositional Kernel classes (e.g. AdditiveKernel or ProductKernel).

Return type:: collections.abc.Iterable
Returns:: An iterator over the component kernel objects.

Standard Kernels¶

ConstantKernel¶

class gpytorch.kernels.ConstantKernel(batch_shape=None, constant_prior=None, constant_constraint=None, active_dims=None)[source]¶

Constant covariance kernel for the probabilistic inference of constant coefficients.

ConstantKernel represents the prior variance k(x1, x2) = var(c) of a constant c. The prior variance of the constant is optimized during the GP hyper-parameter optimization stage. The actual value of the constant is computed (implicitly) using the linear algebraic approaches for the computation of GP samples and posteriors.

The constant kernel k_constant is most useful as a modification of an arbitrary base kernel k_base: 1) Additive constants: The modification k_base + k_constant allows the GP to infer a non-zero asymptotic value far from the training data, which generally leads to more accurate extrapolation. Notably, the uncertainty in this constant value affects the posterior covariances through the posterior inference equations. This is not the case when a constant prior mean is not used, since the prior mean does not show up the posterior covariance and is regularized by the log-determinant during the optimization of the marginal likelihood. 2) Multiplicative constants: The modification k_base * k_constant allows the GP to modulate the variance of the kernel k_base, and is mathematically identical to ScaleKernel(base_kernel) with the same constant.

Parameters:

batch_shape (Optional) –
constant_prior (Optional) –
constant_constraint (Optional) –
active_dims (Optional) –

forward(x1, x2, diag=False, last_dim_is_batch=False)[source]¶

Evaluates the constant kernel.

Parameters:

x1 (Tensor) – First input tensor of shape (batch_shape x n1 x d).
x2 (Tensor) – Second input tensor of shape (batch_shape x n2 x d).
diag (bool | None) – If True, returns the diagonal of the covariance matrix.
last_dim_is_batch (bool | None) – If True, the last dimension of size d of the input tensors are treated as a batch dimension.

Return type:

Tensor

Returns:

A (batch_shape x n1 x n2)-dim, resp. (batch_shape x n1)-dim, tensor of constant covariance values if diag is False, resp. True.

CosineKernel¶

class gpytorch.kernels.CosineKernel(period_length_prior=None, period_length_constraint=None, **kwargs)[source]¶

Computes a covariance matrix based on the cosine kernel between inputs \(\mathbf{x_1}\) and \(\mathbf{x_2}\):

\[\begin{equation*} k_{\text{Cosine}}(\mathbf{x_1}, \mathbf{x_2}) = \cos \left( \pi \Vert \mathbf{x_1} - \mathbf{x_2} \Vert_2 / p \right) \end{equation*}\]

where \(p\) is the period length parameter.

Parameters:

batch_shape (torch.Size, optional) – Set this if you want a separate lengthscale for each batch of input data. It should be b if x1 is a b x n x d tensor. Default: torch.Size([])
active_dims (tuple of ints, optional) – Set this if you want to compute the covariance of only a few input dimensions. The ints corresponds to the indices of the dimensions. Default: None.
period_length_prior (Prior, optional) – Set this if you want to apply a prior to the period length parameter. Default: None
period_length_constraint (Constraint, optional) – Set this if you want to apply a constraint to the period length parameter. Default: Positive.

period_length¶

The period length parameter. Size = *batch_shape x 1 x 1.

Type:: Tensor

Example

>>> x = torch.randn(10, 5)
>>> # Non-batch: Simple option
>>> covar_module = gpytorch.kernels.ScaleKernel(gpytorch.kernels.CosineKernel())
>>>
>>> batch_x = torch.randn(2, 10, 5)
>>> # Batch: Simple option
>>> covar_module = gpytorch.kernels.ScaleKernel(gpytorch.kernels.CosineKernel())
>>> # Batch: different lengthscale for each batch
>>> covar_module = gpytorch.kernels.ScaleKernel(gpytorch.kernels.CosineKernel(batch_shape=torch.Size([2])))
>>> covar = covar_module(x)  # Output: LazyVariable of size (2 x 10 x 10)

CylindricalKernel¶

class gpytorch.kernels.CylindricalKernel(num_angular_weights, radial_base_kernel, eps=1e-06, angular_weights_prior=None, angular_weights_constraint=None, alpha_prior=None, alpha_constraint=None, beta_prior=None, beta_constraint=None, **kwargs)[source]¶

Computes a covariance matrix based on the Cylindrical Kernel between inputs \(\mathbf{x_1}\) and \(\mathbf{x_2}\). It was proposed in BOCK: Bayesian Optimization with Cylindrical Kernels. See http://proceedings.mlr.press/v80/oh18a.html for more details

Note

The data must lie completely within the unit ball.

Parameters:

num_angular_weights (int) – The number of components in the angular kernel
radial_base_kernel (gpytorch.kernel) – The base kernel for computing the radial kernel
batch_size (int, optional) – Set this if the data is batch of input data. It should be b if x1 is a b x n x d tensor. Default: 1
eps (float) – Small floating point number used to improve numerical stability in kernel computations. Default: 1e-6
param_transform (function, optional) – Set this if you want to use something other than softplus to ensure positiveness of parameters.
inv_param_transform (function, optional) – Set this to allow setting parameters directly in transformed space and sampling from priors. Automatically inferred for common transformations such as torch.exp or torch.nn.functional.softplus.
angular_weights_prior (Optional) –
angular_weights_constraint (Optional) –
alpha_prior (Optional) –
alpha_constraint (Optional) –
beta_prior (Optional) –
beta_constraint (Optional) –

GibbsKernel¶

class gpytorch.kernels.GibbsKernel(lengthscale_fn, **kwargs)[source]¶

Gibbs kernel with input-dependent lengthscale \(\ell(x)\) (Gibbs, 1997)

\[k(x, x') = \sqrt{\frac{2\ell(x)\ell(x')}{\ell(x)^2 + \ell(x')^2}} \exp\left(-\frac{(x-x')^2}{\ell(x)^2 + \ell(x')^2}\right)\]

Parameters:: lengthscale_fn (torch.nn.Module) – A callable torch.nn.Module mapping inputs to positive lengthscales. Must output tensors of shape (… x N x 1) for input of shape (… x N x D)

Example:

class LengthscaleMLP(torch.nn.Module):
    def __init__(self, in_dim=1, hidden=32):
        super().__init__()
        self.net = torch.nn.Sequential(
            torch.nn.Linear(in_dim, hidden),
            torch.nn.ReLU(),
            torch.nn.Linear(hidden, 1),
            torch.nn.Softplus(),
        )

    def forward(self, x):
        return self.net(x)

kernel = GibbsKernel(lengthscale_fn=LengthscaleMLP(in_dim=1))

LinearKernel¶

class gpytorch.kernels.LinearKernel(ard_num_dims=None, variance_prior=None, variance_constraint=None, **kwargs)[source]¶

Computes a covariance matrix based on the Linear kernel between inputs \(\mathbf{x_1}\) and \(\mathbf{x_2}\):

\[\begin{equation*} k_\text{Linear}(\mathbf{x_1}, \mathbf{x_2}) = v\mathbf{x_1}^\top \mathbf{x_2}. \end{equation*}\]

where

\(v\) is a variance parameter.

Note

To implement this efficiently (when \(D < N\)), we use a RootLinearOperator during training and a MatmulLinearOperator during test. These lazy tensors represent matrices of the form \(\mathbf K = \mathbf X \mathbf X^{\prime \top}\). This makes inference efficient because a matrix-vector product \(\mathbf K \mathbf v\) can be computed as \(\mathbf K \mathbf v = \mathbf X( \mathbf X^{\prime \top} \mathbf v)\), where the base multiply \(\mathbf X \mathbf v\) takes only \(\mathcal O(ND)\) time and space.

Parameters:

ard_num_dims (Optional) – Set this if you want a separate variance priors for each weight. (Default: None)
variance_prior (Optional) – Prior over the variance parameter. (Default None.)
variance_constraint (Optional) – Constraint to place on variance parameter. (Default: Positive.)
active_dims – List of data dimensions to operate on.

MaternKernel¶

class gpytorch.kernels.MaternKernel(nu=2.5, **kwargs)[source]¶

Computes a covariance matrix based on the Matern kernel between inputs \(\mathbf{x_1}\) and \(\mathbf{x_2}\):

\[\begin{equation*} k_{\text{Matern}}(\mathbf{x_1}, \mathbf{x_2}) = \frac{2^{1 - \nu}}{\Gamma(\nu)} \left( \sqrt{2 \nu} d \right)^{\nu} K_\nu \left( \sqrt{2 \nu} d \right) \end{equation*}\]

where

\(d = \sqrt{(\mathbf{x_1} - \mathbf{x_2})^\top \Theta^{-2} (\mathbf{x_1} - \mathbf{x_2})}\) is the distance between \(\mathbf{x_1}\) and \(\mathbf{x_2}\) scaled by the lengthscale parameter \(\Theta\).
\(\nu\) is a smoothness parameter (takes values 1/2, 3/2, or 5/2). Smaller values are less smooth.
\(K_\nu\) is a modified Bessel function.

There are a few options for the lengthscale parameter \(\Theta\): See gpytorch.kernels.Kernel for descriptions of the lengthscale options.

Note

This kernel does not have an outputscale parameter. To add a scaling parameter, decorate this kernel with a gpytorch.kernels.ScaleKernel.

Parameters:

nu (float (0.5, 1.5, or 2.5)) – (Default: 2.5) The smoothness parameter.
ard_num_dims (int, optional) – (Default: None) Set this if you want a separate lengthscale for each input dimension. It should be d if x1 is a … x n x d matrix.
batch_shape (torch.Size, optional) – (Default: None) Set this if you want a separate lengthscale for each batch of input data. It should be torch.Size([b1, b2]) for a b1 x b2 x n x m kernel output.
active_dims (Tuple(int)) – (Default: None) Set this if you want to compute the covariance of only a few input dimensions. The ints corresponds to the indices of the dimensions.
lengthscale_prior (Prior, optional) – (Default: None) Set this if you want to apply a prior to the lengthscale parameter.
lengthscale_constraint (Interval, optional) – (Default: Positive) Set this if you want to apply a constraint to the lengthscale parameter.

Example

>>> x = torch.randn(10, 5)
>>> # Non-batch: Simple option
>>> covar_module = gpytorch.kernels.ScaleKernel(gpytorch.kernels.MaternKernel(nu=0.5))
>>> # Non-batch: ARD (different lengthscale for each input dimension)
>>> covar_module = gpytorch.kernels.ScaleKernel(gpytorch.kernels.MaternKernel(nu=0.5, ard_num_dims=5))
>>> covar = covar_module(x)  # Output: LazyVariable of size (10 x 10)
>>>
>>> batch_x = torch.randn(2, 10, 5)
>>> # Batch: Simple option
>>> covar_module = gpytorch.kernels.ScaleKernel(gpytorch.kernels.MaternKernel(nu=0.5))
>>> # Batch: different lengthscale for each batch
>>> covar_module = gpytorch.kernels.MaternKernel(nu=0.5, batch_shape=torch.Size([2])
>>> covar = covar_module(x)  # Output: LazyVariable of size (2 x 10 x 10)

PeriodicKernel¶

class gpytorch.kernels.PeriodicKernel(period_length_prior=None, period_length_constraint=None, **kwargs)[source]¶

Computes a covariance matrix based on the periodic kernel between inputs \(\mathbf{x_1}\) and \(\mathbf{x_2}\):

\[\begin{equation*} k_{\text{Periodic}}(\mathbf{x}, \mathbf{x'}) = \exp \left( -2 \sum_i \frac{\sin ^2 \left( \frac{\pi}{p} ({x_{i}} - {x_{i}'} ) \right)}{\lambda} \right) \end{equation*}\]

where

\(p\) is the period length parameter.
\(\lambda\) is a lengthscale parameter.

Equation is based on David Mackay’s Introduction to Gaussian Processes equation 47 (albeit without feature-specific lengthscales and period lengths). The exponential coefficient was changed and lengthscale is not squared to maintain backwards compatibility

Note

This kernel does not have an outputscale parameter. To add a scaling parameter, decorate this kernel with a gpytorch.kernels.ScaleKernel.

Parameters:

ard_num_dims (int, optional) – (Default: None) Set this if you want a separate lengthscale for each input dimension. It should be d if x1 is a … x n x d matrix.
batch_shape (torch.Size, optional) – (Default: None) Set this if you want a separate lengthscale for each batch of input data. It should be torch.Size([b1, b2]) for a b1 x b2 x n x m kernel output.
active_dims (Tuple(int)) – (Default: None) Set this if you want to compute the covariance of only a few input dimensions. The ints corresponds to the indices of the dimensions.
period_length_prior (Prior, optional) – (Default: None) Set this if you want to apply a prior to the period length parameter.
period_length_constraint (Interval, optional) – (Default: Positive) Set this if you want to apply a constraint to the period length parameter.
lengthscale_prior (Prior, optional) – (Default: None) Set this if you want to apply a prior to the lengthscale parameter.
lengthscale_constraint (Interval, optional) – (Default: Positive) Set this if you want to apply a constraint to the lengthscale parameter.

Variables:

period_length (torch.Tensor) – The period length parameter. Size/shape of parameter depends on the ard_num_dims and batch_shape arguments.

Example

>>> x = torch.randn(10, 5)
>>> # Non-batch: Simple option
>>> covar_module = gpytorch.kernels.ScaleKernel(gpytorch.kernels.PeriodicKernel())
>>>
>>> batch_x = torch.randn(2, 10, 5)
>>> # Batch: Simple option
>>> covar_module = gpytorch.kernels.ScaleKernel(gpytorch.kernels.PeriodicKernel())
>>> # Batch: different lengthscale for each batch
>>> covar_module = gpytorch.kernels.ScaleKernel(gpytorch.kernels.PeriodicKernel(batch_size=2))
>>> covar = covar_module(x)  # Output: LazyVariable of size (2 x 10 x 10)

PiecewisePolynomialKernel¶

class gpytorch.kernels.PiecewisePolynomialKernel(q=2, **kwargs)[source]¶

Computes a covariance matrix based on the Piecewise Polynomial kernel between inputs \(\mathbf{x_1}\) and \(\mathbf{x_2}\):

\[\begin{split}\begin{align} r &= \left\Vert x1 - x2 \right\Vert \\ j &= \lfloor \frac{D}{2} \rfloor + q +1 \\ K_{\text{ppD, 0}}(\mathbf{x_1}, \mathbf{x_2}) &= (1-r)^j_+ , \\ K_{\text{ppD, 1}}(\mathbf{x_1}, \mathbf{x_2}) &= (1-r)^{j+1}_+ ((j + 1)r + 1), \\ K_{\text{ppD, 2}}(\mathbf{x_1}, \mathbf{x_2}) &= (1-r)^{j+2}_+ ((1 + (j+2)r + \frac{j^2 + 4j + 3}{3}r^2), \\ K_{\text{ppD, 3}}(\mathbf{x_1}, \mathbf{x_2}) &= (1-r)^{j+3}_+ (1 + (j+3)r + \frac{6j^2 + 36j + 45}{15}r^2 + \frac{j^3 + 9j^2 + 23j +15}{15}r^3) \\ \end{align}\end{split}\]

where \(K_{\text{ppD, q}}\) is positive semidefinite in \(\mathbb{R}^{D}\) and \(q\) is the smoothness coefficient. See Rasmussen and Williams (2006) Equation 4.21.

Note

This kernel does not have an outputscale parameter. To add a scaling parameter, decorate this kernel with a gpytorch.kernels.ScaleKernel.

Parameters:

q (int (0, 1, 2 or 3)) – (default= 2) The smoothness parameter.
ard_num_dims (int, optional) – (Default: None) Set this if you want a separate lengthscale for each input dimension. It should be d if x1 is a … x n x d matrix.
batch_shape (torch.Size, optional) – (Default: None) Set this if you want a separate lengthscale for each batch of input data. It should be torch.Size([b1, b2]) for a b1 x b2 x n x m kernel output.
active_dims (Tuple(int)) – (Default: None) Set this if you want to compute the covariance of only a few input dimensions. The ints corresponds to the indices of the dimensions.
lengthscale_prior (Prior, optional) – (Default: None) Set this if you want to apply a prior to the lengthscale parameter.
lengthscale_constraint (Positive, optional) – (Default: Positive) Set this if you want to apply a constraint to the lengthscale parameter.

Example

>>> x = torch.randn(10, 5)
>>> # Non-batch option
>>> covar_module = gpytorch.kernels.ScaleKernel(
                        gpytorch.kernels.PiecewisePolynomialKernel(q = 2))
>>> # Non-batch: ARD (different lengthscale for each input dimension)
>>> covar_module = gpytorch.kernels.ScaleKernel(
                    gpytorch.kernels.PiecewisePolynomialKernel(q = 2, ard_num_dims=5)
                    )
>>> covar = covar_module(x)  # Output: LinearOperator of size (10 x 10)
>>> batch_x = torch.randn(2, 10, 5)
>>> # Batch: different lengthscale for each batch
>>> covar_module = gpytorch.kernels.ScaleKernel(
    gpytorch.kernels.PiecewisePolynomialKernel(q = 2, batch_shape=torch.Size([2]))
    )
>>> covar = covar_module(batch_x)  # Output: LinearOperator of size (2 x 10 x 10)

Parameters:: q (int | None) –

PolynomialKernel¶

class gpytorch.kernels.PolynomialKernel(power, offset_prior=None, offset_constraint=None, **kwargs)[source]¶

Computes a covariance matrix based on the Polynomial kernel between inputs \(\mathbf{x_1}\) and \(\mathbf{x_2}\):

\[\begin{equation*} k_\text{Poly}(\mathbf{x_1}, \mathbf{x_2}) = (\mathbf{x_1}^\top \mathbf{x_2} + c)^{d}. \end{equation*}\]

where

\(c\) is an offset parameter.

Parameters:

offset_prior (gpytorch.priors.Prior) – Prior over the offset parameter (default None).
offset_constraint (Constraint, optional) – Constraint to place on offset parameter. Default: Positive.
active_dims (list) – List of data dimensions to operate on. len(active_dims) should equal num_dimensions.
power (int) –

PolynomialKernelGrad¶

class gpytorch.kernels.PolynomialKernelGrad(power, offset_prior=None, offset_constraint=None, **kwargs)[source]¶

Parameters:

power (int) –
offset_prior (Optional) –
offset_constraint (Optional) –

RBFKernel¶

class gpytorch.kernels.RBFKernel(ard_num_dims=None, batch_shape=None, active_dims=None, lengthscale_prior=None, lengthscale_constraint=None, **kwargs)[source]¶

Computes a covariance matrix based on the RBF (squared exponential) kernel between inputs \(\mathbf{x_1}\) and \(\mathbf{x_2}\):

\[\begin{equation*} k_{\text{RBF}}(\mathbf{x_1}, \mathbf{x_2}) = \exp \left( -\frac{1}{2} (\mathbf{x_1} - \mathbf{x_2})^\top \Theta^{-2} (\mathbf{x_1} - \mathbf{x_2}) \right) \end{equation*}\]

where \(\Theta\) is a lengthscale parameter. See gpytorch.kernels.Kernel for descriptions of the lengthscale options.

Note

This kernel does not have an outputscale parameter. To add a scaling parameter, decorate this kernel with a gpytorch.kernels.ScaleKernel.

Parameters:

ard_num_dims (Optional) – Set this if you want a separate lengthscale for each input dimension. It should be d if \(\mathbf{x_1}\) is a n x d matrix. (Default: None.)
batch_shape (Optional) – Set this if you want a separate lengthscale for each batch of input data. It should be \(B_1 \times \ldots \times B_k\) if \(\mathbf{x_1}\) is a \(B_1 \times \ldots \times B_k \times N \times D\) tensor.
active_dims (Optional) – Set this if you want to compute the covariance of only a few input dimensions. The ints corresponds to the indices of the dimensions. (Default: None.)
lengthscale_prior (Optional) – Set this if you want to apply a prior to the lengthscale parameter. (Default: None)
lengthscale_constraint (Optional) – Set this if you want to apply a constraint to the lengthscale parameter. (Default: Positive.)

Variables:

lengthscale (torch.Tensor) – The lengthscale parameter. Size/shape of parameter depends on the ard_num_dims and batch_shape arguments.

Example

>>> x = torch.randn(10, 5)
>>> # Non-batch: Simple option
>>> covar_module = gpytorch.kernels.ScaleKernel(gpytorch.kernels.RBFKernel())
>>> # Non-batch: ARD (different lengthscale for each input dimension)
>>> covar_module = gpytorch.kernels.ScaleKernel(gpytorch.kernels.RBFKernel(ard_num_dims=5))
>>> covar = covar_module(x)  # Output: LinearOperator of size (10 x 10)
>>>
>>> batch_x = torch.randn(2, 10, 5)
>>> # Batch: Simple option
>>> covar_module = gpytorch.kernels.ScaleKernel(gpytorch.kernels.RBFKernel())
>>> # Batch: different lengthscale for each batch
>>> covar_module = gpytorch.kernels.ScaleKernel(gpytorch.kernels.RBFKernel(batch_shape=torch.Size([2])))
>>> covar = covar_module(x)  # Output: LinearOperator of size (2 x 10 x 10)

RQKernel¶

class gpytorch.kernels.RQKernel(alpha_constraint=None, **kwargs)[source]¶

Computes a covariance matrix based on the rational quadratic kernel between inputs \(\mathbf{x_1}\) and \(\mathbf{x_2}\):

\[\begin{equation*} k_{\text{RQ}}(\mathbf{x_1}, \mathbf{x_2}) = \left(1 + \frac{1}{2\alpha} (\mathbf{x_1} - \mathbf{x_2})^\top \Theta^{-2} (\mathbf{x_1} - \mathbf{x_2}) \right)^{-\alpha} \end{equation*}\]

where \(\Theta\) is a lengthscale parameter, and \(\alpha\) is the rational quadratic relative weighting parameter. See gpytorch.kernels.Kernel for descriptions of the lengthscale options.

Note

This kernel does not have an outputscale parameter. To add a scaling parameter, decorate this kernel with a gpytorch.kernels.ScaleKernel.

Parameters:

ard_num_dims – Set this if you want a separate lengthscale for each input dimension. It should be d if x1 is a n x d matrix. (Default: None.)
batch_shape – Set this if you want a separate lengthscale for each batch of input data. It should be \(B_1 \times \ldots \times B_k\) if \(\mathbf x1\) is a \(B_1 \times \ldots \times B_k \times N \times D\) tensor.
active_dims – Set this if you want to compute the covariance of only a few input dimensions. The ints corresponds to the indices of the dimensions. (Default: None.)
lengthscale_prior – Set this if you want to apply a prior to the lengthscale parameter. (Default: None)
lengthscale_constraint – Set this if you want to apply a constraint to the lengthscale parameter. (Default: Positive.)
alpha_constraint (Optional) – Set this if you want to apply a constraint to the alpha parameter. (Default: Positive.)

Variables:

lengthscale (torch.Tensor) – The lengthscale parameter. Size/shape of parameter depends on the ard_num_dims and batch_shape arguments.
alpha (torch.Tensor) – The rational quadratic relative weighting parameter. Size/shape of parameter depends on the batch_shape argument

SpectralDeltaKernel¶

class gpytorch.kernels.SpectralDeltaKernel(num_dims, num_deltas=128, Z_constraint=None, batch_shape=torch.Size([]), **kwargs)[source]¶

A kernel that supports spectral learning for GPs, where the underlying spectral density is modeled as a mixture of delta distributions (e.g., with point masses). This has been explored e.g. in Lazaro-Gredilla et al., 2010.

Conceptually, this kernel is similar to random Fourier features as implemented in RFFKernel, but instead of sampling a Gaussian to determine the spectrum sites, they are treated as learnable parameters.

When using CG for inference, this kernel supports linear space and time (in N) for training and inference.

Parameters:

num_dims (int) – Dimensionality of input data that this kernel will operate on. Note that if active_dims is used, this should be the length of the active dim set.
num_deltas (int | None) – Number of point masses to learn.
num_dims –
num_deltas –
Z_constraint (Optional) –
batch_shape (torch.Size | None) –

initialize_from_data(train_x, train_y)[source]¶: Initialize the point masses for this kernel from the empirical spectrum of the data. To do this, we estimate the empirical spectrum’s CDF and then simply sample from it. This is analogous to how the SM kernel’s mixture is initialized, but we skip the last step of fitting a GMM to the samples and just use the samples directly.

SpectralMixtureKernel¶

class gpytorch.kernels.SpectralMixtureKernel(num_mixtures=None, ard_num_dims=1, batch_shape=torch.Size([]), mixture_scales_prior=None, mixture_scales_constraint=None, mixture_means_prior=None, mixture_means_constraint=None, mixture_weights_prior=None, mixture_weights_constraint=None, **kwargs)[source]¶

Computes a covariance matrix based on the Spectral Mixture Kernel between inputs \(\mathbf{x_1}\) and \(\mathbf{x_2}\).

It was proposed in Gaussian Process Kernels for Pattern Discovery and Extrapolation.

Note

Unlike other kernels,

ard_num_dims must equal the number of dimensions of the data.

This kernel should not be combined with a gpytorch.kernels.ScaleKernel.

Parameters:

num_mixtures (int) – The number of components in the mixture.
ard_num_dims (int) – Set this to match the dimensionality of the input. It should be d if x1 is a … x n x d matrix. (Default: 1.)
batch_shape (torch.Size, optional) – Set this if the data is batch of input data. It should be b_1 x … x b_j if x1 is a b_1 x … x b_j x n x d tensor. (Default: torch.Size([]).)
active_dims (float, optional) – Set this if you want to compute the covariance of only a few input dimensions. The ints corresponds to the indices of the dimensions. (Default: None.)
mixture_scales_prior (Prior, optional) – A prior to set on the mixture_scales parameter
mixture_scales_constraint (Interval, optional) – A constraint to set on the mixture_scales parameter
mixture_means_prior (Prior, optional) – A prior to set on the mixture_means parameter
mixture_means_constraint (Interval, optional) – A constraint to set on the mixture_means parameter
mixture_weights_prior (Prior, optional) – A prior to set on the mixture_weights parameter
mixture_weights_constraint (Interval, optional) – A constraint to set on the mixture_weights parameter

Variables:

mixture_scales (torch.Tensor) – The lengthscale parameter. Given k mixture components, and … x n x d data, this will be of size … x k x 1 x d.
mixture_means (torch.Tensor) – The mixture mean parameters (… x k x 1 x d).
mixture_weights (torch.Tensor) – The mixture weight parameters (… x k).

Example

>>> # Non-batch
>>> x = torch.randn(10, 5)
>>> covar_module = gpytorch.kernels.SpectralMixtureKernel(num_mixtures=4, ard_num_dims=5)
>>> covar = covar_module(x)  # Output: LazyVariable of size (10 x 10)
>>>
>>> # Batch
>>> batch_x = torch.randn(2, 10, 5)
>>> covar_module = gpytorch.kernels.SpectralMixtureKernel(num_mixtures=4, batch_size=2, ard_num_dims=5)
>>> covar = covar_module(x)  # Output: LazyVariable of size (10 x 10)

Parameters:

num_mixtures (Optional) –
ard_num_dims (int | None) –

initialize_from_data(train_x, train_y, **kwargs)[source]¶

Initialize mixture components based on batch statistics of the data. You should use this initialization routine if your observations are not evenly spaced.

Parameters:

train_x (Tensor) – Training inputs
train_y (Tensor) – Training outputs
train_x –
train_y –

initialize_from_data_empspect(train_x, train_y)[source]¶

Initialize mixture components based on the empirical spectrum of the data. This will often be better than the standard initialize_from_data method, but it assumes that your inputs are evenly spaced.

Parameters:

train_x (Tensor) – Training inputs
train_y (Tensor) – Training outputs
train_x –
train_y –

Composition/Decoration Kernels¶

AdditiveKernel¶

class gpytorch.kernels.AdditiveKernel(*kernels)[source]¶

A Kernel that supports summing over multiple component kernels.

Example

>>> covar_module = RBFKernel(active_dims=torch.tensor([1])) + RBFKernel(active_dims=torch.tensor([2]))
>>> x1 = torch.randn(50, 2)
>>> additive_kernel_matrix = covar_module(x1)

Parameters:: kernels (Kernel) – Kernels to add together.

MultiDeviceKernel¶

class gpytorch.kernels.MultiDeviceKernel(base_kernel, device_ids, output_device=None, create_cuda_context=True, **kwargs)[source]¶

Allocates the covariance matrix on distributed devices, e.g. multiple GPUs.

Parameters:

base_kernel (Kernel) – Base kernel to distribute
device_ids (list) – list of torch.device objects to place kernel chunks on
output_device (Optional) – Device where outputs will be placed
create_cuda_context (bool | None) –

AdditiveStructureKernel¶

class gpytorch.kernels.AdditiveStructureKernel(base_kernel, num_dims, active_dims=None)[source]¶

A Kernel decorator for kernels with additive structure. If a kernel decomposes additively, then this module will be much more computationally efficient.

A kernel function k decomposes additively if it can be written as

\[\begin{equation*} k(\mathbf{x_1}, \mathbf{x_2}) = k'(x_1^{(1)}, x_2^{(1)}) + \ldots + k'(x_1^{(d)}, x_2^{(d)}) \end{equation*}\]

for some kernel \(k'\) that operates on a subset of dimensions.

Given a b x n x d input, AdditiveStructureKernel computes d one-dimensional kernels (using the supplied base_kernel), and then adds the component kernels together. Unlike AdditiveKernel, AdditiveStructureKernel computes each of the additive terms in batch, making it very fast.

Parameters:

base_kernel (Kernel) – The kernel to approximate with KISS-GP
num_dims (int) – The dimension of the input data.
active_dims (tuple of ints, optional) – Passed down to the base_kernel.

property is_stationary: bool¶: Kernel is stationary if the base kernel is stationary.

ProductKernel¶

class gpytorch.kernels.ProductKernel(*kernels)[source]¶

A Kernel that supports elementwise multiplying multiple component kernels together.

Example

>>> covar_module = RBFKernel(active_dims=torch.tensor([1])) * RBFKernel(active_dims=torch.tensor([2]))
>>> x1 = torch.randn(50, 2)
>>> kernel_matrix = covar_module(x1) # The RBF Kernel already decomposes multiplicatively, so this is foolish!

Parameters:: kernels (Kernel) – Kernels to multiply together.

ProductStructureKernel¶

class gpytorch.kernels.ProductStructureKernel(base_kernel, num_dims, active_dims=None)[source]¶

A Kernel decorator for kernels with product structure. If a kernel decomposes multiplicatively, then this module will be much more computationally efficient.

A kernel function k has product structure if it can be written as

\[\begin{equation*} k(\mathbf{x_1}, \mathbf{x_2}) = k'(x_1^{(1)}, x_2^{(1)}) * \ldots * k'(x_1^{(d)}, x_2^{(d)}) \end{equation*}\]

for some kernel \(k'\) that operates on each dimension.

Given a b x n x d input, ProductStructureKernel computes d one-dimensional kernels (using the supplied base_kernel), and then multiplies the component kernels together. Unlike ProductKernel, ProductStructureKernel computes each of the product terms in batch, making it very fast.

See Product Kernel Interpolation for Scalable Gaussian Processes for more detail.

Parameters:

base_kernel (Kernel) – The kernel to approximate with KISS-GP
num_dims (int) – The dimension of the input data.
active_dims (tuple of ints, optional) – Passed down to the base_kernel.

property is_stationary: bool¶: Kernel is stationary if the base kernel is stationary.

ScaleKernel¶

class gpytorch.kernels.ScaleKernel(base_kernel, outputscale_prior=None, outputscale_constraint=None, **kwargs)[source]¶

Decorates an existing kernel object with an output scale, i.e.

\[\begin{equation*} K_{\text{scaled}} = \theta_\text{scale} K_{\text{orig}} \end{equation*}\]

where \(\theta_\text{scale}\) is the outputscale parameter.

In batch-mode (i.e. when \(x_1\) and \(x_2\) are batches of input matrices), each batch of data can have its own outputscale parameter by setting the batch_shape keyword argument to the appropriate number of batches.

Note

The outputscale parameter is parameterized on a log scale to constrain it to be positive. You can set a prior on this parameter using the outputscale_prior argument.

Parameters:

base_kernel (Kernel) – The base kernel to be scaled.
batch_shape (int, optional) – Set this if you want a separate outputscale for each batch of input data. It should be b if x1 is a b x n x d tensor. Default: torch.Size([])
outputscale_prior (Prior, optional) – Set this if you want to apply a prior to the outputscale parameter. Default: None
outputscale_constraint (Constraint, optional) – Set this if you want to apply a constraint to the outputscale parameter. Default: Positive.

base_kernel¶

The kernel module to be scaled.

Type:: Kernel

outputscale¶

The outputscale parameter. Size/shape of parameter depends on the batch_shape arguments.

Type:: Tensor

Example

>>> x = torch.randn(10, 5)
>>> base_covar_module = gpytorch.kernels.RBFKernel()
>>> scaled_covar_module = gpytorch.kernels.ScaleKernel(base_covar_module)
>>> covar = scaled_covar_module(x)  # Output: LinearOperator of size (10 x 10)

property is_stationary: bool¶: Kernel is stationary if base kernel is stationary.

Specialty Kernels¶

ArcKernel¶

class gpytorch.kernels.ArcKernel(base_kernel, delta_func=None, angle_prior=None, radius_prior=None, **kwargs)[source]¶

Computes a covariance matrix based on the Arc Kernel (https://arxiv.org/abs/1409.4011) between inputs \(\mathbf{x_1}\) and \(\mathbf{x_2}\). First it applies a cylindrical embedding:

\[\begin{split}g_{i}(\mathbf{x}) = \begin{cases} [0, 0]^{T} & \delta_{i}(\mathbf{x}) = \text{false}\\ \omega_{i} \left[ \sin{\pi\rho_{i}\frac{x_{i}}{u_{i}-l_{i}}}, \cos{\pi\rho_{i}\frac{x_{i}}{u_{i}-l_{i}}} \right] & \text{otherwise} \end{cases}\end{split}\]

where * \(\rho\) is the angle parameter. * \(\omega\) is a radius parameter.

then the kernel is built with the particular covariance function, e.g.

\[\begin{equation} k_{i}(\mathbf{x}, \mathbf{x'}) = \sigma^{2}\exp \left(-\frac{1}{2}d_{i}(\mathbf{x}, \mathbf{x^{'}}) \right)^{2} \end{equation}\]

and the produt between dimensions

\[\begin{equation} k_{i}(\mathbf{x}, \mathbf{x'}) = \sigma^{2}\exp \left(-\frac{1}{2}d_{i}(\mathbf{x}, \mathbf{x^{'}}) \right)^{2} \end{equation}\]

Note

This kernel does not have an outputscale parameter. To add a scaling parameter, decorate this kernel with a gpytorch.kernels.ScaleKernel. When using with an input of b x n x d dimensions, decorate this kernel with gpytorch.kernel.ProductStructuredKernel , setting the number of dims, `num_dims to d.

Note

This kernel does not have an ARD lengthscale option.

Parameters:

base_kernel (Kernel) – (Default gpytorch.kernels.MaternKernel(nu=2.5).) The euclidean covariance of choice.
ard_num_dims (int, optional) – (Default None.) The number of dimensions to compute the kernel for. The kernel has two parameters which are individually defined for each dimension, defaults to None
angle_prior (Prior, optional) – Set this if you want to apply a prior to the period angle parameter.
radius_prior (Prior, optional) – Set this if you want to apply a prior to the lengthscale parameter.

Variables:

radius (torch.Tensor) – The radius parameter. Size = *batch_shape x 1.
angle (torch.Tensor) – The period angle parameter. Size = *batch_shape x 1.

Example

>>> x = torch.randn(10, 5)
>>> # Non-batch: Simple option
... base_kernel = gpytorch.kernels.MaternKernel(nu=2.5)
>>> base_kernel.raw_lengthscale.requires_grad_(False)
>>> covar_module = gpytorch.kernels.ProductStructureKernel(
        gpytorch.kernels.ScaleKernel(
            ArcKernel(base_kernel,
                      angle_prior=gpytorch.priors.GammaPrior(0.5,1),
                      radius_prior=gpytorch.priors.GammaPrior(3,2),
                      ard_num_dims=x.shape[-1])),
        num_dims=x.shape[-1])
>>> covar = covar_module(x)
>>> print(covar.shape)
>>> # Now with batch
>>> covar_module = gpytorch.kernels.ProductStructureKernel(
        gpytorch.kernels.ScaleKernel(
            ArcKernel(base_kernel,
                      angle_prior=gpytorch.priors.GammaPrior(0.5,1),
                      radius_prior=gpytorch.priors.GammaPrior(3,2),
                      ard_num_dims=x.shape[-1])),
        num_dims=x.shape[-1])
>>> covar = covar_module(x
>>> print(covar.shape)

Parameters:: delta_func (Optional) –

HammingIMQKernel

..autoclass:: HammingIMQKernel

members:

IndexKernel¶

class gpytorch.kernels.IndexKernel(num_tasks, rank=1, prior=None, var_constraint=None, **kwargs)[source]¶

A kernel for discrete indices. Kernel is defined by a lookup table.

\[\begin{equation} k(i, j) = \left(BB^\top + \text{diag}(\mathbf v) \right)_{i, j} \end{equation}\]

where \(B\) is a low-rank matrix, and \(\mathbf v\) is a non-negative vector. These parameters are learned.

Parameters:

num_tasks (int) – Total number of indices.
batch_shape (torch.Size, optional) – Set if the MultitaskKernel is operating on batches of data (and you want different parameters for each batch)
rank (int) – Rank of \(B\) matrix. Controls the degree of correlation between the outputs. With a rank of 1 the outputs are identical except for a scaling factor.
prior (gpytorch.priors.Prior) – Prior for \(B\) matrix.
var_constraint (Constraint, optional) – Constraint for added diagonal component. Default: Positive.

covar_factor¶: The \(B\) matrix.

raw_var¶: The element-wise Softplus of the \(\mathbf v\) vector (assuming the default var_constraint).

LCMKernel¶

class gpytorch.kernels.LCMKernel(base_kernels, num_tasks, rank=1, task_covar_prior=None)[source]¶

This kernel supports the LCM kernel. It allows the user to specify a list of base kernels to use, and individual MultitaskKernel objects are fit to each of them. The final kernel is the linear sum of the Kronecker product of all these base kernels with their respective MultitaskKernel objects.

The returned object is of type KroneckerProductLinearOperator.

Parameters:

base_kernels (list) –
num_tasks (int) –
rank (int | list | None) –
task_covar_prior (Optional) –

num_outputs_per_input(x1, x2)[source]¶: Given n data points x1 and m datapoints x2, this multitask kernel returns an (n*num_tasks) x (m*num_tasks) covariance matrix.

MultitaskKernel¶

class gpytorch.kernels.MultitaskKernel(data_covar_module, num_tasks, rank=1, task_covar_prior=None, **kwargs)[source]¶

Kernel supporting Kronecker style multitask Gaussian processes (where every data point is evaluated at every task) using gpytorch.kernels.IndexKernel as a basic multitask kernel.

Given a base covariance module to be used for the data, \(K_{XX}\), this kernel computes a task kernel of specified size \(K_{TT}\) and returns \(K = K_{TT} \otimes K_{XX}\). as an KroneckerProductLinearOperator.

Parameters:

data_covar_module (Kernel) – Kernel to use as the data kernel.
num_tasks (int) – Number of tasks
rank (int | None) – (default 1) Rank of index kernel to use for task covariance matrix.
task_covar_prior (Optional) – (default None) Prior to use for task kernel. See gpytorch.kernels.IndexKernel for details.
kwargs (dict) – Additional arguments to pass to the kernel.
data_covar_module –
num_tasks –
rank –
task_covar_prior –

num_outputs_per_input(x1, x2)[source]¶: Given n data points x1 and m datapoints x2, this multitask kernel returns an (n*num_tasks) x (m*num_tasks) covariance matrix.

RBFKernelGrad¶

class gpytorch.kernels.RBFKernelGrad(ard_num_dims=None, batch_shape=None, active_dims=None, lengthscale_prior=None, lengthscale_constraint=None, **kwargs)[source]¶

Computes a covariance matrix of the RBF kernel that models the covariance between the values and partial derivatives for inputs \(\mathbf{x_1}\) and \(\mathbf{x_2}\).

See gpytorch.kernels.Kernel for descriptions of the lengthscale options.

Note

This kernel does not have an outputscale parameter. To add a scaling parameter, decorate this kernel with a gpytorch.kernels.ScaleKernel.

Parameters:

ard_num_dims (Optional) – Set this if you want a separate lengthscale for each input dimension. It should be d if x1 is a n x d matrix. (Default: None.)
batch_shape (Optional) – Set this if you want a separate lengthscale for each batch of input data. It should be \(B_1 \times \ldots \times B_k\) if \(\mathbf x1\) is a \(B_1 \times \ldots \times B_k \times N \times D\) tensor.
active_dims (Optional) – Set this if you want to compute the covariance of only a few input dimensions. The ints corresponds to the indices of the dimensions. (Default: None.)
lengthscale_prior (Optional) – Set this if you want to apply a prior to the lengthscale parameter. (Default: None)
lengthscale_constraint (Optional) – Set this if you want to apply a constraint to the lengthscale parameter. (Default: Positive.)

Variables:

lengthscale (torch.Tensor) – The lengthscale parameter. Size/shape of parameter depends on the ard_num_dims and batch_shape arguments.

Example

>>> x = torch.randn(10, 5)
>>> # Non-batch: Simple option
>>> covar_module = gpytorch.kernels.ScaleKernel(gpytorch.kernels.RBFKernelGrad())
>>> covar = covar_module(x)  # Output: LinearOperator of size (60 x 60), where 60 = n * (d + 1)
>>>
>>> batch_x = torch.randn(2, 10, 5)
>>> # Batch: Simple option
>>> covar_module = gpytorch.kernels.ScaleKernel(gpytorch.kernels.RBFKernelGrad())
>>> # Batch: different lengthscale for each batch
>>> covar_module = gpytorch.kernels.ScaleKernel(gpytorch.kernels.RBFKernelGrad(batch_shape=torch.Size([2])))
>>> covar = covar_module(x)  # Output: LinearOperator of size (2 x 60 x 60)

RBFKernelGradGrad¶

class gpytorch.kernels.RBFKernelGradGrad(ard_num_dims=None, batch_shape=None, active_dims=None, lengthscale_prior=None, lengthscale_constraint=None, **kwargs)[source]¶

Computes a covariance matrix of the RBF kernel that models the covariance between the values and first and second (non-mixed) partial derivatives for inputs \(\mathbf{x_1}\) and \(\mathbf{x_2}\).

See gpytorch.kernels.Kernel for descriptions of the lengthscale options.

Note

This kernel does not have an outputscale parameter. To add a scaling parameter, decorate this kernel with a gpytorch.kernels.ScaleKernel.

Parameters:

ard_num_dims (Optional) – Set this if you want a separate lengthscale for each input dimension. It should be d if x1 is a n x d matrix. (Default: None.)
batch_shape (Optional) – Set this if you want a separate lengthscale for each batch of input data. It should be \(B_1 \times \ldots \times B_k\) if \(\mathbf x1\) is a \(B_1 \times \ldots \times B_k \times N \times D\) tensor.
active_dims (Optional) – Set this if you want to compute the covariance of only a few input dimensions. The ints corresponds to the indices of the dimensions. (Default: None.)
lengthscale_prior (Optional) – Set this if you want to apply a prior to the lengthscale parameter. (Default: None)
lengthscale_constraint (Optional) – Set this if you want to apply a constraint to the lengthscale parameter. (Default: Positive.)

Variables:

lengthscale (torch.Tensor) – The lengthscale parameter. Size/shape of parameter depends on the ard_num_dims and batch_shape arguments.

Example

>>> x = torch.randn(10, 5)
>>> # Non-batch: Simple option
>>> covar_module = gpytorch.kernels.ScaleKernel(gpytorch.kernels.RBFKernelGradGrad())
>>> covar = covar_module(x)  # Output: LinearOperator of size (110 x 110), where 110 = n * (2*d + 1)
>>>
>>> batch_x = torch.randn(2, 10, 5)
>>> # Batch: Simple option
>>> covar_module = gpytorch.kernels.ScaleKernel(gpytorch.kernels.RBFKernelGradGrad())
>>> # Batch: different lengthscale for each batch
>>> covar_module = gpytorch.kernels.ScaleKernel(gpytorch.kernels.RBFKernelGradGrad(batch_shape=torch.Size([2])))
>>> covar = covar_module(x)  # Output: LinearOperator of size (2 x 110 x 110)

Matern52KernelGrad¶

class gpytorch.kernels.Matern52KernelGrad(**kwargs)[source]¶

Computes a covariance matrix of the Matern52 kernel that models the covariance between the values and partial derivatives for inputs \(\mathbf{x_1}\) and \(\mathbf{x_2}\).

See gpytorch.kernels.Kernel for descriptions of the lengthscale options.

Note

This kernel does not have an outputscale parameter. To add a scaling parameter, decorate this kernel with a gpytorch.kernels.ScaleKernel.

Note

A perfect shuffle permutation is applied after the calculation of the matrix blocks in order to match the MutiTask ordering.

The Matern52 kernel is defined as

\[k(r) = (1 + \sqrt{5}r + \frac{5}{3} r^2) \exp(- \sqrt{5}r)\]

where \(r\) is defined as

\[r(\mathbf{x}^m , \mathbf{x}^n) = \sqrt{\sum_d{\frac{(x^m_d - x^n_d)^2}{l^2_d}}}\]

The first gradient block containing \(\frac{\partial k}{\partial x^n_i}\) is defined as

\[\frac{\partial k}{\partial x^n_i} = \frac{5}{3} \left( 1 + \sqrt{5}r \right) \exp(- \sqrt{5}r) \left(\frac{x^m_i - x^n_i}{l^2_i} \right)\]

The second gradient block containing \(\frac{\partial k}{\partial x^m_j}\) is defined as

\[\frac{\partial k}{\partial x^m_j} = - \frac{5}{3} \left( 1 + \sqrt{5}r \right) \exp(- \sqrt{5}r) \left(\frac{x^m_j - x^n_j}{l^2_j} \right)\]

The Hessian block containing \(\frac{\partial^2 k}{\partial x^m_j \partial x^n_i}\) is defined as

\[\frac{\partial^2 k}{\partial x^m_j \partial x^n_i} = - \frac{5}{3} \exp(- \sqrt{5}r) \left[ 5\left(\frac{x^m_i - x^n_i}{l^2_i} \right) \left( \frac{x^m_j - x^n_j}{l^2_j} \right) - \frac{\delta_{ij}}{l^2_i} \left( 1 + \sqrt{5}r \right) \right]\]

The derivations can be found here.

Parameters:

ard_num_dims – Set this if you want a separate lengthscale for each input dimension. It should be d if x1 is a n x d matrix. (Default: None.)
batch_shape – Set this if you want a separate lengthscale for each batch of input data. It should be \(B_1 \times \ldots \times B_k\) if \(\mathbf x1\) is a \(B_1 \times \ldots \times B_k \times N \times D\) tensor.
active_dims – Set this if you want to compute the covariance of only a few input dimensions. The ints corresponds to the indices of the dimensions. (Default: None.)
lengthscale_prior – Set this if you want to apply a prior to the lengthscale parameter. (Default: None)
lengthscale_constraint – Set this if you want to apply a constraint to the lengthscale parameter. (Default: Positive.)

Variables:

lengthscale (torch.Tensor) – The lengthscale parameter. Size/shape of parameter depends on the ard_num_dims and batch_shape arguments.

Example

>>> x = torch.randn(10, 5)
>>> # Non-batch: Simple option
>>> covar_module = gpytorch.kernels.ScaleKernel(gpytorch.kernels.Matern52KernelGrad())
>>> covar = covar_module(x)  # Output: LinearOperator of size (60 x 60), where 60 = n * (d + 1)
>>>
>>> batch_x = torch.randn(2, 10, 5)
>>> # Batch: Simple option
>>> covar_module = gpytorch.kernels.ScaleKernel(gpytorch.kernels.Matern52KernelGrad())
>>> # Batch: different lengthscale for each batch
>>> covar_module = gpytorch.kernels.ScaleKernel(gpytorch.kernels.Matern52KernelGrad(batch_shape=torch.Size([2]))) # noqa: E501
>>> covar = covar_module(x)  # Output: LinearOperator of size (2 x 60 x 60)

SphericalLinearKernel¶

class gpytorch.kernels.SphericalLinearKernel(bounds, ard_num_dims=None, lengthscale_prior=None, lengthscale_constraint=None, normalize_lengthscale=True, **kwargs)[source]¶

Computes a covariance matrix based on a linear kernel applied after inverse stereographic projection:

\[k(\mathbf{x_1}, \mathbf{x_2}) = b_0 + b_1 P(z(\mathbf{x_1}))^\top P(z(\mathbf{x_2}))\]

where \(z(\mathbf x)\) applies lengthscale scaling, \(P\) is the inverse stereographic projection onto a unit sphere, and \((b_0, b_1)\) are learned mixture weights (via softmax, so \(b_0 + b_1 = 1\)).

This kernel was proposed in We Still Don’t Understand High-Dimensional Bayesian Optimization.

Example

>>> bounds = torch.stack([torch.zeros(3), torch.ones(3)])  # (2, D) lower and upper
>>> covar_module = gpytorch.kernels.SphericalLinearKernel(bounds=bounds, ard_num_dims=3)
>>> x = torch.rand(50, 3)  # data within [0, 1]^3
>>> covar_matrix = covar_module(x).to_dense()

Parameters:

bounds (Tensor) – Input space bounds, shape (2, D) with lower and upper per dimension. Used for centering and computing the global lengthscale.
ard_num_dims (Optional) – Set this if you want a separate lengthscale for each input dimension. It should be d if \(\mathbf{x_1}\) is a n x d matrix. (Default: None.)
normalize_lengthscale (bool) – If True, constrain the ARD lengthscale vector to unit L2 norm, thereby speeding up the optimization of hyperparameters. (Default: False.)
lengthscale_prior (Optional) – Set this if you want to apply a prior to the lengthscale parameter. (Default: LogNormalPrior(loc=sqrt(2), scale=sqrt(3)).)
lengthscale_constraint (Optional) – Set this if you want to apply a constraint to the lengthscale parameter. (Default: GreaterThan(0.025).)

Kernels for Scalable GP Regression Methods¶

GridKernel¶

class gpytorch.kernels.GridKernel(base_kernel, grid, interpolation_mode=False, active_dims=None)[source]¶

If the input data \(X\) are regularly spaced on a grid, then GridKernel can dramatically speed up computatations for stationary kernel.

GridKernel exploits Toeplitz and Kronecker structure within the covariance matrix. See Fast kernel learning for multidimensional pattern extrapolation for more info.

Note

GridKernel can only wrap stationary kernels (such as RBF, Matern, Periodic, Spectral Mixture, etc.)

Parameters:

base_kernel (Kernel) – The kernel to speed up with grid methods.
grid (Tensor) – A g x d tensor where column i consists of the projections of the grid in dimension i.
active_dims (tuple of ints, optional) – Passed down to the base_kernel.
interpolation_mode (bool) – Used for GridInterpolationKernel where we want the covariance between points in the projections of the grid of each dimension. We do this by treating grid as d batches of g x 1 tensors by calling base_kernel(grid, grid) with last_dim_is_batch to get a d x g x g Tensor which we Kronecker product to get a g x g KroneckerProductLinearOperator.

register_buffer_list(base_name, tensors)[source]¶: Helper to register several buffers at once under a single base name

update_grid(grid)[source]¶: Supply a new grid if it ever changes.

GridInterpolationKernel¶

class gpytorch.kernels.GridInterpolationKernel(base_kernel, grid_size, num_dims=None, grid_bounds=None, active_dims=None)[source]¶

Implements the KISS-GP (or SKI) approximation for a given kernel. It was proposed in Kernel Interpolation for Scalable Structured Gaussian Processes, and offers extremely fast and accurate Kernel approximations for large datasets.

Given a base kernel k, the covariance \(k(\mathbf{x_1}, \mathbf{x_2})\) is approximated by using a grid of regularly spaced inducing points:

\[\begin{equation*} k(\mathbf{x_1}, \mathbf{x_2}) = \mathbf{w_{x_1}}^\top K_{U,U} \mathbf{w_{x_2}} \end{equation*}\]

where

\(U\) is the set of gridded inducing points
\(K_{U,U}\) is the kernel matrix between the inducing points
\(\mathbf{w_{x_1}}\) and \(\mathbf{w_{x_2}}\) are sparse vectors based on \(\mathbf{x_1}\) and \(\mathbf{x_2}\) that apply cubic interpolation.

The user should supply the size of the grid (using the grid_size attribute). To choose a reasonable grid value, we highly recommend using the gpytorch.utils.grid.choose_grid_size() helper function. The bounds of the grid will automatically be determined by data.

(Alternatively, you can hard-code bounds using the grid_bounds, which will speed up this kernel’s computations.)

Note

GridInterpolationKernel can only wrap stationary kernels (such as RBF, Matern, Periodic, Spectral Mixture, etc.)

Parameters:

base_kernel (Kernel) – The kernel to approximate with KISS-GP
grid_size (Union[int, List[int]]) – The size of the grid in each dimension. If a single int is provided, then every dimension will have the same grid size.
num_dims (int) – The dimension of the input data. Required if grid_bounds=None
grid_bounds (tuple(float, float), optional) – The bounds of the grid, if known (high performance mode). The length of the tuple must match the number of dimensions. The entries represent the min/max values for each dimension.
active_dims (tuple of ints, optional) – Passed down to the base_kernel.

InducingPointKernel¶

class gpytorch.kernels.InducingPointKernel(base_kernel, inducing_points, likelihood, active_dims=None)[source]¶

Parameters:

base_kernel (Kernel) –
inducing_points (Tensor) –
likelihood (Likelihood) –
active_dims (Optional) –

RFFKernel¶

class gpytorch.kernels.RFFKernel(num_samples, num_dims=None, **kwargs)[source]¶

Computes a covariance matrix based on Random Fourier Features with the RBFKernel.

Random Fourier features was originally proposed in ‘Random Features for Large-Scale Kernel Machines’ by Rahimi and Recht (2008). Instead of the shifted cosine features from Rahimi and Recht (2008), we use the sine and cosine features which is a lower-variance estimator — see ‘On the Error of Random Fourier Features’ by Sutherland and Schneider (2015).

By Bochner’s theorem, any continuous kernel \(k\) is positive definite if and only if it is the Fourier transform of a non-negative measure \(p(\omega)\), i.e.

\[\begin{equation} k(x, x') = k(x - x') = \int p(\omega) e^{i(\omega^\top (x - x'))} d\omega. \end{equation}\]

where \(p(\omega)\) is a normalized probability measure if \(k(0)=1\).

For the RBF kernel,

\[\begin{equation} k(\Delta) = \exp{(-\frac{\Delta^2}{2\sigma^2})} \text{ and } p(\omega) = \exp{(-\frac{\sigma^2\omega^2}{2})} \end{equation}\]

where \(\Delta = x - x'\).

Given datapoint \(x\in \mathbb{R}^d\), we can construct its random Fourier features \(z(x) \in \mathbb{R}^{2D}\) by

\[\begin{split}\begin{equation} z(x) = \sqrt{\frac{1}{D}} \begin{bmatrix} \cos(\omega_1^\top x)\\ \sin(\omega_1^\top x)\\ \cdots \\ \cos(\omega_D^\top x)\\ \sin(\omega_D^\top x) \end{bmatrix}, \omega_1, \ldots, \omega_D \sim p(\omega) \end{equation}\end{split}\]

such that we have an unbiased Monte Carlo estimator

\[\begin{equation} k(x, x') = k(x - x') \approx z(x)^\top z(x') = \frac{1}{D}\sum_{i=1}^D \cos(\omega_i^\top (x - x')). \end{equation}\]

Note

When this kernel is used in batch mode, the random frequencies are drawn independently across the batch dimension as well by default.

Parameters:

num_samples (int) – Number of random frequencies to draw. This is \(D\) in the above papers. This will produce \(D\) sine features and \(D\) cosine features for a total of \(2D\) random Fourier features.
num_dims (int, optional) – (Default None.) Dimensionality of the data space. This is \(d\) in the above papers. Note that if you want an independent lengthscale for each dimension, set ard_num_dims equal to num_dims. If unspecified, it will be inferred the first time forward is called.

Variables:

randn_weights (torch.Tensor) – The random frequencies that are drawn once and then fixed.

Example

>>> # This will infer `num_dims` automatically
>>> kernel= gpytorch.kernels.RFFKernel(num_samples=5)
>>> x = torch.randn(10, 3)
>>> kxx = kernel(x, x).to_dense()
>>> print(kxx.randn_weights.size())
torch.Size([3, 5])