gpytorch.kernels

If you don’t know what kernel to use, we recommend that you start out with a gpytorch.kernels.ScaleKernel(gpytorch.kernels.RBFKernel).

Kernel

class gpytorch.kernels.Kernel(ard_num_dims=None, batch_shape=None, active_dims=None, lengthscale_prior=None, lengthscale_constraint=None, eps=1e-06, **kwargs)[source]

Kernels in GPyTorch are implemented as a gpytorch.Module that, when called on two torch.Tensor objects \(\mathbf x_1\) and \(\mathbf x_2\) returns either a torch.Tensor or a LinearOperator that represents the covariance matrix between \(\mathbf x_1\) and \(\mathbf x_2\).

In the typical use case, extend this class simply requires implementing a forward() method.

Note

The __call__() method does some additional internal work. In particular, all kernels are lazily evaluated so that we can index in to the kernel matrix before actually computing it. Furthermore, many built-in kernel modules return LinearOperators that allow for more efficient inference than if we explicitly computed the kernel matrix itself.

As a result, if you want to get an actual torch.tensor representing the covariance matrix, you may need to call the to_dense() method on the output.

This base Kernel class includes a lengthscale parameter \(\Theta\), which is used by many common kernel functions. There are a few options for the lengthscale:

  • Default: No lengthscale (i.e. \(\Theta\) is the identity matrix).

  • Single lengthscale: One lengthscale can be applied to all input dimensions/batches (i.e. \(\Theta\) is a constant diagonal matrix). This is controlled by setting the attribute has_lengthscale=True.

  • ARD: Each input dimension gets its own separate lengthscale (i.e. \(\Theta\) is a non-constant diagonal matrix). This is controlled by the ard_num_dims keyword argument (as well as has_lengthscale=True).

In batch mode (i.e. when \(\mathbf x_1\) and \(\mathbf x_2\) are batches of input matrices), each batch of data can have its own lengthscale parameter by setting the batch_shape keyword argument to the appropriate number of batches.

Note

You can set a prior on the lengthscale parameter using the lengthscale_prior argument.

Parameters:
  • ard_num_dims (int, optional) – Set this if you want a separate lengthscale for each input dimension. It should be D if \(\mathbf x\) is a … x N x D matrix. (Default: None.)

  • batch_shape (torch.Size, optional) – Set this if you want a separate lengthscale for each batch of input data. It should be \(B_1 \times \ldots \times B_k\) if \(\mathbf x_1\) is a \(B_1 \times \ldots \times B_k \times N \times D\) tensor.

  • active_dims ((int, ...), optional) – Set this if you want to compute the covariance of only a few input dimensions. The ints corresponds to the indices of the dimensions. (Default: None.)

  • lengthscale_prior (Prior, optional) – Set this if you want to apply a prior to the lengthscale parameter. (Default: None.)

  • lengthscale_constraint (Interval, optional) – Set this if you want to apply a constraint to the lengthscale parameter. (Default: Positive.)

  • eps (float) – A small positive value added to the lengthscale to prevent divide by zero errors. (Default: 1e-6.)

Variables:
  • batch_shape (torch.Size) – The (minimum) number of batch dimensions supported by this kernel. Typically, this captures the batch shape of the lengthscale and other parameters, and is usually set by the batch_shape argument in the constructor.

  • dtype (torch.dtype) – The dtype supported by this kernel. Typically, this depends on the dtype of the lengthscale and other parameters.

  • is_stationary (bool) – Set to True if the Kernel represents a stationary function (one that depends only on \(\mathbf x_1 - \mathbf x_2\)).

  • lengthscale (torch.Tensor) – The lengthscale parameter. Size/shape of parameter depends on the ard_num_dims and batch_shape arguments.

Example

>>> covar_module = gpytorch.kernels.LinearKernel()
>>> x1 = torch.randn(50, 3)
>>> lazy_covar_matrix = covar_module(x1) # Returns a RootLinearOperator
>>> tensor_covar_matrix = lazy_covar_matrix.to_dense() # Gets the actual tensor for this kernel matrix
__call__(x1, x2=None, diag=False, last_dim_is_batch=False, **params)[source]

Computes the covariance between \(\mathbf x_1\) and \(\mathbf x_2\).

Note

Following PyTorch convention, all GP objects should use __call__ rather than forward(). The __call__ method applies additional pre- and post-processing to the forward method, and additionally employs a lazy evaluation scheme to reduce memory and computational costs.

Parameters:
  • x1 (torch.Tensor) – First set of data (… x N x D).

  • x2 (torch.Tensor, optional) – Second set of data (… x M x D). (If None, then x2 is set to x1.)

  • diag (bool) – Should the Kernel compute the whole kernel, or just the diag? If True, it must be the case that x1 == x2. (Default: False.)

  • last_dim_is_batch (bool) – If True, treat the last dimension of x1 and x2 as another batch dimension. (Useful for additive structure over the dimensions). (Default: False.)

Return type:

LazyEvaluatedKernelTensor or LinearOperator or torch.Tensor

Returns:

An object that will lazily evaluate to the kernel matrix or vector. The shape depends on the kernel’s evaluation mode:

  • full_covar: … x N x M

  • full_covar with last_dim_is_batch=True: … x K x N x M

  • diag: … x N

  • diag with last_dim_is_batch=True: … x K x N

__getitem__(index)[source]

Constructs a new kernel where the lengthscale (and other kernel parameters) are modified by an indexing operation.

Parameters:

index – Index to apply to all parameters.

Return type:

Kernel

covar_dist(x1, x2, diag=False, last_dim_is_batch=False, square_dist=False, **params)[source]

This is a helper method for computing the Euclidean distance between all pairs of points in \(\mathbf x_1\) and \(\mathbf x_2\).

Parameters:
  • x1 (torch.Tensor) – First set of data (… x N x D).

  • x2 (torch.Tensor) – Second set of data (… x M x D).

  • diag (bool) – Should the Kernel compute the whole kernel, or just the diag? If True, it must be the case that x1 == x2. (Default: False.)

  • last_dim_is_batch (bool) – If True, treat the last dimension of x1 and x2 as another batch dimension. (Useful for additive structure over the dimensions). (Default: False.)

  • square_dist (bool) – If True, returns the squared distance rather than the standard distance. (Default: False.)

Return type:

torch.Tensor

Returns:

The kernel matrix or vector. The shape depends on the kernel’s evaluation mode:

  • full_covar: … x N x M

  • full_covar with last_dim_is_batch=True: … x K x N x M

  • diag: … x N

  • diag with last_dim_is_batch=True: … x K x N

expand_batch(*sizes)[source]

Constructs a new kernel where the lengthscale (and other kernel parameters) are expanded to match the batch dimension determined by sizes.

Parameters:

sizes (torch.Size or (int, ...)) – The batch shape of the new tensor

Return type:

Kernel

abstract forward(x1, x2, diag=False, last_dim_is_batch=False, **params)[source]

Computes the covariance between \(\mathbf x_1\) and \(\mathbf x_2\). This method should be imlemented by all Kernel subclasses.

Parameters:
  • x1 (torch.Tensor) – First set of data (… x N x D).

  • x2 (torch.Tensor) – Second set of data (… x M x D).

  • diag (bool) – Should the Kernel compute the whole kernel, or just the diag? If True, it must be the case that x1 == x2. (Default: False.)

  • last_dim_is_batch (bool) – If True, treat the last dimension of x1 and x2 as another batch dimension. (Useful for additive structure over the dimensions). (Default: False.)

Return type:

torch.Tensor or LinearOperator

Returns:

The kernel matrix or vector. The shape depends on the kernel’s evaluation mode:

  • full_covar: … x N x M

  • full_covar with last_dim_is_batch=True: … x K x N x M

  • diag: … x N

  • diag with last_dim_is_batch=True: … x K x N

named_sub_kernels()[source]

For compositional Kernel classes (e.g. AdditiveKernel or ProductKernel).

Return type:

iterable((str, Kernel))

Returns:

An iterator over the component kernel objects, along with the name of each component kernel.

num_outputs_per_input(x1, x2)[source]

For most kernels, num_outputs_per_input = 1.

However, some kernels (e.g. multitask kernels or interdomain kernels) return a num_outputs_per_input x num_outputs_per_input matrix of covariance values for every pair of data points.

I.e. if x1 is size … x N x D and x2 is size … x M x D, then the size of the kernel will be … x (N * num_outputs_per_input) x (M * num_outputs_per_input).

Return type:

int

Returns:

num_outputs_per_input (usually 1).

Parameters:
sub_kernels()[source]

For compositional Kernel classes (e.g. AdditiveKernel or ProductKernel).

Return type:

iterable(Kernel)

Returns:

An iterator over the component kernel objects.

Standard Kernels

CosineKernel

class gpytorch.kernels.CosineKernel(period_length_prior=None, period_length_constraint=None, **kwargs)[source]

Computes a covariance matrix based on the cosine kernel between inputs \(\mathbf{x_1}\) and \(\mathbf{x_2}\):

\[\begin{equation*} k_{\text{Cosine}}(\mathbf{x_1}, \mathbf{x_2}) = \cos \left( \pi \Vert \mathbf{x_1} - \mathbf{x_2} \Vert_2 / p \right) \end{equation*}\]

where \(p\) is the period length parameter.

Parameters:
  • batch_shape (torch.Size, optional) – Set this if you want a separate lengthscale for each batch of input data. It should be b if x1 is a b x n x d tensor. Default: torch.Size([])

  • active_dims (tuple of ints, optional) – Set this if you want to compute the covariance of only a few input dimensions. The ints corresponds to the indices of the dimensions. Default: None.

  • period_length_prior (Prior, optional) – Set this if you want to apply a prior to the period length parameter. Default: None

  • period_length_constraint (Constraint, optional) – Set this if you want to apply a constraint to the period length parameter. Default: Positive.

  • eps (float) – The minimum value that the lengthscale/period length can take (prevents divide by zero errors). Default: 1e-6.

period_length

The period length parameter. Size = *batch_shape x 1 x 1.

Type:

Tensor

Example

>>> x = torch.randn(10, 5)
>>> # Non-batch: Simple option
>>> covar_module = gpytorch.kernels.ScaleKernel(gpytorch.kernels.CosineKernel())
>>>
>>> batch_x = torch.randn(2, 10, 5)
>>> # Batch: Simple option
>>> covar_module = gpytorch.kernels.ScaleKernel(gpytorch.kernels.CosineKernel())
>>> # Batch: different lengthscale for each batch
>>> covar_module = gpytorch.kernels.ScaleKernel(gpytorch.kernels.CosineKernel(batch_shape=torch.Size([2])))
>>> covar = covar_module(x)  # Output: LazyVariable of size (2 x 10 x 10)

CylindricalKernel

class gpytorch.kernels.CylindricalKernel(num_angular_weights, radial_base_kernel, eps=1e-06, angular_weights_prior=None, angular_weights_constraint=None, alpha_prior=None, alpha_constraint=None, beta_prior=None, beta_constraint=None, **kwargs)[source]

Computes a covariance matrix based on the Cylindrical Kernel between inputs \(\mathbf{x_1}\) and \(\mathbf{x_2}\). It was proposed in BOCK: Bayesian Optimization with Cylindrical Kernels. See http://proceedings.mlr.press/v80/oh18a.html for more details

Note

The data must lie completely within the unit ball.

Parameters:
  • num_angular_weights (int) – The number of components in the angular kernel

  • radial_base_kernel (gpytorch.kernel) – The base kernel for computing the radial kernel

  • batch_size (int, optional) – Set this if the data is batch of input data. It should be b if x1 is a b x n x d tensor. Default: 1

  • eps (float) – Small floating point number used to improve numerical stability in kernel computations. Default: 1e-6

  • param_transform (function, optional) – Set this if you want to use something other than softplus to ensure positiveness of parameters.

  • inv_param_transform (function, optional) – Set this to allow setting parameters directly in transformed space and sampling from priors. Automatically inferred for common transformations such as torch.exp or torch.nn.functional.softplus.

  • angular_weights_prior (Prior, optional) –

  • angular_weights_constraint (Interval, optional) –

  • alpha_prior (Prior, optional) –

  • alpha_constraint (Interval, optional) –

  • beta_prior (Prior, optional) –

  • beta_constraint (Interval, optional) –

LinearKernel

class gpytorch.kernels.LinearKernel(num_dimensions=None, offset_prior=None, variance_prior=None, variance_constraint=None, **kwargs)[source]

Computes a covariance matrix based on the Linear kernel between inputs \(\mathbf{x_1}\) and \(\mathbf{x_2}\):

\[\begin{equation*} k_\text{Linear}(\mathbf{x_1}, \mathbf{x_2}) = v\mathbf{x_1}^\top \mathbf{x_2}. \end{equation*}\]

where

  • \(v\) is a variance parameter.

Note

To implement this efficiently, we use a RootLinearOperator during training and a MatmulLinearOperator during test. These lazy tensors represent matrices of the form \(\mathbf K = \mathbf X \mathbf X^{\prime \top}\). This makes inference efficient because a matrix-vector product \(\mathbf K \mathbf v\) can be computed as \(\mathbf K \mathbf v = \mathbf X( \mathbf X^{\prime \top} \mathbf v)\), where the base multiply \(\mathbf X \mathbf v\) takes only \(\mathcal O(ND)\) time and space.

Parameters:
  • variance_prior (Prior, optional) – Prior over the variance parameter. (Default None.)

  • variance_constraint (Interval, optional) – Constraint to place on variance parameter. (Default: Positive.)

  • active_dims – List of data dimensions to operate on. len(active_dims) should equal num_dimensions.

  • num_dimensions (int, optional) –

  • offset_prior (Prior, optional) –

MaternKernel

class gpytorch.kernels.MaternKernel(nu=2.5, **kwargs)[source]

Computes a covariance matrix based on the Matern kernel between inputs \(\mathbf{x_1}\) and \(\mathbf{x_2}\):

\[\begin{equation*} k_{\text{Matern}}(\mathbf{x_1}, \mathbf{x_2}) = \frac{2^{1 - \nu}}{\Gamma(\nu)} \left( \sqrt{2 \nu} d \right)^{\nu} K_\nu \left( \sqrt{2 \nu} d \right) \end{equation*}\]

where

  • \(d = (\mathbf{x_1} - \mathbf{x_2})^\top \Theta^{-2} (\mathbf{x_1} - \mathbf{x_2})\) is the distance between \(x_1\) and \(x_2\) scaled by the lengthscale parameter \(\Theta\).

  • \(\nu\) is a smoothness parameter (takes values 1/2, 3/2, or 5/2). Smaller values are less smooth.

  • \(K_\nu\) is a modified Bessel function.

There are a few options for the lengthscale parameter \(\Theta\): See gpytorch.kernels.Kernel for descriptions of the lengthscale options.

Note

This kernel does not have an outputscale parameter. To add a scaling parameter, decorate this kernel with a gpytorch.kernels.ScaleKernel.

Parameters:
  • nu (float (0.5, 1.5, or 2.5)) – (Default: 2.5) The smoothness parameter.

  • ard_num_dims (int, optional) – (Default: None) Set this if you want a separate lengthscale for each input dimension. It should be d if x1 is a … x n x d matrix.

  • batch_shape (torch.Size, optional) – (Default: None) Set this if you want a separate lengthscale for each batch of input data. It should be torch.Size([b1, b2]) for a b1 x b2 x n x m kernel output.

  • active_dims (Tuple(int)) – (Default: None) Set this if you want to compute the covariance of only a few input dimensions. The ints corresponds to the indices of the dimensions.

  • lengthscale_prior (Prior, optional) – (Default: None) Set this if you want to apply a prior to the lengthscale parameter.

  • lengthscale_constraint (Interval, optional) – (Default: Positive) Set this if you want to apply a constraint to the lengthscale parameter.

  • eps (float, optional) – (Default: 1e-6) The minimum value that the lengthscale can take (prevents divide by zero errors).

Example

>>> x = torch.randn(10, 5)
>>> # Non-batch: Simple option
>>> covar_module = gpytorch.kernels.ScaleKernel(gpytorch.kernels.MaternKernel(nu=0.5))
>>> # Non-batch: ARD (different lengthscale for each input dimension)
>>> covar_module = gpytorch.kernels.ScaleKernel(gpytorch.kernels.MaternKernel(nu=0.5, ard_num_dims=5))
>>> covar = covar_module(x)  # Output: LazyVariable of size (10 x 10)
>>>
>>> batch_x = torch.randn(2, 10, 5)
>>> # Batch: Simple option
>>> covar_module = gpytorch.kernels.ScaleKernel(gpytorch.kernels.MaternKernel(nu=0.5))
>>> # Batch: different lengthscale for each batch
>>> covar_module = gpytorch.kernels.MaternKernel(nu=0.5, batch_shape=torch.Size([2])
>>> covar = covar_module(x)  # Output: LazyVariable of size (2 x 10 x 10)

PeriodicKernel

class gpytorch.kernels.PeriodicKernel(period_length_prior=None, period_length_constraint=None, **kwargs)[source]

Computes a covariance matrix based on the periodic kernel between inputs \(\mathbf{x_1}\) and \(\mathbf{x_2}\):

\[\begin{equation*} k_{\text{Periodic}}(\mathbf{x}, \mathbf{x'}) = \exp \left( -2 \sum_i \frac{\sin ^2 \left( \frac{\pi}{p} ({x_{i}} - {x_{i}'} ) \right)}{\lambda} \right) \end{equation*}\]

where

  • \(p\) is the period length parameter.

  • \(\lambda\) is a lengthscale parameter.

Equation is based on David Mackay’s Introduction to Gaussian Processes equation 47 (albeit without feature-specific lengthscales and period lengths). The exponential coefficient was changed and lengthscale is not squared to maintain backwards compatibility

Note

This kernel does not have an outputscale parameter. To add a scaling parameter, decorate this kernel with a gpytorch.kernels.ScaleKernel.

Parameters:
  • ard_num_dims (int, optional) – (Default: None) Set this if you want a separate lengthscale for each input dimension. It should be d if x1 is a … x n x d matrix.

  • batch_shape (torch.Size, optional) – (Default: None) Set this if you want a separate lengthscale for each batch of input data. It should be torch.Size([b1, b2]) for a b1 x b2 x n x m kernel output.

  • active_dims (Tuple(int)) – (Default: None) Set this if you want to compute the covariance of only a few input dimensions. The ints corresponds to the indices of the dimensions.

  • period_length_prior (Prior, optional) – (Default: None) Set this if you want to apply a prior to the period length parameter.

  • period_length_constraint (Interval, optional) – (Default: Positive) Set this if you want to apply a constraint to the period length parameter.

  • lengthscale_prior (Prior, optional) – (Default: None) Set this if you want to apply a prior to the lengthscale parameter.

  • lengthscale_constraint (Interval, optional) – (Default: Positive) Set this if you want to apply a constraint to the lengthscale parameter.

  • eps (float, optional) – (Default: 1e-6) The minimum value that the lengthscale can take (prevents divide by zero errors).

Variables:

period_length (torch.Tensor) – The period length parameter. Size/shape of parameter depends on the ard_num_dims and batch_shape arguments.

Example

>>> x = torch.randn(10, 5)
>>> # Non-batch: Simple option
>>> covar_module = gpytorch.kernels.ScaleKernel(gpytorch.kernels.PeriodicKernel())
>>>
>>> batch_x = torch.randn(2, 10, 5)
>>> # Batch: Simple option
>>> covar_module = gpytorch.kernels.ScaleKernel(gpytorch.kernels.PeriodicKernel())
>>> # Batch: different lengthscale for each batch
>>> covar_module = gpytorch.kernels.ScaleKernel(gpytorch.kernels.PeriodicKernel(batch_size=2))
>>> covar = covar_module(x)  # Output: LazyVariable of size (2 x 10 x 10)

PiecewisePolynomialKernel

class gpytorch.kernels.PiecewisePolynomialKernel(q=2, **kwargs)[source]

Computes a covariance matrix based on the Piecewise Polynomial kernel between inputs \(\mathbf{x_1}\) and \(\mathbf{x_2}\):

\[\begin{split}\begin{align} r &= \left\Vert x1 - x2 \right\Vert \\ j &= \lfloor \frac{D}{2} \rfloor + q +1 \\ K_{\text{ppD, 0}}(\mathbf{x_1}, \mathbf{x_2}) &= (1-r)^j_+ , \\ K_{\text{ppD, 1}}(\mathbf{x_1}, \mathbf{x_2}) &= (1-r)^{j+1}_+ ((j + 1)r + 1), \\ K_{\text{ppD, 2}}(\mathbf{x_1}, \mathbf{x_2}) &= (1-r)^{j+2}_+ ((1 + (j+2)r + \frac{j^2 + 4j + 3}{3}r^2), \\ K_{\text{ppD, 3}}(\mathbf{x_1}, \mathbf{x_2}) &= (1-r)^{j+3}_+ (1 + (j+3)r + \frac{6j^2 + 36j + 45}{15}r^2 + \frac{j^3 + 9j^2 + 23j +15}{15}r^3) \\ \end{align}\end{split}\]

where \(K_{\text{ppD, q}}\) is positive semidefinite in \(\mathbb{R}^{D}\) and \(q\) is the smoothness coefficient. See Rasmussen and Williams (2006) Equation 4.21.

Note

This kernel does not have an outputscale parameter. To add a scaling parameter, decorate this kernel with a gpytorch.kernels.ScaleKernel.

Parameters:
  • q (int (0, 1, 2 or 3)) – (default= 2) The smoothness parameter.

  • ard_num_dims (int, optional) – (Default: None) Set this if you want a separate lengthscale for each input dimension. It should be d if x1 is a … x n x d matrix.

  • batch_shape (torch.Size, optional) – (Default: None) Set this if you want a separate lengthscale for each batch of input data. It should be torch.Size([b1, b2]) for a b1 x b2 x n x m kernel output.

  • active_dims (Tuple(int)) – (Default: None) Set this if you want to compute the covariance of only a few input dimensions. The ints corresponds to the indices of the dimensions.

  • lengthscale_prior (Prior, optional) – (Default: None) Set this if you want to apply a prior to the lengthscale parameter.

  • lengthscale_constraint (Positive, optional) – (Default: Positive) Set this if you want to apply a constraint to the lengthscale parameter.

  • eps (float, optional) – (Default: 1e-6) The minimum value that the lengthscale can take (prevents divide by zero errors).

Example

>>> x = torch.randn(10, 5)
>>> # Non-batch option
>>> covar_module = gpytorch.kernels.ScaleKernel(
                        gpytorch.kernels.PiecewisePolynomialKernel(q = 2))
>>> # Non-batch: ARD (different lengthscale for each input dimension)
>>> covar_module = gpytorch.kernels.ScaleKernel(
                    gpytorch.kernels.PiecewisePolynomialKernel(q = 2, ard_num_dims=5)
                    )
>>> covar = covar_module(x)  # Output: LinearOperator of size (10 x 10)
>>> batch_x = torch.randn(2, 10, 5)
>>> # Batch: different lengthscale for each batch
>>> covar_module = gpytorch.kernels.ScaleKernel(
    gpytorch.kernels.PiecewisePolynomialKernel(q = 2, batch_shape=torch.Size([2]))
    )
>>> covar = covar_module(batch_x)  # Output: LinearOperator of size (2 x 10 x 10)
Parameters:

q (int, optional) –

PolynomialKernel

class gpytorch.kernels.PolynomialKernel(power, offset_prior=None, offset_constraint=None, **kwargs)[source]

Computes a covariance matrix based on the Polynomial kernel between inputs \(\mathbf{x_1}\) and \(\mathbf{x_2}\):

\[\begin{equation*} k_\text{Poly}(\mathbf{x_1}, \mathbf{x_2}) = (\mathbf{x_1}^\top \mathbf{x_2} + c)^{d}. \end{equation*}\]

where

  • \(c\) is an offset parameter.

Parameters:
  • offset_prior (gpytorch.priors.Prior) – Prior over the offset parameter (default None).

  • offset_constraint (Constraint, optional) – Constraint to place on offset parameter. Default: Positive.

  • active_dims (list) – List of data dimensions to operate on. len(active_dims) should equal num_dimensions.

  • power (int) –

PolynomialKernelGrad

class gpytorch.kernels.PolynomialKernelGrad(power, offset_prior=None, offset_constraint=None, **kwargs)[source]
Parameters:
  • power (int) –

  • offset_prior (Prior, optional) –

  • offset_constraint (Interval, optional) –

RBFKernel

class gpytorch.kernels.RBFKernel(ard_num_dims=None, batch_shape=None, active_dims=None, lengthscale_prior=None, lengthscale_constraint=None, eps=1e-06, **kwargs)[source]

Computes a covariance matrix based on the RBF (squared exponential) kernel between inputs \(\mathbf{x_1}\) and \(\mathbf{x_2}\):

\[\begin{equation*} k_{\text{RBF}}(\mathbf{x_1}, \mathbf{x_2}) = \exp \left( -\frac{1}{2} (\mathbf{x_1} - \mathbf{x_2})^\top \Theta^{-2} (\mathbf{x_1} - \mathbf{x_2}) \right) \end{equation*}\]

where \(\Theta\) is a lengthscale parameter. See gpytorch.kernels.Kernel for descriptions of the lengthscale options.

Note

This kernel does not have an outputscale parameter. To add a scaling parameter, decorate this kernel with a gpytorch.kernels.ScaleKernel.

Parameters:
  • ard_num_dims (int, optional) – Set this if you want a separate lengthscale for each input dimension. It should be d if x1 is a n x d matrix. (Default: None.)

  • batch_shape (torch.Size, optional) – Set this if you want a separate lengthscale for each batch of input data. It should be \(B_1 \times \ldots \times B_k\) if \(\mathbf x1\) is a \(B_1 \times \ldots \times B_k \times N \times D\) tensor.

  • active_dims ((int, ...), optional) – Set this if you want to compute the covariance of only a few input dimensions. The ints corresponds to the indices of the dimensions. (Default: None.)

  • lengthscale_prior (Prior, optional) – Set this if you want to apply a prior to the lengthscale parameter. (Default: None)

  • lengthscale_constraint (Interval, optional) – Set this if you want to apply a constraint to the lengthscale parameter. (Default: Positive.)

  • eps (float) – The minimum value that the lengthscale can take (prevents divide by zero errors). (Default: 1e-6.)

Variables:

lengthscale (torch.Tensor) – The lengthscale parameter. Size/shape of parameter depends on the ard_num_dims and batch_shape arguments.

Example

>>> x = torch.randn(10, 5)
>>> # Non-batch: Simple option
>>> covar_module = gpytorch.kernels.ScaleKernel(gpytorch.kernels.RBFKernel())
>>> # Non-batch: ARD (different lengthscale for each input dimension)
>>> covar_module = gpytorch.kernels.ScaleKernel(gpytorch.kernels.RBFKernel(ard_num_dims=5))
>>> covar = covar_module(x)  # Output: LinearOperator of size (10 x 10)
>>>
>>> batch_x = torch.randn(2, 10, 5)
>>> # Batch: Simple option
>>> covar_module = gpytorch.kernels.ScaleKernel(gpytorch.kernels.RBFKernel())
>>> # Batch: different lengthscale for each batch
>>> covar_module = gpytorch.kernels.ScaleKernel(gpytorch.kernels.RBFKernel(batch_shape=torch.Size([2])))
>>> covar = covar_module(x)  # Output: LinearOperator of size (2 x 10 x 10)

RQKernel

class gpytorch.kernels.RQKernel(alpha_constraint=None, **kwargs)[source]

Computes a covariance matrix based on the rational quadratic kernel between inputs \(\mathbf{x_1}\) and \(\mathbf{x_2}\):

\[\begin{equation*} k_{\text{RQ}}(\mathbf{x_1}, \mathbf{x_2}) = \left(1 + \frac{1}{2\alpha} (\mathbf{x_1} - \mathbf{x_2})^\top \Theta^{-2} (\mathbf{x_1} - \mathbf{x_2}) \right)^{-\alpha} \end{equation*}\]

where \(\Theta\) is a lengthscale parameter, and \(\alpha\) is the rational quadratic relative weighting parameter. See gpytorch.kernels.Kernel for descriptions of the lengthscale options.

Note

This kernel does not have an outputscale parameter. To add a scaling parameter, decorate this kernel with a gpytorch.kernels.ScaleKernel.

Parameters:
  • ard_num_dims – Set this if you want a separate lengthscale for each input dimension. It should be d if x1 is a n x d matrix. (Default: None.)

  • batch_shape – Set this if you want a separate lengthscale for each batch of input data. It should be \(B_1 \times \ldots \times B_k\) if \(\mathbf x1\) is a \(B_1 \times \ldots \times B_k \times N \times D\) tensor.

  • active_dims – Set this if you want to compute the covariance of only a few input dimensions. The ints corresponds to the indices of the dimensions. (Default: None.)

  • lengthscale_prior – Set this if you want to apply a prior to the lengthscale parameter. (Default: None)

  • lengthscale_constraint – Set this if you want to apply a constraint to the lengthscale parameter. (Default: Positive.)

  • alpha_constraint (Interval, optional) – Set this if you want to apply a constraint to the alpha parameter. (Default: Positive.)

  • eps – The minimum value that the lengthscale can take (prevents divide by zero errors). (Default: 1e-6.)

Variables:
  • lengthscale (torch.Tensor) – The lengthscale parameter. Size/shape of parameter depends on the ard_num_dims and batch_shape arguments.

  • alpha (torch.Tensor) – The rational quadratic relative weighting parameter. Size/shape of parameter depends on the batch_shape argument

SpectralDeltaKernel

class gpytorch.kernels.SpectralDeltaKernel(num_dims, num_deltas=128, Z_constraint=None, batch_shape=torch.Size([]), **kwargs)[source]

A kernel that supports spectral learning for GPs, where the underlying spectral density is modeled as a mixture of delta distributions (e.g., with point masses). This has been explored e.g. in Lazaro-Gredilla et al., 2010.

Conceptually, this kernel is similar to random Fourier features as implemented in RFFKernel, but instead of sampling a Gaussian to determine the spectrum sites, they are treated as learnable parameters.

When using CG for inference, this kernel supports linear space and time (in N) for training and inference.

Parameters:
  • num_dims (int) – Dimensionality of input data that this kernel will operate on. Note that if active_dims is used, this should be the length of the active dim set.

  • num_deltas (int, optional) – Number of point masses to learn.

  • num_dims

  • num_deltas

  • Z_constraint (Interval, optional) –

  • batch_shape (torch.Size, optional) –

initialize_from_data(train_x, train_y)[source]

Initialize the point masses for this kernel from the empirical spectrum of the data. To do this, we estimate the empirical spectrum’s CDF and then simply sample from it. This is analogous to how the SM kernel’s mixture is initialized, but we skip the last step of fitting a GMM to the samples and just use the samples directly.

SpectralMixtureKernel

class gpytorch.kernels.SpectralMixtureKernel(num_mixtures=None, ard_num_dims=1, batch_shape=torch.Size([]), mixture_scales_prior=None, mixture_scales_constraint=None, mixture_means_prior=None, mixture_means_constraint=None, mixture_weights_prior=None, mixture_weights_constraint=None, **kwargs)[source]

Computes a covariance matrix based on the Spectral Mixture Kernel between inputs \(\mathbf{x_1}\) and \(\mathbf{x_2}\).

It was proposed in Gaussian Process Kernels for Pattern Discovery and Extrapolation.

Note

Unlike other kernels,

Parameters:
  • num_mixtures (int) – The number of components in the mixture.

  • ard_num_dims (int) – Set this to match the dimensionality of the input. It should be d if x1 is a … x n x d matrix. (Default: 1.)

  • batch_shape (torch.Size, optional) – Set this if the data is batch of input data. It should be b_1 x … x b_j if x1 is a b_1 x … x b_j x n x d tensor. (Default: torch.Size([]).)

  • active_dims (float, optional) – Set this if you want to compute the covariance of only a few input dimensions. The ints corresponds to the indices of the dimensions. (Default: None.)

  • eps (float, optional) – The minimum value that the lengthscale can take (prevents divide by zero errors). (Default: 1e-6.)

  • mixture_scales_prior (Prior, optional) – A prior to set on the mixture_scales parameter

  • mixture_scales_constraint (Interval, optional) – A constraint to set on the mixture_scales parameter

  • mixture_means_prior (Prior, optional) – A prior to set on the mixture_means parameter

  • mixture_means_constraint (Interval, optional) – A constraint to set on the mixture_means parameter

  • mixture_weights_prior (Prior, optional) – A prior to set on the mixture_weights parameter

  • mixture_weights_constraint (Interval, optional) – A constraint to set on the mixture_weights parameter

Variables:
  • mixture_scales (torch.Tensor) – The lengthscale parameter. Given k mixture components, and … x n x d data, this will be of size … x k x 1 x d.

  • mixture_means (torch.Tensor) – The mixture mean parameters (… x k x 1 x d).

  • mixture_weights (torch.Tensor) – The mixture weight parameters (… x k).

Example

>>> # Non-batch
>>> x = torch.randn(10, 5)
>>> covar_module = gpytorch.kernels.SpectralMixtureKernel(num_mixtures=4, ard_num_dims=5)
>>> covar = covar_module(x)  # Output: LazyVariable of size (10 x 10)
>>>
>>> # Batch
>>> batch_x = torch.randn(2, 10, 5)
>>> covar_module = gpytorch.kernels.SpectralMixtureKernel(num_mixtures=4, batch_size=2, ard_num_dims=5)
>>> covar = covar_module(x)  # Output: LazyVariable of size (10 x 10)
Parameters:
  • num_mixtures (int, optional) –

  • ard_num_dims (int, optional) –

initialize_from_data(train_x, train_y, **kwargs)[source]

Initialize mixture components based on batch statistics of the data. You should use this initialization routine if your observations are not evenly spaced.

Parameters:
initialize_from_data_empspect(train_x, train_y)[source]

Initialize mixture components based on the empirical spectrum of the data. This will often be better than the standard initialize_from_data method, but it assumes that your inputs are evenly spaced.

Parameters:

Composition/Decoration Kernels

AdditiveKernel

class gpytorch.kernels.AdditiveKernel(*kernels)[source]

A Kernel that supports summing over multiple component kernels.

Example

>>> covar_module = RBFKernel(active_dims=torch.tensor([1])) + RBFKernel(active_dims=torch.tensor([2]))
>>> x1 = torch.randn(50, 2)
>>> additive_kernel_matrix = covar_module(x1)
Parameters:

kernels (iterable(Kernel)) – Kernels to add together.

MultiDeviceKernel

class gpytorch.kernels.MultiDeviceKernel(base_kernel, device_ids, output_device=None, create_cuda_context=True, **kwargs)[source]

Allocates the covariance matrix on distributed devices, e.g. multiple GPUs.

Parameters:
  • base_kernel (Kernel) – Base kernel to distribute

  • device_ids (list(torch.device)) – list of torch.device objects to place kernel chunks on

  • output_device (torch.device, optional) – Device where outputs will be placed

  • create_cuda_context (bool, optional) –

AdditiveStructureKernel

class gpytorch.kernels.AdditiveStructureKernel(base_kernel, num_dims, active_dims=None)[source]

A Kernel decorator for kernels with additive structure. If a kernel decomposes additively, then this module will be much more computationally efficient.

A kernel function k decomposes additively if it can be written as

\[\begin{equation*} k(\mathbf{x_1}, \mathbf{x_2}) = k'(x_1^{(1)}, x_2^{(1)}) + \ldots + k'(x_1^{(d)}, x_2^{(d)}) \end{equation*}\]

for some kernel \(k'\) that operates on a subset of dimensions.

Given a b x n x d input, AdditiveStructureKernel computes d one-dimensional kernels (using the supplied base_kernel), and then adds the component kernels together. Unlike AdditiveKernel, AdditiveStructureKernel computes each of the additive terms in batch, making it very fast.

Parameters:
  • base_kernel (Kernel) – The kernel to approximate with KISS-GP

  • num_dims (int) – The dimension of the input data.

  • active_dims (tuple of ints, optional) – Passed down to the base_kernel.

property is_stationary: bool

Kernel is stationary if the base kernel is stationary.

ProductKernel

class gpytorch.kernels.ProductKernel(*kernels)[source]

A Kernel that supports elementwise multiplying multiple component kernels together.

Example

>>> covar_module = RBFKernel(active_dims=torch.tensor([1])) * RBFKernel(active_dims=torch.tensor([2]))
>>> x1 = torch.randn(50, 2)
>>> kernel_matrix = covar_module(x1) # The RBF Kernel already decomposes multiplicatively, so this is foolish!
Parameters:

kernels (iterable(Kernel)) – Kernels to multiply together.

ProductStructureKernel

class gpytorch.kernels.ProductStructureKernel(base_kernel, num_dims, active_dims=None)[source]

A Kernel decorator for kernels with product structure. If a kernel decomposes multiplicatively, then this module will be much more computationally efficient.

A kernel function k has product structure if it can be written as

\[\begin{equation*} k(\mathbf{x_1}, \mathbf{x_2}) = k'(x_1^{(1)}, x_2^{(1)}) * \ldots * k'(x_1^{(d)}, x_2^{(d)}) \end{equation*}\]

for some kernel \(k'\) that operates on each dimension.

Given a b x n x d input, ProductStructureKernel computes d one-dimensional kernels (using the supplied base_kernel), and then multiplies the component kernels together. Unlike ProductKernel, ProductStructureKernel computes each of the product terms in batch, making it very fast.

See Product Kernel Interpolation for Scalable Gaussian Processes for more detail.

Parameters:
  • base_kernel (Kernel) – The kernel to approximate with KISS-GP

  • num_dims (int) – The dimension of the input data.

  • active_dims (tuple of ints, optional) – Passed down to the base_kernel.

property is_stationary: bool

Kernel is stationary if the base kernel is stationary.

ScaleKernel

class gpytorch.kernels.ScaleKernel(base_kernel, outputscale_prior=None, outputscale_constraint=None, **kwargs)[source]

Decorates an existing kernel object with an output scale, i.e.

\[\begin{equation*} K_{\text{scaled}} = \theta_\text{scale} K_{\text{orig}} \end{equation*}\]

where \(\theta_\text{scale}\) is the outputscale parameter.

In batch-mode (i.e. when \(x_1\) and \(x_2\) are batches of input matrices), each batch of data can have its own outputscale parameter by setting the batch_shape keyword argument to the appropriate number of batches.

Note

The outputscale parameter is parameterized on a log scale to constrain it to be positive. You can set a prior on this parameter using the outputscale_prior argument.

Parameters:
  • base_kernel (Kernel) – The base kernel to be scaled.

  • batch_shape (int, optional) – Set this if you want a separate outputscale for each batch of input data. It should be b if x1 is a b x n x d tensor. Default: torch.Size([])

  • outputscale_prior (Prior, optional) – Set this if you want to apply a prior to the outputscale parameter. Default: None

  • outputscale_constraint (Constraint, optional) – Set this if you want to apply a constraint to the outputscale parameter. Default: Positive.

base_kernel

The kernel module to be scaled.

Type:

Kernel

outputscale

The outputscale parameter. Size/shape of parameter depends on the batch_shape arguments.

Type:

Tensor

Example

>>> x = torch.randn(10, 5)
>>> base_covar_module = gpytorch.kernels.RBFKernel()
>>> scaled_covar_module = gpytorch.kernels.ScaleKernel(base_covar_module)
>>> covar = scaled_covar_module(x)  # Output: LinearOperator of size (10 x 10)
property is_stationary: bool

Kernel is stationary if base kernel is stationary.

Specialty Kernels

ArcKernel

class gpytorch.kernels.ArcKernel(base_kernel, delta_func=None, angle_prior=None, radius_prior=None, **kwargs)[source]

Computes a covariance matrix based on the Arc Kernel (https://arxiv.org/abs/1409.4011) between inputs \(\mathbf{x_1}\) and \(\mathbf{x_2}\). First it applies a cylindrical embedding:

\[\begin{split}g_{i}(\mathbf{x}) = \begin{cases} [0, 0]^{T} & \delta_{i}(\mathbf{x}) = \text{false}\\ \omega_{i} \left[ \sin{\pi\rho_{i}\frac{x_{i}}{u_{i}-l_{i}}}, \cos{\pi\rho_{i}\frac{x_{i}}{u_{i}-l_{i}}} \right] & \text{otherwise} \end{cases}\end{split}\]

where * \(\rho\) is the angle parameter. * \(\omega\) is a radius parameter.

then the kernel is built with the particular covariance function, e.g.

\[\begin{equation} k_{i}(\mathbf{x}, \mathbf{x'}) = \sigma^{2}\exp \left(-\frac{1}{2}d_{i}(\mathbf{x}, \mathbf{x^{'}}) \right)^{2} \end{equation}\]

and the produt between dimensions

\[\begin{equation} k_{i}(\mathbf{x}, \mathbf{x'}) = \sigma^{2}\exp \left(-\frac{1}{2}d_{i}(\mathbf{x}, \mathbf{x^{'}}) \right)^{2} \end{equation}\]

Note

This kernel does not have an outputscale parameter. To add a scaling parameter, decorate this kernel with a gpytorch.kernels.ScaleKernel. When using with an input of b x n x d dimensions, decorate this kernel with gpytorch.kernel.ProductStructuredKernel , setting the number of dims, `num_dims to d.

Note

This kernel does not have an ARD lengthscale option.

Parameters:
  • base_kernel (Kernel) – (Default gpytorch.kernels.MaternKernel(nu=2.5).) The euclidean covariance of choice.

  • ard_num_dims (int, optional) – (Default None.) The number of dimensions to compute the kernel for. The kernel has two parameters which are individually defined for each dimension, defaults to None

  • angle_prior (Prior, optional) – Set this if you want to apply a prior to the period angle parameter.

  • radius_prior (Prior, optional) – Set this if you want to apply a prior to the lengthscale parameter.

Variables:
  • radius (torch.Tensor) – The radius parameter. Size = *batch_shape x 1.

  • angle (torch.Tensor) – The period angle parameter. Size = *batch_shape x 1.

Example

>>> x = torch.randn(10, 5)
>>> # Non-batch: Simple option
... base_kernel = gpytorch.kernels.MaternKernel(nu=2.5)
>>> base_kernel.raw_lengthscale.requires_grad_(False)
>>> covar_module = gpytorch.kernels.ProductStructureKernel(
        gpytorch.kernels.ScaleKernel(
            ArcKernel(base_kernel,
                      angle_prior=gpytorch.priors.GammaPrior(0.5,1),
                      radius_prior=gpytorch.priors.GammaPrior(3,2),
                      ard_num_dims=x.shape[-1])),
        num_dims=x.shape[-1])
>>> covar = covar_module(x)
>>> print(covar.shape)
>>> # Now with batch
>>> covar_module = gpytorch.kernels.ProductStructureKernel(
        gpytorch.kernels.ScaleKernel(
            ArcKernel(base_kernel,
                      angle_prior=gpytorch.priors.GammaPrior(0.5,1),
                      radius_prior=gpytorch.priors.GammaPrior(3,2),
                      ard_num_dims=x.shape[-1])),
        num_dims=x.shape[-1])
>>> covar = covar_module(x
>>> print(covar.shape)
Parameters:

delta_func (callable, optional) –

HammingIMQKernel

..autoclass:: HammingIMQKernel
members:

IndexKernel

class gpytorch.kernels.IndexKernel(num_tasks, rank=1, prior=None, var_constraint=None, **kwargs)[source]

A kernel for discrete indices. Kernel is defined by a lookup table.

\[\begin{equation} k(i, j) = \left(BB^\top + \text{diag}(\mathbf v) \right)_{i, j} \end{equation}\]

where \(B\) is a low-rank matrix, and \(\mathbf v\) is a non-negative vector. These parameters are learned.

Parameters:
  • num_tasks (int) – Total number of indices.

  • batch_shape (torch.Size, optional) – Set if the MultitaskKernel is operating on batches of data (and you want different parameters for each batch)

  • rank (int) – Rank of \(B\) matrix. Controls the degree of correlation between the outputs. With a rank of 1 the outputs are identical except for a scaling factor.

  • prior (gpytorch.priors.Prior) – Prior for \(B\) matrix.

  • var_constraint (Constraint, optional) – Constraint for added diagonal component. Default: Positive.

covar_factor

The \(B\) matrix.

raw_var

The element-wise log of the \(\mathbf v\) vector.

LCMKernel

class gpytorch.kernels.LCMKernel(base_kernels, num_tasks, rank=1, task_covar_prior=None)[source]

This kernel supports the LCM kernel. It allows the user to specify a list of base kernels to use, and individual MultitaskKernel objects are fit to each of them. The final kernel is the linear sum of the Kronecker product of all these base kernels with their respective MultitaskKernel objects.

The returned object is of type KroneckerProductLinearOperator.

Parameters:
  • base_kernels (list(Kernel)) –

  • num_tasks (int) –

  • rank (int or list(T), optional) –

  • task_covar_prior (Prior, optional) –

num_outputs_per_input(x1, x2)[source]

Given n data points x1 and m datapoints x2, this multitask kernel returns an (n*num_tasks) x (m*num_tasks) covariance matrix.

MultitaskKernel

class gpytorch.kernels.MultitaskKernel(data_covar_module, num_tasks, rank=1, task_covar_prior=None, **kwargs)[source]

Kernel supporting Kronecker style multitask Gaussian processes (where every data point is evaluated at every task) using gpytorch.kernels.IndexKernel as a basic multitask kernel.

Given a base covariance module to be used for the data, \(K_{XX}\), this kernel computes a task kernel of specified size \(K_{TT}\) and returns \(K = K_{TT} \otimes K_{XX}\). as an KroneckerProductLinearOperator.

Parameters:
  • data_covar_module (Kernel) – Kernel to use as the data kernel.

  • num_tasks (int) – Number of tasks

  • rank (int, optional) – (default 1) Rank of index kernel to use for task covariance matrix.

  • task_covar_prior (Prior, optional) – (default None) Prior to use for task kernel. See gpytorch.kernels.IndexKernel for details.

  • kwargs (dict) – Additional arguments to pass to the kernel.

  • data_covar_module

  • num_tasks

  • rank

  • task_covar_prior

num_outputs_per_input(x1, x2)[source]

Given n data points x1 and m datapoints x2, this multitask kernel returns an (n*num_tasks) x (m*num_tasks) covariance matrix.

RBFKernelGrad

class gpytorch.kernels.RBFKernelGrad(ard_num_dims=None, batch_shape=None, active_dims=None, lengthscale_prior=None, lengthscale_constraint=None, eps=1e-06, **kwargs)[source]

Computes a covariance matrix of the RBF kernel that models the covariance between the values and partial derivatives for inputs \(\mathbf{x_1}\) and \(\mathbf{x_2}\).

See gpytorch.kernels.Kernel for descriptions of the lengthscale options.

Note

This kernel does not have an outputscale parameter. To add a scaling parameter, decorate this kernel with a gpytorch.kernels.ScaleKernel.

Parameters:
  • ard_num_dims (int, optional) – Set this if you want a separate lengthscale for each input dimension. It should be d if x1 is a n x d matrix. (Default: None.)

  • batch_shape (torch.Size, optional) – Set this if you want a separate lengthscale for each batch of input data. It should be \(B_1 \times \ldots \times B_k\) if \(\mathbf x1\) is a \(B_1 \times \ldots \times B_k \times N \times D\) tensor.

  • active_dims ((int, ...), optional) – Set this if you want to compute the covariance of only a few input dimensions. The ints corresponds to the indices of the dimensions. (Default: None.)

  • lengthscale_prior (Prior, optional) – Set this if you want to apply a prior to the lengthscale parameter. (Default: None)

  • lengthscale_constraint (Interval, optional) – Set this if you want to apply a constraint to the lengthscale parameter. (Default: Positive.)

  • eps (float) – The minimum value that the lengthscale can take (prevents divide by zero errors). (Default: 1e-6.)

Variables:

lengthscale (torch.Tensor) – The lengthscale parameter. Size/shape of parameter depends on the ard_num_dims and batch_shape arguments.

Example

>>> x = torch.randn(10, 5)
>>> # Non-batch: Simple option
>>> covar_module = gpytorch.kernels.ScaleKernel(gpytorch.kernels.RBFKernelGrad())
>>> covar = covar_module(x)  # Output: LinearOperator of size (60 x 60), where 60 = n * (d + 1)
>>>
>>> batch_x = torch.randn(2, 10, 5)
>>> # Batch: Simple option
>>> covar_module = gpytorch.kernels.ScaleKernel(gpytorch.kernels.RBFKernelGrad())
>>> # Batch: different lengthscale for each batch
>>> covar_module = gpytorch.kernels.ScaleKernel(gpytorch.kernels.RBFKernelGrad(batch_shape=torch.Size([2])))
>>> covar = covar_module(x)  # Output: LinearOperator of size (2 x 60 x 60)

RBFKernelGradGrad

class gpytorch.kernels.RBFKernelGradGrad(ard_num_dims=None, batch_shape=None, active_dims=None, lengthscale_prior=None, lengthscale_constraint=None, eps=1e-06, **kwargs)[source]

Computes a covariance matrix of the RBF kernel that models the covariance between the values and first and second (non-mixed) partial derivatives for inputs \(\mathbf{x_1}\) and \(\mathbf{x_2}\).

See gpytorch.kernels.Kernel for descriptions of the lengthscale options.

Note

This kernel does not have an outputscale parameter. To add a scaling parameter, decorate this kernel with a gpytorch.kernels.ScaleKernel.

Parameters:
  • ard_num_dims (int, optional) – Set this if you want a separate lengthscale for each input dimension. It should be d if x1 is a n x d matrix. (Default: None.)

  • batch_shape (torch.Size, optional) – Set this if you want a separate lengthscale for each batch of input data. It should be \(B_1 \times \ldots \times B_k\) if \(\mathbf x1\) is a \(B_1 \times \ldots \times B_k \times N \times D\) tensor.

  • active_dims ((int, ...), optional) – Set this if you want to compute the covariance of only a few input dimensions. The ints corresponds to the indices of the dimensions. (Default: None.)

  • lengthscale_prior (Prior, optional) – Set this if you want to apply a prior to the lengthscale parameter. (Default: None)

  • lengthscale_constraint (Interval, optional) – Set this if you want to apply a constraint to the lengthscale parameter. (Default: Positive.)

  • eps (float) – The minimum value that the lengthscale can take (prevents divide by zero errors). (Default: 1e-6.)

Variables:

lengthscale (torch.Tensor) – The lengthscale parameter. Size/shape of parameter depends on the ard_num_dims and batch_shape arguments.

Example

>>> x = torch.randn(10, 5)
>>> # Non-batch: Simple option
>>> covar_module = gpytorch.kernels.ScaleKernel(gpytorch.kernels.RBFKernelGradGrad())
>>> covar = covar_module(x)  # Output: LinearOperator of size (110 x 110), where 110 = n * (2*d + 1)
>>>
>>> batch_x = torch.randn(2, 10, 5)
>>> # Batch: Simple option
>>> covar_module = gpytorch.kernels.ScaleKernel(gpytorch.kernels.RBFKernelGradGrad())
>>> # Batch: different lengthscale for each batch
>>> covar_module = gpytorch.kernels.ScaleKernel(gpytorch.kernels.RBFKernelGradGrad(batch_shape=torch.Size([2])))
>>> covar = covar_module(x)  # Output: LinearOperator of size (2 x 110 x 110)

Kernels for Scalable GP Regression Methods

GridKernel

class gpytorch.kernels.GridKernel(base_kernel, grid, interpolation_mode=False, active_dims=None)[source]

If the input data \(X\) are regularly spaced on a grid, then GridKernel can dramatically speed up computatations for stationary kernel.

GridKernel exploits Toeplitz and Kronecker structure within the covariance matrix. See Fast kernel learning for multidimensional pattern extrapolation for more info.

Note

GridKernel can only wrap stationary kernels (such as RBF, Matern, Periodic, Spectral Mixture, etc.)

Parameters:
  • base_kernel (Kernel) – The kernel to speed up with grid methods.

  • grid (Tensor) – A g x d tensor where column i consists of the projections of the grid in dimension i.

  • active_dims (tuple of ints, optional) – Passed down to the base_kernel.

  • interpolation_mode (bool) – Used for GridInterpolationKernel where we want the covariance between points in the projections of the grid of each dimension. We do this by treating grid as d batches of g x 1 tensors by calling base_kernel(grid, grid) with last_dim_is_batch to get a d x g x g Tensor which we Kronecker product to get a g x g KroneckerProductLinearOperator.

register_buffer_list(base_name, tensors)[source]

Helper to register several buffers at once under a single base name

update_grid(grid)[source]

Supply a new grid if it ever changes.

GridInterpolationKernel

class gpytorch.kernels.GridInterpolationKernel(base_kernel, grid_size, num_dims=None, grid_bounds=None, active_dims=None)[source]

Implements the KISS-GP (or SKI) approximation for a given kernel. It was proposed in Kernel Interpolation for Scalable Structured Gaussian Processes, and offers extremely fast and accurate Kernel approximations for large datasets.

Given a base kernel k, the covariance \(k(\mathbf{x_1}, \mathbf{x_2})\) is approximated by using a grid of regularly spaced inducing points:

\[\begin{equation*} k(\mathbf{x_1}, \mathbf{x_2}) = \mathbf{w_{x_1}}^\top K_{U,U} \mathbf{w_{x_2}} \end{equation*}\]

where

  • \(U\) is the set of gridded inducing points

  • \(K_{U,U}\) is the kernel matrix between the inducing points

  • \(\mathbf{w_{x_1}}\) and \(\mathbf{w_{x_2}}\) are sparse vectors based on \(\mathbf{x_1}\) and \(\mathbf{x_2}\) that apply cubic interpolation.

The user should supply the size of the grid (using the grid_size attribute). To choose a reasonable grid value, we highly recommend using the gpytorch.utils.grid.choose_grid_size() helper function. The bounds of the grid will automatically be determined by data.

(Alternatively, you can hard-code bounds using the grid_bounds, which will speed up this kernel’s computations.)

Note

GridInterpolationKernel can only wrap stationary kernels (such as RBF, Matern, Periodic, Spectral Mixture, etc.)

Parameters:
  • base_kernel (Kernel) – The kernel to approximate with KISS-GP

  • grid_size (Union[int, List[int]]) – The size of the grid in each dimension. If a single int is provided, then every dimension will have the same grid size.

  • num_dims (int) – The dimension of the input data. Required if grid_bounds=None

  • grid_bounds (tuple(float, float), optional) – The bounds of the grid, if known (high performance mode). The length of the tuple must match the number of dimensions. The entries represent the min/max values for each dimension.

  • active_dims (tuple of ints, optional) – Passed down to the base_kernel.

InducingPointKernel

class gpytorch.kernels.InducingPointKernel(base_kernel, inducing_points, likelihood, active_dims=None)[source]
Parameters:

RFFKernel

class gpytorch.kernels.RFFKernel(num_samples, num_dims=None, **kwargs)[source]

Computes a covariance matrix based on Random Fourier Features with the RBFKernel.

Random Fourier features was originally proposed in ‘Random Features for Large-Scale Kernel Machines’ by Rahimi and Recht (2008). Instead of the shifted cosine features from Rahimi and Recht (2008), we use the sine and cosine features which is a lower-variance estimator — see ‘On the Error of Random Fourier Features’ by Sutherland and Schneider (2015).

By Bochner’s theorem, any continuous kernel \(k\) is positive definite if and only if it is the Fourier transform of a non-negative measure \(p(\omega)\), i.e.

\[\begin{equation} k(x, x') = k(x - x') = \int p(\omega) e^{i(\omega^\top (x - x'))} d\omega. \end{equation}\]

where \(p(\omega)\) is a normalized probability measure if \(k(0)=1\).

For the RBF kernel,

\[\begin{equation} k(\Delta) = \exp{(-\frac{\Delta^2}{2\sigma^2})}$ and $p(\omega) = \exp{(-\frac{\sigma^2\omega^2}{2})} \end{equation}\]

where \(\Delta = x - x'\).

Given datapoint \(x\in \mathbb{R}^d\), we can construct its random Fourier features \(z(x) \in \mathbb{R}^{2D}\) by

\[\begin{split}\begin{equation} z(x) = \sqrt{\frac{1}{D}} \begin{bmatrix} \cos(\omega_1^\top x)\\ \sin(\omega_1^\top x)\\ \cdots \\ \cos(\omega_D^\top x)\\ \sin(\omega_D^\top x) \end{bmatrix}, \omega_1, \ldots, \omega_D \sim p(\omega) \end{equation}\end{split}\]

such that we have an unbiased Monte Carlo estimator

\[\begin{equation} k(x, x') = k(x - x') \approx z(x)^\top z(x') = \frac{1}{D}\sum_{i=1}^D \cos(\omega_i^\top (x - x')). \end{equation}\]

Note

When this kernel is used in batch mode, the random frequencies are drawn independently across the batch dimension as well by default.

Parameters:
  • num_samples (int) – Number of random frequencies to draw. This is \(D\) in the above papers. This will produce \(D\) sine features and \(D\) cosine features for a total of \(2D\) random Fourier features.

  • num_dims (int, optional) – (Default None.) Dimensionality of the data space. This is \(d\) in the above papers. Note that if you want an independent lengthscale for each dimension, set ard_num_dims equal to num_dims. If unspecified, it will be inferred the first time forward is called.

Variables:

randn_weights (torch.Tensor) – The random frequencies that are drawn once and then fixed.

Example

>>> # This will infer `num_dims` automatically
>>> kernel= gpytorch.kernels.RFFKernel(num_samples=5)
>>> x = torch.randn(10, 3)
>>> kxx = kernel(x, x).to_dense()
>>> print(kxx.randn_weights.size())
torch.Size([3, 5])