# gpytorch.kernels¶

If you don’t know what kernel to use, we recommend that you start out with a gpytorch.kernels.ScaleKernel(gpytorch.kernels.RBFKernel).

## Kernel¶

class gpytorch.kernels.Kernel(ard_num_dims=None, batch_shape=torch.Size([]), active_dims=None, lengthscale_prior=None, lengthscale_constraint=None, eps=1e-06, **kwargs)[source]

Kernels in GPyTorch are implemented as a gpytorch.Module that, when called on two torch.tensor objects x1 and x2 returns either a torch.tensor or a gpytorch.lazy.LazyTensor that represents the covariance matrix between x1 and x2.

In the typical use case, to extend this class means to implement the forward() method.

Note

The __call__() does some additional internal work. In particular, all kernels are lazily evaluated so that, in some cases, we can index in to the kernel matrix before actually computing it. Furthermore, many built in kernel modules return LazyTensors that allow for more efficient inference than if we explicitly computed the kernel matrix itself.

As a result, if you want to use a gpytorch.kernels.Kernel object just to get an actual torch.tensor representing the covariance matrix, you may need to call the gpytorch.lazy.LazyTensor.evaluate() method on the output.

This base Kernel class includes a lengthscale parameter $$\Theta$$, which is used by many common kernel functions. There are a few options for the lengthscale:

• Default: No lengthscale (i.e. $$\Theta$$ is the identity matrix).
• Single lengthscale: One lengthscale can be applied to all input dimensions/batches (i.e. $$\Theta$$ is a constant diagonal matrix). This is controlled by setting the attribute has_lengthscale=True.
• ARD: Each input dimension gets its own separate lengthscale (i.e. $$\Theta$$ is a non-constant diagonal matrix). This is controlled by the ard_num_dims keyword argument (as well as has_lengthscale=True).

In batch-mode (i.e. when $$x_1$$ and $$x_2$$ are batches of input matrices), each batch of data can have its own lengthscale parameter by setting the batch_shape keyword argument to the appropriate number of batches.

Note

The lengthscale parameter is parameterized on a log scale to constrain it to be positive. You can set a prior on this parameter using the lengthscale_prior argument.

Base Args:
ard_num_dims (int, optional):
Set this if you want a separate lengthscale for each input dimension. It should be d if x1 is a n x d matrix. Default: None
batch_shape (torch.Size, optional):
Set this if you want a separate lengthscale for each batch of input data. It should be b1 x … x bk if x1 is a b1 x … x bk x n x d tensor.
active_dims (tuple of ints, optional):
Set this if you want to compute the covariance of only a few input dimensions. The ints corresponds to the indices of the dimensions. Default: None.
lengthscale_prior (Prior, optional):
Set this if you want to apply a prior to the lengthscale parameter. Default: None
lengthscale_constraint (Constraint, optional):
Set this if you want to apply a constraint to the lengthscale parameter. Default: Positive.
eps (float):
The minimum value that the lengthscale can take (prevents divide by zero errors). Default: 1e-6.
Base Attributes:
lengthscale (Tensor):
The lengthscale parameter. Size/shape of parameter depends on the ard_num_dims and batch_shape arguments.
Example:
>>> covar_module = gpytorch.kernels.LinearKernel()
>>> x1 = torch.randn(50, 3)
>>> lazy_covar_matrix = covar_module(x1) # Returns a RootLazyTensor
>>> tensor_covar_matrix = lazy_covar_matrix.evaluate() # Gets the actual tensor for this kernel matrix
covar_dist(x1, x2, diag=False, last_dim_is_batch=False, square_dist=False, dist_postprocess_func=<function default_postprocess_script>, postprocess=True, **params)[source]

This is a helper method for computing the Euclidean distance between all pairs of points in x1 and x2.

Args:
x1 (Tensor n x d or b1 x … x bk x n x d):
First set of data.
x2 (Tensor m x d or b1 x … x bk x m x d):
Second set of data.
diag (bool):
Should we return the whole distance matrix, or just the diagonal? If True, we must have x1 == x2.
last_dim_is_batch (tuple, optional):
Is the last dimension of the data a batch dimension or not?
square_dist (bool):
Should we square the distance matrix before returning?
Returns:
(Tensor, Tensor) corresponding to the distance matrix between x1 and x2. The shape depends on the kernel’s mode * diag=False * diag=False and last_dim_is_batch=True: (b x d x n x n) * diag=True * diag=True and last_dim_is_batch=True: (b x d x n)
forward(x1, x2, diag=False, last_dim_is_batch=False, **params)[source]

Computes the covariance between x1 and x2. This method should be imlemented by all Kernel subclasses.

Args:
x1 (Tensor n x d or b x n x d):
First set of data
x2 (Tensor m x d or b x m x d):
Second set of data
diag (bool):
Should the Kernel compute the whole kernel, or just the diag?
last_dim_is_batch (tuple, optional):
If this is true, it treats the last dimension of the data as another batch dimension. (Useful for additive structure over the dimensions). Default: False
Returns:
Tensor or gpytorch.lazy.LazyTensor.

The exact size depends on the kernel’s evaluation mode:

• full_covar: n x m or b x n x m
• full_covar with last_dim_is_batch=True: k x n x m or b x k x n x m
• diag: n or b x n
• diag with last_dim_is_batch=True: k x n or b x k x n
is_stationary

Property to indicate whether kernel is stationary or not.

num_outputs_per_input(x1, x2)[source]

How many outputs are produced per input (default 1) if x1 is size n x d and x2 is size m x d, then the size of the kernel will be (n * num_outputs_per_input) x (m * num_outputs_per_input) Default: 1

## Standard Kernels¶

### CosineKernel¶

class gpytorch.kernels.CosineKernel(period_length_prior=None, period_length_constraint=None, **kwargs)[source]

Computes a covariance matrix based on the cosine kernel between inputs $$\mathbf{x_1}$$ and $$\mathbf{x_2}$$:

$\begin{equation*} k_{\text{Cosine}}(\mathbf{x_1}, \mathbf{x_2}) = \cos \left( \pi \Vert \mathbf{x_1} - \mathbf{x_2} \Vert_2 / p \right) \end{equation*}$

where $$p$$ is the period length parameter.

Args:
batch_shape (torch.Size, optional):
Set this if you want a separate lengthscale for each batch of input data. It should be b if x1 is a b x n x d tensor. Default: torch.Size([])
active_dims (tuple of ints, optional):
Set this if you want to compute the covariance of only a few input dimensions. The ints corresponds to the indices of the dimensions. Default: None.
period_length_prior (Prior, optional):
Set this if you want to apply a prior to the period length parameter. Default: None
period_length_constraint (Constraint, optional):
Set this if you want to apply a constraint to the period length parameter. Default: Positive.
eps (float):
The minimum value that the lengthscale/period length can take (prevents divide by zero errors). Default: 1e-6.
Attributes:
period_length (Tensor):
The period length parameter. Size = *batch_shape x 1 x 1.
Example:
>>> x = torch.randn(10, 5)
>>> # Non-batch: Simple option
>>> covar_module = gpytorch.kernels.ScaleKernel(gpytorch.kernels.CosineKernel())
>>>
>>> batch_x = torch.randn(2, 10, 5)
>>> # Batch: Simple option
>>> covar_module = gpytorch.kernels.ScaleKernel(gpytorch.kernels.CosineKernel())
>>> # Batch: different lengthscale for each batch
>>> covar_module = gpytorch.kernels.ScaleKernel(gpytorch.kernels.CosineKernel(batch_shape=torch.Size([2])))
>>> covar = covar_module(x)  # Output: LazyVariable of size (2 x 10 x 10)

### CylindricalKernel¶

class gpytorch.kernels.CylindricalKernel(num_angular_weights: int, radial_base_kernel: gpytorch.kernels.kernel.Kernel, eps: Optional[int] = 1e-06, angular_weights_prior: Optional[gpytorch.priors.prior.Prior] = None, angular_weights_constraint: Optional[gpytorch.constraints.constraints.Interval] = None, alpha_prior: Optional[gpytorch.priors.prior.Prior] = None, alpha_constraint: Optional[gpytorch.constraints.constraints.Interval] = None, beta_prior: Optional[gpytorch.priors.prior.Prior] = None, beta_constraint: Optional[gpytorch.constraints.constraints.Interval] = None, **kwargs)[source]

Computes a covariance matrix based on the Cylindrical Kernel between inputs $$\mathbf{x_1}$$ and $$\mathbf{x_2}$$. It was proposed in BOCK: Bayesian Optimization with Cylindrical Kernels. See http://proceedings.mlr.press/v80/oh18a.html for more details

Note

The data must lie completely within the unit ball.

Args:
num_angular_weights (int):
The number of components in the angular kernel
The base kernel for computing the radial kernel
batch_size (int, optional):
Set this if the data is batch of input data. It should be b if x1 is a b x n x d tensor. Default: 1
eps (float):
Small floating point number used to improve numerical stability in kernel computations. Default: 1e-6
param_transform (function, optional):
Set this if you want to use something other than softplus to ensure positiveness of parameters.
inv_param_transform (function, optional):
Set this to allow setting parameters directly in transformed space and sampling from priors. Automatically inferred for common transformations such as torch.exp or torch.nn.functional.softplus.

### LinearKernel¶

class gpytorch.kernels.LinearKernel(num_dimensions=None, offset_prior=None, variance_prior=None, variance_constraint=None, **kwargs)[source]

Computes a covariance matrix based on the Linear kernel between inputs $$\mathbf{x_1}$$ and $$\mathbf{x_2}$$:

$\begin{equation*} k_\text{Linear}(\mathbf{x_1}, \mathbf{x_2}) = v\mathbf{x_1}^\top \mathbf{x_2}. \end{equation*}$

where

• $$v$$ is a variance parameter.

Note

To implement this efficiently, we use a gpytorch.lazy.RootLazyTensor during training and a gpytorch.lazy.MatmulLazyTensor during test. These lazy tensors represent matrices of the form $$K = XX^{\top}$$ and $$K = XZ^{\top}$$. This makes inference efficient because a matrix-vector product $$Kv$$ can be computed as $$Kv=X(X^{\top}v)$$, where the base multiply $$Xv$$ takes only $$O(nd)$$ time and space.

Args:
variance_prior (gpytorch.priors.Prior):
Prior over the variance parameter (default None).
variance_constraint (Constraint, optional):
Constraint to place on variance parameter. Default: Positive.
active_dims (list):
List of data dimensions to operate on. len(active_dims) should equal num_dimensions.

### MaternKernel¶

class gpytorch.kernels.MaternKernel(nu=2.5, **kwargs)[source]

Computes a covariance matrix based on the Matern kernel between inputs $$\mathbf{x_1}$$ and $$\mathbf{x_2}$$:

$\begin{equation*} k_{\text{Matern}}(\mathbf{x_1}, \mathbf{x_2}) = \frac{2^{1 - \nu}}{\Gamma(\nu)} \left( \sqrt{2 \nu} d \right)^{\nu} K_\nu \left( \sqrt{2 \nu} d \right) \end{equation*}$

where

• $$d = (\mathbf{x_1} - \mathbf{x_2})^\top \Theta^{-2} (\mathbf{x_1} - \mathbf{x_2})$$ is the distance between $$x_1$$ and $$x_2$$ scaled by the lengthscale parameter $$\Theta$$.
• $$\nu$$ is a smoothness parameter (takes values 1/2, 3/2, or 5/2). Smaller values are less smooth.
• $$K_\nu$$ is a modified Bessel function.

There are a few options for the lengthscale parameter $$\Theta$$: See gpytorch.kernels.Kernel for descriptions of the lengthscale options.

Note

This kernel does not have an outputscale parameter. To add a scaling parameter, decorate this kernel with a gpytorch.kernels.ScaleKernel.

Parameters: nu (float (0.5, 1.5, or 2.5)) – (Default: 2.5) The smoothness parameter. ard_num_dims (int, optional) – (Default: None) Set this if you want a separate lengthscale for each input dimension. It should be d if x1 is a … x n x d matrix. batch_shape (torch.Size, optional) – (Default: None) Set this if you want a separate lengthscale for each batch of input data. It should be torch.Size([b1, b2]) for a b1 x b2 x n x m kernel output. active_dims (Tuple(int)) – (Default: None) Set this if you want to compute the covariance of only a few input dimensions. The ints corresponds to the indices of the dimensions. lengthscale_prior (Prior, optional) – (Default: None) Set this if you want to apply a prior to the lengthscale parameter. lengthscale_constraint (Interval, optional) – (Default: Positive) Set this if you want to apply a constraint to the lengthscale parameter. eps (float, optional) – (Default: 1e-6) The minimum value that the lengthscale can take (prevents divide by zero errors). lengthscale (torch.Tensor) – The lengthscale parameter. Size/shape of parameter depends on the ard_num_dims and batch_shape arguments.
Example:
>>> x = torch.randn(10, 5)
>>> # Non-batch: Simple option
>>> covar_module = gpytorch.kernels.ScaleKernel(gpytorch.kernels.MaternKernel(nu=0.5))
>>> # Non-batch: ARD (different lengthscale for each input dimension)
>>> covar_module = gpytorch.kernels.ScaleKernel(gpytorch.kernels.MaternKernel(nu=0.5, ard_num_dims=5))
>>> covar = covar_module(x)  # Output: LazyVariable of size (10 x 10)
>>>
>>> batch_x = torch.randn(2, 10, 5)
>>> # Batch: Simple option
>>> covar_module = gpytorch.kernels.ScaleKernel(gpytorch.kernels.MaternKernel(nu=0.5))
>>> # Batch: different lengthscale for each batch
>>> covar_module = gpytorch.kernels.MaternKernel(nu=0.5, batch_shape=torch.Size([2])
>>> covar = covar_module(x)  # Output: LazyVariable of size (2 x 10 x 10)

### PeriodicKernel¶

class gpytorch.kernels.PeriodicKernel(period_length_prior=None, period_length_constraint=None, **kwargs)[source]

Computes a covariance matrix based on the periodic kernel between inputs $$\mathbf{x_1}$$ and $$\mathbf{x_2}$$:

$\begin{equation*} k_{\text{Periodic}}(\mathbf{x_1}, \mathbf{x_2}) = \exp \left( \frac{2 \sin^2 \left( \pi \Vert \mathbf{x_1} - \mathbf{x_2} \Vert_1 / p \right) } { \ell^2 } \right) \end{equation*}$

where

• $$p$$ is the periord length parameter.
• $$\ell$$ is a lengthscale parameter.

Note

This kernel does not have an outputscale parameter. To add a scaling parameter, decorate this kernel with a gpytorch.kernels.ScaleKernel.

Note

This kernel does not have an ARD lengthscale option.

Args:
batch_shape (torch.Size, optional):
Set this if you want a separate lengthscale for each
batch of input data. It should be b if x1 is a b x n x d tensor. Default: torch.Size([]).
active_dims (tuple of ints, optional):
Set this if you want to compute the covariance of only a few input dimensions. The ints corresponds to the indices of the dimensions. Default: None.
period_length_prior (Prior, optional):
Set this if you want to apply a prior to the period length parameter. Default: None.
lengthscale_prior (Prior, optional):
Set this if you want to apply a prior to the lengthscale parameter. Default: None.
lengthscale_constraint (Constraint, optional):
Set this if you want to apply a constraint to the value of the lengthscale. Default: Positive.
period_length_constraint (Constraint, optional):
Set this if you want to apply a constraint to the value of the period length. Default: Positive.
eps (float):
The minimum value that the lengthscale/period length can take (prevents divide by zero errors). Default: 1e-6.
Attributes:
lengthscale (Tensor):
The lengthscale parameter. Size = *batch_shape x 1 x 1.
period_length (Tensor):
The period length parameter. Size = *batch_shape x 1 x 1.
Example:
>>> x = torch.randn(10, 5)
>>> # Non-batch: Simple option
>>> covar_module = gpytorch.kernels.ScaleKernel(gpytorch.kernels.PeriodicKernel())
>>>
>>> batch_x = torch.randn(2, 10, 5)
>>> # Batch: Simple option
>>> covar_module = gpytorch.kernels.ScaleKernel(gpytorch.kernels.PeriodicKernel())
>>> # Batch: different lengthscale for each batch
>>> covar_module = gpytorch.kernels.ScaleKernel(gpytorch.kernels.PeriodicKernel(batch_size=2))
>>> covar = covar_module(x)  # Output: LazyVariable of size (2 x 10 x 10)

### PolynomialKernel¶

class gpytorch.kernels.PolynomialKernel(power: int, offset_prior: Optional[gpytorch.priors.prior.Prior] = None, offset_constraint: Optional[gpytorch.constraints.constraints.Interval] = None, **kwargs)[source]

Computes a covariance matrix based on the Polynomial kernel between inputs $$\mathbf{x_1}$$ and $$\mathbf{x_2}$$:

$\begin{equation*} k_\text{Poly}(\mathbf{x_1}, \mathbf{x_2}) = (\mathbf{x_1}^\top \mathbf{x_2} + c)^{d}. \end{equation*}$

where

• $$c$$ is an offset parameter.
Args:
offset_prior (gpytorch.priors.Prior):
Prior over the offset parameter (default None).
offset_constraint (Constraint, optional):
Constraint to place on offset parameter. Default: Positive.
active_dims (list):
List of data dimensions to operate on. len(active_dims) should equal num_dimensions.

class gpytorch.kernels.PolynomialKernelGrad(power: int, offset_prior: Optional[gpytorch.priors.prior.Prior] = None, offset_constraint: Optional[gpytorch.constraints.constraints.Interval] = None, **kwargs)[source]

### RBFKernel¶

class gpytorch.kernels.RBFKernel(ard_num_dims=None, batch_shape=torch.Size([]), active_dims=None, lengthscale_prior=None, lengthscale_constraint=None, eps=1e-06, **kwargs)[source]

Computes a covariance matrix based on the RBF (squared exponential) kernel between inputs $$\mathbf{x_1}$$ and $$\mathbf{x_2}$$:

$\begin{equation*} k_{\text{RBF}}(\mathbf{x_1}, \mathbf{x_2}) = \exp \left( -\frac{1}{2} (\mathbf{x_1} - \mathbf{x_2})^\top \Theta^{-2} (\mathbf{x_1} - \mathbf{x_2}) \right) \end{equation*}$

where $$\Theta$$ is a lengthscale parameter. See gpytorch.kernels.Kernel for descriptions of the lengthscale options.

Note

This kernel does not have an outputscale parameter. To add a scaling parameter, decorate this kernel with a gpytorch.kernels.ScaleKernel.

Args:
ard_num_dims (int, optional):
Set this if you want a separate lengthscale for each input dimension. It should be d if x1 is a n x d matrix. Default: None
batch_shape (torch.Size, optional):
Set this if you want a separate lengthscale for each batch of input data. It should be b if x1 is a b x n x d tensor. Default: torch.Size([]).
active_dims (tuple of ints, optional):
Set this if you want to compute the covariance of only a few input dimensions. The ints corresponds to the indices of the dimensions. Default: None.
lengthscale_prior (Prior, optional):
Set this if you want to apply a prior to the lengthscale parameter. Default: None.
lengthscale_constraint (Constraint, optional):
Set this if you want to apply a constraint to the lengthscale parameter. Default: Positive.
eps (float):
The minimum value that the lengthscale can take (prevents divide by zero errors). Default: 1e-6.
Attributes:
lengthscale (Tensor):
The lengthscale parameter. Size/shape of parameter depends on the ard_num_dims and batch_shape arguments.
Example:
>>> x = torch.randn(10, 5)
>>> # Non-batch: Simple option
>>> covar_module = gpytorch.kernels.ScaleKernel(gpytorch.kernels.RBFKernel())
>>> # Non-batch: ARD (different lengthscale for each input dimension)
>>> covar_module = gpytorch.kernels.ScaleKernel(gpytorch.kernels.RBFKernel(ard_num_dims=5))
>>> covar = covar_module(x)  # Output: LazyTensor of size (10 x 10)
>>>
>>> batch_x = torch.randn(2, 10, 5)
>>> # Batch: Simple option
>>> covar_module = gpytorch.kernels.ScaleKernel(gpytorch.kernels.RBFKernel())
>>> # Batch: different lengthscale for each batch
>>> covar_module = gpytorch.kernels.ScaleKernel(gpytorch.kernels.RBFKernel(batch_shape=torch.Size([2])))
>>> covar = covar_module(x)  # Output: LazyTensor of size (2 x 10 x 10)

### RQKernel¶

class gpytorch.kernels.RQKernel(alpha_constraint=None, **kwargs)[source]

Computes a covariance matrix based on the rational quadratic kernel between inputs $$\mathbf{x_1}$$ and $$\mathbf{x_2}$$:

$\begin{equation*} k_{\text{RQ}}(\mathbf{x_1}, \mathbf{x_2}) = \left(1 + \frac{1}{2\alpha} (\mathbf{x_1} - \mathbf{x_2})^\top \Theta^{-2} (\mathbf{x_1} - \mathbf{x_2}) \right)^{-\alpha} \end{equation*}$

where $$\Theta$$ is a lengthscale parameter, and $$\alpha$$ is the rational quadratic relative weighting parameter. See gpytorch.kernels.Kernel for descriptions of the lengthscale options.

Note

This kernel does not have an outputscale parameter. To add a scaling parameter, decorate this kernel with a gpytorch.kernels.ScaleKernel.

Args:
ard_num_dims (int, optional):
Set this if you want a separate lengthscale for each input dimension. It should be d if x1 is a n x d matrix. Default: None
batch_shape (torch.Size, optional):
Set this if you want a separate lengthscale for each batch of input data. It should be b if x1 is a b x n x d tensor. Default: torch.Size([]).
active_dims (tuple of ints, optional):
Set this if you want to compute the covariance of only a few input dimensions. The ints corresponds to the indices of the dimensions. Default: None.
lengthscale_prior (Prior, optional):
Set this if you want to apply a prior to the lengthscale parameter. Default: None.
lengthscale_constraint (Constraint, optional):
Set this if you want to apply a constraint to the lengthscale parameter. Default: Positive.
alpha_constraint (Constraint, optional):
Set this if you want to apply a constraint to the alpha parameter. Default: Positive.
eps (float):
The minimum value that the lengthscale can take (prevents divide by zero errors). Default: 1e-6.
Attributes:
lengthscale (Tensor):
The lengthscale parameter. Size/shape of parameter depends on the ard_num_dims and batch_shape arguments.
alpha (Tensor):
The rational quadratic relative weighting parameter. Size/shape of parameter depends on the batch_shape argument

### SpectralDeltaKernel¶

class gpytorch.kernels.SpectralDeltaKernel(num_dims, num_deltas=128, Z_constraint=None, batch_shape=torch.Size([]), **kwargs)[source]

A kernel that supports spectral learning for GPs, where the underlying spectral density is modeled as a mixture of delta distributions (e.g., with point masses). This has been explored e.g. in Lazaro-Gredilla et al., 2010.

Conceptually, this kernel is similar to random Fourier features as implemented in RFFKernel, but instead of sampling a Gaussian to determine the spectrum sites, they are treated as learnable parameters.

When using CG for inference, this kernel supports linear space and time (in N) for training and inference.

Parameters: num_dims (int) – Dimensionality of input data that this kernel will operate on. Note that if active_dims is used, this should be the length of the active dim set. num_deltas (int) – Number of point masses to learn.
initialize_from_data(train_x, train_y)[source]

Initialize the point masses for this kernel from the empirical spectrum of the data. To do this, we estimate the empirical spectrum’s CDF and then simply sample from it. This is analogous to how the SM kernel’s mixture is initialized, but we skip the last step of fitting a GMM to the samples and just use the samples directly.

### SpectralMixtureKernel¶

class gpytorch.kernels.SpectralMixtureKernel(num_mixtures: int = None, ard_num_dims: int = 1, batch_shape: torch.Size = torch.Size([]), mixture_scales_prior: gpytorch.priors.prior.Prior = None, mixture_scales_constraint: gpytorch.constraints.constraints.Interval = None, mixture_means_prior: gpytorch.priors.prior.Prior = None, mixture_means_constraint: gpytorch.constraints.constraints.Interval = None, mixture_weights_prior: gpytorch.priors.prior.Prior = None, mixture_weights_constraint: gpytorch.constraints.constraints.Interval = None, **kwargs)[source]

Computes a covariance matrix based on the Spectral Mixture Kernel between inputs $$\mathbf{x_1}$$ and $$\mathbf{x_2}$$.

It was proposed in Gaussian Process Kernels for Pattern Discovery and Extrapolation.

Note

Unlike other kernels,

Parameters: num_mixtures (int) – The number of components in the mixture. ard_num_dims (int) – Set this to match the dimensionality of the input. It should be d if x1 is a … x n x d matrix. (Default: 1.) batch_shape (torch.Size, optional) – Set this if the data is batch of input data. It should be b_1 x … x b_j if x1 is a b_1 x … x b_j x n x d tensor. (Default: torch.Size([]).) active_dims (float, optional) – Set this if you want to compute the covariance of only a few input dimensions. The ints corresponds to the indices of the dimensions. (Default: None.) eps (float, optional) – The minimum value that the lengthscale can take (prevents divide by zero errors). (Default: 1e-6.) mixture_scales_prior (Prior, optional) – A prior to set on the mixture_scales parameter mixture_scales_constraint (Interval, optional) – A constraint to set on the mixture_scales parameter mixture_means_prior (Prior, optional) – A prior to set on the mixture_means parameter mixture_means_constraint (Interval, optional) – A constraint to set on the mixture_means parameter mixture_weights_prior (Prior, optional) – A prior to set on the mixture_weights parameter mixture_weights_constraint (Interval, optional) – A constraint to set on the mixture_weights parameter mixture_scales (torch.Tensor) – The lengthscale parameter. Given k mixture components, and … x n x d data, this will be of size … x k x 1 x d. mixture_means (torch.Tensor) – The mixture mean parameters (… x k x 1 x d). mixture_weights (torch.Tensor) – The mixture weight parameters (… x k).

Example:

>>> # Non-batch
>>> x = torch.randn(10, 5)
>>> covar_module = gpytorch.kernels.SpectralMixtureKernel(num_mixtures=4, ard_num_dims=5)
>>> covar = covar_module(x)  # Output: LazyVariable of size (10 x 10)
>>>
>>> # Batch
>>> batch_x = torch.randn(2, 10, 5)
>>> covar_module = gpytorch.kernels.SpectralMixtureKernel(num_mixtures=4, batch_size=2, ard_num_dims=5)
>>> covar = covar_module(x)  # Output: LazyVariable of size (10 x 10)
initialize_from_data(train_x: torch.Tensor, train_y: torch.Tensor, **kwargs)[source]

Initialize mixture components based on batch statistics of the data. You should use this initialization routine if your observations are not evenly spaced.

Parameters: train_x (torch.Tensor) – Training inputs train_y (torch.Tensor) – Training outputs
initialize_from_data_empspect(train_x: torch.Tensor, train_y: torch.Tensor)[source]

Initialize mixture components based on the empirical spectrum of the data. This will often be better than the standard initialize_from_data method, but it assumes that your inputs are evenly spaced.

Parameters: train_x (torch.Tensor) – Training inputs train_y (torch.Tensor) – Training outputs

## Composition/Decoration Kernels¶

A Kernel that supports summing over multiple component kernels.

Example:
>>> covar_module = RBFKernel(active_dims=torch.tensor([1])) + RBFKernel(active_dims=torch.tensor([2]))
>>> x1 = torch.randn(50, 2)
is_stationary

Kernel is stationary if all components are stationary.

### MultiDeviceKernel¶

class gpytorch.kernels.MultiDeviceKernel(base_kernel, device_ids, output_device=None, create_cuda_context=True, **kwargs)[source]

Allocates the covariance matrix on distributed devices, e.g. multiple GPUs.

Args:
• base_kernel: Base kernel to distribute
• device_ids: list of torch.device objects to place kernel chunks on
• output_device: Device where outputs will be placed

A Kernel decorator for kernels with additive structure. If a kernel decomposes additively, then this module will be much more computationally efficient.

A kernel function k decomposes additively if it can be written as

$\begin{equation*} k(\mathbf{x_1}, \mathbf{x_2}) = k'(x_1^{(1)}, x_2^{(1)}) + \ldots + k'(x_1^{(d)}, x_2^{(d)}) \end{equation*}$

for some kernel $$k'$$ that operates on a subset of dimensions.

Given a b x n x d input, AdditiveStructureKernel computes d one-dimensional kernels (using the supplied base_kernel), and then adds the component kernels together. Unlike AdditiveKernel, AdditiveStructureKernel computes each of the additive terms in batch, making it very fast.

Args:
base_kernel (Kernel):
The kernel to approximate with KISS-GP
num_dims (int):
The dimension of the input data.
active_dims (tuple of ints, optional):
Passed down to the base_kernel.
is_stationary

Kernel is stationary if the base kernel is stationary.

### ProductKernel¶

class gpytorch.kernels.ProductKernel(*kernels)[source]

A Kernel that supports elementwise multiplying multiple component kernels together.

Example:
>>> covar_module = RBFKernel(active_dims=torch.tensor([1])) * RBFKernel(active_dims=torch.tensor([2]))
>>> x1 = torch.randn(50, 2)
>>> kernel_matrix = covar_module(x1) # The RBF Kernel already decomposes multiplicatively, so this is foolish!
is_stationary

Kernel is stationary if all components are stationary.

### ProductStructureKernel¶

class gpytorch.kernels.ProductStructureKernel(base_kernel, num_dims, active_dims=None)[source]

A Kernel decorator for kernels with product structure. If a kernel decomposes multiplicatively, then this module will be much more computationally efficient.

A kernel function k has product structure if it can be written as

$\begin{equation*} k(\mathbf{x_1}, \mathbf{x_2}) = k'(x_1^{(1)}, x_2^{(1)}) * \ldots * k'(x_1^{(d)}, x_2^{(d)}) \end{equation*}$

for some kernel $$k'$$ that operates on each dimension.

Given a b x n x d input, ProductStructureKernel computes d one-dimensional kernels (using the supplied base_kernel), and then multiplies the component kernels together. Unlike ProductKernel, ProductStructureKernel computes each of the product terms in batch, making it very fast.

See Product Kernel Interpolation for Scalable Gaussian Processes for more detail.

Args:
• base_kernel (Kernel):
The kernel to approximate with KISS-GP
• num_dims (int):
The dimension of the input data.
• active_dims (tuple of ints, optional):
Passed down to the base_kernel.
is_stationary

Kernel is stationary if the base kernel is stationary.

### ScaleKernel¶

class gpytorch.kernels.ScaleKernel(base_kernel, outputscale_prior=None, outputscale_constraint=None, **kwargs)[source]

Decorates an existing kernel object with an output scale, i.e.

$\begin{equation*} K_{\text{scaled}} = \theta_\text{scale} K_{\text{orig}} \end{equation*}$

where $$\theta_\text{scale}$$ is the outputscale parameter.

In batch-mode (i.e. when $$x_1$$ and $$x_2$$ are batches of input matrices), each batch of data can have its own outputscale parameter by setting the batch_shape keyword argument to the appropriate number of batches.

Note

The outputscale parameter is parameterized on a log scale to constrain it to be positive. You can set a prior on this parameter using the outputscale_prior argument.

Args:
base_kernel (Kernel):
The base kernel to be scaled.
batch_shape (int, optional):
Set this if you want a separate outputscale for each batch of input data. It should be b if x1 is a b x n x d tensor. Default: torch.Size([])
outputscale_prior (Prior, optional): Set this if you want to apply a prior to the outputscale
parameter. Default: None
outputscale_constraint (Constraint, optional): Set this if you want to apply a constraint to the
outputscale parameter. Default: Positive.
Attributes:
base_kernel (Kernel):
The kernel module to be scaled.
outputscale (Tensor):
The outputscale parameter. Size/shape of parameter depends on the batch_shape arguments.
Example:
>>> x = torch.randn(10, 5)
>>> base_covar_module = gpytorch.kernels.RBFKernel()
>>> scaled_covar_module = gpytorch.kernels.ScaleKernel(base_covar_module)
>>> covar = scaled_covar_module(x)  # Output: LazyTensor of size (10 x 10)
is_stationary

Kernel is stationary if base kernel is stationary.

## Specialty Kernels¶

### ArcKernel¶

class gpytorch.kernels.ArcKernel(base_kernel, delta_func: Optional = None, angle_prior: Optional[gpytorch.priors.prior.Prior] = None, radius_prior: Optional[gpytorch.priors.prior.Prior] = None, **kwargs)[source]

Computes a covariance matrix based on the Arc Kernel (https://arxiv.org/abs/1409.4011) between inputs $$\mathbf{x_1}$$ and $$\mathbf{x_2}$$. First it applies a cylindrical embedding:

$\begin{split}g_{i}(\mathbf{x}) = \begin{cases} [0, 0]^{T} & \delta_{i}(\mathbf{x}) = \text{false}\\ \omega_{i} \left[ \sin{\pi\rho_{i}\frac{x_{i}}{u_{i}-l_{i}}}, \cos{\pi\rho_{i}\frac{x_{i}}{u_{i}-l_{i}}} \right] & \text{otherwise} \end{cases}\end{split}$

where * $$\rho$$ is the angle parameter. * $$\omega$$ is a radius parameter.

then the kernel is built with the particular covariance function, e.g.

$$$k_{i}(\mathbf{x}, \mathbf{x'}) = \sigma^{2}\exp \left(-\frac{1}{2}d_{i}(\mathbf{x}, \mathbf{x^{'}}) \right)^{2}$$$

and the produt between dimensions

$$$k_{i}(\mathbf{x}, \mathbf{x'}) = \sigma^{2}\exp \left(-\frac{1}{2}d_{i}(\mathbf{x}, \mathbf{x^{'}}) \right)^{2}$$$

Note

This kernel does not have an outputscale parameter. To add a scaling parameter, decorate this kernel with a gpytorch.kernels.ScaleKernel. When using with an input of b x n x d dimensions, decorate this kernel with gpytorch.kernel.ProductStructuredKernel , setting the number of dims, num_dims to d.

Note

This kernel does not have an ARD lengthscale option.

Parameters: base_kernel (Kernel) – (Default gpytorch.kernels.MaternKernel(nu=2.5).) The euclidean covariance of choice. ard_num_dims (int, optional) – (Default None.) The number of dimensions to compute the kernel for. The kernel has two parameters which are individually defined for each dimension, defaults to None angle_prior (Prior, optional) – Set this if you want to apply a prior to the period angle parameter. radius_prior (Prior, optional) – Set this if you want to apply a prior to the lengthscale parameter. radius (torch.Tensor) – The radius parameter. Size = *batch_shape x 1. angle (torch.Tensor) – The period angle parameter. Size = *batch_shape x 1.
Example:
>>> x = torch.randn(10, 5)
>>> # Non-batch: Simple option
... base_kernel = gpytorch.kernels.MaternKernel(nu=2.5)
>>> covar_module = gpytorch.kernels.ProductStructureKernel(
gpytorch.kernels.ScaleKernel(
ArcKernel(base_kernel,
angle_prior=gpytorch.priors.GammaPrior(0.5,1),
ard_num_dims=x.shape[-1])),
num_dims=x.shape[-1])
>>> covar = covar_module(x)
>>> print(covar.shape)
>>> # Now with batch
>>> covar_module = gpytorch.kernels.ProductStructureKernel(
gpytorch.kernels.ScaleKernel(
ArcKernel(base_kernel,
angle_prior=gpytorch.priors.GammaPrior(0.5,1),
ard_num_dims=x.shape[-1])),
num_dims=x.shape[-1])
>>> covar = covar_module(x
>>> print(covar.shape)

### IndexKernel¶

class gpytorch.kernels.IndexKernel(num_tasks, rank=1, prior=None, var_constraint=None, **kwargs)[source]

A kernel for discrete indices. Kernel is defined by a lookup table.

$$$k(i, j) = \left(BB^\top + \text{diag}(\mathbf v) \right)_{i, j}$$$

where $$B$$ is a low-rank matrix, and $$\mathbf v$$ is a non-negative vector. These parameters are learned.

Args:
Total number of indices.
batch_shape (torch.Size, optional):
Set if the MultitaskKernel is operating on batches of data (and you want different parameters for each batch)
rank (int):
Rank of $$B$$ matrix.
prior (gpytorch.priors.Prior):
Prior for $$B$$ matrix.
var_constraint (Constraint, optional):
Constraint for added diagonal component. Default: Positive.
Attributes:
covar_factor:
The $$B$$ matrix.
raw_var:
The element-wise log of the $$\mathbf v$$ vector.

### LCMKernel¶

This kernel supports the LCM kernel. It allows the user to specify a list of base kernels to use, and individual MultitaskKernel objects are fit to each of them. The final kernel is the linear sum of the Kronecker product of all these base kernels with their respective MultitaskKernel objects.

The returned object is of type gpytorch.lazy.KroneckerProductLazyTensor.

num_outputs_per_input(x1, x2)[source]

Given n data points x1 and m datapoints x2, this multitask kernel returns an (n*num_tasks) x (m*num_tasks) covariance matrix.

Kernel supporting Kronecker style multitask Gaussian processes (where every data point is evaluated at every task) using gpytorch.kernels.IndexKernel as a basic multitask kernel.

Given a base covariance module to be used for the data, $$K_{XX}$$, this kernel computes a task kernel of specified size $$K_{TT}$$ and returns $$K = K_{TT} \otimes K_{XX}$$. as an gpytorch.lazy.KroneckerProductLazyTensor.

Parameters: data_covar_module (Kernel) – Kernel to use as the data kernel. num_tasks (int) – Number of tasks rank (int) – (default 1) Rank of index kernel to use for task covariance matrix. task_covar_prior (Prior) – (default None) Prior to use for task kernel. See gpytorch.kernels.IndexKernel for details. kwargs (dict) – Additional arguments to pass to the kernel.
num_outputs_per_input(x1, x2)[source]

Given n data points x1 and m datapoints x2, this multitask kernel returns an (n*num_tasks) x (m*num_tasks) covariance matrix.

class gpytorch.kernels.RBFKernelGrad(ard_num_dims=None, batch_shape=torch.Size([]), active_dims=None, lengthscale_prior=None, lengthscale_constraint=None, eps=1e-06, **kwargs)[source]

Computes a covariance matrix of the RBF kernel that models the covariance between the values and partial derivatives for inputs $$\mathbf{x_1}$$ and $$\mathbf{x_2}$$.

See gpytorch.kernels.Kernel for descriptions of the lengthscale options.

Note

This kernel does not have an outputscale parameter. To add a scaling parameter, decorate this kernel with a gpytorch.kernels.ScaleKernel.

Args:
batch_shape (torch.Size, optional):
Set this if you want a separate lengthscale for each
batch of input data. It should be b if x1 is a b x n x d tensor. Default: torch.Size([]).
active_dims (tuple of ints, optional):
Set this if you want to compute the covariance of only a few input dimensions. The ints corresponds to the indices of the dimensions. Default: None.
lengthscale_prior (Prior, optional):
Set this if you want to apply a prior to the lengthscale parameter. Default: None.
lengthscale_constraint (Constraint, optional):
Set this if you want to apply a constraint to the lengthscale parameter. Default: Positive.
eps (float):
The minimum value that the lengthscale can take (prevents divide by zero errors). Default: 1e-6.
Attributes:
lengthscale (Tensor):
The lengthscale parameter. Size/shape of parameter depends on the ard_num_dims and batch_shape arguments.
Example:
>>> x = torch.randn(10, 5)
>>> # Non-batch: Simple option
>>> covar = covar_module(x)  # Output: LazyTensor of size (60 x 60), where 60 = n * (d + 1)
>>>
>>> batch_x = torch.randn(2, 10, 5)
>>> # Batch: Simple option
>>> # Batch: different lengthscale for each batch
>>> covar = covar_module(x)  # Output: LazyTensor of size (2 x 60 x 60)

## Kernels for Scalable GP Regression Methods¶

### GridKernel¶

class gpytorch.kernels.GridKernel(base_kernel: gpytorch.kernels.kernel.Kernel, grid: List[torch.Tensor], interpolation_mode: bool = False, active_dims: bool = None)[source]

If the input data $$X$$ are regularly spaced on a grid, then GridKernel can dramatically speed up computatations for stationary kernel.

GridKernel exploits Toeplitz and Kronecker structure within the covariance matrix. See Fast kernel learning for multidimensional pattern extrapolation for more info.

Note

GridKernel can only wrap stationary kernels (such as RBF, Matern, Periodic, Spectral Mixture, etc.)

Args:
base_kernel (Kernel):
The kernel to speed up with grid methods.
grid (Tensor):
A g x d tensor where column i consists of the projections of the grid in dimension i.
active_dims (tuple of ints, optional):
Passed down to the base_kernel.
interpolation_mode (bool):
Used for GridInterpolationKernel where we want the covariance between points in the projections of the grid of each dimension. We do this by treating grid as d batches of g x 1 tensors by calling base_kernel(grid, grid) with last_dim_is_batch to get a d x g x g Tensor which we Kronecker product to get a g x g KroneckerProductLazyTensor.
register_buffer_list(base_name, tensors)[source]

Helper to register several buffers at once under a single base name

update_grid(grid)[source]

Supply a new grid if it ever changes.

### GridInterpolationKernel¶

class gpytorch.kernels.GridInterpolationKernel(base_kernel: gpytorch.kernels.kernel.Kernel, grid_size: Union[int, List[int]], num_dims: int = None, grid_bounds: Optional[Tuple[float, float]] = None, active_dims: Tuple[int, ...] = None)[source]

Implements the KISS-GP (or SKI) approximation for a given kernel. It was proposed in Kernel Interpolation for Scalable Structured Gaussian Processes, and offers extremely fast and accurate Kernel approximations for large datasets.

Given a base kernel k, the covariance $$k(\mathbf{x_1}, \mathbf{x_2})$$ is approximated by using a grid of regularly spaced inducing points:

$\begin{equation*} k(\mathbf{x_1}, \mathbf{x_2}) = \mathbf{w_{x_1}}^\top K_{U,U} \mathbf{w_{x_2}} \end{equation*}$

where

• $$U$$ is the set of gridded inducing points
• $$K_{U,U}$$ is the kernel matrix between the inducing points
• $$\mathbf{w_{x_1}}$$ and $$\mathbf{w_{x_2}}$$ are sparse vectors based on $$\mathbf{x_1}$$ and $$\mathbf{x_2}$$ that apply cubic interpolation.

The user should supply the size of the grid (using the grid_size attribute). To choose a reasonable grid value, we highly recommend using the gpytorch.utils.grid.choose_grid_size() helper function. The bounds of the grid will automatically be determined by data.

(Alternatively, you can hard-code bounds using the grid_bounds, which will speed up this kernel’s computations.)

Note

GridInterpolationKernel can only wrap stationary kernels (such as RBF, Matern, Periodic, Spectral Mixture, etc.)

Args:
• base_kernel (Kernel):
The kernel to approximate with KISS-GP
• grid_size (Union[int, List[int]]):
The size of the grid in each dimension. If a single int is provided, then every dimension will have the same grid size.
• num_dims (int):
The dimension of the input data. Required if grid_bounds=None
• grid_bounds (tuple(float, float), optional):
The bounds of the grid, if known (high performance mode). The length of the tuple must match the number of dimensions. The entries represent the min/max values for each dimension.
• active_dims (tuple of ints, optional):
Passed down to the base_kernel.

### InducingPointKernel¶

class gpytorch.kernels.InducingPointKernel(base_kernel, inducing_points, likelihood, active_dims=None)[source]

### RFFKernel¶

class gpytorch.kernels.RFFKernel(num_samples: int, num_dims: Optional[int] = None, **kwargs)[source]

Computes a covariance matrix based on Random Fourier Features with the RBFKernel.

Random Fourier features was originally proposed in ‘Random Features for Large-Scale Kernel Machines’ by Rahimi and Recht (2008). Instead of the shifted cosine features from Rahimi and Recht (2008), we use the sine and cosine features which is a lower-variance estimator — see ‘On the Error of Random Fourier Features’ by Sutherland and Schneider (2015).

By Bochner’s theorem, any continuous kernel $$k$$ is positive definite if and only if it is the Fourier transform of a non-negative measure $$p(\omega)$$, i.e.

$$$k(x, x') = k(x - x') = \int p(\omega) e^{i(\omega^\top (x - x'))} d\omega.$$$

where $$p(\omega)$$ is a normalized probability measure if $$k(0)=1$$.

For the RBF kernel,

$$$k(\Delta) = \exp{(-\frac{\Delta^2}{2\sigma^2})} and p(\omega) = \exp{(-\frac{\sigma^2\omega^2}{2})}$$$

where $$\Delta = x - x'$$.

Given datapoint $$x\in \mathbb{R}^d$$, we can construct its random Fourier features $$z(x) \in \mathbb{R}^{2D}$$ by

$\begin{split}$$z(x) = \sqrt{\frac{1}{D}} \begin{bmatrix} \cos(\omega_1^\top x)\\ \sin(\omega_1^\top x)\\ \cdots \\ \cos(\omega_D^\top x)\\ \sin(\omega_D^\top x) \end{bmatrix}, \omega_1, \ldots, \omega_D \sim p(\omega)$$\end{split}$

such that we have an unbiased Monte Carlo estimator

$$$k(x, x') = k(x - x') \approx z(x)^\top z(x') = \frac{1}{D}\sum_{i=1}^D \cos(\omega_i^\top (x - x')).$$$

Note

When this kernel is used in batch mode, the random frequencies are drawn independently across the batch dimension as well by default.

Parameters: num_samples (int) – Number of random frequencies to draw. This is $$D$$ in the above papers. This will produce $$D$$ sine features and $$D$$ cosine features for a total of $$2D$$ random Fourier features. num_dims (int, optional) – (Default None.) Dimensionality of the data space. This is $$d$$ in the above papers. Note that if you want an independent lengthscale for each dimension, set ard_num_dims equal to num_dims. If unspecified, it will be inferred the first time forward is called. randn_weights (torch.Tensor) – The random frequencies that are drawn once and then fixed.

Example:

>>> # This will infer num_dims automatically
>>> kernel= gpytorch.kernels.RFFKernel(num_samples=5)
>>> x = torch.randn(10, 3)
>>> kxx = kernel(x, x).evaluate()
>>> print(kxx.randn_weights.size())
torch.Size([3, 5])