{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Hyperparameters in GPyTorch\n",
    "\n",
    "The purpose of this notebook is to explain how GP hyperparameters in GPyTorch work, how they are handled, what options are available for constraints and priors, and how things may differ from other packages.\n",
    "\n",
    "**Note:** This is a *basic* introduction to hyperparameters in GPyTorch. If you want to use GPyTorch hyperparameters with things like Pyro distributions, that will be covered in a less \"basic usage\" tutorial."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "# smoke_test (this makes sure this example notebook gets tested)\n",
    "\n",
    "import math\n",
    "import torch\n",
    "import gpytorch\n",
    "from matplotlib import pyplot as plt\n",
    "\n",
    "from IPython.display import Markdown, display\n",
    "def printmd(string):\n",
    "    display(Markdown(string))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Defining an example model\n",
    "\n",
    "In the next cell, we define our simple exact GP from the <a href=\"../01_Exact_GPs/Simple_GP_Regression.ipynb\">Simple GP Regression</a> tutorial. We'll be using this model to demonstrate certain aspects of hyperparameter creation."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "train_x = torch.linspace(0, 1, 100)\n",
    "train_y = torch.sin(train_x * (2 * math.pi)) + torch.randn(train_x.size()) * 0.2\n",
    "\n",
    "# We will use the simplest form of GP model, exact inference\n",
    "class ExactGPModel(gpytorch.models.ExactGP):\n",
    "    def __init__(self, train_x, train_y, likelihood):\n",
    "        super(ExactGPModel, self).__init__(train_x, train_y, likelihood)\n",
    "        self.mean_module = gpytorch.means.ConstantMean()\n",
    "        self.covar_module = gpytorch.kernels.ScaleKernel(gpytorch.kernels.RBFKernel())\n",
    "    \n",
    "    def forward(self, x):\n",
    "        mean_x = self.mean_module(x)\n",
    "        covar_x = self.covar_module(x)\n",
    "        return gpytorch.distributions.MultivariateNormal(mean_x, covar_x)\n",
    "\n",
    "# initialize likelihood and model\n",
    "likelihood = gpytorch.likelihoods.GaussianLikelihood()\n",
    "model = ExactGPModel(train_x, train_y, likelihood)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Viewing model hyperparameters\n",
    "\n",
    "Let's take a look at the model parameters. By \"parameters\", here I mean explicitly objects of type `torch.nn.Parameter` that will have gradients filled in by autograd. To access these, there are two ways of doing this in torch. One way is to use `model.state_dict()`, which we demonstrate the use of for saving models <a href=\"Saving_and_Loading_Models.ipynb\">here</a>.\n",
    "\n",
    "In the next cell we demonstrate another way to do this, by looping over the `model.named_parameters()` generator:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Parameter name: likelihood.noise_covar.raw_noise           value = 0.0\n",
      "Parameter name: mean_module.constant                       value = 0.0\n",
      "Parameter name: covar_module.raw_outputscale               value = 0.0\n",
      "Parameter name: covar_module.base_kernel.raw_lengthscale   value = 0.0\n"
     ]
    }
   ],
   "source": [
    "for param_name, param in model.named_parameters():\n",
    "    print(f'Parameter name: {param_name:42} value = {param.item()}')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Raw vs Actual Parameters\n",
    "\n",
    "The most important thing to note here is that the actual learned parameters of the model are things like `raw_noise`, `raw_outputscale`, `raw_lengthscale`, etc. The reason for this is that these parameters **must be positive**. This brings us to our next topic for parameters: constraints, and the difference between *raw* parameters and *actual* parameters.\n",
    "\n",
    "In order to enforce positiveness and other constraints for hyperparameters, GPyTorch has **raw** parameters (e.g., `model.covar_module.raw_outputscale`) that are transformed to actual values via some constraint. Let's take a look at the raw outputscale, its constraint, and the final value:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "raw_outputscale,  Parameter containing:\n",
      "tensor(0., requires_grad=True)\n",
      "\n",
      "raw_outputscale_constraint1 Positive()\n"
     ]
    },
    {
     "data": {
      "text/markdown": [
       "\n",
       "\n",
       "**Printing all model constraints...**\n"
      ],
      "text/plain": [
       "<IPython.core.display.Markdown object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Constraint name: likelihood.noise_covar.raw_noise_constraint             constraint = GreaterThan(1.000E-04)\n",
      "Constraint name: covar_module.raw_outputscale_constraint                 constraint = Positive()\n",
      "Constraint name: covar_module.base_kernel.raw_lengthscale_constraint     constraint = Positive()\n"
     ]
    },
    {
     "data": {
      "text/markdown": [
       "\n",
       "**Getting raw outputscale constraint from model...**"
      ],
      "text/plain": [
       "<IPython.core.display.Markdown object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Positive()\n"
     ]
    },
    {
     "data": {
      "text/markdown": [
       "\n",
       "**Getting raw outputscale constraint from model.covar_module...**"
      ],
      "text/plain": [
       "<IPython.core.display.Markdown object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Positive()\n"
     ]
    }
   ],
   "source": [
    "raw_outputscale = model.covar_module.raw_outputscale\n",
    "print('raw_outputscale, ', raw_outputscale)\n",
    "\n",
    "# Three ways of accessing the raw outputscale constraint\n",
    "print('\\nraw_outputscale_constraint1', model.covar_module.raw_outputscale_constraint)\n",
    "\n",
    "printmd('\\n\\n**Printing all model constraints...**\\n')\n",
    "for constraint_name, constraint in model.named_constraints():\n",
    "    print(f'Constraint name: {constraint_name:55} constraint = {constraint}')\n",
    "\n",
    "printmd('\\n**Getting raw outputscale constraint from model...**')\n",
    "print(model.constraint_for_parameter_name(\"covar_module.raw_outputscale\"))\n",
    "\n",
    "\n",
    "printmd('\\n**Getting raw outputscale constraint from model.covar_module...**')\n",
    "print(model.covar_module.constraint_for_parameter_name(\"raw_outputscale\"))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### How do constraints work?\n",
    "\n",
    "Constraints define `transform` and `inverse_transform` methods that turn raw parameters in to real ones. For a positive constraint, we expect the **transformed** values to always be positive. Let's see:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Transformed outputscale tensor(0.6931, grad_fn=<SoftplusBackward>)\n",
      "tensor(0., grad_fn=<LogBackward>)\n",
      "True\n",
      "Transform a bunch of negative tensors:  tensor([0.3133, 0.1269, 0.0486])\n"
     ]
    }
   ],
   "source": [
    "raw_outputscale = model.covar_module.raw_outputscale\n",
    "constraint = model.covar_module.raw_outputscale_constraint\n",
    "\n",
    "print('Transformed outputscale', constraint.transform(raw_outputscale))\n",
    "print(constraint.inverse_transform(constraint.transform(raw_outputscale)))\n",
    "print(torch.equal(constraint.inverse_transform(constraint.transform(raw_outputscale)), raw_outputscale))\n",
    "\n",
    "print('Transform a bunch of negative tensors: ', constraint.transform(torch.tensor([-1., -2., -3.])))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Convenience Getters/Setters for Transformed Values\n",
    "\n",
    "Because dealing with raw parameter values is annoying (e.g., we might know what a noise variance of 0.01 means, but maybe not a `raw_noise` of `-2.791`), virtually all built in GPyTorch modules that define raw parameters define convenience getters and setters for dealing with transformed values directly.\n",
    "\n",
    "In the next cells, we demonstrate the \"inconvenient way\" and the \"convenient\" way of getting and setting the outputscale."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Actual outputscale: 0.6931471824645996\n",
      "Actual outputscale after setting: 2.0\n"
     ]
    }
   ],
   "source": [
    "# Recreate model to reset outputscale\n",
    "model = ExactGPModel(train_x, train_y, likelihood)\n",
    "\n",
    "# Inconvenient way of getting true outputscale\n",
    "raw_outputscale = model.covar_module.raw_outputscale\n",
    "constraint = model.covar_module.raw_outputscale_constraint\n",
    "outputscale = constraint.transform(raw_outputscale)\n",
    "print(f'Actual outputscale: {outputscale.item()}')\n",
    "\n",
    "# Inconvenient way of setting true outputscale\n",
    "model.covar_module.raw_outputscale.data.fill_(constraint.inverse_transform(torch.tensor(2.)))\n",
    "raw_outputscale = model.covar_module.raw_outputscale\n",
    "outputscale = constraint.transform(raw_outputscale)\n",
    "print(f'Actual outputscale after setting: {outputscale.item()}')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Ouch, that is ugly! Fortunately, there is a better way:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Actual outputscale: 0.6931471824645996\n",
      "Actual outputscale after setting: 2.0\n"
     ]
    }
   ],
   "source": [
    "# Recreate model to reset outputscale\n",
    "model = ExactGPModel(train_x, train_y, likelihood)\n",
    "\n",
    "# Convenient way of getting true outputscale\n",
    "print(f'Actual outputscale: {model.covar_module.outputscale}')\n",
    "\n",
    "# Convenient way of setting true outputscale\n",
    "model.covar_module.outputscale = 2.\n",
    "print(f'Actual outputscale after setting: {model.covar_module.outputscale}')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Changing Parameter Constraints\n",
    "\n",
    "If we look at the actual noise of the model, GPyTorch defines a default lower bound of `1e-4` for the noise variance:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Actual noise value: tensor([0.6932], grad_fn=<AddBackward0>)\n",
      "Noise constraint: GreaterThan(1.000E-04)\n"
     ]
    }
   ],
   "source": [
    "print(f'Actual noise value: {likelihood.noise}')\n",
    "print(f'Noise constraint: {likelihood.noise_covar.raw_noise_constraint}')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can change the noise constraint either on the fly or when the likelihood is created:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Noise constraint: GreaterThan(1.000E-03)\n",
      "Noise constraint: Positive()\n"
     ]
    }
   ],
   "source": [
    "likelihood = gpytorch.likelihoods.GaussianLikelihood(noise_constraint=gpytorch.constraints.GreaterThan(1e-3))\n",
    "print(f'Noise constraint: {likelihood.noise_covar.raw_noise_constraint}')\n",
    "\n",
    "## Changing the constraint after the module has been created\n",
    "likelihood.noise_covar.register_constraint(\"raw_noise\", gpytorch.constraints.Positive())\n",
    "print(f'Noise constraint: {likelihood.noise_covar.raw_noise_constraint}')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Priors\n",
    "\n",
    "In GPyTorch, priors are things you register to the model that act on any arbitrary function of any parameter. Like constraints, these can usually be defined either when you create an object (like a Kernel or Likelihood), or set afterwards on the fly.\n",
    "\n",
    "Here are some examples:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Registers a prior on the sqrt of the noise parameter \n",
    "# (e.g., a prior for the noise standard deviation instead of variance)\n",
    "likelihood.noise_covar.register_prior(\n",
    "    \"noise_std_prior\",\n",
    "    gpytorch.priors.NormalPrior(0, 1),\n",
    "    lambda module: module.noise.sqrt()\n",
    ")\n",
    "\n",
    "# Create a GaussianLikelihood with a normal prior for the noise\n",
    "likelihood = gpytorch.likelihoods.GaussianLikelihood(\n",
    "    noise_constraint=gpytorch.constraints.GreaterThan(1e-3),\n",
    "    noise_prior=gpytorch.priors.NormalPrior(0, 1)\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Putting it Together\n",
    "\n",
    "In the next cell, we augment our `ExactGP` definition to place several priors over hyperparameters and tighter constraints when creating the model."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [],
   "source": [
    "# We will use the simplest form of GP model, exact inference\n",
    "class FancyGPWithPriors(gpytorch.models.ExactGP):\n",
    "    def __init__(self, train_x, train_y, likelihood):\n",
    "        super(FancyGPWithPriors, self).__init__(train_x, train_y, likelihood)\n",
    "        self.mean_module = gpytorch.means.ConstantMean()\n",
    "        \n",
    "        lengthscale_prior = gpytorch.priors.GammaPrior(3.0, 6.0)\n",
    "        outputscale_prior = gpytorch.priors.GammaPrior(2.0, 0.15)\n",
    "        \n",
    "        self.covar_module = gpytorch.kernels.ScaleKernel(\n",
    "            gpytorch.kernels.RBFKernel(\n",
    "                lengthscale_prior=lengthscale_prior,\n",
    "            ),\n",
    "            outputscale_prior=outputscale_prior\n",
    "        )\n",
    "        \n",
    "        # Initialize lengthscale and outputscale to mean of priors\n",
    "        self.covar_module.base_kernel.lengthscale = lengthscale_prior.mean\n",
    "        self.covar_module.outputscale = outputscale_prior.mean\n",
    "    \n",
    "    def forward(self, x):\n",
    "        mean_x = self.mean_module(x)\n",
    "        covar_x = self.covar_module(x)\n",
    "        return gpytorch.distributions.MultivariateNormal(mean_x, covar_x)\n",
    "\n",
    "likelihood = gpytorch.likelihoods.GaussianLikelihood(\n",
    "    noise_constraint=gpytorch.constraints.GreaterThan(1e-2),\n",
    ")\n",
    "\n",
    "model = FancyGPWithPriors(train_x, train_y, likelihood)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Initializing hyperparameters in One Call\n",
    "\n",
    "For convenience, GPyTorch modules also define an `initialize` method that allow you to update a full dictionary of parameters on submodules. For example:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "1.0000001192092896 0.4999999701976776 2.0\n"
     ]
    }
   ],
   "source": [
    "hypers = {\n",
    "    'likelihood.noise_covar.noise': torch.tensor(1.),\n",
    "    'covar_module.base_kernel.lengthscale': torch.tensor(0.5),\n",
    "    'covar_module.outputscale': torch.tensor(2.),\n",
    "}\n",
    "\n",
    "model.initialize(**hypers)\n",
    "print(\n",
    "    model.likelihood.noise_covar.noise.item(),\n",
    "    model.covar_module.base_kernel.lengthscale.item(),\n",
    "    model.covar_module.outputscale.item()\n",
    ")"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.0"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}