{
"cells": [
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"tags": [
"remove-cell"
]
},
"outputs": [],
"source": [
"import warnings\n",
"# Ignore numpy dtype warnings. These warnings are caused by an interaction\n",
"# between numpy and Cython and can be safely ignored.\n",
"# Reference: https://stackoverflow.com/a/40846742\n",
"warnings.filterwarnings(\"ignore\", message=\"numpy.dtype size changed\")\n",
"warnings.filterwarnings(\"ignore\", message=\"numpy.ufunc size changed\")\n",
"\n",
"import numpy as np\n",
"import matplotlib.pyplot as plt\n",
"import pandas as pd\n",
"import seaborn as sns\n",
"%matplotlib inline\n",
"import ipywidgets as widgets\n",
"from ipywidgets import interact, interactive, fixed, interact_manual\n",
"\n",
"sns.set()\n",
"sns.set_context('talk')\n",
"np.set_printoptions(threshold=20, precision=2, suppress=True)\n",
"pd.options.display.max_rows = 7\n",
"pd.options.display.max_columns = 8\n",
"pd.set_option('precision', 2)\n",
"# This option stops scientific notation for pandas\n",
"# pd.set_option('display.float_format', '{:.2f}'.format)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Exercises\n",
"\n",
"- Another loss function called the Huber loss combines the absolute and\n",
" squared loss to create a loss function that is both smooth and robust\n",
" to outliers. The Huber loss accomplishes this by behaving like the squared loss\n",
" for $\\theta$ values close to the minimum and switching to absolute loss for\n",
" $\\theta$ values far from the minimum. Below is a formula for a simplified\n",
" version of Huber loss. Use this definition of Huber loss to\n",
" - Write a function called `mhe` to compute the mean Huber error.\n",
" - Plot the smooth `mhe` curve for the bus times data where $\\theta$ ranges from -2\n",
" to 8.\n",
" - Use trial and error to find the minimizing $\\hat \\theta$ for bus times.\n",
"\n",
"$$\n",
"\\begin{aligned}\n",
"l(\\theta, y)\n",
"&= \\frac{1}{2} (y - \\theta)^2 &\\textrm{for}~ |y-\\theta| \\leq 2\\\\\n",
"&= 2(|y - \\theta| - 1) &\\textrm{otherwise.}\\\\\n",
"\\end{aligned}\n",
"$$\n",
"\n",
"- Continue with Huber loss and the function `mhe` in the previous problem:\n",
" - Plot the smooth `mhe` for the five data points $[-2, 0, 1, 5, 10]$.\n",
" - Describe the curve. \n",
" - For these five points, what is the minimizing $\\hat \\theta$? \n",
" - What happens when the data point 10 is swapped for 100? Compare the minimizer to the\n",
" mean and median of the five points."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"- Consider a loss function that has 0 loss for negative values of $y$ and quadratic loss for positive $y$. \n",
" - Write a function, called `m0e` that computes the average loss for this function.\n",
" - Plot the `m0e` curve for many $\\theta$s given the data $\\mathbf{y} = [-2, 0, 1, 5, 10]$\n",
" - Use trial and error to find the minimizing $\\hat \\theta$.\n",
" - Intuitively, what should the minimizing value be? What if we use linear loss instead? "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"- In this exercise, we again show that the mean minimizes the mean square error, but we will use calculus instead.\n",
" - Take the derivative of the average loss with respect to $\\theta$.\n",
" - Set the derivative to 0 and solve for $\\hat{\\theta}$.\n",
" - To be thorough, take a second derivative to confirm that $\\bar{y}$ is a minimizer. (Recall that if the second derivative is positive than the quadratic is concave.) "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"- Follow the steps below to establish that MAE is minimized for the median. \n",
" - Split the summation, $\\frac{1}{n} \\sum_{i = 1}^{n}|y_i - \\theta|$ into\n",
" three terms for when $y_i - \\theta$ is negative, 0, and positive. \n",
" - Set the middle term to 0 so that the equations are easier to work with.\n",
" Use the fact that the derivative of the absolute value is -1 or +1 to\n",
" differentiate the remaining two terms with respect to $\\theta$. \n",
" - Set the derivative to 0 and simplify terms. Explain why when there are an\n",
" odd number of points, the solution is the median.\n",
" - Explain why when there are an even number of points, the minimizing\n",
" $\\theta$ is not uniquely defined (just as with the median). "
]
}
],
"metadata": {
"celltoolbar": "Tags",
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.4"
},
"toc": {
"nav_menu": {},
"number_sections": false,
"sideBar": true,
"skip_h1_title": false,
"title_cell": "Table of Contents",
"title_sidebar": "Contents",
"toc_cell": true,
"toc_position": {},
"toc_section_display": true,
"toc_window_display": false
}
},
"nbformat": 4,
"nbformat_minor": 4
}