4.4. Exercises¶

• Another loss function called the Huber loss combines the absolute and squared loss to create a loss function that is both smooth and robust to outliers. The Huber loss accomplishes this by behaving like the squared loss for $$\theta$$ values close to the minimum and switching to absolute loss for $$\theta$$ values far from the minimum. Below is a formula for a simplified version of Huber loss. Use this definition of Huber loss to

• Write a function called mhe to compute the mean Huber error.

• Plot the smooth mhe curve for the bus times data where $$\theta$$ ranges from -2 to 8.

• Use trial and error to find the minimizing $$\hat \theta$$ for bus times.

\begin{split} \begin{aligned} l(\theta, y) &= \frac{1}{2} (y - \theta)^2 &\textrm{for}~ |y-\theta| \leq 2\\ &= 2(|y - \theta| - 1) &\textrm{otherwise.}\\ \end{aligned} \end{split}
• Continue with Huber loss and the function mhe in the previous problem:

• Plot the smooth mhe for the five data points $$[-2, 0, 1, 5, 10]$$.

• Describe the curve.

• For these five points, what is the minimizing $$\hat \theta$$?

• What happens when the data point 10 is swapped for 100? Compare the minimizer to the mean and median of the five points.

• Consider a loss function that has 0 loss for negative values of $$y$$ and quadratic loss for positive $$y$$.

• Write a function, called m0e that computes the average loss for this function.

• Plot the m0e curve for many $$\theta$$s given the data $$\mathbf{y} = [-2, 0, 1, 5, 10]$$

• Use trial and error to find the minimizing $$\hat \theta$$.

• Intuitively, what should the minimizing value be? What if we use linear loss instead?

• In this exercise, we again show that the mean minimizes the mean square error, but we will use calculus instead.

• Take the derivative of the average loss with respect to $$\theta$$.

• Set the derivative to 0 and solve for $$\hat{\theta}$$.

• To be thorough, take a second derivative to confirm that $$\bar{y}$$ is a minimizer. (Recall that if the second derivative is positive than the quadratic is concave.)

• Follow the steps below to establish that MAE is minimized for the median.

• Split the summation, $$\frac{1}{n} \sum_{i = 1}^{n}|y_i - \theta|$$ into three terms for when $$y_i - \theta$$ is negative, 0, and positive.

• Set the middle term to 0 so that the equations are easier to work with. Use the fact that the derivative of the absolute value is -1 or +1 to differentiate the remaining two terms with respect to $$\theta$$.

• Set the derivative to 0 and simplify terms. Explain why when there are an odd number of points, the solution is the median.

• Explain why when there are an even number of points, the minimizing $$\theta$$ is not uniquely defined (just as with the median).