4.4. Exercises

  • Another loss function called the Huber loss combines both the absolute and squared loss to create a loss function that is both differentiable and robust to outliers. The Huber loss accomplishes this by behaving like the squared loss for \(\theta\) values close to the minimum and switching to the absolute loss for \(\theta\) values far from the minimum. Below is a formula for a simplified version of Huber loss:

\[\begin{split} \begin{aligned} l(\theta, y) &= \frac{1}{2} (y - \theta)^2 &\textrm{for}~ |y-\theta| \leq 2\\ &= 2(|y - \theta| - 1) &\textrm{otherwise.}\\ \end{aligned} \end{split}\]
  • Using the definition of Huber loss:

    • Write a function called mhe to compute the mean Huber error.

    • Plot the smooth mhe curve for the bus times and \(\theta\) ranging from -2 to 8.

    • Use trial and error to find the minimizing \(\hat \theta\) for the bus times.

    • Plot the smooth mhe for the five data points \([-2, 0, 1, 5, 10]\). Describe the curve.

    • For these five points, what is the minimizing \(\hat \theta\)?

    • What happens when the 10 is swapped for 100? Compare the minimizer to the mean and median of the five points.

  • For this exercise, follow the steps below to establish that MAE is minimized for the median.

    • Split the summation, \(\frac{1}{n} \sum_{i = 1}^{n}|y_i - \theta|\) into three for when \(y_i - \theta\) is negative, 0, and positive.

    • Set the middle term to 0 so that the equations are easier to work with. Use the fact that the derivative of the absolute value is -1 or +1 to differentiate the remaining two terms with respect to \(\theta\).

    • Set the derivative to 0 and simplify terms. Explain why when there are an odd number of points, the solution is the median.

    • Explain why when there are an even number of points, the minimizing \(\theta\) is not uniquely defined (just as with the median).