4.4. Exercises¶

• Another loss function called the Huber loss combines both the absolute and squared loss to create a loss function that is both differentiable and robust to outliers. The Huber loss accomplishes this by behaving like the squared loss for $$\theta$$ values close to the minimum and switching to the absolute loss for $$\theta$$ values far from the minimum. Below is a formula for a simplified version of Huber loss:

\begin{split} \begin{aligned} l(\theta, y) &= \frac{1}{2} (y - \theta)^2 &\textrm{for}~ |y-\theta| \leq 2\\ &= 2(|y - \theta| - 1) &\textrm{otherwise.}\\ \end{aligned} \end{split}
• Using the definition of Huber loss:

• Write a function called mhe to compute the mean Huber error.

• Plot the smooth mhe curve for the bus times and $$\theta$$ ranging from -2 to 8.

• Use trial and error to find the minimizing $$\hat \theta$$ for the bus times.

• Plot the smooth mhe for the five data points $$[-2, 0, 1, 5, 10]$$. Describe the curve.

• For these five points, what is the minimizing $$\hat \theta$$?

• What happens when the 10 is swapped for 100? Compare the minimizer to the mean and median of the five points.

• For this exercise, follow the steps below to establish that MAE is minimized for the median.

• Split the summation, $$\frac{1}{n} \sum_{i = 1}^{n}|y_i - \theta|$$ into three for when $$y_i - \theta$$ is negative, 0, and positive.

• Set the middle term to 0 so that the equations are easier to work with. Use the fact that the derivative of the absolute value is -1 or +1 to differentiate the remaining two terms with respect to $$\theta$$.

• Set the derivative to 0 and simplify terms. Explain why when there are an odd number of points, the solution is the median.

• Explain why when there are an even number of points, the minimizing $$\theta$$ is not uniquely defined (just as with the median).