# Exercises

# 4.4. Exercises¶

Another loss function called the Huber loss combines the absolute and squared loss to create a loss function that is both smooth and robust to outliers. The Huber loss accomplishes this by behaving like the squared loss for \(\theta\) values close to the minimum and switching to absolute loss for \(\theta\) values far from the minimum. Below is a formula for a simplified version of Huber loss. Use this definition of Huber loss to

Write a function called

`mhe`

to compute the mean Huber error.Plot the smooth

`mhe`

curve for the bus times data where \(\theta\) ranges from -2 to 8.Use trial and error to find the minimizing \(\hat \theta\) for bus times.

Continue with Huber loss and the function

`mhe`

in the previous problem:Plot the smooth

`mhe`

for the five data points \([-2, 0, 1, 5, 10]\).Describe the curve.

For these five points, what is the minimizing \(\hat \theta\)?

What happens when the data point 10 is swapped for 100? Compare the minimizer to the mean and median of the five points.

Consider a loss function that has 0 loss for negative values of \(y\) and quadratic loss for positive \(y\).

Write a function, called

`m0e`

that computes the average loss for this function.Plot the

`m0e`

curve for many \(\theta\)s given the data \(\mathbf{y} = [-2, 0, 1, 5, 10]\)Use trial and error to find the minimizing \(\hat \theta\).

Intuitively, what should the minimizing value be? What if we use linear loss instead?

In this exercise, we again show that the mean minimizes the mean square error, but we will use calculus instead.

Take the derivative of the average loss with respect to \(\theta\).

Set the derivative to 0 and solve for \(\hat{\theta}\).

To be thorough, take a second derivative to confirm that \(\bar{y}\) is a minimizer. (Recall that if the second derivative is positive than the quadratic is concave.)

Follow the steps below to establish that MAE is minimized for the median.

Split the summation, \(\frac{1}{n} \sum_{i = 1}^{n}|y_i - \theta|\) into three terms for when \(y_i - \theta\) is negative, 0, and positive.

Set the middle term to 0 so that the equations are easier to work with. Use the fact that the derivative of the absolute value is -1 or +1 to differentiate the remaining two terms with respect to \(\theta\).

Set the derivative to 0 and simplify terms. Explain why when there are an odd number of points, the solution is the median.

Explain why when there are an even number of points, the minimizing \(\theta\) is not uniquely defined (just as with the median).