{
"cells": [
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"tags": [
"remove-cell"
]
},
"outputs": [],
"source": [
"import sys\n",
"import os\n",
"if not any(path.endswith('textbook') for path in sys.path):\n",
" sys.path.append(os.path.abspath('../../..'))\n",
"from textbook_utils import *"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"(sec:linear_exercises)=\n",
"# Exercises"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"- When we fit models to the Opportunity data, we actually removed several\n",
" commuting zones: 34105, 34113, 34112, and 34106.\n",
" These commuting zones were *outliers* in the data, since they had abnormally\n",
" small AUMs for their corresponding predictor variables. \n",
" All four of these CZs are in the same state. Which one?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"- Even a few extreme outliers can harm our model's fit to the data.\n",
" To show this, fit another simple linear model that uses the\n",
" fraction with a ≤15 commute to predict AUM, but include the outlier\n",
" commuting zones (34105, 34113, 34112, and 34106) in the\n",
" training data. How does including the outliers change the model and\n",
" residual plot?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"- Let $ f_\\hat{\\theta}(X) $ be the $ n $-dimensional vector of a\n",
" linear model's predictions after fitting, and\n",
" let $ \\epsilon = y - f_\\hat{\\theta}(X) $ be the\n",
" $ n $-dimensional vector of the residuals. Prove that\n",
" $ f_\\hat{\\theta}(X) \\cdot \\epsilon = 0 $.\n",
" "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"- We derived that $ \\hat{\\theta} = (X^\\top X)^{-1} X^\\top y $ . Construct a\n",
" design matrix $ X $ where $ \\hat{\\theta} $ is undefined.\n",
" Hint: this is the same as finding a matrix $ X $ where $ (X^\\top X) $ is not\n",
" invertible.\n",
" What does this mean about $ \\hat{\\theta} $ ?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"- Create a design matrix that uses the nine predictor variables we discussed in\n",
" {numref}`Section %s ` and the one-hot encoded\n",
" US Census regions. This design matrix should have 13 columns total.\n",
" Then, fit a linear model to predict AUM using this design matrix.\n",
" How does this model perform on the test set compared to the model without\n",
" the US Census regions?\n"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.4"
}
},
"nbformat": 4,
"nbformat_minor": 4
}