{
"cells": [
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"tags": [
"remove-cell"
]
},
"outputs": [],
"source": [
"# Reference: https://jupyterbook.org/interactive/hiding.html\n",
"# Use {hide, remove}-{input, output, cell} tags to hiding content\n",
"\n",
"import numpy as np\n",
"import matplotlib.pyplot as plt\n",
"import pandas as pd\n",
"import seaborn as sns\n",
"%matplotlib inline\n",
"import ipywidgets as widgets\n",
"from ipywidgets import interact, interactive, fixed, interact_manual\n",
"from IPython.display import display\n",
"\n",
"sns.set()\n",
"sns.set_context('talk')\n",
"np.set_printoptions(threshold=20, precision=2, suppress=True)\n",
"pd.set_option('display.max_rows', 7)\n",
"pd.set_option('display.max_columns', 8)\n",
"pd.set_option('precision', 2)\n",
"# This option stops scientific notation for pandas\n",
"# pd.set_option('display.float_format', '{:.2f}'.format)\n",
"\n",
"def display_df(df, rows=pd.options.display.max_rows,\n",
" cols=pd.options.display.max_columns):\n",
" with pd.option_context('display.max_rows', rows,\n",
" 'display.max_columns', cols):\n",
" display(df)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Exercises"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"- In the {ref}`ch:pa_collocated` section, we used an approximation to find AQS and\n",
" PurpleAir sensors within 50 meters of each other. Geospatial data appears in\n",
" all kinds of domains and data scientists have a variety of tools for working\n",
" with this kind of data. One such tool is the `geopandas` package ([link][gpd]).\n",
" Use the `geopandas` package to create a map of the US with the AQS sites marked.\n",
" \n",
"[gpd]: https://geopandas.org/\n",
" "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"- Use a `geopandas` spatial join to find the closest PurpleAir sensor to each AQS sensor. "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"- Although our data cleaning process closely followed Barkjohn's, we had to\n",
" omit some steps for brevity. Read Section 3 (Quality assurance) of BarkJohn's\n",
" paper, and note down all the additional steps that the original analysis took\n",
" that we did not include in this chapter.\n",
" Which steps might be most important to include?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"- Barkjohn's paper also distinguishes AQS sensors by whether they are\n",
" FRM (Federal Reference Method) or FEM (Federal Equivalent Method).\n",
" Do some research of your own to answer: what's the difference between\n",
" these two types of sensors? Which type is more accurate, if any?\n",
" Why did Barkjohn decide to include both types of sensors in their analysis?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"- When we analyzed the PurpleAir data, we pointed out that PurpleAir sensors\n",
" apply two different types of corrections on the raw laser readings. One\n",
" correction is named CF1, and the other is named ATM.\n",
" Conduct your own EDA to find out how these two corrections differ in the data."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"- In {numref}`ch:pa_modeling`, we wrote Model 2 as:\n",
"\n",
" $$\n",
" \\begin{aligned}\n",
" f_{\\theta}(x_i) = \\text{PA}_i + \\theta\n",
" \\end{aligned}\n",
" $$\n",
" \n",
" Derive that $ \\hat{\\theta} = \\frac{1}{n} \\sum_i(\\text{AQS}_i - \\text{PA}_i) $\n",
" is the value for $ \\theta $ that minimizes the mean squared loss."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"- Consider the simple linear model without the intercept term. That is:\n",
"\n",
" $$\n",
" \\begin{aligned}\n",
" f_{\\theta}(x_i) = \\theta \\cdot \\text{PA}_i\n",
" \\end{aligned}\n",
" $$\n",
" \n",
" Derive $ \\hat{\\theta} $, the model parameter that minimizes the mean squared\n",
" loss. Then, fit this model on the data and compare the test set RMSE against\n",
" the other models. How does it compare?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"- (Needs background from Chapter {numref}`%s `.) For Model 3,\n",
" we fit a calibration model, then inverted it to find the prediction model.\n",
" Fit a prediction model directly, *without* fitting a calibration model.\n",
" You might be surprised to see that the RMSE of this model\n",
" is lower than using Model 3.\n",
" Why will the training set RMSE of the direct linear regression model\n",
" *always* be lower than inverting a calibration model?\n",
" Why might we prefer the calibration model anyway?"
]
}
],
"metadata": {
"celltoolbar": "Tags",
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.4"
}
},
"nbformat": 4,
"nbformat_minor": 4
}