{ "cells": [ { "cell_type": "code", "execution_count": 3, "metadata": { "tags": [ "remove-cell" ] }, "outputs": [], "source": [ "import sys\n", "import os\n", "if not any(path.endswith('textbook') for path in sys.path):\n", " sys.path.append(os.path.abspath('../../..'))\n", "from textbook_utils import *" ] }, { "cell_type": "markdown", "metadata": { "user_expressions": [] }, "source": [ "(ch:pa_eda)=\n", "# Exploring PurpleAir and AQS Measurements " ] }, { "cell_type": "markdown", "metadata": { "user_expressions": [] }, "source": [ "Let's explore the cleaned dataset of matched AQS and PurpleAir PM2.5 readings and look for insights that might help us in modeling.\n", "Our main interest is in the relationship between the two sources of air quality measurements. But we want to keep in mind the scope of the data, like how these data are situated in time and place. We learned from our data cleaning that we are working with daily averages of PM2.5 for a couple of years and that we have data from dozens of locations across the US." ] }, { "cell_type": "markdown", "metadata": { "user_expressions": [] }, "source": [ "First we review the entire cleaned dataframe:" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "tags": [ "hide-input" ] }, "outputs": [], "source": [ "csv_file = 'data/cleaned_purpleair_aqs/Full24hrdataset.csv'\n", "usecols = ['Date', 'ID', 'region', 'PM25FM', 'PM25cf1', 'TempC', 'RH', 'Dewpoint']\n", "full_df = (pd.read_csv(csv_file, usecols=usecols, parse_dates=['Date'])\n", " .dropna())\n", "full_df.columns = ['date', 'id', 'region', 'pm25aqs', 'pm25pa', 'temp', 'rh', 'dew']" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " | date | \n", "id | \n", "region | \n", "pm25aqs | \n", "pm25pa | \n", "temp | \n", "rh | \n", "dew | \n", "
---|---|---|---|---|---|---|---|---|
0 | \n", "2019-05-17 | \n", "AK1 | \n", "Alaska | \n", "6.7 | \n", "8.62 | \n", "18.03 | \n", "38.56 | \n", "3.63 | \n", "
1 | \n", "2019-05-18 | \n", "AK1 | \n", "Alaska | \n", "3.8 | \n", "3.49 | \n", "16.12 | \n", "49.40 | \n", "5.44 | \n", "
2 | \n", "2019-05-21 | \n", "AK1 | \n", "Alaska | \n", "4.0 | \n", "3.80 | \n", "19.90 | \n", "29.97 | \n", "1.73 | \n", "
... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
12427 | \n", "2019-02-20 | \n", "WI6 | \n", "North | \n", "15.6 | \n", "25.30 | \n", "1.71 | \n", "65.78 | \n", "-4.08 | \n", "
12428 | \n", "2019-03-04 | \n", "WI6 | \n", "North | \n", "14.0 | \n", "8.21 | \n", "-14.38 | \n", "48.21 | \n", "-23.02 | \n", "
12429 | \n", "2019-03-22 | \n", "WI6 | \n", "North | \n", "5.8 | \n", "9.44 | \n", "5.08 | \n", "52.20 | \n", "-4.02 | \n", "
12246 rows × 8 columns
\n", "