{
"cells": [
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"tags": [
"remove-cell"
]
},
"outputs": [],
"source": [
"# Reference: https://jupyterbook.org/interactive/hiding.html\n",
"# Use {hide, remove}-{input, output, cell} tags to hiding content\n",
"\n",
"import numpy as np\n",
"import matplotlib.pyplot as plt\n",
"import pandas as pd\n",
"import seaborn as sns\n",
"%matplotlib inline\n",
"import ipywidgets as widgets\n",
"from ipywidgets import interact, interactive, fixed, interact_manual\n",
"from IPython.display import display\n",
"\n",
"sns.set()\n",
"sns.set_context('talk')\n",
"np.set_printoptions(threshold=20, precision=2, suppress=True)\n",
"pd.set_option('display.max_rows', 7)\n",
"pd.set_option('display.max_columns', 8)\n",
"pd.set_option('precision', 2)\n",
"# This option stops scientific notation for pandas\n",
"# pd.set_option('display.float_format', '{:.2f}'.format)\n",
"\n",
"def display_df(df, rows=pd.options.display.max_rows,\n",
" cols=pd.options.display.max_columns):\n",
" with pd.option_context('display.max_rows', rows,\n",
" 'display.max_columns', cols):\n",
" display(df)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Exercises\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"1. In cluster sampling, the population is divided into non-overlapping subgroups, which tend to be smaller than strata. The sampling method is to take a simple random sample of the clusters and include all of the units in a cluster in the sample. Use our urn analogy, to express cluster sampling. As a simple example, suppose our population of $7$ starship prototypes are placed into $4$ clusters as follows: $\\left(A,B\\right)~ \\left(C, D\\right)~ \\left(E, F\\right) ~ \\left(G\\right).$ Suppose we take a SRS of $2$ clusters.\n",
" 1. List all of the possible samples that might result. \n",
" 2. What is the chance that $A$ is in a sample?\n",
" 3. What is the chance that $A$, $C$ and $E$ are in the sample? \n",
"\n",
"Cluster sampling has a distinct advantage of making sample collection easier. For example, it is much easier to poll 100 homes of 2-4 people each than to poll 300 individuals. But, since people in a cluster tend to be similar to each other, we need to keep the sampling procedure in mind as we generalize from sample to population. \n",
"\n",
"2. Systematic sampling is another popular technique. To start, the population is ordered, and the first unit is selected at random from the first $k$ elements. Then, every $k^{th}$ unit after that is placed in the sample. As a simple example, suppose our population of $7$ prototypes is ordered alphabetically and we select one from the first three $A, B$ at random, and then every second element after that. \n",
" 1. List all of the possible samples that might result.\n",
" 2. What is the chand that $A$ is in the sample?\n",
" 3. What is the chance that $A$ and $B$ are in the sample? $A$ and $C$? \n",
"\n",
"Intercept surveys are when a popup window asks you to complete a brief questionnaire. If every $k^{th}$ visitor to a website is asked to complete a brief survey, then we have a systematice sample. Here the population consists of visits to the site, and the ordering for systematic sampling, is the order of the visits. It seems reasonable to imagine that this ordering wouldn't introduce a selection bias in the sampling process. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"celltoolbar": "Tags",
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.4"
}
},
"nbformat": 4,
"nbformat_minor": 4
}