{ "cells": [ { "cell_type": "code", "execution_count": 2, "metadata": { "tags": [ "remove-cell" ] }, "outputs": [], "source": [ "import sys\n", "import os\n", "if not any(path.endswith('textbook') for path in sys.path):\n", " sys.path.append(os.path.abspath('../../..'))\n", "from textbook_utils import *" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "(sec:sql_transforming)=\n", "# Transforming and Common Table Expressions\n", "\n", "In this section, we show how to call functions to transform columns of data using built-in SQL functions. We also demonstrate how to use common table expressions to build up complex queries from simpler ones. As usual, we start by loading the database:" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "# Set up connection to database\n", "import sqlalchemy\n", "db = sqlalchemy.create_engine('sqlite:///babynames.db')" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "## SQL Functions\n", "\n", "SQLite provides a variety of _scalar functions_, or functions that transform single data\n", "values. When called on a column of data, SQLite will apply these functions on\n", "each value in the column. In contrast, aggregation functions like `SUM` and\n", "`COUNT` take a column of values as input and compute a single value as output.\n", "\n", "SQLite provides a comprehensive list of the built-in scalar functions in [its online documentation][funcs]. For instance, to find the number of characters in each name, we use the\n", "`LENGTH` function:\n", "\n", "[funcs]: https://www.sqlite.org/lang_corefunc.html" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " | Name | \n", "LENGTH(Name) | \n", "
---|---|---|
0 | \n", "Liam | \n", "4 | \n", "
1 | \n", "Noah | \n", "4 | \n", "
2 | \n", "Oliver | \n", "6 | \n", "
... | \n", "... | \n", "... | \n", "
7 | \n", "Lucas | \n", "5 | \n", "
8 | \n", "Henry | \n", "5 | \n", "
9 | \n", "Alexander | \n", "9 | \n", "
10 rows × 2 columns
\n", "\n", " | Name | \n", "LENGTH(Name) | \n", "AVG(Count) | \n", "
---|---|---|---|
0 | \n", "Liam | \n", "4 | \n", "174.47 | \n", "
\n", " | Name | \n", "SUBSTR(Name, 1, 1) | \n", "
---|---|---|
0 | \n", "Liam | \n", "L | \n", "
1 | \n", "Noah | \n", "N | \n", "
2 | \n", "Oliver | \n", "O | \n", "
... | \n", "... | \n", "... | \n", "
7 | \n", "Lucas | \n", "L | \n", "
8 | \n", "Henry | \n", "H | \n", "
9 | \n", "Alexander | \n", "A | \n", "
10 rows × 2 columns
\n", "\n", " | Name | \n", "Sex | \n", "Count | \n", "Year | \n", "Firsts | \n", "
---|---|---|---|---|---|
0 | \n", "Liam | \n", "M | \n", "19659 | \n", "2020 | \n", "L | \n", "
1 | \n", "Noah | \n", "M | \n", "18252 | \n", "2020 | \n", "N | \n", "
2 | \n", "Oliver | \n", "M | \n", "14147 | \n", "2020 | \n", "O | \n", "
... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
7 | \n", "Lucas | \n", "M | \n", "11281 | \n", "2020 | \n", "L | \n", "
8 | \n", "Henry | \n", "M | \n", "10705 | \n", "2020 | \n", "H | \n", "
9 | \n", "Alexander | \n", "M | \n", "10151 | \n", "2020 | \n", "A | \n", "
10 rows × 5 columns
\n", "\n", " | Name | \n", "Sex | \n", "Count | \n", "Year | \n", "Firsts | \n", "
---|---|---|---|---|---|
0 | \n", "Liam | \n", "M | \n", "19659 | \n", "2020 | \n", "L | \n", "
1 | \n", "Noah | \n", "M | \n", "18252 | \n", "2020 | \n", "N | \n", "
2 | \n", "Oliver | \n", "M | \n", "14147 | \n", "2020 | \n", "O | \n", "
... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
7 | \n", "Lucas | \n", "M | \n", "11281 | \n", "2020 | \n", "L | \n", "
8 | \n", "Henry | \n", "M | \n", "10705 | \n", "2020 | \n", "H | \n", "
9 | \n", "Alexander | \n", "M | \n", "10151 | \n", "2020 | \n", "A | \n", "
10 rows × 5 columns
\n", "\n", " | Firsts | \n", "Year | \n", "Count | \n", "
---|---|---|---|
0 | \n", "L | \n", "1880 | \n", "12799 | \n", "
1 | \n", "L | \n", "1881 | \n", "12770 | \n", "
2 | \n", "L | \n", "1882 | \n", "14923 | \n", "
... | \n", "... | \n", "... | \n", "... | \n", "
138 | \n", "L | \n", "2018 | \n", "246251 | \n", "
139 | \n", "L | \n", "2019 | \n", "249315 | \n", "
140 | \n", "L | \n", "2020 | \n", "239760 | \n", "
141 rows × 3 columns
\n", "