{ "cells": [ { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "C9asOtmc0K6v" }, "source": [ "# HTTP\n", "\n", "HTTP (AKA **H**yper**T**ext **T**ransfer **P**rotocol) is a *request-response* protocol that allows one computer to talk to another over the Internet." ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "PfSmQf_a0K6w" }, "source": [ "## Requests and Responses\n", "\n", "The Internet allows computers to send text to one another, but does not impose any restrictions on what that text contains. HTTP defines a structure on the text communication between one computer (client) and another (server). In this protocol, a client submits a *request* to a server, a specially formatted text message. The server sends a text *response* back to the client.\n", "\n", "The command line tool `curl` gives us a simple way to send HTTP requests. In the output below, lines starting with `>` indicate the text sent in our request; the remaining lines are the server's response." ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "PfSmQf_a0K6w" }, "source": [ "```bash\n", "$ curl -v https://httpbin.org/html\n", "```" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "PfSmQf_a0K6w" }, "source": [ "```\n", "> GET /html HTTP/1.1\n", "> Host: httpbin.org\n", "> User-Agent: curl/7.55.1\n", "> Accept: */*\n", "> \n", "< HTTP/1.1 200 OK\n", "< Connection: keep-alive\n", "< Server: meinheld/0.6.1\n", "< Date: Wed, 11 Apr 2018 18:15:03 GMT\n", "< \n", "\n", " \n", "

Herman Melville - Moby-Dick

\n", "

\n", " Availing himself of the mild...\n", "

\n", " \n", "\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Running the `curl` command above causes the client's computer to construct a text message that looks like:\n", "\n", "```\n", "GET /html HTTP/1.1\n", "Host: httpbin.org\n", "User-Agent: curl/7.55.1\n", "Accept: */*\n", "{blank_line}\n", "```\n", "\n", "This message follows a specific format: it starts with `GET /html HTTP/1.1` which indicates that the message is an HTTP `GET` request to the `/html` page. Each of the three lines that follow form HTTP headers, optional information that `curl` sends to the server. The HTTP headers have the format `{name}: {value}`. Finally, the blank line at the end of the message tells the server that the message ends after three headers. Note that we've marked the blank line with `{blank_line}` in the snippet above; in the actual message `{blank_line}` is replaced with a blank line." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The client's computer then uses the Internet to send this message to the `https://httpbin.org` web server. The server processes the request, and sends the following response:\n", "\n", "```\n", "HTTP/1.1 200 OK\n", "Connection: keep-alive\n", "Server: meinheld/0.6.1\n", "Date: Wed, 11 Apr 2018 18:15:03 GMT\n", "{blank_line}\n", "```\n", "\n", "The first line of the response states that the request completed successfully. The following three lines form the HTTP response headers, optional information that the server sends back to the client. Finally, the blank line at the end of the message tells the client that the server has finished sending its response headers and will next send the response body:\n", "\n", "```\n", "\n", " \n", "

Herman Melville - Moby-Dick

\n", "

\n", " Availing himself of the mild...\n", "

\n", " \n", "\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This HTTP protocol is used in almost every application that interacts with the Internet. For example, visiting https://httpbin.org/html in your web browser makes the same basic HTTP request as the `curl` command above. Instead of displaying the response as plain text as we have above, your browser recognizes that the text is an HTML document and will display it accordingly.\n", "\n", "In practice, we will not write out full HTTP requests in text. Instead, we use tools like `curl` or Python libraries to construct requests for us." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## In Python" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "YkeIABvo8PqG" }, "source": [ "The Python **requests** library allows us to make HTTP requests in Python. The code below makes the same HTTP request as running `curl -v https://httpbin.org/html`." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "colab": { "autoexec": { "startup": false, "wait_interval": 0 }, "base_uri": "https://localhost:8080/", "height": 34 }, "colab_type": "code", "executionInfo": { "elapsed": 346, "status": "ok", "timestamp": 1523471119234, "user": { "displayName": "TIFFANY THERESA JANN", "photoUrl": "https://lh3.googleusercontent.com/a/default-user=s128", "userId": "107371321413131637774" }, "user_tz": 420 }, "id": "bWtSMNiJ8PUu", "outputId": "ad66ba9e-17d8-4a48-e09e-7434f9995974" }, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import requests\n", "\n", "url = \"https://httpbin.org/html\"\n", "response = requests.get(url)\n", "response" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "yw9leEpU0K62" }, "source": [ "### The Request" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "i8o7txd68Ln-" }, "source": [ "Let's take a closer look at the request we made. We can access the original request using `response` object; we display the request's HTTP headers below:" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "cellView": "code", "colab": { "autoexec": { "startup": false, "wait_interval": 0 }, "base_uri": "https://localhost:8080/", "height": 86 }, "colab_type": "code", "executionInfo": { "elapsed": 340, "status": "ok", "timestamp": 1523471831894, "user": { "displayName": "TIFFANY THERESA JANN", "photoUrl": "https://lh3.googleusercontent.com/a/default-user=s128", "userId": "107371321413131637774" }, "user_tz": 420 }, "id": "H7uZ1npO0K67", "outputId": "6380dea2-0764-4841-bc1c-06acd2539b4b" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "User-Agent: python-requests/2.12.4\n", "Accept-Encoding: gzip, deflate\n", "Accept: */*\n", "Connection: keep-alive\n" ] } ], "source": [ "request = response.request\n", "for key in request.headers: # The headers in the response are stored as a dictionary.\n", " print(f'{key}: {request.headers[key]}')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Every HTTP request has a type. In this case, we used a `GET` request which retrieves information from a server." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'GET'" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "request.method" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "lxI7P4jm0K7A" }, "source": [ "### The Response\n", "\n", "Let's examine the response we received from the server. First, we will print the response's HTTP headers." ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "colab": { "autoexec": { "startup": false, "wait_interval": 0 } }, "colab_type": "code", "id": "38aHtUBe0K7B", "outputId": "28e49080-9321-41ae-acfc-8537ca7fa43a" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Connection: keep-alive\n", "Server: gunicorn/19.7.1\n", "Date: Wed, 25 Apr 2018 18:32:51 GMT\n", "Content-Type: text/html; charset=utf-8\n", "Content-Length: 3741\n", "Access-Control-Allow-Origin: *\n", "Access-Control-Allow-Credentials: true\n", "X-Powered-By: Flask\n", "X-Processed-Time: 0\n", "Via: 1.1 vegur\n" ] } ], "source": [ "for key in response.headers:\n", " print(f'{key}: {response.headers[key]}')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "An HTTP response contains a status code, a special number that indicates whether the request succeeded or failed. The status code `200` indicates that the request succeeded." ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "colab": { "autoexec": { "startup": false, "wait_interval": 0 }, "base_uri": "https://localhost:8080/", "height": 34 }, "colab_type": "code", "executionInfo": { "elapsed": 306, "status": "ok", "timestamp": 1523471565520, "user": { "displayName": "TIFFANY THERESA JANN", "photoUrl": "https://lh3.googleusercontent.com/a/default-user=s128", "userId": "107371321413131637774" }, "user_tz": 420 }, "id": "vTFq3Tdm0K7H", "outputId": "ebd84dfe-04ff-4da2-e452-ce227b2f0dbc" }, "outputs": [ { "data": { "text/plain": [ "200" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "response.status_code" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Finally, we display the first 100 characters of the response's content (the entire response content is too long to display nicely here)." ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "colab": { "autoexec": { "startup": false, "wait_interval": 0 }, "base_uri": "https://localhost:8080/", "height": 54 }, "colab_type": "code", "executionInfo": { "elapsed": 336, "status": "ok", "timestamp": 1523471596644, "user": { "displayName": "TIFFANY THERESA JANN", "photoUrl": "https://lh3.googleusercontent.com/a/default-user=s128", "userId": "107371321413131637774" }, "user_tz": 420 }, "id": "NRTwNVJa-6wG", "outputId": "235943c7-8a4e-42fb-eb8b-d37907169432" }, "outputs": [ { "data": { "text/plain": [ "'\\n\\n \\n \\n \\n

Herman Melville - Moby-Dick

\\n\\n '" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "response.text[:100]" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "jWRTnKQ90K7V" }, "source": [ "## Types of Requests\n", "\n", "The request we made above was a `GET` HTTP request. There are multiple HTTP request types; the most important two are `GET` and `POST` requests." ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "QJ5v9MQ70K7W" }, "source": [ "### GET Requests\n", "\n", "The `GET` request is used to retrieve information from the server. Since your web browser makes `GET` request whenever you enter in a URL into its address bar, `GET` requests are the most common type of HTTP requests.\n", "\n", "`curl` uses `GET` requests by default, so running `curl https://www.google.com/` makes a `GET` request to `https://www.google.com/`." ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "CEmCrcQ10K7X" }, "source": [ "### POST Request\n", "\n", "The `POST` request is used to send information from the client to the server. For example, some web pages contain forms for the user to fill out—a login form, for example. After clicking the \"Submit\" button, most web browsers will make a `POST` request to send the form data to the server for processing." ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "CEmCrcQ10K7X" }, "source": [ "Let's look an example of a `POST` request that sends `'sam'` as the parameter `'name'`. This one can be done by running **`curl -d 'name=sam' https://httpbin.org/post`** on the command line. \n", "\n", "Notice that our request has a body this time (filled with the parameters of the `POST` request), and the content of the response is different from our `GET` response from before. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Like HTTP headers, the data sent in a `POST` request uses a key-value format. In Python, we can make a `POST` request by using `requests.post` and passing in a dictionary as an argument." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "post_response = requests.post(\"https://httpbin.org/post\",\n", " data={'name': 'sam'})\n", "post_response" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The server will respond with a status code to indicate whether the `POST` request successfully completed. In addition, the server will usually send a response body to display to the client." ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "200" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "post_response.status_code" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'{\\n \"args\": {}, \\n \"data\": \"\", \\n \"files\": {}, \\n \"form\": {\\n \"name\": \"sam\"\\n }, \\n \"headers\": {\\n \"Accept\": \"*/*\", \\n \"Accept-Encoding\": \"gzip, deflate\", \\n \"Connection\": \"close\", \\n \"Content-Length\": \"8\", \\n \"Content-Type\": \"application/x-www-form-urlencoded\", \\n \"Host\": \"httpbin.org\", \\n \"User-Agent\": \"python-requests/2.12.4\"\\n }, \\n \"json\": null, \\n \"origin\": \"136.152.143.72\", \\n \"url\": \"https://httpbin.org/post\"\\n}\\n'" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "post_response.text" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "0nkJHMtJ0K7b" }, "source": [ "## Types of Response Status Codes\n", "\n", "The previous HTTP responses had the HTTP status code `200`. This status code indicates that the request completed successfully. There are hundreds of other HTTP status codes. Thankfully, they are grouped into categories to make them easier to remember:\n", "\n", "- **100s** - Informational: More input is expected from client or server *(e.g. 100 Continue, 102 Processing)*\n", "- **200s** - Success: The client's request was successful *(e.g. 200 OK, 202 Accepted)*\n", "- **300s** - Redirection: Requested URL is located elsewhere; May need user's further action *(e.g. 300 Multiple Choices, 301 Moved Permanently)*\n", "- **400s** - Client Error: Client-side error *(e.g. 400 Bad Request, 403 Forbidden, 404 Not Found)*\n", "- **500s** - Server Error: Server-side error or server is incapable of performing the request *(e.g. 500 Internal Server Error, 503 Service Unavailable)*\n", "\n", "We can look at examples of some of these errors." ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "colab": { "autoexec": { "startup": false, "wait_interval": 0 } }, "colab_type": "code", "id": "5SANbo8w0K7c", "outputId": "a0bc9752-f9c0-44c9-8a14-758e38ca02d1" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n" ] } ], "source": [ "# This page doesn't exist, so we get a 404 page not found error\n", "url = \"https://www.youtube.com/404errorwow\"\n", "errorResponse = requests.get(url)\n", "print(errorResponse)" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "colab": { "autoexec": { "startup": false, "wait_interval": 0 } }, "colab_type": "code", "id": "mtlnDxXR0K7d", "outputId": "52c7897a-ad4d-4e91-aba9-f13e3ebdcc68" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n" ] } ], "source": [ "# This specific page results in a 500 server error\n", "url = \"https://httpstat.us/500\"\n", "serverResponse = requests.get(url)\n", "print(serverResponse)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Summary\n", "\n", "We have introduced the HTTP protocol, the basic communication method for applications that use the Web. Although the protocol specifies a specific text format, we typically turn to other tools to make HTTP requests for us, such as the command line tool `curl` and the Python library `requests`." ] } ], "metadata": { "colab": { "collapsed_sections": [], "default_view": {}, "name": "ajkim@berkeley.edu_task1.ipynb", "provenance": [], "version": "0.3.2", "views": {} }, "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.3" }, "toc": { "nav_menu": {}, "number_sections": false, "sideBar": false, "skip_h1_title": false, "title_cell": "Table of Contents", "title_sidebar": "Contents", "toc_cell": true, "toc_position": {}, "toc_section_display": true, "toc_window_display": false } }, "nbformat": 4, "nbformat_minor": 1 }