Open on DataHub
# Clear previously defined variables
%reset -f

# Set directory for data loading to work properly
import os

Web Technologies

Before the Internet, data scientists had to physically move hard disk drives to share data with others. Now, we can freely retrieve datasets from computers across the world.

Although we use the Internet to download and share data files, the web pages on the Internet themselves contain huge amounts of information as text, images, and videos. By learning web technologies, we can use the Web as a data source. In this chapter, we introduce HTTP, the primary communication protocol for the Web, and XML/HTML, the primary document formats for web pages.