5. Relational Databases and SQL

In Data 8, you worked with data stored in CSV files. However, CSV files are inconvenient in many real-world scenarios. Data scientists commonly work on a team to analyze a shared dataset. For instance, an astrophysics group might receive new telescope data on an hourly basis. Instead of downloading a new CSV file every hour, data scientists prefer to use shared data storage that always reflects the most up-to-date data.

Database systems are software systems specifically designed for large-scale data storage and retrieval. Industry, academic research, and governments all rely on database systems. One common and useful type of database system is an relational database management system (RDBMS). These systems allow data scientists to use a query language called SQL to quickly retrieve and process large amounts of data at once. In this chapter, we introduce the relational database model and SQL.