Lecture
Contributors: Alicia Wang, Conner Swenberg
Last updated
Contributors: Alicia Wang, Conner Swenberg
Last updated
So far, we have discussed databases in the context of SQL and we use SQLite in our assignments. SQL databases are also known as relational databases! A key aspect of relational databases is that they have a table-based data structure that uses a strict, predefined schema. A powerful feature of relational databases is that we can join data from different tables, which allows for the representation of relationships between different data models.
Example
Suppose we have an application with users that can create posts. A database for this application may look like this:
If I have a user and post table, how do I keep track of who posted what? Given a post, we may want to know who the author of the post is. We can model this by having each post refer to the id of the user that authored it, like so:
The addition of this user-identifying column is the basis for the formation of a relationship between users and posts.
The relationship between users and posts in the previous example can be described as one-to-many. One user can author multiple posts, but each post can be written by only one person. There are many types of relationships between entities in the world, so there are many types of relationships we can describe with a relational database. There are three main types of relationships:
One-to-One
One-to-Many
Many-to-Many
A row x in table A is related to only row y in table B, and row y in table B is related only to row x. An example is Cornell students to NetIDs. Each student can only have one NetID, and each NetID can correspond only to one student. A one-to-one relationship is modeled with SQL by creating a foreign key with a unique constraint in table B that references the primary key of table A.
Example
In a Slack-like application, we want to keep track of people and their profiles. One person can only have one profile, and one profile can only belong to one person. A sample database may look like this:
Each profile row has its own primary key id, a status, and a foreign key person_id referencing the person that has that profile. In this example, the profile with id = 30 belongs to the person with id = 20, which is Conner.
Here are the SQL commands used to generate the Person and Profile tables:
Why is it not necessary that we have a column in the Person table specifying which profile a Person has? If we want to find the profile corresponding to Person.id = 10, we can instead make the following query:
The query will return (40, 'away', 10), which is exactly what we wanted. Thus, adding another column to Person will just result in redundant information.
A row in table A can be related to many rows in table B, but a row in table B can only be related to one row in table A.
The users and posts example is a one-to-many relationship. Like the one-to-one relationship, a one-to-many relation can by modeled by using a foreign key column in table B that references a primary key in table A. However, the values in the foreign key column don’t need to be unique, as we want many rows in table B to be related to one row in table A.
Example
In a Spotify-like application, we will want to keep track of albums and their songs. In this example, we’ll assume that one album can contain many songs, but one song can only belong to one album. Here is a sample database modeling this:
The album “IGOR” (with id = 105) has two songs: “EARFQUAKE” (id = 111) and “I THINK” (id = 112).
Here are the SQL commands used to generate the Album and Song tables:
Note that Song.album_id does not have a unique constraint - this allows for the “many” part of “one-to-many”. Again, we can select all the songs belonging to a specific album through a simple query:
This will return the songs “4:44” (id = 400) and “Moonlight” (id = 401) for the album “4:44” (id = 444).
Many rows in table A can be related to many rows in table B, and vice versa. More formally, row x in table A relates to many rows y1, y2, … in table B, and row y in table B relates to many rows x1, x2, … in table A.
An example would be posts and hashtags in a social media platform. A user’s post can contain many hashtags (#coding, #programmerhumor), and a hashtag can be associated with many posts (“How do I exit Vim?”, “I love CS 1998!”).
Example
Implementing a many-to-many relationship is less straightforward than the previous relationships. You may think of a few intuitive solutions, two of which are not ideal:
You can create a column in table A to store a list of IDs from table B and vice versa. However, deleting rows from one table can result in rows in the other table referencing non-existing rows.
You can create a new column for each new foreign key. This is space inefficient and may lead to many unnecessary NULL values in the added columns.
You can create an association table that relates the primary key of table A to the primary key of table B. This is the generally accepted solution that we will be using! Say that we want to model Canvas, a learning platform with students enrolled in multiple courses and courses containing multiple students. The database may look like:
To create a many-to-many relationship between rows in Student and rows in Course, we could create a new table with two columns: one referencing Student.id and one referencing Course.id:
This new table Join
, like its name implies, serves to join together students with courses. Each row in Join represents a relation between the student with Student.id = Join.student_id and the course with Course.id = Join.course_id. For example, the student with id = 1 is present in two rows in Join: one with course_id = 1110 and one with course_id = 1998. This tells us that student abc123 is enrolled in CS1110 and CS1998. On the other side, the course with id = 1110 is present in 3 rows in Join with its associated student_id being 1, 2, and 3 respectively. Then we know that the course CS1110 has students abc123, def456, and ghi789 enrolled in it.
Here are the SQL commands used to generate Student, Join, and Course:
Note that Join has its own primary key id in addition to its foreign keys course_id and student_id.