Lecture
Contributors: Alicia Wang, Conner Swenberg
Last updated
Contributors: Alicia Wang, Conner Swenberg
Last updated
We have been using raw SQL as our main method of retrieving data from the database. It has been working well for our simpler applications, but there are a few reasons why we may not want to directly make SQL queries going forward.
SQL, while written with the intent of being user-friendly, can have some serious readability issues. Queries can get complicated when multiple tables and sub-queries are involved. Can you immediately tell what the following query does?
The nested sub-queries, conditions, and renaming of temporary tables in complex queries are difficult for a programmer to easily read.
Our server application is written in Python but we are using a supporting Python package to execute raw SQL commands. If the database management system raises an exception (e.g. attempting to insert a NULL value into a NOT NULL column), our application won’t automatically handle the error and the server may crash
One of the biggest drawbacks to using raw SQL in our codebase is possibility of vulnerabilities to malicious attacks. One such vulnerability is called a SQL injection, where a specially crafted input is used to trick SQL semantics and allow database access to an external source. The attacker may be able to query unauthorized data or corrupt or delete existing data. Suppose we have a login for an application. To use the application, a user needs to login with their e-mail and password.
When the user submits their login information to the client, the client will send the data over to the backend server to check if this login is valid. The server will want to see if this email-password combination exists in the database, and may query its User table like so:
Suppose the attacker submits the following login information:
email = xxx@xxx.xxx
password = xxx') OR 1 = 1 --]
These values are inserted into the placeholder values '$email' and '$password'. Then the query will look like this with the placeholders filled in:
Now the query does something completely unintended. The password field is prematurely ended with xxx') and an additional WHERE condition is added: OR 1 = 1. The double hyphen -- designates a comment in SQL, so everything after it is ignored. When the WHERE conditions are evaluated by SQL, the query looks like this:
The email and password conditions evaluates to FALSE as that email and password combination does not exist in the table. However, 1 = 1 will always evaluate to TRUE. Because of the additional OR clause, the entire WHERE condition will then evaluate to TRUE
SELECT will return only rows where the WHERE condition evaluates to TRUE, so in this case, all rows will be returned. The attacker now has data from the entire User table. Uh oh!
To avoid the previously mentioned issues, we can use Object Relational Mappers (ORMs) to connect to our database. You may have noticed that the relational model we use for our databases are very similar to the object-oriented paradigm we learned in CS 2110. ORMs will allow us to query and manipulate data from a database using objects to model the data. In effect, they represent a “database of objects” that can be used in the language of our backend (in our case, Python). We can keep data consistent between our internal server and our database.
Pros of ORMs
Cons of ORMs
Readable code that resembles query reasoning
Increased amount of application code
Same-language database and application logic
Abstracted fundamentals of SQL
Under-the-hood connection handling
Slower query performance