Lecture

Contributors: Alicia Wang, Conner Swenberg

Lecture Slides

Integrating a Database

We have been using raw SQL as our main method of retrieving data from the database. It has been working well for our simpler applications, but there are a few reasons why we may not want to directly make SQL queries going forward.

Readability

SQL, while written with the intent of being user-friendly, can have some serious readability issues. Queries can get complicated when multiple tables and sub-queries are involved. Can you immediately tell what the following query does?

SELECT Country.name, T.language 
FROM country Country, countrylanguage CL2, 
    (SELECT tmp.language, code FROM country, 
        (SELECT CL.language, MAX(C.surfacearea) 
        FROM country C, countrylanguage CL 
        WHERE C.code=CL.countrycode 
        GROUP BY CL.language) as tmp 
    WHERE (surfacearea = tmp.max)) T 
    WHERE T.code = Country.code 
        AND T.language = CL2.language 
        AND Country.code = CL2.countrycode;

The nested sub-queries, conditions, and renaming of temporary tables in complex queries are difficult for a programmer to easily read.

Runtime Errors

Our server application is written in Python but we are using a supporting Python package to execute raw SQL commands. If the database management system raises an exception (e.g. attempting to insert a NULL value into a NOT NULL column), our application won’t automatically handle the error and the server may crash

Vulnerabilities

One of the biggest drawbacks to using raw SQL in our codebase is possibility of vulnerabilities to malicious attacks. One such vulnerability is called a SQL injection, where a specially crafted input is used to trick SQL semantics and allow database access to an external source. The attacker may be able to query unauthorized data or corrupt or delete existing data. Suppose we have a login for an application. To use the application, a user needs to login with their e-mail and password.

When the user submits their login information to the client, the client will send the data over to the backend server to check if this login is valid. The server will want to see if this email-password combination exists in the database, and may query its User table like so:

SELECT * FROM User WHERE email = '$email' AND password = md5('$password');

Suppose the attacker submits the following login information:

email = xxx@xxx.xxx
password = xxx') OR 1 = 1 --]

These values are inserted into the placeholder values '$email' and '$password'. Then the query will look like this with the placeholders filled in:

SELECT * FROM User 
WHERE email = 'xxx@xxx.xxx' AND password = md5('xxx') OR 1 = 1 --]');

Now the query does something completely unintended. The password field is prematurely ended with xxx') and an additional WHERE condition is added: OR 1 = 1. The double hyphen -- designates a comment in SQL, so everything after it is ignored. When the WHERE conditions are evaluated by SQL, the query looks like this:

SELECT * FROM User WHERE FALSE AND FALSE OR TRUE;

The email and password conditions evaluates to FALSE as that email and password combination does not exist in the table. However, 1 = 1 will always evaluate to TRUE. Because of the additional OR clause, the entire WHERE condition will then evaluate to TRUE

SELECT * FROM User WHERE TRUE;

SELECT will return only rows where the WHERE condition evaluates to TRUE, so in this case, all rows will be returned. The attacker now has data from the entire User table. Uh oh!

Object Relational Mapping

To avoid the previously mentioned issues, we can use Object Relational Mappers (ORMs) to connect to our database. You may have noticed that the relational model we use for our databases are very similar to the object-oriented paradigm we learned in CS 2110. ORMs will allow us to query and manipulate data from a database using objects to model the data. In effect, they represent a “database of objects” that can be used in the language of our backend (in our case, Python). We can keep data consistent between our internal server and our database.

Pros of ORMs

Cons of ORMs

Readable code that resembles query reasoning

Increased amount of application code

Same-language database and application logic

Abstracted fundamentals of SQL

Under-the-hood connection handling

Slower query performance