The Anatomy of SQL: Navigating the Layers of Complex Query Execution
In my daily work as a software engineer, I find myself engaging with relational databases on a regular basis. Whether it’s through tools like DBeaver or by making raw queries in the terminal, interacting with databases is a fundamental part of my routine.
Now, there was a particular instance when I faced the challenge of constructing a more complex query. This one involved three joins, filtering, ordering, and grouping of a substantial number of records. It marked the first time I delved deeper into the inner workings of SQL. 🧐
🤔 What I discovered is that queries aren’t these monolithic entities. Instead, they are broken down into smaller execution steps, each following a specific order to produce the desired result.
If this idea seems a bit confusing, don’t worry! Let’s go through it together using the example below.👇
From the user’s perspective, queries appear to initiate from the initial clause and conclude with the final clause. Nonetheless, during execution, queries don’t follow a linear top-to-bottom reading pattern.
The table and query setup 🍽️
Let’s go trough an example, where I query users and their expenses (I’ve used this query once in my MyPFinance project 🤓) 👇
We will be running the following query on these tables 👇
SELECT
users.username,
SUM(expenses.expense_amount) AS expense_amount
FROM
users
JOIN
expenses ON users.user_id = expenses.user_id
WHERE
users.registration_date >= '2022-09-25'
AND expenses.expense_status = 'PAID'
GROUP BY
users.username
HAVING
SUM(expenses.expense_amount) > 500
ORDER BY
expense_amount DESC
LIMIT 5 OFFSET 1;
Order of Query Execution (In steps 🐾)
FROM clauses
Firstly, SQL identifies which tables are being used in the query. In our case, those are users & expenses.
...
FROM
users
JOIN
expenses ...
JOIN clauses
❗️Many times, we execute queries that don’t have JOIN clauses in them, so if that’s the case this step of execution is skipped!
Next up, the query language looks for the condition/s on which the tables are joined. In ours case, users are joined to expenses based on the user_id field.
JOIN expenses ON users.user_id = expenses.user_id
WHERE clauses
After all tables and their respective fields are joined, the WHERE clause is ran in order to filter out only the records of interest.
WHERE
users.registration_date >= '2022-09-25'
AND expenses.expense = 'PAID'
In our case, those are all users that have registered after the 25th of September 2023 (My BDay 🎊) and we want to extract only their PAID expenses.
GROUP BY clause
Now that we have the list of users and expenses we’re interested in, we utilize the GROUP BY clause in order to group together users witht he same username.
GROUP BY users.username
HAVING clause
In our example, we use it to make sure that all of the groups we put together (via the GROUP BY clause) have total expenses exceeding 500$.
HAVING SUM(expenses.expense_amount) > 500
🚨 The HAVING clause is used when the WHERE clause cannot be used with the aggregate functions (groups of records)!
The HAVING clause is applied to the filtered data and not the single rows.
SELECT clause
Look at how the first step got all the way down here. 👀
After all data manipulation we did with the above clauses, it’s time for the SELECT one to shine and specify which columns to include into the final result!
SELECT
users.username,
SUM(expenses.expense_amount) AS expense_amount
What we want is only the username and the amount of PAID expenses 🤑this user did.
ORDER BY clause
Now, that we have the columns of interest as the final result, we can order them in descending order by expense_amount!
ORDER BY expense_amount DESC;
We use the expense_amount alias defined by the SELECT clause!
LIMIT/OFFSET clauses
Finally, if you were to use pagination and wanted to put some limit on the data you want returned by the query ⬇️
LIMIT 5 OFFSET 1;
This makes sure you skip the first user and provide the data only for the next 5 users.
The result of running the query 🏁
We got all users that have registered after the 25th of September 2022 (imagine there was a feature launch then) and successfully spent more than 500$ (on it)! 🤩
As you can see, SQL query execution isn’t as straight forward as some may’ve expected. At least, I didn’t at first. 😅 We actually first get all data from the tables and then we implement filtering, SELECT and all else.
✍️ Basically, the order is as follows:
FROM -> JOIN -> WHERE -> GROUP BY -> HAVING -> SELECT -> ORDER BY -> LIMIT/OFFSET
💡 Important to note is that, while the logical order of the clauses in a SQL query remains the same, the way these clauses are executed and optimized may differ between different DBMS (such as PostgreSQL, MySQL, MariaDB)!
💚 Wish you all a productive day and week! Give a clap 👏 to this article so that I can get a quick dopamine hit when I see that someone found this article helpful! 🍭