The Power of Common Table Expressions (CTEs) in SQL
Discover the power of CTEs in SQL: streamline complex queries, enhance performance, and choose wisely between CTEs, temp tables, and subqueries for cleaner code.
Read this article to know:
- What is a Common Table Expression (CTE)?
- Why use CTEs?
- When to use CTEs?
- How to create a CTE?
- Syntax
- Example
- Performance considerations
- CTE vs. subqueries
- CTE vs. temporary table
- Final thoughts
What Is a Common Table Expression (CTE)?
A common table expression (CTE) is a named temporary result set that exists within the scope of a single statement and that can be referred to later within that statement, possibly multiple times.
You may think of a CTE as a table subquery. A table subquery, also sometimes referred to as derived table, is a query that is used as the starting point to build another query.
CTEs are a standard SQL feature and essentially temporary named result sets. While CTEs are often used like temporary result sets, they're not stored as temporary tables in the database. They're more akin to query-level views, existing only during the execution of the query they're part of.
CTEs are a versatile tool for various SQL operations, because they are of two kinds:
- Non-recursive for straightforward tasks; it is basically a query-local view,
- Recursive for handling hierarchical data, such as organisational structures or category trees.
SQL is generally poor at recursive structures. CTEs permit a query to reference itself. In fact, enabling recursion within SQL was the main inspiration for CTEs.
CTEs first appeared in the SQL standard in 1999 (ISO SQL:1999, also known as SQL3), and the first implementations began appearing in 2001. Various database systems started incorporating CTEs into their SQL implementations in the years following the standard's release.
Notably, MySQL as the second popular DBMS in the world, incorporated CTEs only in 2018 starting from version 8.0.
Why Use CTEs?
CTEs were intended to overcome some practical limitations of subqueries.
The advantages of using CTEs:
- Improved query readability and structure, which facilitates easier maintenance and understanding,
- Enabling the construction of complex queries in a more organised manner,
- Support recursive queries essential for hierarchical data,
- Can enhance performance in certain scenarios by eliminating the need for repeated subqueries or complex joins.
In general, CTEs serve as a powerful tool for data analysts and developers, streamlining SQL operations and contributing to cleaner, more efficient code.
When to Use CTEs?
CTE use cases:
- Multiple reference in the same statement to a subquery within a larger query, simplifying complex queries by breaking them down into manageable parts.
- Multiple join on the same data set within a single query.
- Facilitation of advanced data analysis tasks, such as calculating running totals, executing multiple aggregations, or performing recursive joins.
- A convenient, temporary alternative to creating views in the database, eliminating the need to store definitions in metadata.
How to Create a CTE?
- Initiate a CTE using
WITH
clause. - Provide a name for the result soon-to-be defined query (the reverse of how aliasing usually works in SQL).
- After assigning a name, follow with
AS
- Optional: Specify column names.
- Define the query to produce the desired result set and use parentheses around the derived table’s query itself.
- If it is needed to add more derived tables, we can add more as long as we separate each one with a comma, and repeat steps 2-5.
- Reference the above-defined CTE(s) in a subsequent query.
Syntax
Example
The following query answers the business question: “Who are the top 5 users on the platform by total payments in 2023 in each of the top 5 countries?”
Performance Considerations
CTEs themselves don't inherently enhance performance. They are recalculated each time they're referenced, which can affect performance. Therefore:
- When using CTEs, it's important to ensure they're used judiciously for readability and structure without impacting performance negatively.
- CTEs should not replace all subqueries, especially when performance is a priority. They're ideal for complex data hierarchies and cases requiring multiple subquery references.
- Testing and comparing execution plans with and without CTEs can help determine the most efficient approach for a specific scenario.
However, by structuring complex queries more efficiently and avoiding repeated subqueries, CTEs can indirectly contribute to performance optimisation by simplifying the execution plan.
CTE vs. Subqueries
Choosing between a CTE and a subquery often comes down to readability and the specific use case.
- CTEs can make complex queries more understandable and are particularly useful for recursive operations or when a temporary result set is referenced multiple times within a query.
- Subqueries, on the other hand, might perform better in simple cases since they are usually executed once.
CTE vs. Temporary Table
Choosing between CTEs and temporary tables depends on the specific needs of your query and database environment.
- CTEs are generally preferred for enhancing readability and structure in complex queries, especially when dealing with hierarchical data or when a temporary result is needed just for the duration of a query.
- Temporary tables might be better for performance-intensive operations, particularly when dealing with very large datasets, as they can be indexed and persisted across multiple queries.
Final Thoughts
When I was first introduced to the concept of CTE and started to use it, I was fascinated by CTE’s ability to simplify the development of sophisticated data transformations and aggregations in my data analysis tasks. CTE is a great alternative to break down complex queries, especially complex joins and subqueries, into well-structured and manageable parts.
However, it’s important not to forget about query performance, as CTEs can negatively affect it. The main aim, as usual, is to keep balance between readability and efficiency in SQL queries.
Thus, experimenting and comparing execution plans with and without CTEs help to determine the most efficient approach for a specific scenario.