27 February 2024

How to Improve SQL Query Performance with Indexing?

Discover how smart indexing transforms SQL query performance, avoiding common mistakes and mastering best practices.

The cat sits on a book with coloured indices and bookmarks and looks up at the words 'Indexing in SQL'.

In my experience working with databases, I’ve found that proper use of indexes can significantly improve performance while improper use can lead to slower queries and wasted resources. Indeed, SQL professional literature often highlights indexing due to its considerable impact on the effectiveness of SQL queries.

Read this article to know:

  • How SQL index works
    • Index order
    • Indexes as constraints
    • Index limitations
  • Types of indexes
  • Index structures
  • The impact of indexing on query performance
  • Best practices for SQL indexing
  • Step-by-step guide to implementing indexing
    • How to create indexes in a SQL database
    • How to delete indexes in a SQL database
  • Final thoughts

How SQL Index Works

If you want to improve SQL query performance via indexing, the first step is understanding what it actually is and how it works within SQL. This knowledge will help to make more intelligent choices about the right way to use them.

Joel Murach, a renowned technical writer and editor, describes indexes as follows:

An index speeds up joins and searches by providing a way for a database management system to go directly to a row rather than having to search through all the rows until it finds the one you want [1].

A commonly used analogy to explain the concept of indexing is that of a book's index. In a book we use index to quickly find specific information without flipping through every page. Index in SQL aims the same purpose, but relates to databases and increases retrieval efficiency by reducing the number of disk accesses.

Just as the book author or publisher may choose to create an index of the important concepts and terms in the book, you can choose to create an index on a particular column of a database table [3].

By default MySQL create indexes for the primary keys, foreign keys, and unique keys of a table. Usually, that’s what is needed. In addition, you may want to create indexes for other columns that are used frequently in search conditions or joins. Other popular database management systems (DBMS) automatically create indexes only for primary and unique keys.

Index Order

Imagine you want to find a piece of information that is within a large database:

  1. To get this information out of the database the computer will look through every row until it finds it. If the data you are looking for is towards the very end, this query would take a long time to run.
  2. If the table was ordered, searching for a piece of information could happen a lot faster because we could skip looking for the data in certain rows (jump down to halfway through the data to see if the comes before or after that row; then half the remaining rows and make the same comparison).

Indexes allow us to create sorted lists without having to create all new sorted tables, which would take up a lot of storage space.

Some databases store the indexes in descending order and are optimised for reading them in that order. MySQL is smart enough to “traverse the index backwards” when necessary. It will execute the query from the example above very quickly.

Indexes as Constraints

In the real world, it’s rare to find dynamic data that just happens to be sorted (and stays sorted). Books are a special case; they tend to remain static. Because any database server needs to maintain a separate list of indexes’ values and keep them updated as your data changes, you really don’t want to index every column in a table. Indexes are a trade-off between space and time. You’re sacrificing:

  • some extra disk space,
  • a bit of CPU overhead on each INSERTUPDATE, and DELETE query to make most (if not all) your queries much faster.

Thus, it’s important to strike a balance between read optimisation and write performance based on the application needs.

Index Limitations

There are many instances when DBMS simply can’t use an index to satisfy a query. To understand and mitigate these limitations, let’s look at the main impediments to using an index:

  • Wildcard matches: Performing a wildcard search at the beginning of a string (e.g., %term) necessitates scanning every row, rendering a full-text index ineffective. Full-text indexes are designed to work with complete words, making them unsuitable for leading wildcard searches.
  • Regular expressions: Regular expression searches face similar challenges. The optimiser in many DBMSs, including MySQL, often cannot optimise queries based on regular expressions, leading to potentially inefficient full-table scans.
  • Poor statistics or corruption: Corrupted or inaccurate internal index statistics, possibly due to a crash or unexpected shutdown, can lead to erratic behaviour. If the statistics are misleading, a DBMS might inconsistently use or ignore an index, affecting query performance.
  • Too many matching rows: When a query matches a large proportion of the rows in a table, using an index might be less efficient than a full-table scan, leading to slower performance.

Note: While the limitation examples are specific to MySQL, similar principles apply to other DBMSs, though the details and available features may vary.

Types of Indexes

Types of indexes refer to the logical application of indexes based on the database schema, query patterns, and specific requirements of data access. There are several types of indexes tailored for different kinds of queries and data structures. Here's an overview of some of the most common types of SQL indexes:

  • Primary key index: Automatically created when a primary key is defined on a table.
  • Unique index: Automatically created when a unique key is defined on a table. A unique index ensures that two rows of a table do not have the same value in the indexed columns.
  • Foreign key index: Although not always explicitly created as indexes, indexing foreign keys can significantly improve the performance of join operations.
  • Partial (filtered) index: MySQL gives you a lot of control over how much space is used by the indexes. Instead of indexing the entire ‘last name’, you might index only the first 4 bytes:

ALTER TABLE phone_book ADD INDEX (last_name(4))

  • Single-column index: This is the simplest form of an index, where the index is created on a single column of a table. It is effective for queries that involve only that specific column.
  • Multicolumn (composite) index: Many relational database engines allow you to create indexes that are composed of multiple columns. Such indexes can improve the query speed if you often query all columns together in the WHERE clause or if a single column doesn’t have sufficient variety.

ALTER TABLE phone_book ADD INDEX (last_name, first_name)

  • Covering index: A regular index that provides all the data required for a query without having to access the actual table. When a query is executed, the database looks for the required data in the index tree, retrieves it, and returns the result. For an index to be considered a covering index, it must have all the data needed for a particular query: the columns being selected, the columns being filtered on, and the columns being used for sorting.
  • Clustered index: Determines the physical order of data in a table based on the index key. There can be only one clustered index per table because it defines the physical storage order of the data. In many systems, the primary key automatically becomes a clustered index unless specified otherwise.
  • Non-clustered index: A type of index where the physical order of the rows is not the same as the index order. The index contains pointers to the location of the data in the table, allowing multiple non-clustered indexes to exist on a single table.
  • Full-text index: A special type of index that can quickly retrieve the locations of every distinct word in a field. Designed for searching text content within a column, allowing for complex searches involving phrases, word proximity, and more.
  • Spatial index: Designed for indexing spatial data, such as geographical coordinates, shapes, or regions. Optimised for spatial queries, like finding all locations within a certain distance from a point, or determining if spatial objects overlap.

Index Structures

Index structures refer to the underlying data architectures used by databases to implement and manage indexes efficiently.

The most common index structures are:

  • B-Tree indexes
  • Hash indexes
  • R-Tree indexes
  • Bitmap indexes.
B-Tree indexes

The most common types of index. They are usually the default because of their unique combination of flexibility, size, and overall good performance. B-Trees maintain data in a sorted order, allowing for efficient searches, insertions, deletions, and range queries. Each node in a B-Tree contains a number of keys sorted in ascending order, and pointers to child nodes, dividing the data range into segments. B-Trees are balanced, meaning the path from the root to any leaf node is the same length, ensuring consistent performance.

Hash indexes

The second most popular indexes. They resemble a hash table rather than a tree. Hash indexes use a hash function to compute the location of data based on the key value. They provide very fast access for point queries (single value lookups) but are not efficient for range queries. They are also less flexible and less predictable than other indexes. Hash indexes work relatively well for most text and numeric data types. Because hash functions effectively reduce arbitrarily sized keys to a small hash value, they tend not to use as much space as many tree-based indexes.

R-Tree indexes

Used for spatial or N-dimensional data. They are quite popular in mapping and geoscience applications but work equally well in other situations in which records are often queried based on two axes or dimensions: length and width, height and weight, etc.

Bitmap indexes

Use a bitmap for each key value and a bit array to represent the presence or absence of a value in a row. Efficient for columns with a low cardinality (i.e., a small number of distinct values), such as gender, boolean flags, etc. Particularly effective for queries that involve multiple conditions combined using AND, OR, or NOT operations.

The Impact of Indexing on Query Performance

Indexes are particularly beneficial for queries that retrieve a small subset of data (high selectivity). For example, queries using a WHERE clause with conditions on indexed columns.

Indexes also improve the performance of JOIN operations and aggregate functions (like COUNT, SUM) by quickly locating the relevant rows in the joined tables.

Real-world example is coming soon. ⏳

Best Practices for SQL Indexing

  • Use indexes on columns used in WHERE clauses: Indexing columns used in WHERE clauses can significantly speed up query execution by allowing the database to quickly locate the rows that meet the query criteria.
  • Index JOIN columns: For queries that involve JOIN operations, indexing the columns used for joining tables can improve performance by reducing the time required to match rows between the tables.
  • Consider composite indexes for multi-column queries: When queries filter or sort on multiple columns, a composite index (an index on multiple columns) can be more effective than separate indexes on each column.
  • Keep indexes narrow: Use the narrowest columns possible in indexes, such as integers or short strings, because they consume less space and allow the database to search and sort more efficiently.
  • Be cautious with indexing frequently updated columns: Indexes can slow down data modification operations such as INSERT, UPDATE, and DELETE, because the index must be updated as well. Be mindful of indexing columns that are frequently updated.
  • Limit the number of indexes per table: While indexes can speed up query performance, having too many can degrade write performance and increase storage requirements. Evaluate the cost-benefit of each index.
  • Use partial indexes for large ables: If only a small subset of rows is queried frequently, consider creating a partial index that only includes those rows. This can save space and improve efficiency.
  • Use indexes for sorting and grouping: Indexes can also improve performance for queries that involve ORDER BY, GROUP BY, or DISTINCT operations, by reducing the need for the database to perform costly sort operations.
  • Balance between read and write performance: Consider the read-write ratio of your application. In read-heavy applications, more indexes can be beneficial, but in write-heavy applications, minimise indexing to avoid performance penalties.
  • Analyse query performance: Use query execution plans to understand how queries are executed and how indexes are used. This can help identify opportunities for indexing improvements.
  • Regularly review and optimise indexes: Over time, the way an application accesses data can change. Regularly review query performance and index usage to remove unused indexes and add new ones as needed.
  • Consider the database engine and version: Different database systems and versions have unique features and behaviours regarding indexing. Tailor your indexing strategy to the specific capabilities and limitations of your database system.

Step-by-Step Guide to Implementing Indexing

❗️Note: The exact syntax might vary slightly depending on the specific DBMS you're using, but the general process is quite similar across systems like MySQL, PostgreSQL, SQL Server, etc.

How to Create Indexes in a SQL Database

Step 1: Identify query patterns and the need for an index

Analyse your queries to determine which columns are frequently used in the WHERE clauses, JOIN conditions, or as part of an ORDER BY.

Step 2: Choose the right columns
Step 3: Choose the type of index
Step 4: Check existing indexes

Creating redundant indexes can waste storage and negatively impact write performance.

Step 5: Create the index

## Based on single column:
CREATE INDEX index_name ON table_name (column_name);

## Based on single n columns:
CREATE INDEX index_name ON table_name (column1, column2, ...);

Step 6: Verify the index creation

After creating the index, you can usually view all indexes on a table with a command specific to your DBMS, like:

SHOW INDEXES FROM table_name;

Step 7: Monitor performance

Monitor your system's performance to ensure the index is having the desired effect in query execution times and overall system performance. Periodically review and maintain your indexes. Unused or inefficient indexes should be removed to avoid unnecessary overhead.

How to Delete Indexes in a SQL Database

Step 1: Identify the index to be deleted

This could be because the index is no longer used, it's a duplicate, or it's not providing performance benefits. Use database-specific tools or queries to check index usage. For example, in SQL Server, you can use Dynamic Management Views (DMVs) to track index usage.

Step 2: Check dependencies

Ensure that the index is not involved in any database constraints or relationships, such as a foreign key constraint.

Step 3: Delete the index

DROP INDEX index_name ON table_name;

Step 4: Verify the deletion

SHOW INDEXES FROM table_name;

Step 5: Monitor performance

Deletion of an index can affect query performance, both positively (faster writes) and negatively (slower reads). Ensure that the removal of the index has not negatively impacted the performance of critical queries.

Final Thoughts

Improving SQL query performance through effective indexing is like fine-tuning a high-performance engine: it requires attention to detail, a deep understanding of the system, and a strategic approach. Indexing, when done right, can dramatically speed up data retrieval times, making applications more responsive and efficient.

Though, that while indexes are powerful tools, they're not a cure-all. They need to be used judiciously. Over-indexing can slow down write operations and consume additional storage resources. It's all about finding the right balance tailored to specific needs and workload patterns.

Therefore, keep experimenting and learning from each change that was made!

REFERENCES

  1. Joel Murach, Murach’s MySQL. Training & reference. First Edition (Fresno, CA: Mike Murach & Associates, Incorporated, 2012).
  2. Indexing, from Atlassian.
  3. Jeremy D. Zawodny, Derek J. Balling. High Performance MySQL (O'Reilly Media, Inc., 2004)

Thank you for reading!

Im Nadya, a data analyst based in Zurich, Switzerland.  Learn more about me

Picture of a girl holding a cat