A Guide to SQL Server 2005 indexes

This tutorial describes some ways to get the best from SQL Server by setting up good indexes on your database tables. It begins with a description of Clustered and Non-Clustered indexes, then goes on to explain what covering indexes are, and how these can improve query performance. Finally some general rules-of-thumb for creating indexes are given.

Introduction

Every database table has at least one index, and sometimes SQL Server makes a index for you, behind the scenes, without you even realising it. For example, if you create a primary key on a table, then SQL Server will create an index for this primary key, called something like PK_mytable. This index will be a clustered index on the primary key(s) column(s).

Description of Clustered indexes

The actual data that comprises a database table is always stored on disk in a certain order. The data is physically arranged on the disk in this order, even though indexes let you view the data in other orders. So, for example, you may have a table of people and view this table in date-of-birth order, or surname order, but the data on disk may well be stored in order of the person_id, if that is the Primary Key.

The order that the data is stored in is determined by the Clustered Index for the table, and there can only be one index that is clustered for any table. A clustered index has one big advantage over all other indexes: it is very efficient to retrieve data from a table by using it. So, in the example of the people table, it would be more efficient (i.e. faster) to select the records in order of the Primary Key than to get them in Date-of-birth order. This is because each record that needs to be accessed from disk is right next to the previous one, making the disk access more efficient.

Description of Non-Clustered Indexes

By contrast, Non-Clustered indexes are all the other indexes on a table, that do not dictate the order that records are stored on disk. The disadvantage of a non-clustered index is that it is slightly slower than a clustered index, but the advantage is that we can have many non-clustered indexes, as apposed to just one clustered index per table.

Getting down to the nitty-gritty of how SQL Server stores non-clustered indexes on disk, a non-clustered index stores all the fields that comprise the index, plus a link to the actual full record on disk. This link is the primary key (clustered index), so in fact a clustered index contains all the columns that you specify for them plus the clustered index columns too.

Covering Indexes and how they improve performance

A covering index is the term given to a non-clustered index that has been increased to include some extra fields in it that are not used for ordering, but are stored in the index anyway. The idea of setting up a covering index is so that all the fields required in a particular "select" query are "covered" by the index, i.e. all the fields requested in the select are found in the index. You create a covering index by adding certain columns as "included columns" or "non-key columns" in the index. These columns don't affect the order of the records, but do help performance.

To explain how this works and why it is a good thing, consider an index on my "invoice" table. This non-clustered index is composed of the invoice_date. If we wanted to perform this query:

select date, invoice_id, customer_id from invoice order by date

then this index would be a good one to choose. However, when SQL Server executes this query, it must do the following:

  1. Use this index to go through the records in the table in date order.
  2. For each record, retrieve the date from this index.
  3. Due to the rule stated above, the invoice_id will also be found in this index, because invoice_id is the primary index on this table.
  4. Use the invoice_id to lookup the actual invoice record and find the customer_id for the invoice.

Now imagine a second index that had date in it, and also had customer_id in it, even though this is not really important when sorting the records. This index would be a "covering" index for the above query, because it would contain invoice_date, customer_id, and also invoice_id (due to the rule above). When SQL Server wants to answer the query, this time it does this:

  1. Uses the index to go through the records in date order.
  2. For each record, it can find the date, invoice_id and customer_id from the index without needing to lookup the actual data row.

You can see that this covering index will be faster for this particular query.

All this is a slightly advanced performance-tuning topic for SQL Server, and you can survive perfectly well without it, but you may be able to speed up your queries if you add a few well chosen "included columns" here and there.

General rules of thumb for SQL Server indexes

Here are some general rules to use when making indexes on database tables:

  1. For main or "header" tables such as a table of invoices, make a clustered index on the Primary Key of the table.
  2. For secondary or "details" tables such as "invoice_row", make a clustered index on the foreign key that groups the child records together (which in this example is "invoice_id"). This is because the majority of queries on the invoice_row table will be made in the invoice_id order rather than the invoice_row_id order.
  3. For all tables, make a non-clustered index on each of the foreign keys of the table. Don't concern yourself with covering indexes at this point.
  4. Think about the selects queries that you will be performing on the table. What sort of "where" and "order by" statements will you be using? Make a non-clustered index on these columns.
  5. It is now time to start timing your typical queries and look for any slow ones. If you identify a particularly slow one, see if there is a way that you can add extra non-key columns to an index so that it becomes a covering index for that query.

Ok, that is a brief introduction to indexes in SQL Server, covering Clustered indexes, non-clustered indexes, covering indexes, and performance tuning.