I am the Mayor of Indexes. OK, I am definitely not, but I wanted to get the bluster out of the way first. I’m simply a DBA who enjoys writing and has you as a captive audience until you close this tab on your browser or hit BACK and select a better hit from Google on your search of SQL INDEX “THINGS YOU SHOULD ALREADY KNOW“. No, I am only writing this post tonight, because:
I vowed to write a post every day for the next 2 weeks.
I just finished my book chapter on Index Dynamic Management Object and feel like I just birthed a baby. I guess that makes this my non-clustered placenta yet I still feel like I had a little more in me.
I needed to look busy or else my wife would put me to work.
I wanted to contribute just a little bit tonight back to the community, though they should already know everything in this post.
So, here we go, what I think all of you should (and hopefully already) know about indexing when dealing with Microsoft SQL Server (in no particular order, because it’s the weekend; math and weekends are mutually-exclusive.)
1. Don’t confuse the default scheduling of indexing tasks in Maintenance Plans for a Microsoft Best Practice.
I recently found myself in a sparring match with the SQL Server support expert at a leading supplier of turnkey healthcare applications over just this matter. I don’t use maintenance plans for any of my scheduled maintenance. Don’t get me wrong, I think they’re great for the Mom and Pop shops that either don’t have an Information Technology Department, or don’t have expertise in SQL Server. It’s better to have something in place in regards to scheduled maintance over not having anything at all. However, I schedule my indexing maintenance daily, against only those indexes that are 15% fragmented (these will be reorganized; any indexes fragmented beyond a 30% threshold will be rebuilt.) A performance issue arose and I was on the phone with the vendor since we’re unable to make schema changes (includes index alterations or additions) I was recommending without consultation. The SQL Expert (his company’s opinion, not mine) was aghast that I would even attempt to rebuild/reorganize indexes on a daiy basis.
In his words: “Microsoft recommends you only rebuild indexes once a week.”
I jumped on this, “where did you get this idea? I’ve never heard of such a thing”
“Well it’s right there when you create a Maintenance Plan. It defaults to once a week.
Then we hit a 10 minute long heated discussion on the fact that defaults <> Best Practices. In the end I never did get to make the indexing changes I wanted to make and he still thinks he’s right. Sometimes someone is so far away from the truth that you can’t lead them back to it. I’ve turned the database over to him and the vendor’s support team and I am simply responsible for backups, restores, and keeping the database up – performance tuning is totally in their court and I hope they enjoy the Hell they live in. Sad thing is that this is not the first time I’ve heard this from a supposed “Expert.”
2. Understand how to determine the fragmentation state of your indexes
Fragmentation happens as a by-product of user inserts, updates, and deletes from a database. There is a sure-fire cure for never having to deal with fragmentation; only publish read-only databases. So, understanding that is never an option what do you do? Dynamic Management Views and Functions have been around since the release of SQL Server 2005. There are a set of these Dynamic Management Objects that focus specifically on index characteristics, usage, and even indexes that don’t even exist. It’s like Microsoft has a laboratory full of Magicians! I’m getting ahead of myself there though, I’m am most interested here though on the DMF associated with physical statistics for the indexes. This DMF, sys.dm_db_index_physical_stats accepts parameters for the following values: database_id, object_id, index_id, partition_number, mode. Any of these parameters can be NULL; mode accepts any of the following values in order of least-to-most restrictive: DEFAULT, NULL, LIMITED, SAMPLED and DETAILED. DEFAULT or NULL is the same as explicitly stating LIMITED. I suggest the following T/SQL statement as good first step with becoming associated with sys.dm_db_index_physical_stats:
Example: Return listing of all indexes on current database where fragmentation is 15% or greater, index page count is greater than 50 pages; ignore any HEAPS.
SI.[name] AS index_name,
OBJECT_SCHEMA_NAME(SDDIPS.[object_id]) + ‘.’ + OBJECT_NAME(SDDIPS.[object_id]) AS [object_name],
FROM sys.[dm_db_index_physical_stats](DB_ID(), NULL, NULL, NULL, ‘Detailed’) SDDIPS
INNER JOIN sys.[indexes] SI ON SDDIPS.[object_id] = SI.[object_id]
AND SDDIPS.[index_id] = SI.[index_id]
WHERE SDDIPS.[avg_fragmentation_in_percent] > 15
AND SDDIPS.[page_count] > 2
AND SDDIPS.[index_id] > 0
ORDER BY OBJECT_SCHEMA_NAME(SDDIPS.[object_id]), OBJECT_NAME(SDDIPS.[object_id])
From here you could work out a plan to rebuild your heavily fragmented indexes and reorganize your lesser-fragmented ones.
3. Don’t index every column in a table.
This is almost as bad as not indexing any columns in a table. Indexes consume resources (space, cpu, memory) when writes occur to a table. If they serve no benefit from a read perspective then they are pure overhead. Granted this is an extreme case. No one would ever index all columns on a table without good reasons right? Seen it happen my friends. Many times. The reasoning is usually something along the lines of “well, if a user ever wants to search on any of the columns they’ll be all set. If you don’t index a field, then you can’t search on it.” Look into the sys.dm_db_index_usage_stats DMV for insight into which indexes are being written to, but not read from. Details will be in the up-coming book from Red Gate so I can’t divulge too much here. Oh, and if you have any developers in your company who insist on indexing anything they touch then take the “I” key from their keyboard and shove it up their <REDACTED>. They will only be able to CREATE CLUSTERED NDEX es, which is pretty damn harmless. They may have trouble sitting down for a few days as well, but that is just one of those side effects of having a vigilant Database Administrator on staff. Oh, and if they insist on calling it a field then a certain Mr. Celko may want to have a word with them.
4. Make darn sure the data type for your surrogate primary keys is correct
So you have a surrogate key on a banking transactions table (just sayin’) and one day you get a call that users can’t add records to what is probably the most important table in the database. You determine that there are 2,147,483,647 records in the table and that clicks a cobwebby little mass in the nether regions of the soft matter between your ears. “Oh ****, the field is an int data type!” So, you have a downed database and an instant need to propogate a data type change in your primary table and any referencing foreign key. Make sure your database is scalable. Granted this is stretching on the Indexing Best Practices thing, but so be it. I’ve been there and it is about as pleasant as 18th century dental surgery.
5. Understand the falacies
There are certain myths that have been propgated about indexes in Microsoft SQL Server none (n-o-n-e) of the following are true.
If you reorganize an clustered index all non-clustered indexes are also reorganized. Absolutely not true. There was a time, long ago in a SQL Server version far, far away.
Non-clustered indexes are automatically rebuilt when a clustered index is rebuilt on a table. This is ONLY true if you change the structure of the clustered index and then REBUILD WITH DROP EXISTING.
A Fill Factor of 0% is not the same as a Fill Factor of 100%. Fill Factor is the setting for how dense you wish to have your index pages populated. When an index is created or rebuilt it will be done so in a manner to populate the index page at that Fill Factor percentage. When an index page fills, a page split occurs, which adds to fragmentation. A fill factor of either 0% or 100% essentially instructs SQL Server to pack the index pages almost completely full (yes, there is a tad bit of free space). You can more or less be assured that a page split will occur upon the first insert into the index.
You can’t fix fragmentation on a HEAP. In theory this is true, but there is always a solution. Simply create a clustered index on the column(s) you wish to physically sort the table. Once done you can drop the index after the fact. I strongly suggest that you consider creating a clustered index on your HEAPS however. If you feel that you need to go through this process, you probably really would benefit from a permanent index.
Okay, there are so many more items that can be on this list and should. I’d like to hear what you would add to this list though. Have at it!