Steal This Letter
Sometimes the Pen is Mightier than the Query
It’s review time here at Day Job Central and it’s a reminder that I’m in a position where I find more and more of my time being spent inside of Outlook, Excel, and Microsoft Word more than I do with SQL Server Management Studio. Just today I had to put aside working on something fun, challenging, and potentially awesome in order to dig into the politics of being the Lead DBA. It’s okay though, because in the end I was able to inform an end user, raise the bar a bit for a software vendor’s support staff through an educational exercise, and was able to bank the work to use again the next time this issue arises. Oh, and I also am in a sharing mood so you benefit as well! In the end I was able to accomplish performance tuning with Outlook – not by fixing a poor query but rather by preventing 100s or 1000s of poor queries from being run.
Stop Spinning Your Wheels
I’m on a huge kick lately on trying to eliminate all repeatable tasks. This is a cornerstone of the presentation I’m developing for Dev Connections this Fall in Las Vegas titled “Performance Enhanced Laziness”. This morning I received an email from a coworker stating that users were complaining of slowness and the independent software vendor (ISV) that designed the system in question wants us to implement some new indexes to fix the issue – 84 new indexes to be precise.
After I gave it some careful thought I composed the following email back to my coworker, my fellow DBA, my architect, and most importantly the representative from the ISV that was the messenger of said solution to all the woes of the end users. I made sure it at no point mentioned the product because my intention will be to re-use this email the next time this issue arises somewhere else in the organization; because it will. That is why I’m sharing it with you now. With the exception of the wait stats information from question 5 – those will differ in everyone’s environment – and the possible exception of question 6 – where I state that migrating to Enterprise Edition from Standard Edition could help if periodic performance issues at specific times of day are contributing to performance concerns you should be able to arm yourself with this email when your customers come to you with a request to add indexes to fix your performance issues.
Please feel free to steal this letter and make it your own.
Good morning, I have a few questions that need to be addressed before you proceed further.
Adding new indexes have consequences:
- They add overhead to existing queries through the additional time it takes to insert, update, and delete from the indexes when adding/altering/deleting data in the associated tables.
- They add space to the database – quite often that space is significant.
- They add maintenance time for keeping index fragmentation in check and updating statistics for associated indexes.
- While solving issues for some slow queries they can invalidate existing execution plans and cause existing fast-performing queries run slow.
- It’s unwise to implement new indexes directly into production without first testing their impact in a non-prod environment for any of these reasons. Your issue is compounded because you also host your non-production environments on the same SQL Server as production – another issue to address separately at a later date.
I mentioned I had questions, please work with the vendor’s technical representatives to address them in detail:
- How were these indexes determined to be the resolution to your performance concerns? I suspect that they were collected by the Database Tuning Advisor or the Missing Index DMVs based upon the prolific occurrence of included columns in the suggested new indexes’ creation scripts they supplied.
- Included columns tend to satisfy only a small subset of queries – perhaps only a single query at the cost of invalidating or slowing performance on many or all other queries. What specific slow queries are you trying to speed up? The vendor can reduce their scope of tuning by looking at the performance metrics for the cached queries by looking at the sys.dm_exec_query_stats Dynamic Management View. I stress cached queries since this information is contained in a DMV and will work its way out of cache depending on RAM, activity, and other issues. DMV data also is transient – when the SQL service is restarted, or under some manual commands being run, that data is flushed from cache. Bulk adding of indexes to solve global slowness performance is not the right course of action. If you’re encountering global slowness you need to look at:
- Hardware resource deficiencies
- Network resource deficiencies
- Competing server activity for limited resources
- Are these slow “queries” ad-hoc queries or utilizing stored procedures? One of the slowest tasks on a SQL Server is compiling execution plans. A stored procedure is only going to have to compile an execution once (unless statistics are rebuilt or an associated index is rebuilt.)
- If this is as critical a system as it’s believed to be it should be:
- On its own SQL instance
- Production separated from non-prod instances of the application’s databases
- Has the vendor looked at the wait stats to determine a good place to start in terms of what the underlying issue with performance is? When I do so I see that SQL is waiting most-frequently on:
- CXPACKET – this is an indicator of waits associated with parallel queries. No, the solution is not to remove the ability to let queries go parallel, the solution is to see what the slowest thread in a parallel query is waiting on. The vendor would need to observe parallel queries when occurring and collect that info.
- OLEDB – this is an indicator that the driver the application is using is causing performance issues. This is nothing for the DBA to tune out – the issue with slowness resides outside of SQL Server. In versions of SQL Server newer than 2005 then the focus of OLEDB waits should be upon the use of linked servers on the SQL Server instance and the linked server API used by some of the Dynamic Management Views.
- PAGEIOLATCH_SH – this indicates a bottleneck with pulling data off disk and into the buffer pool. SQL does its work in memory and needs to pull data off disk to RAM in order to do that work. If a query needs to pull in 19 million records in order to satisfy a query that returns only three rows to a report/screen it will flood the IO channel, cause PAGEIOLATCH_SH waits, and slow everyone down as all the queries compete for buffer pool resources. This is nothing that will be solved primarily with a new index – this is solved by re-writing queries to be more-selective. You’d find better results in adding more RAM than adding more indexes.
- Does the performance wane at specific times of day? Perhaps overhead of index maintenance or backups are issue. Both cause consumption of resources for RAM, disk, and CPU and defined times of day. This instance is running on SQL Server Standard Edition which means that some features that can improve uptime during maintenance windows are not able to be used: specifically, online index rebuilds and backup compression.
To summarize: Resolving specific query performance issues should be first handled by reducing the overall impact of poorly-formed and performing queries which can be identified using the SQL Server Dynamic Management Objects. Fixing the query, rather than making overall changes affecting all queries is the best practice and best solution. Under no circumstances do I recommend implementing any of indexes in production until these questions are addressed by vendor support and reviewed extensively. If performance suffers at various points during the day then you should look at reducing impact of scheduled (necessary) maintenance and stopping unnecessary load as well as upgrading the instance to Enterprise Edition. Adding RAM and upgrading hardware would probably result in better performance gains immediately over adding/implementing new indexes. Migrating critical databases to their own SQL Server instance is also recommended.
I know there is much here to digest, but I think it’s well worth taking the time to do so. Please let me know if you have any questions.
I’ve updated the section on OLEDB waits to clarify pre/post SQL 2005 behavior based upon information that Adam Machanic gave me on Twitter. I didn’t want to mention his name in the letter should this letter be used and have his name tossed into a firestorm at Widgets, Inc. but he deserves credit. Credit should be considered given. Thanks Adam!
What are the details about the OLEDB waits pre and post 2005? Does Adam have a post about it? I couldn’t find one if he does.
Nice! And it’s recyclable to boot!
You don’t mention version, but just so everyone remembers….backup compression is available in the standard edition for 2008 R2. It is only available in the Enterprise Edition for 2008.
[…] Steal This Letter – I probably won’t but you might like to. Tim Ford (Blog|Twitter) on handling a tricky customer case. […]
I’ll be interested in hearing about the response to your email. I hope it does some good rather than antagonizing its recipients. Support people do tend to get defensive rather easily …