Henry Lau
Microsoft Corporation
October 1998
Summary: Provides database administrators and developers with valuable information on Microsoft® SQL Server™ 7.0 index selection, disk I/O subsystem tuning, and SQL Server 7.0 performance tuning tools. (47 printed pages)
Audience
Principles of Performance Tuning SQL Server
Performance Items for SQL Server Configuration
Components That Consume CPU and Disk I/O Resources
Disk I/O Performance
SQL Server Indexes
SQL Server Performance Tuning Tools
Miscellaneous Performance Topics
Finding More Information
This performance tuning guide is designed to help database administrators configure Microsoft SQL Server for maximum performance and help determine the cause of poor performance in a SQL Server environment. It also provides SQL Server application developers guidance on how to use SQL Server indexes and SQL Server tools for analyzing the input/output (I/O) performance efficiency of SQL queries.
SQL Server 6.x vs. 7.0 Performance Tuning Comparative Information
SQL 6.x considerations | SQL Server 7.0 enhancements and philosophies |
There were many configuration options to consider during performance tuning. | One major enhancement to SQL Server 7.0 is that the database engine has become largely self-configuring, self-tuning, and self-managing. LazyWrtier and Read-Ahead Manager are self-tuning. Max Async I/O is likely the only sp_configure option that will need to be initially configured when dealing with servers with larger amounts of storage. This reduction in tuning requirements saves valuable administrative time that can be applied to other tasks. Readers familiar with the earlier version of this document, "Microsoft SQL Server 6.5: I/O Performance Tuning Quick Reference," will notice that there are not nearly as many sp_configure options that need to be manually adjusted by the database administrator in order to achieve good SQL Server performance. While it is still possible to manually configure and adjust many of the sp_configure options that were available in previous versions of SQL Server, it is recommended that database administrators allow SQL Server to automatically configure and tune all sp_configure options that SQL Server provides defaults for. This allows SQL Server to automatically adjust the configuration of the database server as factors affecting the database server change. (Examples of such factors include RAM and CPU utilization for SQL Server and other applications running off the database server.) |
Manual tuning of LazyWriter was sometimes necessary. | SQL Server 7.0 automatically configures and tunes the LazyWriter. There is no longer the need to manually tune "free buffer" and "max lazywriter io." Free buffer and LazyWriter I/O activity can still be monitored with the new SQL Server 7.0 Performance Monitor objects described later. |
Manual tuning of Checkpoint was sometimes necessary. | In versions of SQL Server prior to the 7.0 version, "recovery interval" was also used to tune the checkpoint process. SQL Server 7.0 automatically tunes the recovery interval option. The SQL Server 7.0 default of 0 for recovery interval indicates that SQL Server will take responsibility for automatically monitoring and tuning recovery interval. This default setting will maintain recovery times less than one minute for all databases as long as there are no exceptionally long-running transactions present on the system. For more information, search on the strings "Checkpoints and the Active Portion of the Log" and "troubleshooting recovery interval" in SQL Server Books Online. (See the section "Finding More Information" at the end of this document for information on installing SQL Server Books Online.) |
SQL Server 6.x log pages shared RAM cache with data pages. Manual tuning of the Log Manager was sometimes necessary. | SQL Server Log Manager has changed significantly in SQL Server 7.0 from previous versions of SQL Server. SQL Server 7.0 Log Manager manages its own log cache. There is no longer the dependency on the SYSLOGS table that existed in earlier versions of SQL Server. This separating of the log file management from the data cache management brings enhanced performance for both components. SQL Server Log Manager is also capable of performing disk I/O in larger byte sizes than before. The larger I/O size combined with the sequential nature of SQL Server logging help to make disk I/O performance very good for the Log Manager. SQL Server 7.0 automatically tunes the performance of SQL Server Log Manager. There is no longer the need to manually tune the sp_configure option logwrite sleep. This option has been removed from SQL Server 7.0. For more information, search in SQL Server Books Online for the string "logwrite sleep." |
SQL Server 7.0 has been dramatically enhanced in order to create a largely auto-configuring and self-tuning database server. Take advantage of SQL Server's auto-tuning settings. This helps SQL Server run at peak performance even as user load and queries change over time.
A major part of any database server environment is the management of random access memory (RAM) buffer cache. Access to data in RAM cache is much faster than access to the same information from disk. But RAM is a limited resource. If database I/O (input/output operations to the physical disk subsystem) can be reduced to the minimal required set of data and index pages, these pages will stay in RAM longer. Too much unneeded data and index information flowing into buffer cache will quickly push out valuable pages. The driving focus of performance tuning is to reduce I/O so that buffer cache is best utilized.
A key factor in maintaining minimum I/O for all database queries is to ensure that good indexes are created and maintained.
The physical disk subsystem must provide a database server with sufficient I/O processing power in order for the database server to run without disk queuing. Disk queuing results in bad performance. This document describes how to detect disk I/O problems and how to resolve them.
This becomes especially important when a database server will be servicing requests from hundreds or thousands of connections through a given application. Because applications typically determine the SQL queries that will be executed on a database server, it is very important for application developers to understand SQL Server architectural basics and how to take full advantage of SQL Server indexes to minimize I/O.
SQL Server Profiler can be used to monitor and log a SQL Server's workload. This logged workload can then be submitted to SQL Server Index Tuning Wizard so that index changes can be made to help performance if necessary. Regular use of SQL Profiler and Index Tuning Wizard helps SQL Server perform well as the overall query workload changes over time.
SQL Server 7.0 provides a revised set of Performance Monitor objects and counters, which are designed to provide helpful information for monitoring and analyzing the operations of SQL Server. This document describes key Performance Monitor counters to watch.
SQL Server 7.0 introduces Graphical ShowPlan, an easy method to analyze problematic SQL queries. Statistics I/O is another important aspect of Query Analyzer that this document will describe.
The default of 32 is sufficient for lower-end disk subsystems. With a higher-end RAID (Redundant Array of Inexpensive Disks) storage subsystem attached to a database server that is capable of very high disk I/O transfer rates, the setting of 32 may do a disservice to the RAID subsystem because the RAID subsystem is capable of completing many more simultaneous disk transfer requests than 32. If, in addition to this, SQL Server write activity dictates that more disk transfer capability is needed, max async I/O should be set higher.
Note On Microsoft Windows® 95/98 platforms asynchronous I/O is not supported, so this option is not applicable.
A good value for max async I/O is one that allows Checkpoint to be "fast enough." The goal is to make Checkpoint fast enough to finish before another checkpoint is needed (based upon desired recovery characteristics), but not so fast that the system is seriously perturbed by the event (disk queuing, which will be discussed in further detail later in this document).
A rule of thumb for setting max async I/O for SQL Servers running on larger disk subsystems is to multiply the number of physical drives available to do simultaneous I/O by 2 or 3. Then watch Performance Monitor for signs of disk activity or queuing issues. The negative impact of setting this configuration option too high is that it may cause Checkpoint to monopolize disk subsystem bandwidth that is required by other SQL Server I/O operations, such as reads.
To set this value, execute the following command in SQL Server Query Analyzer: "sp_configure 'max async io', <value>", where <value> is expressed as the number of simultaneous disk I/O requests that the SQL Server system will be able to submit to Windows during a checkpoint operation, which in turn submit the requests to the physical disk subsystem. (See the section "Disk I/O Performance" later in this document for more information.) This configuration option is dynamic (that is, it does not require a stop and restart of SQL Server to take effect).
For more information, search in SQL Server Books Online for the strings "I/O architecture" and "max async I/O option."
SQL Server maintains a pool of Windows threads that are used to service batches of SQL Server commands being submitted to the database server. The total number of these threads (referred to in SQL Server terminology as "worker threads") available to service all incoming command batches is dictated by the setting for the sp_configure option max worker threads. If the number of connections actively submitting batches is greater than the number specified for max worker threads, worker threads will be shared among connections actively submitting batches. The default of 255 will work well for many installations. Note that the majority of connections spend most of their time waiting for batches to be received from the client.
Worker threads take on most of the responsibility of writing out dirty 8-kilobyte (KB) pages from the SQL Server buffer cache. Worker threads schedule their I/O operations asynchronously for maximum performance.
For more information, search on the strings "max worker threads option," "sql server threads," "Optimizing Server Performance Using Memory Configuration Options," "sql server memory pool," "Transaction Recovery," "write ahead transaction log," and "Freeing and Writing Buffer Pages" in SQL Server Books Online.
SQL Server LazyWriter helps to produce free buffers, which are 8-KB data cache pages without any data contained in them. As LazyWriter flushes each 8-KB cache buffer out to disk, it needs to initialize the cache page's identity so that other data may be written into the free buffer. LazyWriter aims to produce free buffers during periods of low disk I/O, such that disk I/O resources are readily available for use and there will be minimal impact on other SQL Server operations.
SQL Server 7.0 automatically configures and manages the level of free buffers. Monitor "SQL Server: Buffer Manager—Free Buffers" to see if this value dips. Optimally, LazyWriter keeps this counter level throughout SQL Server operations, which means that LazyWriter is keeping up with the user demand for free buffers. It is bad for the Performance Monitor object "SQL Server: Buffer Manager—Free Buffers" to hit zero, indicating there were times when the user load demanded a higher level of free buffers than the SQL Server's LazyWriter was able to provide.
If LazyWriter is having problems keeping the free buffer steady, or at least above zero, it could mean that the disk subsystem is not able to provide LazyWriter with the disk I/O performance that LazyWriter needs. (Compare drops in free buffer level to any disk queuing to see if this is true.) The solution is to add more physical disk drives (also commonly referred to as "spindles") to the database server disk subsystem in order to provide more disk I/O processing power. "SQL Server: Buffer Manager—Lazy Writes/sec" indicates the number of 8-KB pages being physically written out to disk.
Monitor the current level of disk queuing in Performance Monitor by looking at the counters for (Logical or Physical) Disk: Average Disk Queue or Current Disk Queue, and ensure the disk queue is less than 2 for each physical drive associated with any SQL Server activity. For database servers that employ hardware RAID controllers and disk arrays, remember to divide the number reported by Logical/Physical Disk counters by the number of actual hard drives associated with that logical drive letter or physical hard drive number (as reported by Disk Administrator), because Windows and SQL Server are unaware of the actual number of physical hard drives attached to a RAID controller. It is very important to be aware of the number of drives associated with the RAID array controller in order to properly interpret the disk queue numbers that Performance Monitor is reporting.
Adjust LazyWriter disk I/O request behavior with the use of max async I/O. The sp_configure option max async I/O controls the number of 8-KB disk write requests (including requests coming in from LazyWriter, CheckPoint, and the worker threads) that SQL Server can simultaneously submit to Windows and, in turn, to the disk I/O subsystem. If disk queuing occurs at unacceptable levels, decrease max async I/O. If it is imperative that SQL Server maintains its currently configured level of max async I/O, add more disks to the disk subsystem until disk queuing comes down to acceptable levels.
For more information, search for the strings "freeing and writing buffer pages" and "write-ahead transaction log" in SQL Server Books Online.
Checkpoint writes dirty pages out to the SQL Server data files. Dirty pages are any buffer cache pages that have been modified since being brought into the buffer cache. A buffer written to disk by Checkpoint still contains the page and users can read or update it without rereading it from disk, which is not the case for free buffers created by LazyWriter.
Checkpoint aims to let worker threads and LazyWriter do the majority of the work writing out dirty pages. Checkpoint does this by trying an extra checkpoint wait before writing out a dirty page if possible. This provides the worker threads and LazyWriter more time with which to write out the dirty pages. The conditions under which this extra wait time for a dirty page occurs is detailed in SQL Server Books Online under the section "Checkpoints and the Active Portion of the Log." The main idea to remember is that Checkpoint aims to even out SQL Server disk I/O activity over a longer time period with this extra checkpoint wait.
To make Checkpoint more efficient when there are a large number of pages to flush out of cache, SQL Server will sort the data pages to be flushed in the order the pages appear on disk. This will help minimize disk arm movement during cache flush and potentially take advantage of sequential disk I/O. Checkpoint also submits 8-KB disk I/O requests asynchronously to the disk subsystem. This allows SQL Server to finish submitting required disk I/O requests faster because Checkpoint doesn't wait for the disk subsystem to report back that the data has been actually written to disk.
It is important to watch disk queuing on hard drives associated with SQL Server data files to notice if SQL Server is sending down more disk I/O requests than the disk(s) can handle, and if this is true, more disk I/O capacity must be added to the disk subsystem so that it can handle the load.
Adjust Checkpoint's dirty page flushing behavior with the use of max async I/O. The sp_configure option max async I/O controls the number of 8-KB cache flushes that Checkpoint can simultaneously submit to Windows (and, in turn, to the disk I/O subsystem). If disk queuing occurs at unacceptable levels, decrease max async I/O. If it is imperative that SQL Server maintains its currently configured level of max async I/O, add more disks to the disk subsystem until disk queuing comes down to acceptable levels.
If, on the other hand, it is necessary to increase the speed with which SQL Server executes Checkpoint and the disk subsystem is already powerful enough to handle the increased disk I/O while avoiding disk queuing, increase max async I/O to allow SQL Server to send more disk I/O requests at the same time, potentially improving I/O performance. Watch the disk queuing counters carefully after changing max async I/O. Be sure to watch disk read queuing in addition to disk write queuing. If max async I/O is set too high for a given disk subsystem, Checkpoint may tend to queue up a large number of disk write I/O requests and this can cause SQL Server read activity to be blocked. DBCC PERFMON provides Reads Outstanding as one of its outputs and is a good counter to watch in Performance Monitor to look for queued SQL Server reads. Physical Disk and Logical Disk objects in Performance Monitor provide the Average Disk Read Queue Length counter, which can also be used to monitor queued disk read I/O requests. If disk read queuing is being caused by Checkpoint, the choices are to either decrease max async I/O or to add more hard drives so that the checkpoint and read requests can be simultaneously handled.
Like all other major relational database management system (RDBMS) products, SQL Server ensures that all write activity (insert, update, and delete) performed on the database will not be lost if something were to interrupt SQL Server's online status (such as power failure, disk drive failure, fire in the data center, and so on). One thing that helps guarantee recoverability is the SQL Server logging process. Before any implicit (single SQL query) or explicit transaction (defined transaction that issues a BEGIN TRAN/COMMIT, or ROLLBACK command sequence) can be completed, SQL Server's Log Manager must receive a signal from the disk subsystem that all data changes associated with that transaction have been written successfully to the associated log file. This rule guarantees that if the SQL Server is abruptly shut down for whatever reason and the transactions written into the data cache are not yet flushed to the data files (remember that flushing data buffers are Checkpoint or LazyWriter's responsibility), the transaction log can be read and reapplied in SQL Server upon turning on the SQL Server. Reading the transaction log and applying the transactions to SQL Server after a server stoppage is referred to as recovery.
Because SQL Server must wait for the disk subsystem to complete I/O to SQL Server log files as each transaction is completed, it is important that the disks containing SQL Server log files have sufficient disk I/O handling capacity for the anticipated transaction load.
The method of watching out for disk queuing associated with SQL Server log files is different from SQL Server database files. Use the Performance Monitor counters "SQL Server: Databases <database instance>: Log Flush Waits Times" and "SQL Server: Databases <database instance>: Log Flush Waits/sec" to see if there are log writer requests waiting on the disk subsystem for completion.
It is okay to use a caching controller for SQL Server log files (in fact, it's a necessity for the highest performance) if the controller guarantees that data entrusted to it will be written to disk eventually, even if the power fails, for example. For more information on caching controllers, refer to the section later in this document titled "Effect of on-board cache of hardware RAID controllers."
For more information, search in SQL Server Books Online for the strings "Transaction Recovery," "SQL Server: Log Manager Object," and "optimizing transaction log performance."
SQL Server 7.0 Read-Ahead Manager is completely self-configuring and self-tuning. Read-Ahead Manager is tightly integrated with the operations of SQL Server Query Processor. Query Processor communicates situations that would benefit from read-ahead scans to Read-Ahead Manager. Large table scans, large index range scans, and probes into clustered and nonclustered index binary trees (or B-trees) are situations that would benefit from a read-ahead. This is because read-aheads occur with 64-KB I/Os, which provide higher disk throughput potential for the disk subsystem than do 8-KB I/Os. When it is necessary to retrieve a large amount of data from SQL Server, read-ahead is the best way to do it.
Read-Ahead Manager benefits from the simpler and more efficient Index Allocation Map (IAM) storage structure. The IAM is SQL Server 7.0's new method of recording the location of extents (8 pages of SQL Server data or index information for a total of 64 KB of information per extent). The IAM is an 8-KB page that tightly packs information (through a bitmap) about which extents within the range of extents covered by the IAM contain required data. The compact nature of IAM pages make them fast to read and tends to keep regularly used IAM pages in buffer cache.
Combining the query information from Query Processor and quickly retrieving the location of all extents that need to be read from the IAM page(s), Read-Ahead Manger can construct multiple sequential read requests. Sequential 64-KB disk reads provide extremely good disk I/O performance.
"SQL Server: Buffer Manager—Read-Ahead Pages" is the important Performance Monitor counter to watch with respect to SQL Server Read-Ahead Manager. More information can be found by executing the command DBCC PERFMON (IOSTATS). Among the information is "RA Pages Found in Cache" and "RA Pages Placed in Cache." If the page is already hashed (that is, the application read it in first and read-ahead wasted a read), it counts as a "page found in cache." If the page is not already hashed (that is, a successful read-ahead), it counts as a "page placed in cache."
One caveat about Read-Ahead Manager is that too much read-ahead can be detrimental overall to performance because it can fill cache with pages that were not needed, requiring additional I/O and CPU that could have been used for other purposes. The solution is a general performance tuning goal, and that is to make sure that all SQL queries are tuned such that a minimal number of pages are brought into buffer cache. This would include making sure you use the right index for the right job. Save clustered indexes for efficient range scans and define nonclustered indexes to help quickly locate single rows or smaller rowsets. Note that if you only plan to have one index in a table and that index is for the purposes of fetching single rows or smaller rowsets, you should make the index clustered because clustered indexes will be faster than nonclustered indexes (but not by the same dramatic scale as is the case for range scans).
For more information, search in SQL Server Books Online for the strings "reading pages," "DBCC PERFMON," "Table and Index Architecture," "Heap Structures," and "read-ahead pages."
When configuring a SQL Server that will only contain a few gigabytes (GB) of data and not sustain heavy read nor write activity, it is not as important to be concerned with the subject of disk I/O and balancing of SQL Server I/O activity across hard drives for maximum performance. But to build larger SQL Server databases that will contain hundreds of gigabytes of data and/or will sustain heavy read and/or write activity, it is necessary to drive configuration around maximizing SQL Server disk I/O performance by load-balancing across multiple hard drives.
One of the most important aspects of database performance tuning is I/O performance tuning. SQL Server is certainly no exception to this philosophy. Unless SQL Server is running on a machine with enough RAM to hold the entire database, I/O performance will be dictated by how fast reads and writes of SQL Server data can be processed by the disk I/O subsystem.
A good rule of thumb to remember is that the typical Wide Ultra SCSI-3 hard drive is capable of providing Windows and SQL Server with about 75 nonsequential (random) and 150 sequential I/O operations per second. Advertised transfer rates in terms of megabytes (MB) for these hard drives range around 40 MB/second. Keep in mind that it is much more likely for a database server to be constrained by the 75/150 I/O transfers per second than the 40 MB/second transfer rate. This is illustrated by the following calculations:
(75 random I/O operations per second) X (8-KB transfer) = 600 KB per second
The preceding calculation indicates by doing strictly random read or write SQL Server operations on a given hard drive (single page reads and writes), it is reasonable to expect at most 600 kilobytes (0.6 megabytes) per second I/O processing capability from that hard drive. This is much lower than the advertised 40 MB per second I/O handling capacity of the drive. SQL Server worker threads, Checkpoint, and LazyWriter perform I/O in 8-KB transfer sizes.
(150 sequential I/O operations per second) X (8-KB transfer) = 1,200 KB per second
The preceding calculation indicates by doing strictly sequential read or write SQL Server operations on a given hard drive (single page reads and writes), it is reasonable to expect at most 1,200 kilobytes (1.2 megabytes) per second I/O processing capability from that hard drive.
(75 random I/O operations per second) X (64-KB transfer) = 4,800 KB (4.8 MB) per second
The preceding calculation illustrates a worst-case scenario for read-aheads, assuming all random I/O. Note that even in the completely random situation, the 64-KB transfer size still provides much better disk I/O transfer rate from disk (4.8 MB/second) than the single page transfer rates (0.6 and 1.2 MB/second):
(150 sequential I/O operations per second) X (64-KB transfer) = 9,600 KB (9.6 MB) per second
The preceding calculation indicates that by doing strictly sequential read or write SQL Server operations on a given hard drive, it is reasonable to expect at most 9.6 MB per second I/O processing capability from that hard drive. This is much better than the random I/O case. SQL Server Read-Ahead Manager performs disk I/O in the 64-KB transfer rate and will attempt to arrange its reads such that read-ahead scans are done sequentially (often referred to as "serially" or "in disk order"). While Read-Ahead Manager aims to perform I/O operations sequentially, page splitting will tend to cause extents to be read nonsequentially versus sequentially. This is one reason why it is important to eliminate and prevent page splitting.
Log Manager will write sequentially to the log files in sizes ranging up to 32 kilobytes.
The terms sequential and nonsequential (random) have been used quite a bit to refer to hard disk operations. It is worthwhile to take a moment to explain what these terms basically mean in relation to a disk drive. A single hard drive consists of a set of drive platters. Each of these drive platters provides services for read/write operations with a set of arms with read/write heads that can move across the platters and read information from the drive platter or write data onto the platters. With respect to SQL Server, there are two important points to remember about hard drives:
Readers interested in more detail about physical hard drives should refer to the Compaq white paper "Disk Subsystem Performance and Scalability," the location of which is mentioned in the "Finding More Information" section at the end of this document.
A typical hard disk provides a maximum transfer rate of about 40 megabytes per second or 75 nonsequential/150 sequential disk transfers per second. Typical RAID controllers have an advertised transfer rate of about 40 megabytes per second or (very approximately) 2,000 disk transfers per second. Peripheral Component Interconnect (PCI) buses have an advertised transfer rate of about 133 megabytes per second and higher. The actual transfer rates achievable for a device will differ from the advertised rate, but that is not important for our discussion here. What is important to understand is how to use these transfer rates as a rough starting point for determining the number of hard drives to associate with each RAID controller and, in turn, how many drives + RAID controllers can be attached to a PCI bus without I/O bottleneck problems.
In the earlier section titled "How Advertised Disk Transfer Rates Relate to SQL Server," it was calculated that the maximum amount of SQL Server data that can be read from or written to a hard drive in one second is 9.6 MB. Assuming a RAID controller can handle 40 MB per second, it is possible to roughly calculate the number of hard drives that should be associated with one RAID controller by dividing 40 by 9.6 to get approximately 4. This means that at most 4 drives should be associated with that one controller when SQL Server is doing nothing but sequential I/O of 64 KB. Similarly, it was previously calculated that with all nonsequential I/O of 64 KB, the maximum data sent up from the hard drive to the controller would be 4.8 MB/second. Dividing 40 MB/second by 4.8 MB/second gives us the result of about 8. This means that at most 8 hard drives should be associated with the single controller in the nonsequential 64-KB scenario. The random 8-KB data transfer scenario requires the most drives. Divide 40 by 0.6 to determine that about 66 drives are needed to saturate a RAID controller doing 100 percent random 8-KB reads and writes. This is not a realistic scenario because read-ahead and log writing will use transfer sizes greater than 8 KB and it is very unlikely that a SQL Server will perform 100 percent random I/O.
Another way to figure out how many drives should be associated with a RAID controller is to look at disk transfers per second instead of looking at megabytes per second. If a hard drive is capable of 75 nonsequential (random) I/Os per second, it follows that about 26 hard drives working together could theoretically produce 2,000 nonsequential I/Os per second, enough to hit the maximum I/O handling capacity of a single RAID controller. On the other hand, it would only take about 13 hard drives working together to produce 2,000 sequential I/Os per second and keep the RAID controller running at maximum throughput, because a single hard drive can sustain 150 sequential I/Os per second.
Moving onto the PCI bus. Note that RAID controller and PCI bus bottlenecks are not nearly as common as I/O bottlenecks related to hard drives. But for the sake of illustration, let's assume that it is possible to keep a set of hard drives associated with a RAID controller busy enough to push 40 MB/second of throughput through the controller. The next question would be "How many RAID controllers can be safely attached to the PCI bus without risking a PCI bus I/O bottleneck?" To make a rough estimation, divide the I/O processing capacity of the PCI bus by the I/O processing capacity of the RAID controller: 133 MB/second divided by 40 MB/second provides the result that approximately three RAID controllers can be attached to a single PCI bus. Note that most large servers come with more than one PCI bus, so this would increase the number of RAID controllers that could be installed in a single server.
These calculations help illustrate the relationship of the transfer rates of the various components that comprise a disk I/O subsystem (hard drives, RAID controllers, and PCI bus) and are not meant to be taken literally. This is because the calculations assume all sequential or all nonsequential data access, which is not likely to ever be the case in a production database server environment. In reality, a mixture of sequential, nonsequential, 8-KB, and 64-KB I/O will occur. And other factors will make it difficult to estimate exactly how many I/O operations can be pushed through a set of hard drives at one time. On-board read/write caching available for RAID controllers increases the amount of I/O that a set of drives can effectively produce. How much more is hard to estimate for the same reason that it is hard to place an exact number on the number of 8-KB versus 64-KB I/O a SQL Server environment will need.
But hopefully this section has helped to foster some insight into what advertised transfer rates really mean to SQL Server.
When scaling databases past a few gigabytes it is important to have at least a basic understanding of RAID (Redundant Array of Inexpensive Disks) technology and how it relates to database performance.
The benefits of RAID are:
Mirroring is implemented by writing information onto two sets of drives, one on each side of the mirrored pairs of drives. If there is a drive loss with mirroring in place, the data for the lost drive can be rebuilt by replacing the failed drive and rebuilding the data from the failed drive's matching drive on the other side of the mirrorset. Most RAID controllers provide the ability to do this failed drive replacement and rebuilding from the other side of the mirrored pair while Windows and SQL Server are online (commonly referred to as "Hot Plug" capable drives). One advantage of mirroring is that it is the best-performing RAID option when fault tolerance is required. Each SQL Server write in the mirroring situation costs two disk I/O operations, once to each side of the mirrorset. The other advantage is that mirroring provides more fault tolerance than parity RAID implementations. Mirroring can sustain at least one failed drive and may be able to survive failure of up to half of the drives in the mirrorset without forcing the system administrator to shut down the server and recover from file backup. The disadvantage of mirroring is cost. The disk cost of mirroring is one drive for each drive worth of data. RAID 1 and its hybrid, RAID 0+1 (sometimes referred to as RAID 10 or 0/1) are implemented through mirroring.
Parity is implemented by calculating recovery information about data written to disk and writing this parity information on the other drives that form the RAID array. If a drive should fail, a new drive is inserted into the RAID array and the data on that failed drive is recovered by taking the recovery information (parity) written on the other drives and using this information to regenerate the data from the failed drive. RAID 5 and its hybrids are implemented through parity. The advantage of parity is cost. To protect any number of drives with RAID 5, only one additional drive is required. Parity information is evenly distributed among all drives participating in the RAID 5 array. The disadvantages of parity are performance and fault tolerance. Due to the additional costs associated with calculating and writing parity, RAID 5 requires four disk I/O operations for each Windows NT/SQL Server write as compared to two disk I/O operations for mirroring. Read I/O operation costs are the same for mirroring and parity. Also, RAID 5 can sustain only one failed drive before the array must be taken offline and recovery from backup media must be performed to restore data.
General Rule of Thumb: Be sure to stripe across as many disks as necessary to achieve solid disk I/O performance. Performance Monitor will indicate if there is a disk I/O bottleneck on a particular RAID array. Be ready to add disks and redistribute data across RAID arrays and/or small computer system interface (SCSI) channels as necessary to balance disk I/O and maximize performance.
Many hardware RAID controllers have some form of read and/or write caching. Take advantage of this available caching with SQL Server because it can significantly enhance the effective I/O handling capacity of the disk subsystem. The principle of these controller-based caching mechanisms is to gather smaller and potentially nonsequential I/O requests coming in from the host server (hence, SQL Server) and try to batch them together with other I/O requests for a few milliseconds so that the batched I/Os can form larger (32–128 KB) and maybe sequential I/O requests to send to the hard drives. This, in keeping with the principle that sequential and larger I/O is good for performance, helps produce more disk I/O throughput given the fixed number of I/Os that hard disks are able to provide to the RAID controller. It is not that the RAID controller caching magically allows the hard disks to process more I/Os per second, the RAID controller cache is just using some organization to arrange incoming I/O requests to make best possible use of the underlying hard disks' fixed amount of I/O processing ability.
These RAID controllers usually protect their caching mechanism with some form of backup power. The backup power can help preserve the data written in cache for some period of time (perhaps days) in case of a power outage. And in production environments, provide the database server even more protection by providing adequate uninterruptible power supply (UPS) protection to the server so that the RAID controller has even more protection and battery backup time, which it can utilize to flush data to disk in case power to the server is disrupted.
RAID 1 and RAID 0+1 offers the best data protections and best performance among RAID levels but will cost more in terms of disks required. When cost of hard disks is not a limiting factor, RAID 1 or RAID 0+1 are the best RAID choices in terms of both performance and fault tolerance.
RAID 5 provides fault tolerance at the best cost but has half the write performance of RAID 1 and 0+1 because of the additional I/O that RAID 5 has to do reading and writing parity information onto disk. RAID 5 is not as fault tolerant as RAID 1 and 0+1.
Best disk I/O performance is achieved with RAID 0 (disk striping with no fault tolerance protection), But since there is no fault tolerance with RAID 0, this RAID level can only be typically used for development database servers or other testing environments.
Many RAID array controllers provide the option of RAID 0+1 (also referred to as RAID 1/0 and RAID 10) over physical hard drives. RAID 0+1 is a hybrid RAID solution. On the lower level, it mirrors all data just like normal RAID 1. On the upper level, the controller stripes data across all of the drives (like RAID 0). Thus, RAID 0+1 provides maximum protection (mirroring) with high performance (striping). These striping and mirroring operations are transparent to Windows and SQL Server because they are managed by the RAID controller. The difference between RAID 1 and RAID 0+1 is on the hardware controller level. RAID 1 and RAID 0+1 require the same number of drives for a given amount of storage. For more specifics on RAID 0+1 implementation of specific RAID controllers, contact the hardware vendor that produced the controller.
Figure 1 illustrates the difference between RAID 0, RAID 1, RAID 5, and RAID 0+1. Note that in order to hold four disks worth of data, RAID 1 (and RAID 0+1) needs eight disks, whereas Raid 5 needs five disks. Be sure to involve the appropriate hardware vendors to learn more about RAID implementation specific to the hardware running the database server.
Figure 1. Common RAID levels
This is a very handy feature that allows disks to be added dynamically to a physical RAID array while SQL Server is online, as long as there are hot plug slots available. Many hardware vendors offer hardware RAID controllers that are capable of providing this functionality. Data is automatically re-striped across all drives evenly, including the newly added drive, and there is no need to shut down SQL Server or Windows. It is good to take advantage of this functionality by leaving hot plug hard drive slots free in the disk array cages. Thus, if SQL Server is regularly overtaxing a RAID array with I/O requests (this will be indicated by Disk Queue Length for the Windows logical drive letter associated with that RAID array), it is possible to install one or more new hard drives into the hot plug slot while SQL Server is still running. The RAID controller will redistribute some existing SQL data to these new drives so that SQL data is evenly distributed across all drives in the RAID array. Then, the I/O processing capacity of the new drives (75 nonsequential/150 sequential I/Os per second, per drive) is added to the overall I/O processing capacity of the RAID array.
In Performance Monitor, Logical and Physical Disk Objects provide effectively the same information. The difference is that Logical Disks in Performance Monitor are associated with what Windows sees as a logical drive letter. Physical Disks in Performance Monitor are associated with Windows sees as a single physical hard disk.
To enable Performance Monitor counters, use the command diskperf.exe from the command line of a Windows command prompt window. Use "diskperf –y" so that Performance Monitor will report Logical and Physical disk counters. This works when using hard drives or sets of hard drives and RAID controllers, without the use of Windows NT software RAID.
When utilizing Windows NT software RAID, use "diskperf –ye" so that that Performance Monitor will report Physical counters across the Windows NT stripesets correctly. When "diskperf –ye" is used in conjunction with Windows NT stripesets, Logical counters will not report correct information and need to be disregarded. If Logical disk counter information is required in conjunction with Windows NT stripesets, use "diskperf –y" instead. With "diskperf –y" and Windows NT stripesets, Logical disk counters will be reported correctly but Physical disk counters will not report correct information and need to be disregarded.
Note that the effects of the diskperf command do not occur until Windows NT has been restarted.
Also note that hardware RAID controllers present multiple physical hard drives that compose a single RAID mirrorset or stripeset to Windows, as one single physical disk. Disk Administrator is used to associate logical drive letters to the single physical disk and doesn't need to be concerned with how many hard disks are really associated with the single hard physical disk that the RAID controller has presented to it.
But from a performance tuning perspective, it is very important to be aware of how many physical hard drives are associated with a RAID array because this information will be needed when determining the number of disk I/O requests that Windows and SQL Server are sending to each physical hard drive. Divide the number of disk I/O requests that Performance Monitor reports as being associated with a hard drive by the number of actual physical hard drives known to be in that RAID array.
In order to get a rough estimate of I/O activity per hard drive in a RAID array, it is also important to multiply the number of disk write I/Os reported by Performance Monitor by either 2 (RAID 1 and 0+1) or 4 (RAID 5). This will give a more accurate account of the number of actual I/O requests being sent to the physical hard drives, because it is at this physical level that the I/O capacity numbers for hard drives apply (75 nonsequential and 150 sequential per drive). But don't expect to be able to calculate exactly how much I/O is hitting the hard drives this way, when the hardware RAID controller is using caching, because caching can significantly change the amount of I/O that is actually hitting the hard drives for the reasons just explained.
It is best to concentrate on disk queuing versus actual I/O per disk because, after all, why worry about I/O if it is not causing a problem? Windows can't see the number for physical drives in a RAID array, so to assess disk queuing per physical disk accurately, it is important to divide the Disk Queue Length by the number of physical drives participating in the hardware RAID disk array that contains the logical drive being observed. Keep this number under 2 for hard drives containing SQL Server files.
For more information about SQL Server and RAID, search in SQL Server Books Online for the strings "RAID Levels and SQL Server," "Comparing Different Implementations of RAID Levels," "Monitoring Disk Activity," "Performance Monitoring Example: Identifying Bottlenecks," "About Hardware-based Solutions," and "RAID."
Windows NT provides fault tolerance to hard disk failure by providing mirrorsets and stripesets (with or without fault tolerance) through the Windows NT operating system instead of a hardware RAID controller. Windows NT Disk administrator is used to define either mirrorsets (RAID 1) or stripesets with parity (RAID 5). Windows NT Disk Administrator also allows the definition of stripesets with no fault tolerance (RAID 0).
Software RAID will utilize more CPU resources, because Windows NT is the component managing the RAID operations versus the hardware RAID controller. Thus, performance with the same number of disk drives and Windows NT software RAID may be a few percent less than the hardware RAID solution if the system processors are near 100 percent utilized. But Windows NT software RAID will generally help a set of drives service SQL Server I/O better overall than those drives would have been able to separately, reducing the potential for an I/O bottleneck, leading to higher CPU utilization by SQL Server and better throughput. And software RAID can provide a better-cost solution for providing fault tolerance to a set of hard drives.
For more information about configuring Windows NT software RAID, refer to Chapter 4, "Planning a Reliable Configuration," in Windows NT Server Online Help. Also search in SQL Server Books Online for the strings "About Windows NT-based Disk Mirroring and Duplexing" and "About Windows NT-based Disk Striping and Striping with Parity."
When dealing with smaller SQL Server databases located on a few disk drives, disk I/O parallelism will likely not come into play. But when dealing with large SQL Server databases stored on many disk drives, performance will be enhanced by using disk I/O parallelism to make optimal use of the I/O processing power of the disk subsystem.
The simplest techniques for creating disk I/O parallelism is to create a single "pool of drives" that serves all SQL Server database files, excluding transaction log files. The pool may be a single RAID array that is represented in Windows NT as a single physical drive. Or a larger pool may be set up using multiple RAID arrays and SQL Server files/filegroups. A SQL Server file can be associated with each RAID array and the files can be combined into a SQL Server filegroup. Then a database can be built on the filegroup so that the data will be spread evenly across all of the drives and RAID controllers. The "drive pool" methodology depends on RAID to divide data across all physical drives to help ensure parallel access to that data during database server operations.
This pool methodology simplifies SQL Server I/O performance tuning because database administrators know that there is only one physical location to create database objects. The single pool of drives can be watched for disk queuing and, if necessary, more hard drives can be added to the pool to prevent disk queuing. This technique helps optimize for the common case, where it is not known which parts of databases may see the most usage. It is better not to have a portion of the total available I/O capacity segregated away on some other disk partition just because 5 percent of the time SQL Server might be doing I/O to it. The "single pool of drives" methodology may help make all available I/O capacity always available for SQL Server operations.
Note that SQL Server log files should always be physically separated onto different hard drives from all other SQL Server database files. For SQL Servers with very busy databases, transaction log files should be physically separated from each other. Transaction logging is primarily sequential write I/O. There is a lot of I/O performance benefit associated with separating transaction logging activity from other nonsequential disk I/O activity. That allows the hard drives containing the log files to concentrate on sequential I/O. Note that there are times when the transaction log will need to be read as part of SQL Server operations such as replication, rollbacks, and deferred updates. SQL Servers that participate in replication should pay particular attention to making sure that all transaction log files have sufficient disk I/O processing power because of the reads that need to occur.
There is additional administration involved with physically separating SQL Server objects from the rest of their associated database through SQL Server files and filegroups. For the purposes of investigating very active tables and indexes, this may be very worthwhile. By separating table or index away from all other database objects, accurate assessments can be made of the I/O requirements of that object. This is not as easy to do when all database objects are placed within one big drive pool. This type of physical I/O separation may be appropriate during database development and benchmarking so that database I/O information can be gathered and applied to capacity planning for the production database server environment.
Here are the areas of SQL Server activity that can be separated across different hard drives, RAID controllers, PCI channels (or combinations of the three):
The physical separation of SQL Server I/O activities is quite convenient with the use of hardware RAID controllers, RAID hot plug drives, and online RAID expansion. The approach that provides the most flexibility is arranging the RAID controllers so that a separate RAID SCSI channel is provided for each of the separate SQL activities mentioned above. Each RAID SCSI channel should be attached to a separate RAID hot plug cabinet to take full advantage of online RAID expansion (if available through the RAID controller). Windows logical drive letters are associated to each RAID array and SQL Server files may be separated between distinct RAID arrays based on known I/O usage patterns.
With this configuration it is possible to relate disk queuing back to a distinct RAID SCSI channel and its drive cabinet as Performance Monitor reports the queuing behavior during load testing or heavy production loads. If a RAID controller and drive array cabinet support online RAID expansion and slots for hot plug hard drives are available in the cabinet, disk queuing on that RAID array is resolved by simply adding more drives to that RAID array until Performance Monitor reports that disk queuing for that RAID array has reached acceptable levels (less than 2 for SQL Server files). This can be done while SQL Server is online.
Tempdb is a database created by SQL Server to be used as a shared working area for a variety of activities, including temporary tables, sorting, subqueries, and aggregates with GROUP BY or ORDER BY, queries using DISTINCT (temporary worktables have to be created to remove duplicate rows), cursors, and hash joins. It is good to enable tempdb I/O operations to occur in parallel to the I/O operations of related transactions. Because tempdb is a scratch area and very update intensive, RAID 5 would not be as good a choice for tempdb as RAID 1or 0+1. Because tempdb is rebuilt every time the database server is restarted, RAID 0 is a possibility for tempdb on production SQL Server machines. RAID 0 provides best RAID performance for tempdb with the least number of physical drives. The main concern with using RAID 0 for tempdb in a production environment is that SQL Server would need to be stopped and restarted if any physical drive failure were to occur in the RAID 0 array, and this would not necessarily be the case if tempdb were placed on a RAID 1 or 0+1 array.
To move the tempdb database, use the ALTER DATABASE command to change the physical file location of the SQL Server logical file name associated with tempdb. For example, to move tempdb and its associated log to the new file locations e:\mssql7 and c:\temp, use the following commands:
alter database tempdb modify file (name='tempdev',filename=
'e:\mssql7\tempnew_location.mDF')
alter database tempdb modify file (name='templog',filename=
'c:\temp\tempnew_loglocation.mDF')
The master, msdb, and model databases are not used much during production compared to user databases, so it is typically not necessary to consider them in I/O performance tuning considerations. The master database is normally used just for adding new logins, databases, devices, and other system objects.
Nonclustered indexes reside in B-tree structures, which can be separated from their related database tables with the ALTER DATABASE command. In the example below, the first ALTER DATABASE creates a filegroup. The second ALTER DATABASE creates a file with a separate physical location associated with the filegroup. At this point, indexes can be created on the filegroup as illustrated in the following code with the creation of the index called index1. SP_HELPFILE reports files and filegroups present for a given database. SP_HELP <tablename> has a section in its output, which provides information on a table's indexes and their filegroup relationships. For more information, search in SQL Server Books Online for the strings "ALTER DATABASE" and "sp_helpfile."
alter database testdb add filegroup testgroup1
alter database testdb add file (name = 'testfile',
filename = 'e:\mssql7\test1.ndf') to filegroup testgroup1
create table test1(col1 char(8))
create index index1 on test1(col1) on testgroup1
sp_helpfile
sp_help test1
For more information, search in SQL Server Books Online for the strings "Files and Filegroups," "Placing Indexes on Filegroups,"" Monitoring Disk Activity," "Physical Database Files and Filegroups," and "Adding and Deleting Data and Transaction Log Files."
I/O characteristics of the hardware devices on the server have been discussed. Now the discussion will move to how SQL Server data and index structures are physically placed on disk drives. Just enough about these structures will be described so that the knowledge can apply to disk I/O performance.
SQL Server data and index pages are both 8 kilobytes in size. SQL Server data pages contain all of the data associated with a rows of a table, except text and image data. In the case of text and image data, the SQL Server data page that contains the row associated with the text/image column will contain a pointer to a binary tree (or B-tree) structure of one or more 8-KB pages that contain the text/image data.
SQL Server index pages contain only the data from columns that comprise a particular index. This means that index pages effectively compress information associated with many more rows into an 8-KB page than an 8-KB data page does. An important I/O performance concept to visualize is that the I/O performance benefit of indexes comes from this information compression. This is true if the columns picked to be part of an index forms a relatively low percentage of the rowsize of the table. When an SQL query asks for a set of rows from a table in which columns in the query match certain values in the rows, SQL Server can save I/O operations and time by reading the index pages to look for the values and then access only the rows in the table required to satisfy the query instead of having to perform I/O operations to scan all rows in the table to locate the required rows. This is true if the indexes defined are selected well.
There are two types of SQL Server indexes, and both are built upon B-tree structures formed out of 8-KB index pages. The difference is at the bottom of the B-tree structures, which are referred to as the leaf level in SQL Server documentation. The upper parts of index B-tree structures are referred to as nonleaf levels of the index. A B-tree structure built for every single index is defined on a SQL Server table.
Figure 2 illustrates the structural difference between nonclustered and clustered indexes. Key points to remember are that in the nonclustered index case the leaf level nodes contain only the data that participates in the index, along with pointers to locate quickly the remaining row data on the associated data page. In the worst-case scenario, each row access from the nonclustered index will require an additional nonsequential disk I/O to retrieve the row data. In a best-case scenario, many of the required rows will be on the same data page and thus allow retrieval of several required rows with each data page fetched. In the clustered index case, the leaf level nodes of the index are the actual data rows for the table. Therefore, no pointer jumps are required for retrieval of table data. Range scans based on clustered indexes will perform well because the leaf level of the clustered index (hence, all rows of that table) is physically ordered on disk by the columns that comprise the clustered index and, due to this fact, will perform I/O in 64-KB extents. And hopefully, if there is not a lot of page splitting on the clustered index B-tree (nonleaf and leaf levels), these 64-KB I/Os will be physically sequential. The dotted lines indicate that there are other 8-KB pages present in the B-tree structures but not shown.
Figure 2. Clustered and nonclustered index B-tree structures
There can only be one clustered index per table. There is a simple physical reason for this. While the upper parts (commonly referred to in SQL Server documentation as nonleaf levels) of the clustered index B-tree structure are organized just like the nonclustered index B-tree structures, the bottom level of the clustered index B-tree are the actual 8-KB data pages associated with the table. There are two performance implications here:
Nonclustered indexes are most useful for fetching few rows with good selectivity from large SQL Server tables based on a key value. As mentioned before, nonclustered indexes are binary trees formed out of 8-KB index pages. The bottom, or leaf level, of the binary tree of index pages contains all the data from the columns that comprised that index. When a nonclustered index is used to retrieve information from a table based on a match with the key value, the index B-tree is traversed until a key match is found at the leaf level of the index. A pointer jump is made if columns from the table are needed that did not form part of the index. This pointer jump will likely require a nonsequential I/O operation on the disk. It might even require the data to be read from another disk, if the table and its accompanying index B-tree(s) are large in size. If multiple pointers lead to the same 8-KB data page, less of an I/O performance penalty will be paid because it is only necessary to read the page into data cache once. For each row returned for a SQL query that involves searching with a nonclustered index, one pointer jump is required. These pointer jumps are the reason that nonclustered indexes are better suited for SQL queries that return only one or a few rows from the table. Queries that require a lot of rows to be returned are better served with a clustered index.
For more information, search in SQL Server Books Online for the string "nonclustered index."
A special situation with nonclustered indexes is the covering index. The definition of a covering index is a nonclustered index that is built upon all of the columns required to satisfy an SQL query, both in the selection criteria and the WHERE predicate. Covering indexes can save a huge amount of I/O, and hence bring a lot of performance to a query. But it is necessary to balance the costs of creating a new index (with its associated B-tree index structure maintenance) against of the I/O performance gain the covering index will bring. If a covering index will greatly benefit a query or set of queries that will be run very often on SQL Server, the creation of that covering index may be worth it.
–or–
Use Create Index Wizard in SQL Server Enterprise Manager to create the index. Select Tools/Wizards from the SQL Server Enterprise Manager menu bar, left-click the + icon beside Database to expose the database wizards, and then double-click Create Index Wizard to start the wizard.
The index created, called "indexname1" in this example, is a covering index because it includes all columns from the SELECT statement and the WHERE predicate. This means that during the execution of this query, SQL Server does not need to access the data pages associated with table1. SQL Server can obtain all of the information required to satisfy the query by using the index called indexname1. Once SQL Server has traversed the B-tree associated with indexname1 and found the range of index keys where col2 is equal to "value," SQL Server knows that it can fetch all required data (col1,col2,col3) from the leaf level (bottom level) of the covering index. This provides I/O performance in two ways:
In general, if the covering index is small in terms of the number of bytes from all the columns in the index compared to the number of bytes in a single row of that table and it is certain that the query taking advantage of the covered index will be executed frequently, it may make sense to use a covering index. But before building a lot of covered indexes, consider the next section, which describes how SQL Server 7.0 can intelligently and automatically create covered indexes for queries on the fly.
SQL Server 7.0's new query processor provides index intersection. Index intersection allows the query processor to consider multiple indexes from a given table, build a hash table based on those multiple indexes, and utilize the hash table to reduce I/O for a given query. The hash table that resulted from the index intersection has become, in essence, a covering index and provides the same I/O performance benefits that covering indexes do. Index intersection provides greater flexibility for database user environments in which it is difficult to predetermine all of the queries that will be run against the database. A good strategy to follow in this case would be to define single-column, nonclustered indexes on all columns that will be frequently queried and let index intersection handle situations were a covered index is needed.
For more information, search in SQL Server Books Online for the strings "query tuning recommendations" and "designing an index."
Create index indexname2 on table1(col3)
–or–
Use Create Index Wizard in SQL Server Enterprise Manager to create the indexes. Select Tools/Wizards from the SQL Server Enterprise Manager menu bar, left-click the + icon beside Database to expose the database wizards, and then double-click Create Index Wizard to start the wizard.
How indexes are chosen significantly affects the amount of disk I/O generated and, subsequently, performance. The previous sections described why nonclustered indexes are good for retrieval of a small number of rows and clustered indexes are good for range-scans. Here is some additional information:
Samples of WHERE predicates that benefit from clustered indexes:
WHERE <column_name> > some_value
WHERE <column_name> BETWEEN some_value AND some_value
WHERE <column_name> < some_value
Clustered index selection really involves two major decisions: First, determining which column of the table will benefit most from the clustered index in terms of providing sequential I/O for range scans. Second, using the clustered index to affect the physical placement of table data while avoiding hot spots. A hot spot occurs when data is placed on hard drives such that many queries are trying to read or write data in the same area of the disk(s) at the same time. That creates a disk I/O bottleneck, because more concurrent disk I/O requests are being received by that hard disk than it can handle. Solutions are to either stop fetching as much data from this disk or to spread the data across multiple disks to support the I/O demand. This type of consideration for the physical placement of data can be critical for good concurrent access to data among hundreds or thousands of SQL Server users.
These two decisions often conflict with each other and the best overall decision will have to balance the two. In high user load environments, improved concurrency (by avoiding hot spots) may often be more valuable than the performance benefit of placing the clustered index on that column.
In previous versions of SQL Server, for tables without a clustered index (tables like this are referred to as heaps) inserted rows would always be placed at the physical end of the table on disk. This created the possibility of a hot spot at the end of a very busy table. SQL Server 7.0's new storage management algorithms provide free space management, which removes this behavior. Now when rows are inserted in heaps, SQL Server makes use of the PFS pages to quickly locate available free space somewhere in the table in which the row can be inserted. PFS pages indicate free space through the table. This recovers deleted space and avoids insertion hot spots because inserts will be spread through the physical disk space throughout the table. Free space management affects clustered index selection. Because clustered indexes affect physical data placement, hot spots may occur when a clustered index physically sequences based on a column where many concurrent inserts occur at the highest column value, which will be located on the same physical disk location. For columns with monotonically increasing values, be aware how a clustered index on that column will sequentially order data rows on disk by that column and remember that placing the clustered index on another column or by not including a clustered index on the table, this sequential data placement will change to another column or not take place at all.
Here is a common scenario to help illustrate clustered index selection. Suppose a table contains an invoice date column, a unique invoice number column, and other data. Suppose that about 10,000 new records are inserted into this table every day and that SQL queries often need to search this table for all records for one week's worth of data and many users need concurrent access to this table. The invoice number would not be a good candidate for the clustered index for several reasons. First, invoice number is unique and users don't tend to search on ranges of invoice numbers, so placing invoice numbers physically in sequential order on disk is not likely to be helpful because range scans on invoice numbers will likely not happen. Second, the values for invoice number likely increase monotonically (1001, 1002, 1003, and so on). If the clustered index is placed on invoice number, inserts of new rows into this table will all happen at the end of the table (beside the highest invoice number)and, therefore, on the same physical disk location, creating a hot spot.
Next, consider the invoice date column. To maximize sequential I/O, invoice date would be a good candidate for a clustered index because users often are searching for one week's worth of data (about 70,000 rows). But from the concurrency perspective, invoice date may not be a good candidate for the clustered index. If the clustered index is placed on an invoice date, all data will tend to be inserted at the end of the table, given the nature of dates, and a hot spot may occur on the hard disk that holds the end of the table. Note that the fact that the insertions happened at the end of the table is somewhat offset by the fact that 10,000 rows are inserted for the same date, so invoice date would be much less likely to create a hot spot than invoice number. Also, a hardware RAID controller would help spread out the 10,000 rows across multiple disks, which would also help minimize the possibility of a insertion hot spot.
There is no perfect answer to the above scenario. It may be necessary to decide that it is worth the risk of hot spotting and choose to place the clustered index on invoice date in order to speed up queries involving invoice date ranges. If this is the case, carefully monitor disk queuing on the disks associated with this table and keep in mind that the inserts may queue up behind each other trying to get at the end of the table. My recommendation in this scenario would be to define the clustered index on invoice date because of the benefit to range scans based on invoice date and so that invoice numbers are not physically sequential on disk.
Let's consider another example that is more pleasant to work with. Suppose a table consists of invoice number, invoice date, invoice amount, sales office where the sale originated from, and other data. Suppose 10,000 records are inserted into this table every day. In this case, users most often query invoice amounts based on sales office. Thus, sales office should be the column on which the clustered index is created because that is what the range on which scans are based. And because new rows being inserted will likely have a mix of different sales offices, inserts should be spread evenly across the table and across the disks on which the table is located.
In some cases, range scans may not be the important issue. Suppose a very large employee table has employee number, social security number, and other data. As rows are inserted, employee number is incremented. Let's assume that there are 100,000 retrievals from this table every day and that each retrieval is a single record fetch based on social security number. A nonclustered index created on social security number would provide excellent query performance in this scenario. A clustered index on social security number would provide slightly better query performance than the nonclustered index but may be a bit of overkill because range scans are not involved. The question in this case would be whether to bother defining a clustered index on this table. In previous versions of SQL Server, it was important to always define a clustered index on a table even if not required for queries because it helped with deleted row space recovery. This is not an issue with SQL Server 7.0's new space allocation algorithms and storage structures. The recommendation in this example would be to create the clustered index on social security number. The reason being that it is good to find a column on the table that has data distributed in such a way that it does not follow the sequential pattern of employee number and social security number tends to have a pretty even distribution. If a clustered index is created on this evenly distributed column data, the employee records will be evenly distributed on disk. This distribution, in conjunction with FILLFACTOR and PAD_INDEX, which will be discussed in the next section, will provide open data page areas throughout the table to insert data. Assuming that newly inserted employee records will have an even distribution of social security numbers, the employee table will fill evenly and page splitting will be avoided. If a column with even distribution did not exist on the table, it may have been worthwhile to create an integer column on to the table and populate the column with values that are evenly distributed, and the clustered index could be created on the column. This "filler" or "dummy" column with a clustered index defined on it is not being used for queries, but to distribute data I/O across disk drives evenly to improve table access concurrency and overall I/O performance. This can be a very effective methodology with large and heavily accessed SQL tables. Another possible solution in this example would be to not create a clustered index on this table. In this case, SQL Server 7.0 manages all aspects of the space management. SQL Server finds a free space to insert the row, reuses space from deleted rows, and automatically reorganizes physical ordering of data pages on disk when it makes sense (to allow greater amounts of sequential I/O). The reorganization of data pages happens during database file autoshrink operations. For more information, search in SQL Server Books Online for the strings "Managing Space Used by Objects" and "Space Allocation and Reuse."
Another way to think about hot spots is within the context of selects. If many users are selecting data with key values that are very close to but not in the same actual row as each other, a majority of disk I/O activity will tend to occur within the same physical region of the disk I/O subsystem. This disk I/O activity can be spread out more evenly by defining the clustered index for this table on a column that will spread these key values evenly across the disk. If all selects are using the same unique key value, using a clustered index will not help balance the disk I/O activity of this table. Use of RAID (either hardware or software) would help alleviate this problem as well by spreading the I/O across many disk drives. The type of behavior described here can be viewed as disk access contention. It is not locking contention.
If a SQL Server database will be experiencing a large amount of insert activity, it is important to plan for providing and maintaining open space on index and data pages to prevent page splitting. Page splitting occurs when an index page or data page can no longer hold any new rows and a row needs to be inserted into the page because of the logical ordering of data defined in that page. When this occurs, SQL Server needs to divide up the data on the full page and move about half of the data to a new page so that both pages now have some open space. This consumes some system resources and time.
When indexes are initially built, SQL Server places the index B-tree structures on contiguous physical pages, which allows for optimal I/O performance scanning the index pages with sequential I/O. When page splitting occurs and new pages need to be inserted into the logical B-tree structure of the index, SQL Server must allocate new 8-KB index pages. This occurs somewhere else on the hard drive and will break up the physically sequential nature of the index pages. This switches I/O operations from sequential to nonsequential and cuts performance in half. Excessive amounts of page splitting should be resolved by rebuilding the index to restore the physically sequential order of the index pages. This same behavior can be encountered on the leaf level of the clustered index, thereby affecting the data pages of the table.
In Performance Monitor, keep an eye on "SQL Server: Access Methods – Page Splits." Nonzero values for this counter indicates that page splitting is occurring and that further analysis should be done with DBCC SHOWCONTIG. For details on how to use this command, search for "DBCC SHOWCONTIG" in SQL Server Books Online.
The DBCC SHOWCONTIG is a very helpful command that can be used to reveal whether excessive page splitting has occurred on a table. Scan Density is the key indicator that DBCC SHOWCONTIG provides. It is good for this value to be as close to 100 percent as possible. If this value is well below 100 percent, rebuild the clustered index on that table using the DROP_EXISTING option to defragment the table. The DROP_EXISTING option of the CREATE INDEX statement permits re-creation of existing indexes and provides better index rebuild performance than dropping and re-creating the index. For details, search in SQL Server Books Online for the strings "Create Index" and "rebuilding an index."
The FILLFACTOR option on the CREATE INDEX and DBCC REINDEX commands provide a way to specify that percentage of open space to leave on index and data pages. The PAD_INDEX option for CREATE INDEX applies what has been specified for FILLFACTOR on the nonleaf level index pages. Without the PAD_INDEX option, FILLFACTOR mainly affects the leaf level index pages of the clustered index. It is a good idea to use the PAD_INDEX option with FILLFACTOR. More information, search in SQL Server Books Online for "page split" and "pad_index."
The optimal value to specify for FILLFACTOR depends upon how much new data will be inserted within a given time frame into an 8-KB index and data page. It is important to keep in mind that SQL Server index pages typically contain many more rows than data pages because index pages only contain the data for columns associated with that index, whereas data pages hold the data for the entire row. Also bear in mind how often there will be a maintenance window that will permit the rebuilding of indexes to avoid page splitting. Strive toward rebuilding the indexes only as the majority of the index and data pages have become filled with data. Part of what allows this to happen is the proper selection of clustered index for a given table. If the clustered index distributes data evenly so that new row inserts into the table happen across all of the data pages associated with the table, the data pages will fill evenly. Overall, this will provide more time before page splitting starts to occur and it is necessary to rebuild the clustered index. The other part of the decision is the FILLFACTOR, which should be selected partly on the estimated number of rows that will be inserted within the key range of an 8-KB page for a given time frame and how often scheduled index rebuilds can occur on the system.
This is another situation in which a judgement call must be made, based on the performance tradeoffs between leaving a lot of open space on pages versus page splitting. If a small percentage for FILLFACTOR is specified, it will leave large open spaces on the index and data pages. This helps avoid page splitting but will also negate some of performance effect of compressing data onto a page. SQL Server performs faster if more data is compressed on index and data pages because it can generally fetch more data with fewer pages and I/Os if the data is more compressed on the pages. Specifying too high a FILLFACTOR may leave too little open space on pages and allows pages to overflow too quickly, causing page splitting.
Before using FILLFACTOR and PAD_INDEX, remember that reads tend to far outnumber writes, even in an online transaction processing (OLTP) system. Using FILLFACTOR will slow down all reads, because it spreads tables over a wider area (reduction of data compression). Before using FILLFACTOR and PAD_INDEX, it is a good idea to use Performance Monitor to compare SQL Server reads to SQL Server writes and to only use these options if writes are a substantial fraction of reads (say, more than 30 percent).
If writes are a substantial fraction of reads, the best approach in a very busy OLTP system is to specify as high a FILLFACTOR as feasible that will leave a minimal amount of free space per 8-KB page but still prevent page splitting and still allow the SQL Server to reach the next available time window for rebuilding the index. This methodology balances I/O performance (keeping the pages as full as possible) and page splitting avoidance (not letting pages overflow). This may take some experimentation with rebuilding the index with varying FILLFACTOR and then simulating load activity on the table to validate an optimal value for FILLFACTOR. Once the optimal FILLFACTOR value has been determined, automate the scheduled rebuilding of the index as a SQL Server Task. For More information on automating tasks, search in SQL Server Books Online for the string "creating a task."
In the situation where there will be no write activity into the SQL Server database, FILLFACTOR should be set at 100 percent so that all index and data pages are filled completely for maximum I/O performance.
For the purposes of illustrating the use of the SQL Server performance tools, use the following example. First, the following table is constructed:
create table testtable (nkey1 int identity, col2 char(300) default 'abc', ckey1 char(1))
Next, the table is loaded with 10,000 rows of test data. The data being loaded into column nkey1 lends itself to a nonclustered index. The data in column ckey1 lends itself to a clustered index and the data in col2 is merely filler to increase the size of each row by 300 bytes:
declare @counter int
set @counter = 1
while (@counter <= 2000)
begin
insert testtable (ckey1) values ('a')
insert testtable (ckey1) values ('b')
insert testtable (ckey1) values ('c')
insert testtable (ckey1) values ('d')
insert testtable (ckey1) values ('e')
set @counter = @counter + 1
end
The following two queries make up the database server workload:
select ckey1,col2 from testtable where ckey1 = 'a'
select nkey1,col2 from testtable where nkey1 = 5000
SQL Server Profiler records detailed information about activity occurring on the database server. Profiler can be configured to watch and record one or many users executing queries on SQL Server with a widely configurable amount of performance information. Performance information available for recording in Profiler include I/O statistics, CPU statistics, locking requests, T-SQL and RPC statistics, index and table scans, warnings and errors raised, database object create/drop, connection connect/disconnects, stored procedure operations, cursor operation, and more. For full information on what SQL Profiler can record, search in SQL Server Books Online for the string "Profiler."
Profiler and Index Tuning Wizard provide a very powerful tool combination to help database administrators create proper indexes on tables. Profiler records resource consumption for queries into a .trc file. This .trc file can be read by Index Tuning Wizard, which will consider both the .trc information and the database tables and will then provide recommendations about any indexes that should be created. Index Tuning Wizard provides a choice of automatically creating the proper indexes for the database, scheduling the automatic index creation for a later time, or generating a T-SQL script that can be reviewed and executed at a later time.
Here are the steps for analyzing a query load:
Set up Profiler
Run the workload
select ckey1,col2 from testtable where ckey1 = 'a'
select nkey1,col2 from testtable where nkey1 = 5000
Stop Profiler
Load the .trc file into Index Tuning Wizard
/* Created by: Index Tuning Wizard */
/* Date: 9/7/98 */
/* Time: 6:42:00 PM */
/* Server: HENRYLNT2 */
/* Database : test */
/* Workload file : E:\mssql7\Binn\profiler_load.sql */
USE [test]
BEGIN TRANSACTION
CREATE CLUSTERED INDEX [testtable2] ON [dbo].[testtable] ([ckey1])
if (@@error <> 0) rollback transaction
CREATE NONCLUSTERED INDEX [testtable1] ON [dbo].[testtable] ([nkey1])
if (@@error <> 0) rollback transaction
COMMIT TRANSACTION
The indexes recommended by Index Tuning Wizard for the sample table and data are what we would have expected. There are only five unique values for ckey1 and 2,000 rows of each value. Given that one of the sample queries (select ckey1, col2 from testtable where ckey1 = 'a') requires retrieval from the table based on one of the values in ckey1, it makes sense to create a clustered index on the ckey1 column. The second query (select nkey1, col2 from testtable where nkey1 = 5000) fetches one row based on the value of the column nkey1. Nkey1 is unique and there are 10,000 rows, so it makes sense to create a nonclustered index on this column.
The Profiler/Index Tuning Wizard combination becomes very powerful in real database server environments, where there are many tables and many queries involved. Use Profiler to record a .trc file while the database server is experiencing a representative set of queries. Then load the .trc file into Index Tuning Wizard to determine the proper indexes to build. Follow the prompts in Index Tuning Wizard to automatically generate and schedule index creation jobs to run at off-peak times. Run Profiler/Index Tuning Wizard regularly (perhaps weekly) to see if queries being executed on the database server have changed significantly, thus possibly requiring different indexes. Regular use of Profiler/Index Tuning Wizard helps database administrators keep SQL Server running in top form as query workloads change and database size increase over time.
For more information, search in SQL Server Books Online for "Index Tuning Wizard" and "Index Tuning Recommendations."
Another option that Profiler provides is to log information into a SQL Server table. Once completed, the table can be queried to determine if certain queries are using up excessive resources.
Log Profiler information into a SQL Server table
After the information is recorded into the SQL Server table, Query Analyzer can be used to figure out which queries on the system are consuming the most resources. This way, database administrators can concentrate their time on improving the queries that need the most help. For example, the following query would be typical of the type of analysis done on data recorded from Profiler into a SQL Server table. The query retrieves the top three consumers of CPU resources on the database server. Read and write I/O information, along with the duration of the queries in milliseconds, is returned as well. If a large amount of information is recorded with Profiler, it makes sense to create indexes on the table to help speed analysis queries. For example, if CPU is going to be an important criteria for analyzing this table, it would be a good idea to create a nonclustered index on CPU column:
select top 3 TextData,CPU,Reads,Writes,Duration from profiler_out_table order by cpu desc
For more information, search in SQL Server Books Online for the strings "Viewing and Analyzing Traces," "Troubleshooting SQL Server Profiler," "Tips for Using SQL Server," "Common SQL Server Profiler Scenarios," "Starting SQL Server Profiler," and "Monitoring with SQL Server Profiler."
Query Analyzer provides a Show stats I/O option on the General tab of the Connections Options dialog box. Select this check box to obtain information about how much I/O is being consumed for the query just executed in Query Analyzer.
For example, the query "select ckey1, col2 from testtable where ckey1 = 'a'" returns the following I/O information in addition to the resultset when the Show stats I/O option is selected:
Table 'testtable'. Scan count 1, logical reads 400, physical reads 382, read-ahead reads 400.
Similarly, the query "select nkey1, col2 from testtable where nkey1 = 5000" returns the following I/O information in addition to the resultset when the Show stats I/O option is selected:
Table 'testtable'. Scan count 1, logical reads 400, physical reads 282, read-ahead reads 400.
Using statistics I/O is a great way to monitor the effect of query tuning. For example, create the two indexes on this sample table as recommended above by Index Tuning Wizard and then run the queries again.
In the case of the query "select ckey1,col2 from testtable where ckey1 = 'a'", the clustered index improved performance as indicated below. Given that the query needs to fetch 20 percent of the table, the performance improvement seems reasonable:
Table 'testtable'. Scan count 1, logical reads 91, physical reads 5, read-ahead reads 32.
In the case of the query "select nkey1,col2 from testtable where nkey1 = 5000", the creation of the nonclustered index had a very dramatic effect on the performance of the query. Given that only one row of the 10,000-row table needs to be retrieved for this query, the performance improvement with the nonclustered index seems reasonable:
Table 'testtable'. Scan count 1, logical reads 5, physical reads 0, read-ahead reads 0.
ShowPlan can be used to focus attention on problematic SQL queries by displaying detailed information on what the query optimizer is doing. SQL Server 7.0 provides both text and graphical versions of ShowPlan. Graphical ShowPlan output can be displayed in the Results pane of Query Analyzer by executing a SQL query with CTRL+L. Icons indicate the operations that the query optimizer would have performed if it had executed the query. Arrows indicate the direction of data flow for the query. Details about each operation can be displayed by holding the mouse pointer over the operation icon. The equivalent information can be displayed in text-based ShowPlan by executing the command, "set showplan_all on." For reduced output from text-based ShowPlan that displays the query optimizer operations but skips the display of operation details, execute the command "set showplan_text on."
For more information, search in SQL Server Books Online for the strings "Graphical ShowPlan," "Using ShowPlan to Monitor a Database Query," "worktables," and "Understanding Nested Loops Joins."
Using the example queries defined earlier and "set showplan_text on" executed in Query Analyzer:
Query:
select ckey1,col2 from testtable where ckey1 = 'a'
Text-based ShowPlan output:
|--Clustered Index Seek(OBJECT:([test].[dbo].[testtable].[testtable2]), SEEK:([testtable].[ckey1]='a') ORDERED)
The preceding query takes advantage of the clustered index on column ckey1 as indicated by "Clustered Index Seek."
Equivalent Graphical ShowPlan output:
Figure 3. Graphical ShowPlan output for query taking advantage of a clustered index
If the clustered index is removed from the table, the query needs to use a table scan. The following ShowPlan output indicates the change in behavior.
Text-based ShowPlan output:
|--Table Scan(OBJECT:([test].[dbo].[testtable]), WHERE:([testtable].[ckey1]='a'))
Equivalent Graphical ShowPlan output:
Figure 4. Graphical ShowPlan output for query performing a table scan
Note that table scans on small tables are nothing to worry about. Table scans are the most efficient way to retrieve information from small tables. But on larger tables, table scans indicated by ShowPlan are a warning that the table may need better indexes or that the existing indexes need to have their statistics updated (this can be done using the UPDATE STATISTICS command). SQL Server 7.0 provides automatically updating indexes. It is a good idea to let SQL Server automatically maintain index statistics because it helps guarantee that queries will always have good index statistics to work with.
Query:
select nkey1,col2 from testtable where nkey1 = 5000
Text-based ShowPlan output:
|--Bookmark Lookup(BOOKMARK:([Bmk1000]), OBJECT:([test].[dbo].[testtable]))
|--Index Seek(OBJECT:([test].[dbo].[testtable].[testtable1]), SEEK:([testtable].[nkey1]=5000) ORDERED)
Equivalent Graphical ShowPlan output:
Figure 5. Graphical ShowPlan output for query taking advantage of a nonclustered index, Part 1
Figure 6. Graphical ShowPlan output for query taking advantage of a nonclustered index, Part 2
The preceding query uses the nonclustered index on the column nkey1. This is indicated by the "Index Seek" operation on the column nkey1. The "Bookmark Lookup" operation indicates that SQL Server needed to perform a pointer jump from the index page to the data page of the table to retrieve the requested data. The pointer jump was required because the query asked for the column col2, which was not part of the nonclustered index.
Query:
select nkey1 from testtable where nkey1 = 5000
Text-based ShowPlan output:
|--Index Seek(OBJECT:([test].[dbo].[testtable].[testtable1]), SEEK:([testtable].[nkey1]=[@1]) ORDERED)
Equivalent Graphical ShowPlan output:
Figure 7. Graphical ShowPlan output for a covered query
The preceding query uses the nonclustered index on nkey1 as a covering index. Note that no "Bookmark Lookup" operation was needed for this query. This is because all of the information required for the query (both SELECT and WHERE clauses) are provided by the nonclustered index. This means that no pointer jumps to the data pages are required from the nonclustered index pages. I/O is reduced in comparison to the case where a bookmark lookup was required.
Performance Monitor provides a wealth of information about what is happening on the database server in terms of Windows and SQL Server operations. For SQL Server specific counters, search in SQL Server Books Online for the string combination "SQL Server:" and "object."
In Performance Monitor graph mode, take note of the Max and Min values. Don't put too much emphasis on the average, because heavily polarized data points throw this off. Study the graph shape and compare to Min/Max to gather an accurate feel for the behavior. Use the <BACKSPACE> key to highlight counters with a white line.
It is possible to use Performance Monitor to log all available Windows NT and SQL Server performance monitor objects/counters in a log file while at the same time looking at Performance Monitor interactively (chart mode). The setting of sampling interval determines how quickly the logfile grows in size. Logfiles can get pretty big very fast (for example, 100 megabytes in one hour with all counters turned on and a sampling interval of 15 seconds). Hopefully, on the test server there will be a couple of gigabytes free to store these types of files. But if conserving space is important, try running with a large log interval so that Performance Monitor does not sample the system as often. Try 30 or 60 seconds. This way all of the counters are resampled with reasonable frequency but a smaller logfile size is maintained.
Performance Monitor also consumes a small amount of CPU and disk I/O resources. If a system doesn't have a lot of disk I/O and/or CPU to spare, consider running Performance Monitor from another machine to monitor the SQL Server over the network (graph mode only—it tends to be more efficient to log perfmon information locally on the SQL Server as compared to sending the information across a local area network, or LAN) or maybe reduce the logging to only the most critical counters.
It is a good practice to log all counters available during performance test runs into a file for later analysis. That way any counter can be examined further at a later time. Configure Performance Monitor to log all counters into a logfile and at the same time monitor the most interesting counters in one of the other modes, like graph mode. This way, all of the information is recorded but the most interesting counters are presented in an uncluttered Performance Monitor graph while the performance run is taking place.
Starting the logging feature
Stopping the logging feature
Load the logged information into Performance Monitor for analysis
This feature is very handy for observing what the database server was doing for a given time period:
This section requires observation of several Performance Monitor disk counters. In order to enable these counters, run the command "diskperf –y" from a Windows NT command window and restart Windows NT.
Physical hard drives that are experiencing disk queuing will hold back disk I/O requests while they catch up on I/O processing. SQL Server response time will be degraded for these drives. This costs query execution time.
If using RAID, it is necessary to know how many physical hard drives are associated with each drive array that Windows NT sees as a single physical drive in order to calculate disk queuing per physical drive. Ask a hardware expert to explain the SCSI channel and physical drive distribution in order to understand how SQL Server data is held by each physical drive and how much SQL Server data is distributed on each SCSI channel.
There are several choices for looking at disk queuing through Performance Monitor. Logical disk counters are associated with the logical drive letters assigned through Disk Administrator, whereas physical disk counters are associated with what Disk Administrator sees as a single physical disk device. Note that what looks to Disk Administrator like a single physical device may either be a single hard drive or a RAID array, which consists of several hard drives. Current Disk Queue is an instantaneous measure of disk queuing whereas Average Disk Queue averages the disk queuing measurement over the Performance Monitor sampling period. Take note of any counter where Logical Disk: Average Disk Queue > 2, Physical Disk: Average Disk Queue > 2, Logical Disk: Current Disk Queue > 2, or Physical Disk: Average Disk Queue > 2.
These recommended measurements are specified per physical hard drive. If a RAID array is associated with a disk queue measurement, the measurement needs to be divided by the number of physical hard drives in the RAID array to determine the disk queuing per physical hard drive.
Note On physical hard drives or RAID arrays that hold SQL Server log files, disk queuing is not a useful measure because SQL Server Log Manager does not queue more than a single I/O request to SQL Server logfile(s).
For more information, search in SQL Server Books Online for the string "monitoring disk activity."
This means that server's processors are receiving more work requests than they can handle as a collective group. Therefore, Windows needs to place these requests in a queue.
Some processor queuing is actually an indicator of good overall SQL Server I/O performance. If there is no processor queuing and if CPU utilization is low, it may be an indication that there is a performance bottleneck somewhere else in the system, the most likely candidate being the disk subsystem. Having a reasonable amount of work in the processor queue means that the CPUs are not idle and the rest of the system is keeping pace with the CPUs.
A general rule of thumb for a good processor queue number is to multiply the number of CPUs on the database server by 2.
Processor queuing significantly above this calculation needs to be investigated. Excessive processor queuing costs query execution time. Several different activities could be contributing to processor queuing. Eliminating hard and soft paging will help save CPU resources. Other methodologies that help reduce processor queuing include SQL query tuning, picking better SQL indexes to reduce disk I/O (and, hence, CPU), or adding more CPUs (processors) to the system.
Memory: Pages/sec > 0 or Memory: Page Reads/sec > 5 mean that Windows is going to disk to resolve memory references (hard page fault). This costs disk I/O + CPU resources. Memory: Pages/sec is a good indicator of the amount of paging that Windows is performing and the adequacy of the database server's current RAM configuration. A subset of the hard paging information in Performance Monitor is the number of times per second Windows had to read from the paging file to resolve memory references, which is represented by Memory: Pages Reads/sec. If Memory: Pages Reads/sec > 5, this is bad for performance.
Automatic SQL Server memory tuning will do its best to adjust SQL Server memory utilization dynamically such that paging is avoided. A small amount of pages per second is normal, but excessive paging requires corrective action.
If SQL Server is automatically tuning memory, adding more RAM or removing other applications from the database server are possible options to help bring Memory: Pages/sec to a reasonable level.
If SQL Server memory is being manually configured on the database server, it may be necessary to reduce memory given to SQL Server, remove other applications from the database server, or add more RAM to the database server.
Keeping Memory: Pages/sec at or close to zero helps database server performance. It means Windows and all its applications (this includes SQL Server) are not going to the paging file to satisfy any data in memory requests, so the amount of RAM on the server is sufficient. If Pages/sec is greater than zero by a small amount, this is okay, but remember that a relatively high performance penalty (disk I/O) is paid every time data is retrieved from the paging file versus RAM.
It is worth taking a moment to understand the difference between "Memory: Pages Input/sec" and "Memory: Pages Reads/sec." "Memory: Pages Input/sec" indicates the actual number of Windows 4-KB pages being brought in from disk to satisfy page faults. "Memory: Pages Reads/sec" indicates how many disk I/O requests are made per second in order to satisfy page faults, which provides a slightly different point of view of the faulting that is occurring. So, a single page read could contain several Windows 4-KB pages. Disk I/O performs better as the packet size of data increases (64 KB or greater), so it may be worthwhile to consider both of these counters at the same time. It is also important to remember that for a hard disk, completing a single read or write of 4 KB is almost as expensive in terms of time spent as a single read or write of 64 KB. Consider the following situation: 200 page reads consisting of eight 4-KB pages per read could conceivably finish faster than 300 page reads consisting of a single 4-KB page. And note that we are comparing 1,600 4-KB page reads finishing faster than 300 4-KB page reads. The key fact here is applicable to all disk I/O analysis: Don't just watch the number of Disk Bytes/sec, also watch Disk Transfers/sec because both are relevant. This will be discussed further in the disk I/O sections to follow.
It is useful to compare "Memory: Page Input/sec" to "Logical Disk: Disk Reads/sec" across all drives associated with the Windows NT paging file, and "Memory: Page Output/sec" to "Logical Disk: Disk Writes/sec" across all drives associated with the Windows paging file, because they provide a measure of how much disk I/O is strictly related to paging versus other applications (that is, SQL Server). Another easy way to isolate paging file I/O activity is to make sure that the paging file is located on a separate set of drives from all other SQL Server files. Separating the paging file away from the SQL Server files can also help disk I/O performance because it allows disk I/O associated with paging to be performed in parallel to disk I/O associated with SQL Server.
Memory: Pages Faults/sec > 0 indicates that Windows NT is paging but includes both hard and soft paging within the counter. In the previous section, we discussed hard paging. Soft paging means that there are application(s) on the database server that are requesting memory pages still inside RAM but outside of Windows Working Set. Memory: Page Faults/sec is helpful for deriving the amount of soft paging that is occurring. There is no counter called Soft Faults per second. Instead, calculate the number of soft faults happening per second through the computation:
"Memory: Pages Faults/sec" - "Memory: Pages Input/sec" = Soft Page Fault/sec
To determine if SQL Server rather than another process is causing excessive paging, monitor the Process: Page Faults/sec counter for the SQL Server process and note whether the number of page faults per second for sqlserver.exe is similar to the number of Memory: Pages/sec.
Soft faults generally are not as bad as hard faults for performance because they consume CPU resources. Hard faults consume disk I/O resources. The best environment for performance is to have no faulting of any kind.
Note that until SQL Server actually accesses all of its data cache pages for the first time, the first access to each page will cause a soft fault. So do not be concerned with initial soft faulting occurring as SQL Server first starts up and the data cache is first being exercised.
For more information on memory tuning, search for the string "monitoring memory usage" in SQL Server Books Online.
Keep all of the server's processors busy to maximize performance but not so busy that processor bottlenecks occur. The performance tuning challenge is that if CPU is not the bottleneck, something else is the bottleneck (a primary candidate being the disk subsystem), so the CPU is being wasted; CPU is usually the hardest resource to expand (above some configuration specific level, such as four or eight on many current systems), so it should be seen as a good sign that CPU utilization is more than 95 percent. At the same time, the response time of transactions should be monitored to ensure they are within reason; if not, >95 percent CPU usage may simply mean that the workload is just too much for the available CPU resources and either CPU has to be increased or workload has to be reduced or tuned.
Look at the Performance Monitor counter "Processor: Processor Time %" to make sure all processors are consistently below 95 percent utilization on each CPU. "System:Processor Queue" is the processor queue for all CPUs on a Windows NT system. If "System: Processor Queue" is greater than 2 per CPU, it indicates a CPU bottleneck. When a CPU bottleneck is detected, it is necessary to either add processors to the server or reduce the workload on the system. Reducing workload could be accomplished by query tuning or improving indexes to reduce I/O and, subsequently, CPU usage.
Another Performance Monitor counter to watch when a CPU bottleneck is suspected is "System: Context Switches/sec" because it indicates the number of times per second that Windows NT and SQL Server had to change from executing on one thread to executing on another. This costs CPU resources. Context switching is a normal component of a multithreaded, multiprocessor environment, but excessive context switching will bog down a system. The approach to take is to only worry about context switching if there is processor queuing. If processor queuing is observed, use the level of context switching as a gauge when performance tuning SQL Server. Consider using the lightweight pooling option so that SQL Server switches to a fiber-based scheduling model versus the default thread-based scheduling model. Think of fibers as lightweight threads. Use the command sp_configure 'lightweight pooling',1 to enable fiber-based scheduling. Watch processor queuing and context switching to monitor the effect.
DBCC SQLPERF (THREADS) provides more information about I/O, memory, and CPU usage mapped back to spids. Execute the following SQL query to take a survey of current top consumers of CPU time: "select * from master.sysprocesses order by cpu desc."
"Disk Write Bytes/sec" and "Disk Read Bytes/sec" counters provide an idea of the data throughput in terms of bytes per second per logical drive. Weigh these numbers carefully along with "Disk Reads/sec" and "Disk Writes/sec." Don't let a low amount of bytes per second lead you to believe that the disk I/O subsystem is not busy! Remember that a single hard drive is capable of supporting a total of 75 nonsequential and 150 sequential disk reads and disk writes per second.
Monitor the "Disk Queue Length" for all drives associated with SQL Server files and determine which files are associated with excessive disk queuing.
If Performance Monitor indicates that some of the drives are not as busy as others, there is the opportunity to move SQL Server files from drives that are bottlenecking to drives that are not as busy. This will help spread disk I/O activity more evenly across hard drives. If one large drive pool is being used for SQL Server files, the resolution to disk queuing is to make the I/O capacity of the pool bigger by adding more physical drives to the pool.
Disk queuing may be a symptom that one SCSI channel is being saturated with I/O requests. Performance Monitor cannot directly detect if this is the case. Hardware vendors may be able to provide tools to help detect the amount of I/O being serviced by a RAID controller and whether the controller is queuing I/O requests. This would be more likely to occur if many disk drives (10 or more) are attached to the SCSI channel and they are all performing I/O at full speed. In this case, the solution would be to take half of the disk drives and connect them to another SCSI channel or RAID controller to balance that I/O. Typically, rebalancing drives across SCSI channels requires a rebuild of the RAID arrays and full backup/restore of the SQL Server database files.
Figure 8 indicates some typical counters that Performance Monitor can be used to observe. Note that Processor Queue Length is the current counter being observed. The <BACKSPACE> key was pressed in order to highlight the current counter in bright white. This helps to distinguish the current counter from other counters being observed and can be particularly helpful when observing many counters at the same time with Performance Monitor.
Note that the Max value for Processor Queue Length was 22.000. Max, Min, and Average values for the Performance Monitor Graph cover just the current time window for the graph as indicated by Graph Time. In the default case, Graph Time covers 100 seconds. To monitor longer periods of time and to be sure to get representative Max, Min, and Average values for those time periods, use the logging feature of Performance Monitor.
The shape of the Processor Queue graph line indicates that the Max of 22 only occurred for a short period of time. But there is a period of time preceding the 22 value where Processor Queue Length is greater than 5. (This is determined by looking at the graph knowing that 100 percent is 22 and noting that there is a period of time prior to the 22 value where the graph has values of more than 25 percent, which is approximately 5.) In this example, the database server named \\HENRYLNT2 only has one processor, and should not sustain Processor Queue Length greater than 2. Therefore, Performance Monitor is indicating that the processor on this machine is being overtaxed at times and that either further investigation be made into reducing the load on the processor or more processors be added to \\HENRYLNT2 to adequately handle these periods of higher processor workloads.
Figure 8. Performance Monitor graph output
Database programmers that do SQL work with easy-to-use interfaces like the ADO/RDO/DAO database APIs still have a responsibility to stay very aware of the resultsets they are building. ADO/RDO/DAO provide programmers with great database development interfaces that allow rich SQL rowset functionality without requiring a lot of SQL programming experience. But this comes at a cost. Programmers can expose themselves to performance problems if they do not take into careful account the amount of data their application is returning to the client and stay aware of where the SQL Server indexes are placed and how the SQL Server data is arranged. SQL Profiler, Index Tuning Wizard, and ShowPlan are very helpful tools for pinpointing and fixing these problem queries.
Look for opportunities to reduce the size of the resultset being returned, by eliminating columns in the select list that do not need to be returned, or returning only the required rows. This helps to reduce I/O and CPU consumption.
For more information, search in SQL Server Books Online for the strings "Optimizing Application Performance Using Efficient Data Retrieval," "Understanding and Avoiding Blocking," and "Application Design."
If applications accessing SQL Server are architected so that transactions access tables in the same chronological order across all user transactions, deadlocking will be avoided. It is worthwhile to clearly explain this concept of chronological table access to SQL application developers as early as possible during the application design process. It will help avoid deadlocking problems that will be more expensive to solve later on.
Reduce SQL query I/O and shorten transaction time: This is a roundabout way to prevent deadlocking and something that should be done with all queries anyway, but it may help because by making queries faster, lock resources are held for a shorter period of time and all locking contention (including deadlocking) will be reduced. Use SQL Query Analyzer's SHOW STATS I/O to determine the number of logical page fetches associated with large queries. Consider the indexes used by selecting SQL Query Analyzer's "Show query plan" option. Consider index placement or SQL query redesign that will be more efficient and thus use less I/O.
For more information, search in SQL Server Books Online for the strings "Avoiding Deadlocks," "Troubleshooting Deadlocking," "Detecting and Ending Deadlocks," and "Analogy to Nonserializable Transactions."
Use of inequality operators in SQL queries will force databases to use table scans to evaluate the inequalities. This generates high I/O if these queries regularly run against very large tables.
Examples:
WHERE <column_name> != some_value
WHERE <column_name> <> some_value
Any WHERE expression with NOT in it
If these types of queries need to be run, try to restructure the queries to get rid of the NOT keyword.
Example:
Instead of:
select * from tableA where col1 != "value"
Try using:
select * from tableA where col1 < "value" and col1 > "value"
This lets SQL Server make use of use of the index (preferably clustered in this case), if it is built on col1 versus needing to resort to a table scan.
On very hot (heavily accessed) tables, if there are columns that a SQL application does not need regularly, it makes sense to move them to another table. The more columns that are eliminated, the better for reducing I/O and increasing performance. For more information, search in SQL Server Books Online for the strings "Logical Database Design" and "Normalization."
SQL Server 7.0 provides for horizontal partitioning of tables through views. This provides I/O performance benefits when database users wish to maintain SQL queries that refer to one table name but the nature of the data retrieval is that queries will always query fixed subsections of the data. For example, suppose there is a very large table that documents sales for all sales departments for a year and also assume that all retrievals from this table will be based on a single sales department. This is a scenario where a partitioned view might be employed. A sales table would be defined for each sales department, a constraint needs to be defined on the sales department column on each table, and then a view would be created on all of the tables to form a partitioned view. The constraint on the sales department column is used by the query optimizer. When the view is queried, all of the sales department tables that do not match the sales department value provided in the query will be ignored by the query optimizer and no I/O will be performed against those base tables. This improves query performance by reducing I/O.
For more information, search in SQL Server Books Online for the strings "Scenarios for Using Views," "Create View," "Using Views with Partitioned Data," "Modifying Data Through a View," "Copying To or From a View," and "Partitioning."
Ensuring that the disk I/O subsystem and CPUs are performing well will provide performance benefits to all SQL Server operations. This definitely includes replication and backups. Transactional replication and transaction log backups involve reading from transaction log files. Snapshot replication and backups perform serial scans of database files. SQL Server 7.0's new storage structures have been enhanced to make these operations very fast and efficient, so long as there is no queuing occurring in the database server's CPUs or disk subsystems.
For more information on performance tuning replication and backup/restores, search in SQL Server Books Online for the strings "Replication Performance," "Optimizing Backup and Restore Performance," "Creating and Restoring Differential Database Backups," "Creating and Applying Transaction Log Backups," "Using Multiple Media or Devices," "Minimizing Backup and Recovery Times in Mission-Critical Environments," "Backup/Restore Architecture," and "SQL Server 7.0 on Large Servers."
For those implementing SQL Server database systems on EMC Symmetrix Enterprise Storage Systems, there are some disk I/O balancing methods that should be kept in mind because of the unique nature of EMC Symmetrix storage that will help avoid disk I/O bottleneck problems and maximize performance.
Symmetrix storage systems contain up to 16 gigabytes of RAM cache and contain internal processors within the disk array that help speed the I/O processing of data without using host server CPU resources. Within the Symmetrix box, there are four major components to be concerned with to understand what to do to balance disk I/O. One is the 16-GB cache inside the Symmetrix. There are up to 32 SA channels that can be used to cable up to 32 SCSI cards from Windows NT host servers into the Symmetrix, all of these SA channels can be simultaneously requesting data from the 16-GB cache. Then, within the Symmetrix box, there are up to 32 connectors, called DA controllers, which are internal SCSI controllers that connect up all of the internal disk drives within the Symmetrix into the internal cache. And finally, there are the hard drives within the Symmetrix.
A note about the EMC hard drives: They are SCSI hard drives with the same I/O capability of the other SCSI drives that have been discussed in this document (75/150 rule applies here). One feature commonly used with EMC technology is referred to as "hyper-volumes." A hyper-volume is defined as a logical division of an EMC hard drive such that to Windows NT Disk Administrator, the hyper-volume looks just like another physical drive, so they can be manipulated with Windows NT Disk Administrator just like any other disk drive. Multiple hyper-volumes can be defined on each physical drive. It is very important when conducting database performance tuning on EMC storage to be sure to work closely with EMC field engineers to identify how hyper-volumes are defined (if there are any) because it is important to avoid overloading a physical drive with database I/O. This can happen easily if two or more hyper-volumes are believed to be separate physical drives but in reality are two or more hyper-volumes on the same physical drive.
SQL Server I/O activities should be divided evenly among distinct DA controllers. This is because DA controllers are assigned to a defined set of hard drives. As described earlier in this document, SCSI controller bottlenecks are unlikely. DA controllers are not likely to suffer an I/O bottleneck, but the set of hard drives associated with a DA controller may be more susceptible. Within the context of DA controllers and their associated disk drives, SQL Server disk I/O balancing is accomplished the same way as with any other vendors disk drives and controllers.
In terms of monitoring the I/O on a DA channel or separate physical hard drives, get help from EMC technical support staff, because this I/O activity is occurring beneath the EMC internal cache and is not visible to Performance Monitor. EMC storage units have internal monitoring tools that will allow an EMC technical support engineer to monitor I/O statistics within the Symmetrix. Performance Monitor can only see I/O coming to and from an EMC storage unit by the I/O coming from an SA channel. This is enough information to indicate that a given SA channel is queuing disk I/O requests but cannot tell which disk or disks are causing the disk queuing. If an SA channel is queuing it is not necessarily the case that the SA channel is causing the bottleneck, because it could be (and more likely is) the disk drives that are causing problems. One way to isolate the disk I/O bottleneck between the SA channels and the DA channels + drives is to add another SCSI card to the host server and connect it up to another SA channel. If Performance Monitor indicates that I/O across both SA channels has not changed in volume and disk queuing is still occurring, it indicates that it is not the SA channels that are causing the bottleneck. Another way to isolate the I/O bottleneck is to have an EMC engineer monitor the EMC system and analyze which drives or DA channels are bottlenecking, through EMC monitoring tools.
Divide up SQL Server activities evenly across as many of the disk drives as are available. If working with smaller database that will sustain a large amount of I/O, consider carefully the size of hyper-volume to have EMC technical engineers define. Suppose the SQL Server will consist of a 30-GB database. EMC hard drives can provide up to 23 GB in capacity. So, it is possible to fit the entire database onto two drives. From a manageability and cost standpoint, this might seem appealing, but from an I/O performance standpoint, it is not. An EMC storage unit may have more than 100 internal drives to work with. Involving only two drives for SQL Server may lead to I/O bottlenecks. Consider defining smaller hyper-volumes, perhaps 2 GB each. This means close to 12 hyper-volumes may be associated with a given 23-GB hard drive. Assuming 2 GB hyper-volumes, 15 hyper-volumes will be required to store the database. Make sure that each hyper-volume is associated with a separate physical hard drive. Do not use 12 hyper-volumes from one physical drive and then another three hyper-volumes associated on another physical drive, because that is the same as using two physical drives (150 nonsequential I/O / 300 sequential I/O across the two drives). But with 15 hyper-volumes, each of which are associated to a separate physical drive, SQL Server will be utilizing 15 physical drives for providing I/O (1,125 nonsequential I/O / 2,250 sequential I/O activity per second across the 15 drives).
Also consider employing several SA channels from the host server to divide the I/O work across controllers. This makes a lot of sense for host servers that support more than a single PCI bus. In this case, consider using one SA channel per host server PCI bus to divide I/O work across PCI buses as well as SA channels. On EMC storage systems, each SA channel is associated to a specific DA channel and, hence, a specific set of physical hard drives. Because SA channels read and write their data to and from the EMC internal cache, it is unlikely that the SA channel will be a point of I/O bottleneck. In keeping with the idea that SCSI controller bottlenecks are not likely, it is probably best to invest time to concentrate on balancing SQL Server activities across physical drives versus worrying too much about how many SA channels to utilize.
There is some really helpful information in this book. Contains solutions to common problems such as representing and querying hierarchical data. Chapter 28 is dedicated to optimizing SQL queries.