Microsoft SQL Server Optimization and Tuning Guidelines

The following optimization guidelines will aid in the optimization of Microsoft SQL Server as part of a Microsoft BackOffice solution. Optimal application of these guidelines are unique to each environment. Thus, you may wish to experiment with different configurations and values, in order to arrive at the best combination of settings for your particular Microsoft BackOffice system.

The SQL Server Setup program allows for the selection of two options that affect the priority at which SQL Server runs, thereby influencing system performance. These options can be found under the "Set Server Options" selection of the Setup program.

Since these options affect the priority at which SQL Server threads run, the following definitions are necessary for basic understanding of Windows NT thread scheduling.

Thread Priority—Windows NT uses 32 levels of thread priorities, ranging from 1 (the lowest) to 31 (the highest), with 0 being reserved for system use. There are two thread priority categories, variable and real-time.

Variable Priority—Variable priority threads range from 1 through 15. The Windows NT thread scheduler adjusts the priority based on thread behavior. Hence, variable priority threads may have their priority adjusted up or down.

Real-Time Priority—Real-time priority threads range from 16 through 31. The priority of these threads are not adjusted by the Windows NT thread scheduler based on behavior. Consequently, this provides "near real-time" execution.

Dispatch Queue—A list of "ready threads" that are in order of priority. The scheduler examines the dispatch queue for the highest-priority ready thread and selects it for execution for the allotted time slice.

Boost Microsoft SQL Server Priority

Boosting Microsoft SQL Server's priority can improve performance and throughput on single- and multiple-processor hardware platforms. By default this option is turned off and SQL Server runs at a priority of 7. When selected on a single-processor platform, SQL Server runs at priority 13. When selected on a dedicated SMP platform, SQL Server runs at priority 24. The significance is of course that the Windows NT thread scheduler will favor SQL Server threads over threads of other processes.

If this option is turned on, it may degrade the performance of other processes. Hence, this option should only be turned on for dedicated SQL Server machines, or if slower performance of other processes is tolerable.

Dedicated Multiprocessor Performance

Microsoft SQL Server can take advantage of SMP platforms without this option being turned on. In this off state SQL Server runs at a priority level of 7. When this option is turned on the priority is increased to 13, thus increasing the scalability improvement multiple CPUs have on SQL Server performance.

As with the Boost SQL Server priority option, if turned on it may degrade the performance of other processes. Hence, this option should only be turned on for dedicated SQL Server–based machines.

If both options are turned on for SMP platforms, SQL Server runs at a priority of 24.

The following SQL Server configuration parameters are those that impact performance or performance-related resources. Each configuration parameter is defined with respect to its function and its impact on performance. See the SQL Server Configuration Guide for more details on sp_configure settings. It is recommended that you start with the default values and experiment with changing parameter values once you have obtained a baseline of performance. When adjusting parameters to tune performance, adjust one parameter at a time, and measure the difference in performance; changing multiple parameters in an ad hoc fashion is generally not productive.

Memory: As a general rule; on machines with 32 MB or less, you should allocate at least 8 to 16 MB to Windows NT, and configure Microsoft SQL Server to use the rest. When you have more than 32 MB in your machine, allocate 16 to 20 MB to Windows NT, and allocate the rest of the memory to SQL Server.

Performance impact: Physical memory is used by SQL Server for server operation overhead, data (buffer) cache, and procedure cache. Hence, in order to reduce SQL Server page faults, an appropriate amount of memory should be configured. Please refer to the previous discussion in this paper concerning memory.

Max async IO: The "max async IO" parameter controls the number of outstanding asynchronous batch writes performed by checkpoint and Lazywriter. The default is 8, and this is likely to be sufficient in most cases. However, you may wish to experiment with this number to try to improve performance. Increasing the parameter will allow more asynchronous batch writes to be done, effectively shortening the period that the system is checkpointing or doing Lazywriting. However, if your I/O subsystem cannot sustain the increased write activity, the increased writes will flood the I/O systems and can interfere with the ability of SQL Server (or other processes) to read from the disk, resulting in decreased throughput. The behavior of this parameter is thus dependent to a large degree on the underlying I/O subsystem.

Performance impact: SQL Server for Windows NT uses the asynchronous I/O capability of the Windows NT operating system. Examples of these are the Win32 API calls ReadFile(), ReadFileEx(), WriteFile(), and WriteFileEx(). See the Win32 Software Development Kit (SDK) for more information. Asynchronous, or overlapped I/O, refers to the ability of a calling program to issue an I/O request and without waiting for completion to continue with another activity. When the I/O finishes, the operating system will notify the program via a callback or other Win32 synchronization mechanism.

Procedure cache: By default, 20% of the available memory is reserved for procedure cache. In systems with large amounts of memory, this is often excessive.

Performance impact: Having a properly sized procedure cache will result in fewer page faults with respect to use of stored procedures, triggers, rules, and defaults. Please refer to the previous discussion in this paper concerning memory.

Tempdb in RAM: This option can improve performance when processing involves sorting, group by, or joins without supporting indexes. Allocating memory to Tempdb effectively reduces the amount of memory available to allocate to the SQL Server data cache. Accordingly, you need enough physical memory to store the entire Tempdb in RAM without impacting the memory required for Windows NT Server, SQL Server, and applications. Thus you should only consider using Tempdb in RAM when you have large amounts of memory available. The default is off (0).

Performance impact: Forcing Tempdb into RAM may result in increased performance if a significant amount of processing involves the creation and use of "WORKTABLES" by the SQL Server optimizer. Execution of such processing in RAM is inherently faster than corresponding disk I/O from paging.

The tuning of system resources typically involves the discovery of "bottlenecks." A bottleneck is the single resource that consumes the most time during a task's execution. In the case of Microsoft SQL Server, such resource bottlenecks adversely affect the performance of normal relational database operations as well as causing contention with other Microsoft BackOffice applications. Hence, the following information pertains to the detection of SQL Server resource bottlenecks and the subsequent adjustment of the resource in order to relieve the demand and increase performance.

Processor Tuning

Processor tuning involves the detection of CPU-bound operations. The following processor bottleneck monitoring guidelines will aid in determining such problems.

Symptoms and Actions:

Symptom: If Processor: % Processor Time consistently registers in the range of 80% to 100%, the processor(s) may be the bottleneck. (System: % Total Processor Time can be viewed for multiprocessor systems).

Action: If this occurs you need to determine which Microsoft SQL Server User process is consuming the CPU. To determine which process is using up most of the CPU's time, monitor the SQLServer-Users: CPUtime for all of the process instances (spid). One or more will appear as using the greatest cumulative time. Having determined the offending process instance, examine the query for inefficient design. In addition, examine indexes and database design for inefficiencies with respect to excessive I/O, which consumes CPU cycles. (Wide tables and indexes cause more I/Os to occur as do table scans.)

Symptom: If Processor: % Privlieged Time is consistently over 20% and Processor: % User Time is consistently below 80%, then SQL Server is likely generating excessive I/O requests to the system.

Action: Examine the disk controller card and the network interface card. (See the topic under General Actions below.) In addition, if this is not a dedicated SQL Server system, look for other processes that meet the above criteria via Process: % Privlieged Time and Process: % User Time. If you find such processes eliminate them or schedule them to run at more convenient times.

General Actions for Processor Tuning:

Excessive processing may indicate a need to further denormalize the database.

Schedule CPU-intensive queries during off-peak hours.

Investigate tuning the query to be less CPU-intensive (see query tuning).

Boost the Microsoft SQL Server priority as previously discussed.

Determine that no unnecessary processes are running on the SQL Server platform. If so, turn them off in order to see if this improves performance.

Set Windows NT Tasking to "Foreground and Background Applications Equally Responsive."

Assuming you have at least a 486-based server, part of your problem may be the network or disk adapter cards you have chosen. The 8-bit cards use more processor time than 16-bit or 32-bit cards. The number of bits here refers to the amount of data moved to memory from the adapter on each transfer. The most efficient cards use 32-bit transfers to adapter memory or direct memory access (DMA) to move their data. Adapters that don't use memory-mapped buffers or DMA must use processor instructions to move data, and that makes the processor busy. DMA uses the memory, and that can slow the processor down but it is still more efficient than individual instructions.

Placing Tempdb in RAM may help by reducing I/O requests.

Assuming that Microsoft SQL Server has sufficient memory to run (see memory tuning), increasing CPU power is the most effective hardware-related way to improve performance. You can add CPU power in two ways: using a faster CPU or adding additional CPUs.

Faster CPU: When performance is important, you should consider using as fast a CPU as possible. In general, a faster CPU will probably realize a bigger performance gain over adding an additional CPU. This is because while adding CPUs provides additional power, the operating system and SQL Server (or any application) incur an overhead in managing the work performed by multiple CPUs. Of course, once you are running the fastest CPU available in your chosen architecture, you can add additional CPUs to increase performance.

Additional CPUs: Windows NT supports symmetric multiprocessing (SMP). Since SQL Server is implemented using native Windows NT threads, it can automatically take advantage of multiple CPUs. SQL Server takes good advantage of SMP platforms, and you can boost performance significantly by moving your application to an SMP platform, or adding additional CPUs to an existing SMP platform.

Memory Tuning

Memory tuning involves the detection of memory-constrained operations. The following memory bottleneck monitoring guidelines will aid in determining such problems.

Symptoms and Actions:

Symptom: If the SQLServer: Cache Hit Ratio is less than 80% then memory may be a bottleneck.

Action: Either allocate more memory to Microsoft SQL Server or increase the amount of system memory.

Symptom: If the SQLServer: I/O - Lazy Writes/sec counter indicates a great deal of activity over a long interval of time, then a memory bottleneck may be indicated. Typically this counter should be zero until the LRUthreshold (default 3%) is reached with respect to free buffer pages. However, frequent activity may indicate not enough memory is available for data page caching.

Action: Compare the SQLServer: Cache - Number of Free Buffers value against the LRUtheshold value. This value is derived by obtaining the total number of buffers allocated via the DBCC MEMUSAGE command and multiplying this value by the LRUthreshold percentage (default 0.03). If the number of free buffers is close to the derived value, then either allocate more memory to SQL Server or increase the amount of system memory.

If Memory: Page Faults/sec is consistently high (other than during SQL Server initialization), then not enough memory is dedicated to Windows NT or possibly the system as a whole.

Action: Increase the system memory or increase the memory dedicated to Windows NT, by decreasing the memory allocated to SQL Server or other processes. Moreover, you may also eliminate noncritical processes as these also utilize memory resources.

If SQLServer: I/O - Page Reads/sec is consistently high, it may indicate a memory bottleneck. Typically, if enough memory is available for database activities during normal operations, this counter will generally reach an equilibrium point. However, if this counter remains high over a long time interval, lack of sufficient memory may be causing excessive SQL Server page faults. This may be due to a less than adequate data cache, poor index design, or poor database design.

Action: Increase the memory allocated to SQL Server or decrease the procedure cache percentage, thereby increasing the data cache. If indexes are not being utilized, design intelligent indexes. If database tables are too wide, thus resulting in fewer data rows per data page, redesign the tables to be narrower.

If SQLServer: Cache - Ave. Free Page Scan or SQLServer: Cache - Max. Free Page Scan are over 10, a memory bottleneck may be indicated due to excessive buffer scanning while searching for a free page.

Action: Increase the data cache size or the frequency of checkpoints. Checkpoints can be increased via the recovery interval value or by manual execution.

General Actions for Memory Tuning:

Check the amount of memory dedicated to Microsoft SQL Server via the SQL Server configuration options. Follow the memory allocation guidelines previously discussed, increasing memory appropriately. Allocate at least 16 MB to Windows NT.

Make sure the PAGEFILE.SYS file is of adequate size. However, it should not be growing in size, as this would indicate that virtual memory requirements are exceeding capacity. On a dedicated SQL Server system, there should be virtually no paging as long as Windows NT has been allotted sufficient memory; at least 16 MB.

If you make extensive use of extended stored procedures, you should unload the DLL after execution via the DBCC dllname(FREE) command. This releases the memory used by the DLL.

Disk Subsystem Tuning

Disk subsystem tuning involves the detection of disk I/O constrained operations. Such bottleneck constraints may be caused by the disk controller, the physical disk drives, or lack of some other resource that results in excessive disk I/O generating activity. Furthermore, poor disk subsystem performance may also be caused by poor index or database design. The goal is to operate the Microsoft SQL Server with as few physical I/Os and associated interrupts as possible. The following disk I/O bottleneck monitoring guidelines will aid in achieving this goal.

Note In order to monitor low-level disk activity with respect to the PhysicalDisk Performance Monitor counters, it is necessary to enable the diskperf option. This can be accomplished by issuing the following command from the system prompt: diskperf -y. Running with this option enabled may result in a slight (0.1%–1.5%) degradation in performance. Hence, disable it when not required for use (diskperf -n).

When performance tuning the SQL Server disk subsystem, you should first attempt to isolate the disk I/O bottleneck with the SQLServer counters, using the PhysicalDisk and LogicalDisk counters for more detailed monitoring and refinement of an action plan.

Symptoms and Actions:

Symptom: If SQLServer: I/O - Lazy Writes/sec is active for extended intervals, this may indicate the disk subsystem is not adequate with respect to the current I/O demands. (Also see memory bottleneck symptoms.)

Action: Observing either LogicalDisk: Disk Queue Length or PhysicalDisk: Disk Queue Length can reveal significant disk congestion. Typically, a value over 2 indicates disk congestion. Increasing the number of disk drives or obtaining faster drives will help performance.

Symptom: If SQLServer: I/O Outstanding Reads and/or I/O Outstanding Writes are high for extended intervals of time, the disk subsystem may be a bottleneck.

Symptom: If SQLServer: Log Writes/sec seems to reach a maximum level, you may have encountered the maximum sequential write capability of the disk drives in the system. You will see this occur most frequently on systems that have a dedicated disk drive for logging. On systems without a dedicated log disk drive, you will observer a greater number of outstanding I/O requests as discussed previously.

Action: Obtaining faster disk drives or disk controllers will help to improve this value.

General Actions for Disk I/O Tuning:

If you have a disk subsystem that supports asynchronous I/O (one that supports disk arrays and/or includes a smart SCSI controller), you may consider increasing the max async io option (default 8) in order to obtain higher throughput. In order to determine if higher throughput is gained, monitor SQLServer: I/O - Batch Writes/sec before the change to max async io and after the change. You should also notice an increase in the value of SQLServer: I/O Transactions/sec. This assumes the current physical disks are not experiencing I/O congestion. In addition, increasing the max async io value to high may result in excessive overhead.

Install a faster disk and/or controller. Determine if the controller card does 8-bit, 16-bit, or 32-bit transfers. The more bits in the transfer operation, the faster the controller moves data. You may also want to choose a different drive technology such as IDE (Integrated Drive Electronic), which has a 2.5MB/s throughput; ESDI, which has a 3 MB/sec; SCSI-2, which has a 5MB/s throughput; or a Fast SCSI-2, which has a 10MB/sec throughput.

Create mirrored data sets. The I/O system can issue concurrent reads to 2 partitions. The first portion of the read will be to partition A, while the next portion of the read will be to partition B. (Assuming the disk driver and controller can handle asynchronous I/O.)

Create striped data sets. Multiple disks can process I/O requests concurrently (assuming the disk driver and controller can handle asynchronous I/O).

Add memory to increase the data cache size, thereby decreasing physical page reads.

Change to a different I/O bus architecture. EISA, MCA, and local bus (VESA or PCI) buses transfer data at a much higher rate than ISA buses. PCI is fast because it transfers data at 33 MHz, a double word at a time (33 MHz * 4 = 132 Mb/sec) where as ISA maximizes out at about 5 Mb/sec and EISA about 32 Mb/sec (EISA transfers at 8 MHz * 4 bytes). There has been talk about raising the PCI clock rate to 66 MHz (to get a 264 Mb/sec transfer rate) but most manufacturers are resisting the idea (at about 50 MHz or so, getting past FCC class B certification is a nightmare—the main reason Intel drops the 50-MHz 486 and goes with clock doubler).

When choosing a I/O device such as a disk controller, consider the architecture of the card. For example here are some of the points to consider about each architecture:

PIO: PIO (programmed I/O) requires intervention by the CPU. For example, the Adaptec 1522 is a PIO device and can do either 16-bit PIO or 32-bit PIO. However, CPU-usage is quite intensive (30%–40%) and it will slow down your system during a large transfer or a CD-ROM access. As such, most high-performance systems don't use PIO device because it adversely impacts system throughput. BYTE magazine did a comparison of Adaptec 2940 (PCI) against a Future Domain adapter (PIO). While the Future Domain and Adaptec 2940 provide almost identical benchmark results, the Future Domain consumes a hefty 40% of CPU time whereas the 2940 does not. However, all PIO devices are much cheaper to manufacture—the FD is about half the price of the 2940. A standard ATDISK disk (IDE drive) does PIO.

DMA: ISA DMA has only 24 address lines so it can physically address 16 MB. However, if you happen to have 32 MB of RAM, the OS can see all of the memory. Therefore, if the OS wants to transfer a block of memory (which happens to be located at memory location above 16 MB, which the ISA DMA card, such as the Adaptec 1542C, cannot physically see), it will have to copy that block down to an area in the 0–15 MB range (where the Adaptec 1542c can see) so the 1542C can initiate the DMA transfer (double buffering). This copying down to 0–15 MB range and also copying up (16 MB and up) takes quite a bit of time (using Intel repsb, repsw, repsd) so that explains the slow down. However, you don't have that problem with either VL, PCI, or EISA as they all have 32-bit DMA address lines and can physically see up to 4 GB. PIO devices can see all of the memory, including those above 16 MB. The only problem is that it takes processor cycles to do any kind of data transfer.

Bus Master: Bus master devices have their own intelligence and offload this work from the CPU. The CPU can resume doing its own work while the bus-master device is doing all the I/O. When it's done, it hands the result to the CPU.

On a 2 SCSI disk daisy-chained system, the SCSI controller has more of an impact on your total performance than your disk drive. You would be better off buying a slower, cheaper disk and investing in a better SCSI controller.

Choose a disk with a low seek time (the time required to move the disk drive's heads from one track of data to another). The ratio of time spent seeking to time spent transferring data is usually 10 to 1, and often much higher.

Distribute the workload as evenly as possible among different I/O systems and disk drives. This will allow you to take full advantage of the system's I/O bandwidth.

Network Tuning

Network tuning with respect to Microsoft SQL Server performance is affected by the following:

Throughput of the LAN or WAN.

Throughput of the server's network interface card.

Availability of resources on the server to service client requests.

Regarding the throughput of the LAN or WAN, this is beyond the scope of this paper and is not critical to the tuning of a specific SQL Server. However, when considering remote procedure calls between SQL Servers or data replication, LAN and/or WAN throughput will be an important concern. Thus, this section will deal with tuning issues related to the network interface card and system or SQL Server resources that affect the SQL Server's network performance. Accordingly, the following network bottleneck monitoring guidelines will deal with these issues.

Symptoms and Actions:

Symptom: On a dedicated SQL Server system, if SQLServer: Network Reads/sec is substantially lower than Server: Bytes Received/sec, this may indicate network activity outside that generated by SQL Server. This may also be caused by periodic Windows NT Server administrative processes if the system also serves as a Primary Domain Controller or Backup Domain Controller.

Action: Determine if any processes or protocols extraneous to the operation of SQL Server are running. If so, eliminate them.

Symptom: On a dedicated SQL Server system, if SQLServer: Network Writes/sec is substantially lower than Server: Bytes Transmitted/sec, this may indicate network activity outside that generated by SQL Server. This may also be caused by periodic Windows NT Server administrative processes if the system also serves as a Primary Domain Controller or Backup Domain Controller.

Action: Determine if any processes or protocols extraneous to the operation of SQL Server are running. If so, eliminate them.

Symptom: If SQLServer: Network Reads/sec or SQLServer: Network Writes/sec is high, this indicates a great deal of network traffic.

Action: Look at the number of SQLServer: User Connections and the SQLServer: Network Command Queue Length. If these values are also high, especially Network Command Queue Length, then consider increasing the number of available worker threads via sp_configure and/or increase memory allocated to SQL Server. However, you may wish to restrict user connections via sp_configure in order to decrease the workload on the SQL Server. Remember, user connections and worker threads are counted as overhead against the SQL Server memory allocation. Thus, plan accordingly when adjusting these values.

General Actions for Network Tuning:

Tabular data stream (TDS) is the application protocol used by Microsoft SQL Server for the transfer of requests and request results between clients and servers. TDS data is sent in fixed-size chunks, called packets. TDS packets have a default size of 512 bytes. If an application does bulk copy operations, or sends or receives large amounts of text or image data, a packet size larger than 512 bytes may improve network performance, since it results in fewer reads and writes. For large data transfers, a packet size between 4092 and 8192 is usually best. Any larger size packet may degrade performance.

An application can change the packet size by using the DB-Library DBSETLOACKET() call. The packet size may also be changed while using the BCP and ISQL utilities using the [/a packetsize] parameter. Increasing the packet size will only work for name pipes clients to SQL Server on Windows NT.

You can monitor the improvement in network read and write efficiency by viewing the SQLServer: Network Reads/sec and SQLServer: Network Writes/sec counters before and after changing the TDS packet size. Fewer reads and writes should occur after increasing the packet size.

Make sure the server throughput is set to "Maximize Throughput for Network Applications"; this is the default for SQL Server. However, with this option set, user application access has priority over file cache access to memory (4 MB of memory is allocated to available memory for starting up and running local applications).

Increase Netlogon service update notice periods on the SQL Server if your SQL Server is a Primary Domain Controller, as well as the server announcement period if you are concerned with the amount of maintenance traffic Windows NT Server is creating and the load on the primary domain controller.

Value Name	Default Value	Minimum Value	Maximum Value
PulseConcurrency	20	1	500
Pulse	300 (5 minutes)	60 (1 minute)	3600 (1 hour)
Randomize	1 (1 second)	0 (0 seconds)	120 (2 minutes)

Pulse defines the typical pulse frequency (in seconds). All SAM/LSA (User/Security account database) changes made within this time are collected together. After this time, a pulse is sent to each BDC needing the changes. No pulse is sent to a BDC that is up to date.

PulseConcurrency defines the maximum number of simultaneous pulses the PDC will send to BDCs.

Netlogon sends pulses to individual BDCs. The BDCs respond by asking for any database changes. To control the maximum load these responses place on the PDC, the PDC will only have PulseConcurrency pulses "pending" at once. The PDC should be sufficiently powerful to support this many concurrent replication RPC calls (related directly to Server service tuning as well as the amount of memory in the machine).

Increasing PulseConcurrency increases the load on the PDC. Decreasing PulseConcurrency increases the time it takes for a domain with a large number of BDCs to get a SAM/LSA change to all of the BDCs. Consider that the time to replicate a SAM/LSA change to all the BDCs in a domain will be greater than: (Randomize/2) * NumberOfBdcsInDomain) / PulseConcurrency

If you need to transfer huge amounts of data between different computer systems, Ethernet may not be the appropriate medium to use; the basic Ethernet cable is limited to 10 MB per second (considerably less when you include network overhead). Other media are now available that offer significantly higher sustained transfer rates (FDDI, ATM, and so on).

The Network Monitor (provided with Microsoft Systems Management Server) is a very good tool to use to monitor the general network performance. It offers additional Performance Monitor counters as well as a few unique statistics from within the application.

Match the network interface card to the system bus. If you have a 16-bit bus, use a 16-bit network adapter; if you have a 32-bit bus, use a 32-bit network adapter. In addition, the card should be a bus-master card in order to minimize the processing overhead associated with network interrupts.

For more in-depth information concerning Microsoft SQL Server optimization and tuning, refer to Appendix A.