Disk I/O Parallelism

With smaller SQL Server databases located on a few disk drives, disk I/O parallelism is not a likely performance factor. But with large SQL Server databases stored on many disk drives, performance is enhanced by using disk I/O parallelism to make optimal use of the I/O processing power of the disk subsystem.

Microsoft SQL Server 7.0 introduces files and filegroups, which replace the device and segment model of earlier versions of SQL Server. The files and filegroups provide a more convenient method for spreading data proportionately across disk drives or RAID arrays. For more information, see SQL Server Books Online.

A technique for creating disk I/O parallelism is to create a single "pool of drives" that serves all SQL Server database files, excluding transaction log files. The pool can be a single RAID array that is represented in Windows NT as a single physical disk drive. Or a larger pool can be set up using multiple RAID arrays and SQL Server files/filegroups. A SQL Server file can be associated with each RAID array, and the files can be combined into a SQL Server filegroup. Then a database can be built on the filegroup so that the data is spread evenly across all of the drives and RAID controllers. The drive pool methodology depends on RAID to divide data across all physical disk drives to help ensure parallel access to the data during database server operations.

The pool methodology simplifies SQL Server I/O performance tuning because there is only one physical location to create database objects. The single pool of drives can be watched for disk queuing and, if necessary, more hard disk drives can be added to the pool to prevent disk queuing. This technique helps optimize for the common case, in which it is not known which parts of databases will see the most use. Do not segregate part of available I/O capacity on another disk partition because SQL Server might do I/O to it 5 percent of the time. The single pool of drives methodology can make all available I/O capacity available for SQL Server operations.

SQL Server log files should always be physically separated onto different hard disk drives from all other SQL Server database files. For SQL Servers with busy databases, transaction log files should be physically separated from each other. Transaction logging is primarily sequential write I/O. Separating transaction logging activity from other nonsequential disk I/O activity can result in I/O performance benefits. That allows the hard disk drives containing the log files to concentrate on sequential I/O. There are times when the transaction log must be read as part of SQL Server operations such as replication, rollbacks, and deferred updates. If you administer SQL Servers that participate in replication, make sure all transaction log files have sufficient disk I/O processing power because of the reads that must occur.

Physically separating SQL Server objects from the rest of their associated database through SQL Server files and filegroups requires additional administration. Separating the objects can be worthwhile to investigate active tables and indexes. By separating table or index from all other database objects, accurate assessments can be made of the object I/O requirements. This is not as easy to do when all database objects are placed within one drive pool. Physical I/O separation can be appropriate during database development and benchmarking so that database I/O information can be gathered and applied to capacity planning for the production database server environment.

These are the areas of SQL Server activity that can be separated across different hard disk drives, RAID controllers, PCI channels, or combinations of the three:

Transaction log files
tempdb
Database files
Tables associated with considerable query or write activity
Nonclustered indexes associated with considerable query or write activity

The physical separation of SQL Server I/O activities is made convenient by using hardware RAID controllers, RAID hot plug drives, and online RAID expansion. The approach that provides the most flexibility consists of arranging the RAID controllers so that a separate RAID SCSI channel is provided for each database activity. Each RAID SCSI channel should be attached to a separate RAID hot plug cabinet to take full advantage of online RAID expansion (if it is available through the RAID controller). Windows logical drive letters are associated with each RAID array and SQL Server files can be separated between distinct RAID arrays based on known I/O usage patterns.

With this configuration you can relate disk queuing to a distinct RAID SCSI channel and its drive cabinet as Performance Monitor reports the queuing behavior during load testing or heavy production loads. If a RAID controller and drive array cabinet support online RAID expansion and if slots for hot-plug hard disk drives are available in the cabinet, disk queuing on that RAID array is resolved by adding more drives to the RAID array until Performance Monitor reports that disk queuing for that RAID array has reached acceptable levels (less than 2 for SQL Server files.) This can be accomplished while SQL Server is online.

The tempdb database is created by SQL Server to be a shared working area for a variety of activities, including temporary tables, sorting, subqueries and aggregates with GROUP BY or ORDER BY, queries using DISTINCT (temporary worktables must be created to remove duplicate rows), cursors, and hash joins. You should enable the tempdb database I/O operations to occur in parallel to the I/O operations of related transactions. Because tempdb is a scratch area and update-intensive, RAID 5 is not as good a choice for tempdb as RAID 1 or 0+1. The tempdb database is rebuilt every time the database server is restarted; therefore, RAID 0 is a possibility for tempdb on production SQL Server computers. RAID 0 provides the best RAID performance for the tempdb database with the least physical drives. The concern with using RAID 0 for tempdb in a production environment is that SQL Server must be stopped and restarted if physical drive failure occurs in the RAID 0 array-this does not necessarily occur if tempdb is placed on a RAID 1 or 0+1 array.

To move the tempdb database, use the ALTER DATABASE statement to change the physical file location of the SQL Server logical file name associated with the tempdb database. For example, to move the tempdb database and its associated log to the new file locations E:\Mssql7 and C:\Temp, use these statements:

ALTER DATABASE tempdb MODIFY FILE (NAME ='tempdev',FILENAME= 'e:\mssql7\tempnew_location.mDF')

ALTER DATABASE tempdb MODIFY FILE (NAME ='templog',FILENAME= 'c:\temp\tempnew_loglocation.LDF')

The master, msdb, and model databases are not used much during production compared to user databases, so they are not typically a consideration in I/O performance tuning. The master database is used only for adding new logins, databases, and other system objects.

Nonclustered indexes reside in B-tree structures that can be separated from their related database tables with the ALTER DATABASE statement. In this example, the first ALTER DATABASE creates a filegroup. The second ALTER DATABASE creates a file with a separate physical location associated with the filegroup. At this point, indexes can be created on the filegroup as illustrated by creating the index called index1. The sp_helpfile stored procedure reports files and filegroups present for a given database. The sp_help tablename has a section in its output that provides information about the table indexes and filegroup relationships.

ALTER DATABASE testdb ADD FILEGROUP testgroup1

ALTER DATABASE testdb ADD FILE (NAME = 'testfile',

    FILENAME = 'e:\mssql7\test1.ndf') TO FILEGROUP testgroup1

CREATE TABLE test1(col1 char(8))

CREATE INDEX index1 ON test1(col1) ON testgroup1

sp_helpfile

sp_help test1

For more information, see SQL Server Books Online.