Multiuser locking, clustered indexes, and transaction logs are only a few of the benefits when using SQL Server databases.
Microsoft Access offers only two levels of locking: locked or unlocked. It uses the operating system to lock portions of the data to allow two or more users to have simultaneous locks to different areas of the file. However, the Microsoft Access default is to open the .mdb file for exclusive use. In contrast, SQL Server expects many simultaneous users and has much more sophisticated locking strategies.
SQL Server offers these types of locks:
Read operations (SELECT) generally hold shared locks.
Write operations (INSERT, UPDATE, and DELETE) generally hold exclusive locks.
In some circumstances, a write operation can obtain a non-exclusive update lock. In this way, the write operation initially shares the datapage with read operations, then promotes its non-exclusive update lock to an exclusive lock when the write action is ready to begin.
An intent lock indicates the intention to acquire a shared or exclusive lock on a data page. An intent lock prevents another transaction from acquiring an exclusive lock on the table containing that page.
An extent lock is held on a group of eight database pages while they are allocated or freed. An INSERT statement that causes the server to allocate more data or index pages uses an extent lock. CREATE and DROP statements also generate extent locks.
SQL Server 6.5 has the ability to lock individual records, so that two users can simultaneously insert different records on the same 2K datapage. When Insert Row-Level Locking is selected, the lock is initially established as a page lock and escalated to a row lock if another lock is requested. This helps to reduce overhead.
Usually locks are held on the 2K data pages. If a large number of data pages are affected, the server escalates the datapage lock to a table lock to save overhead and speed up the action. The point at which the lock escalates, called the Lock Escalation (LE) threshold, can be configured in the Server Configuration dialog box or by using the sp_configure stored procedure. The relevant settings are the LE threshold maximum, LE threshold minimum, and LE threshold percent settings.
As a system administrator, you can determine the locks that are being held by using the Current Activity window of SQL Enterprise Manager. This indicates the type of lock, the user holding the lock, the program holding the lock, and the last Transact-SQL statement received for that connection. It also shows if any server process ID (SPID) is blocking other SPIDs or being blocked by other SPIDs. Alternatively, you can view this information with the stored procedures sp_who and sp_lock.
Blocking problems usually arise when an uncompleted transaction holds locks. For information about blocking problems, see Knowledge Base article Q162361, "INF: Understanding and Resolving SQL Server Blocking Problems."
With Microsoft Access, data is initially stored on the data pages in the order entered. As individual records grow and shrink (such as name changes from Rob to Robert) and as records are inserted and deleted, the records are moved around. After a moderate amount of activity, you can no longer predict where any given record is kept in the table.
Conversely, SQL Server has the ability to precisely control datapage placement. One and only one index on each table can be designated as the clustered index. The data will be stored in the table in this clustered index order.
If the clustered index in our table is the part number field, the records are physically stored in part number order. If the clustered index is the name field, the records are kept in order by name. Initially, you might think that the primary key, in this example, the part number, should be the clustered index. However, if you were to search for one specific part number, it wouldn't really matter whether the index was clustered or nonclustered index; you would still find the record that coincided with that specific part number. The best use for a clustered index is to assemble groups of data, like all orders for a particular company or an inventory report in alphabetical order by name.
By default, the Upsizing Wizard transfers all indexes as nonclustered indexes. You can modify the Upsizing Wizard to transfer the primary key index to a clustered index. To make this change, start Microsoft Access and open the upsizing wizard library database. For Microsoft Access 95, the filename is Wzcs.mda. For Microsoft Access 97, the filename is Wzcs97.mda. When the database is open, click the Modules tab and open the UT_ModUserConstants module. Search down to the UT_CLUSTERED constant. Change the default value from False to True.
If this flag is set to true, and you're using SQL Server 6.0 or later, tables will be created with a clustered primary key. If you're using SQL Server 4.21, the index on the primary key will be created as a clustered index. The default behavior is to use a nonclustered primary key. Before you change this flag, you should review the section on locking implications of clustered indexes in the Microsoft SQL Server Database Developer's Companion (the default value is False).
Another purpose of the clustered index is to distribute write activity efficiently. For example, you have 40 people registering new members for your pledge drive. They are recording the name and address of each new member and the table provides an automatically increasing member ID number. This is called a counter field in Microsoft Access 2.0, an AutoNumber field in Microsoft Access 95, and an identity field in SQL Server. Assume that 20 member records will fit on each datapage. Now you have a problem. To insert a new record, a user must get an exclusive lock on the datapage. Assuming the member number field is the clustered index, 20 people call at once, and 20 customer representatives simultaneously attempt to enter a record, there are requests for 20 exclusive locks. Even if the order entry application is written so that each record can be saved in a flash with the lock held for one second, blocking is still a problem. Even if this scenario were feasible, at the rate of one lock per second, the maximum speed of data entry would be no more than 60 records per minute. Insert row-level locking (IRL) in SQL Server 6.5 is intended to solve this problem.
To change our scenario, we move our clustered index to the last name field. Now when 10 new members are inserted, the member number still increases by one for each record. The records are physically written to the table alphabetically by last name. Record 2044 for Smith is written to the S data pages, and record 2045 for Jones is written to the J data pages. As these are not the same datapage, both INSERT functions simultaneously receive exclusive page locks. Blocking is only a problem when customer representatives simultaneously attempt to enter records for very similar names like Andersen and Anderson. As the number of records increases and is dispersed over an ever-increasing number of pages, the likelihood of blocking decreases.
Although the presence of useful indexes speeds data retrieval, every index on your table slows down the data INSERT function slightly. Therefore, limit your indexes to fields and groups of fields frequently searched and fields required for table-to-table relationships. In the pledge drive example, it does little good to make the last name field the clustered index if a nonclustered index is placed on the member ID number field. In this example, the developer should create a temporary table for the membership drive, and then merge that data with the full membership table the next day. During the membership drive, the full membership table can be made available for read-only searching. Such use does not require exclusive locks.
A table that does not have a clustered index is called a heap. New data is always added to the end of the table. Empty space created when a record is deleted is never reused. In contrast, if you have a clustered index, new records are inserted among the data pages. If a record is deleted from page 230, a record added after the deletion reuses the available space. If you don't specify a clustered index, more new data pages are created, resulting in a large table and slow data entry. Only rarely can you achieve better data management with a heap than with a clustered index.
Microsoft Access fits as many records as possible on each datapage. However, SQL Server has an index feature called the fill factor. The fill factor, which can be set for both clustered and nonclustered indexes, is a percentage that tells SQL Server to leave some empty room on each datapage for new records. A fill factor of 50 percent causes SQL Server to write 10 of 20 records to the first datapage, then start a new datapage. If you designated a 100 percent fill factor, SQL Server writes all 20 records to one page. The default fill factor of 0 percent fills the datapage less 2 records, or 18 records in this example. When you want to add a new record, SQL Server divides the page in two and adds your record to one of the resulting data pages. The splitting algorithm is the same, regardless of the original fill factor. The fill factor is applied once when the index is first created. An initial 100 percent fill factor decreases as data pages are split in two. A fill factor originally set at 20 percent gradually increases as the pages fill. If the original fill factor is 50 percent, the value increases as the pages fill up, then decreases as they start dividing by two, until they once again reach 50 percent. Eventually, no pages are less than 50 percent full and some pages are 100 percent full, so the application averages 75 percent full. However, areas of your table that have little change retain their original fill factor, whereas active areas of the same table will drift toward 75 percent. Tables that have many deletions conversely migrate toward low fill factors.
A low fill factor reduces the need for page splits, significantly increasing the speed of data inserts. But there is a major drawback to this technique. If you only fill pages halfway, your table and your database will be twice as large as a table and database with a higher fill factor. When you search the resulting table for data, SQL Server must read more pages than would be required by a higher fill factor, since fewer records are stored on each page. Your search will be slower than it would be if the fill factor were higher. Low fill factors can be great for tables that experience frequent data insertion and can help reduce page locking conflict by storing fewer records per page. However, low fill factors are poor for tables that experience frequent data reads. Therefore, before you set the fill factor, ask yourself how the data on the table will be used.
The body of a nonclustered index contains records like any other table. Each record contains an index value and a pointer to the corresponding original table record. If your index is a multi-field index covering many large character (char) or variable character (varchar) fields, each index record will be quite large, and only a few will fit on one datapage. More 2K index pages must be read from the hard disk in order to search for values in such an index, slowing performance. Conversely, if the index is a single integer field, many records can be stored on each index page, resulting in a faster index.
If you frequently need data from just a few fields of a table, create an index containing those fields. This query fills a list box with names and IDs of preferred customers.
SELECT CustName, CustID
FROM Customers
WHERE Status = 'Preferred'
ORDER BY CustName
To improve performance, create an index on the Status field. The server then reads the index to identify the preferred customers and uses the record pointers in the index to find only those records in the customers table. Though faster than reading the entire table, this method requires reading both the index and underlying table data pages, generating many disk reads.
You can speed up this query by creating a multi-field index over the Status/CustName/CustID fields. When the server searches the index for the preferred customers, it simultaneously obtains the CustName and CustID information. (The Status column should be the first field listed in the index.)
In summary, smaller indexes are faster than larger indexes, but if you can get all of the required data from the index, that is faster still. Carefully match your indexes to your data needs since each index slows INSERT, UPDATE, and DELETE operations.
The Microsoft Access utility for compacting databases does not exist for SQL Server. Instead, SQL Server rebuilds indexes. As records are deleted from a table in SQL Server, gaps in the table and its indexes remain untouched. When an entire datapage is emptied, it is reassigned. However, as long as one record remains, the datapage remains untouched, even if it is 90 percent empty. When you rebuild an index, these data pages are rewritten and filled as specified by the fill factor. (If no clustered index exists on the table, create one and then drop it to achieve the same effect.) To fully compact a database, rebuild every index on every table. Of course, tables that experience little activity will benefit only slightly. Though it initially requires some work to set up such a maintenance plan, this system is easy to customize so you can rebuild only selected tables.
SQL Server has an important feature common to most production scale databases that is entirely absent in Microsoft Access and other "desktop" databases. The transaction log is a chronological record of all activity that occurs on the server. When a record is added to a table, a copy of the record is written to the transaction log. When a record is deleted, the transaction log records the data that was removed. Every time an existing record is changed, the transaction log records both the old value and the new value. The same details concerning the creation and deletion of tables, views, stored procedures, rules, defaults, and so on are stored also. Every multi-step transaction is recorded, along with whether it was committed or rolled back. Every entry to the transaction log contains a timestamp, an incrementing number to order all entries.
Transaction logs have many benefits. First, you can make a full backup of the database (which can take many hours on a large installation) and then, at later intervals, you need to back up the transaction log. By backing up transaction logs in lieu of backing up the complete database, you save time that can be used to make more frequent backups. By using a transaction log, you can fine-tune your backup system for a wide variety of data integrity requirements. The transaction log is often kept on a separate disk drive or disk array. If the data resides on a disk that fails, you can restore the database from a dump and then apply the transaction log still intact on another disk.
Should you decide that you don't need a transaction log, you can set the trunc. log on chkpt. option. With trunc. log on chkpt., you can ignore maintenance of the transaction log, and your database behaves much like a Microsoft Access database. SQL Server is still using the transaction log and you are still gaining some of its benefits, but less space and no maintenance are required.
The checkpoint process, verifying that all data changes are "flushed" to the hard disk and thus saved, occurs regularly on every database. The trunc. log on chkpt. option can be set ON or OFF for each database without regard to the settings of the other databases on the server. With this option set ON, the contents of the transaction log are discarded every time the checkpoint process completes. Checkpoints occur whenever a transaction commits, when a specified number of data pages have been changed, or at least every few minutes. The trunc. log on chkpt. is set on for the pubs sample database, the master database, and the tempdb temporary database.
With Microsoft Access, if you experience a power failure, you may be left with an incomplete transaction upon restart. SQL Server's transaction log prevents this. Upon startup after a power failure, the transaction log rolls the database activity forward to a stable condition. Transactions, if completed in the log, are applied to the data pages. Incomplete transactions are rolled back. This is an advantage even if you don't use transactions in your code. Normal database activity produces transactions, such as page splits, that are required when a datapage is full. Such transactions are completed or rolled back, leaving your database in an internally consistent state. This occurs regardless of the trunc. log on chkpt. setting.
Database replication is also based on the transaction log. If you have marked a table for replication, it is recorded in the database transaction log. A special process called the log reader monitors the log and sends changes marked for replication to the distribution database.