SQL Server 7.0: The New Kid on the Block

Michael Levy

SQL Server 7.0 was due for a November release, and the improvements are amazing. Virtually every aspect of the product has been improved upon. Some of these changes will directly influence the way you develop Visual FoxPro applications. For this article, Michael will introduce some of the fundamentals of SQL Server 7.0 and, in order to provide a frame of reference, will be drawing comparisons to VFP.

The purpose of this article is to explore the foundation of SQL Server 7.0. I'll be discussing the components that make up SQL Server, databases and files, indexes, locks, and the transaction log. Though seldom discussed, knowledge of these fundamentals is important to successfully implement any application that uses SQL Server 7.0 as its data store.

If you're like me when trying to learn a new programming language, you refer back to what you already know. For example, you might think, "This is how I would do it in VFP, what's the complement in x?" I'll take this same approach in this article.

The family
SQL Server 7.0 is really made up of three services: MSSQLServer, SQLServerAgent, and MSDTC.

The MSSQLServer service is the actual database engine. It contains the components that handle the processing of Transact SQL statements, the query processor, data storage management, locks, connections, and network facilities.

SQLServerAgent maintains and schedules Jobs, handles Alerts, and manages the list of operators for Alert notification.

MSDTC, or the Microsoft Distributed Transaction Coordinator, has the job of coordinating transactions that stretch over multiple servers. If the MSDTC sounds familiar, it should. It's one of the major components of the Microsoft Transaction Server. MSDTC uses a two-phase commit protocol to guarantee that operations against multiple data sources either complete as a unit or are all cancelled.

If you install the Full-Text Search option, SQL Server 7.0's setup program will install an additional service called MSSearch, or the Microsoft Search Service. MSSearch gives SQL Server the ability to perform text-searching queries similar to the search screen for the Microsoft Knowledge Base.

Databases, files, and filegroups
A VFP database is made up of an assortment of files. Each table is usually made up of three files. The main file has a DBF extension. Large text and BLOBs (Binary Large Objects) are stored in a secondary file that has an FPT extension. Indexes associated with the table are usually stored in a file that has a CDX extension. The meta-data about the database is also stored in three files. The main file has a DBC extension, and the two secondary files have DCT and DCX extensions.

SQL Server also uses multiple files. When you create a database, you'll specify at least two database files. The first is called the Primary, and it will contain all database objects, such as tables, views, and stored procedures. By convention, the Primary database file will have an MDF extension. The other file contains the transaction log -- used for committing and/or rolling back transactions -- and it usually has an LDF extension. It's very likely that most databases will be made up of just these two database files, since the maximum size of a data file is 32TB and the maximum for the transaction log file is 4TB. If you require more space, you can allocate additional database files. These secondary files usually have an NDF extension and need not be created on the same physical device as the primary file.

Multiple database files can be organized into user-defined filegroups, with a filegroup containing one or more database files. Individual database objects can then be assigned to a filegroup, allowing for explicit placement of an object on a physical device. For instance, suppose that you'd like all the indexes placed on the fastest drive that's attached to the system. The following code segment creates a filegroup and places an index in it:

 CREATE DATABASE sales 
 ON PRIMARY 
   (NAME = salesdata, 
   FILE = 'c:\mssql\data\salesdata.mdf', 
   SIZE = 10MB)
 LOG ON 
   (NAME = saleslog, 
   FILE = 'e:\mssql\data\saleslog.ldf', 
   SIZE = 4MB)
 
 ALTER DATABASE sales ADD FILEGROUP IndexGroup
 
 ALTER DATABASE sales ADD FILE 
   (NAME = salesdata2, 
   FILE = 'd:\mssql\data\salesdata2.ndf',
   SIZE = 10MB)
 TO FILEGROUP IndexGroup
 
 CREATE NONCLUSTERED INDEX Customer_NC1 
 ON Customer(zip, state, city) ON IndexGroup


Space allocations
It is important for you to understand how SQL Server 7.0 allocates space so that you can accurately size the database and estimate the amount of work SQL Server will do to solve a query. The basic unit of storage is called a page. Each page is 8K in length. Since a row can't span multiple pages, the maximum row size that can be created is 8,060 bytes (actually a little less), not including text and image data. The remaining amount of space is used for overhead.

For efficiency, SQL Server will never allocate just a single page. It allocates space in groups of eight pages, called an extent. SQL Server 7.0 will allow small database objects to share a single extent, and when one of the objects grows to the point where it takes up eight pages, it will be allocated its own extent.

Indexes
SQL Server uses indexes for many of the same reasons that VFP does. An index can be used to speed up data retrieval and joins, enforce uniqueness, and alleviate sorts. But also, like VFP, indexes have to be dynamically maintained, and maintaining indexes can be a very expensive process. So a balance has to be achieved. You need to have the necessary indexes to allow the query optimizer to find the best way to solve SELECT queries without creating such a load that INSERT, UPDATE, and DELETE queries become expensive (performance-intensive).

SQL Server has two types of indexes: clustered and nonclustered. The difference between the two is subtle but important. A clustered index is implemented as a b-tree or balanced tree (see Figure 1). But the leaf-level pages of the clustered index are the actual data pages of the table. This means that the table is physically sorted in order of the index. Also, each table can only have one clustered index.

Nonclustered indexes are also implemented as b-trees, but the leaf level contains pointers to the appropriate date pages in the table (see Figure 2). If the table doesn't have a clustered index, the leaf-level pointers will contain a row id. If the table does have a clustered index, the leaf-level pointers will contain a copy of the clustered index key. VFP indexes are very similar SQL Server's nonclustered indexes. Since the clustered index physically sorts the data, it's very good for solving range queries. A range query is a query that returns a large amount of contiguous data. Queries that rely on the BETWEEN clause are usually considered range queries. Nonclustered indexes are useful for solving Point queries. Point queries are queries that return a small number of rows, including a single row.

Index statistics
SQL Server uses a cost-based query optimizer to produce the best plan for solving a query. SQL Server stores some statistical information about each index within the index. The query optimizer uses this information to estimate the amount of work that the execution engine would incur if the index were used. It's very important that the index statistics be as up-to-date as possible. SQL Server will automatically update the index statistics periodically, depending on the amount of data in the table and its volatility. The DBA can manually update the statistics at any time by issuing the Transact SQL UPDATE STATISTICS command.

Locks
It's a basic property of all database management systems that two process aren't allowed to make changes to the same piece of data at the same time. SQL Server and VFP both use the locking of resources to enforce this property. With SQL Server, locks are usually described in terms of mode and resource.

VFP and SQL Server are both capable of locking different type of resources. In VFP, you normally discuss locking at the Row, Table, or Database level. In addition to Row, Table, and Database locks, SQL Server 7.0 also supports locking at the Page, Key, and Extent level. Page locks are similar to Row locks but affect all the rows on a database page. An Extent lock is use by SQL Server when it's allocating space for a database object. And Key locks are used for maintaining indexes.

VFP determines the granularity of a lock based on the command and its scope. For instance, if I issue this command:

 REPLACE nTotal WITH lnAmount IN order


VFP will only lock the affected row. On the other hand, if I issue the following command, VFP will lock the entire table:

 REPLACE nPrice WITH nPrice * 1.1 IN product


Microsoft Knowledge Base article Q92345, "Using File Locking and Record Locking in FoxPro," summarizes the granularity of locking applied by VFP based on the command and the scope.

The SQL Server query optimizer also determines the granularity to apply locks. It will try to lock at the smaller-sized resources, like rows, to improve concurrency, but more locks incur a higher overhead. During the execution of the query, SQL Server might decide to escalate the locking to a larger resource if it determines that it's applying an excessive amount of locks. Therefore, SQL Server might escalate many Page locks to a single Table lock. The server dynamically determines the boundaries that SQL Server uses to determine when lock escalation takes place.

SQL Server basically has four locking modes: Shared, Exclusive, Update, and Intent. Shared locks are used for operations that don't update data, such as SELECT. As SQL Server is reading the data, it will acquire a Shared lock, read the data, and then remove the lock. Applying the Shared lock prevents another process from changing the data at the same time that it's being read. Exclusive locks are used for operations that modify data, such as INSERT, UPDATE, and DELETE. Applying the Exclusive lock to the data prevents another process from modifying or reading the data while the modification occurs. By default, Exclusive locks are held until the transaction commits.

Update locks are used by data-modifying operations while the process is looking for the data to modify. They're used to prevent a form of deadlock that could occur if two processes were trying to modify data on the same resource. If both processes applied Shared locks and then waited for other Shared locks to be removed, a deadlock would occur. Instead, SQL Server uses an Update lock while searching for the data to modify. In this way, only one Update lock can be applied to the data.

Intent locks are more of a flag than a real lock. If a process acquires a lock at the row or page level, SQL Server will place an Intent lock on the table. This prevents other processes from acquiring a Table lock. Without the Intent lock, if a process wanted to acquire a Table lock, it would have to scan the lock chain to determine if there was a lock on any of the resources lower in the hierarchy.

Transaction log
VFP 3.0 was the first version of FoxPro to support Transactions. A Transaction is used to treat operations that affect multiple tables as a single unit. Either all the operations succeed, or none of them do. VFP implements Transactions using a transaction buffer. Changes made to the tables are put in the buffer. The changes in the buffer are flushed to disk only after an END TRANSACTION has been issued. If the application were to abruptly terminate, the changes wouldn't make it out of the buffer to corrupt the data.

SQL Server 7.0 implements Transactions very differently. Instead of storing the changes in a memory resident buffer, SQL Server uses a special database file called the transaction log. When a data modification occurs, SQL Server goes through the following steps:

1. If the required data pages aren't already present in the data cache, they're read in.

2. The changes are made to the pages in the data cache.

3. A record of the changes is written to the transaction log, and the log is flushed to disk. The dirty data pages aren't immediately flushed to the disk.

4. SQL Server has a special process, called the Checkpoint Process, whose job is to periodically flush dirty cache pages to disk.

As you can see, there's a delay between when the changes are made to the data and when the page is written to disk. If the server were to fail, those changes that hadn't been flushed would be lost. But a record of the changes is in the transaction log. This is why a SQL Server Transaction is sometimes described as a unit of work and a unit of recovery. When the server restarted, it would use the information in the transaction log to restore the database to a stable state.

The end?
Hopefully you now feel that making the jump from VFP to SQL Server isn't the nightmare that you thought it might be. While there are differences, both products are database management systems and both products have similarities. It's these similarities that provide the path for us to follow to SQL Server 7.0.

Sidebar: Installing SQL Server 7.0
I've always felt that SQL Server was one of the easiest back office products to install, and SQL Server 7.0 continues the legacy. I'd like to take a moment to discuss some of the installation options and their impact.

Bad news for you Netscape fans. One of SQL Server 7.0's prerequisites is the installation of Internet Explorer 4.01. If you plan on installing on Windows NT, you'll also have to install Service Pack 3 and a Mini-Service Pack.

The next big decision that you'll have to make is which variation of the server to install: Desktop, Standard, or Enterprise. The README file that ships on the CD goes into detail about the differences, but the big items are:


Two of the biggest decisions that you'll have to make are which Character Set and Sort Order to install. Once the server has been installed with these selections, they can't be changed without rebuilding the master database and reloading all the user databases. You'll also have to specify the Unicode Collation. It's recommended that your selection of the Unicode Collation match your selection for Character Set and Sort Order. If you don't do this, it's possible that Unicode and non-Unicode data might sort differently, thoroughly confusing the users. The setup program will default the Unicode Collation to the appropriate option.

It's important that you get the correct network libraries installed if you want to be able to communicate with SQL Server over a network. The network libraries determine which network protocols SQL Server will listen to for requests (remember, it's a service!). Setup will default to installing Named Pipes, Multi-Protocol, and TCP/IP. After SQL Server has been installed, you may add and remove network libraries at any time by running the SQL Server Network Utility.

The last configuration option is the logon account that the SQL Server services will use. You basically have two choices: the Local System account or a Domain account. If you select the Local System account, your SQL Server won't be able to access any network resources. If you want to use a Domain account, the account must have the following properties:


The Log on as a service right has to be granted after the account is created. While within User Manager for Domains, select Policies | User Rights to display the User Rights Policy dialog box. You'll have to select the Show Advanced User Rights check box before Log on as a service will show up in the right-hand dropdown.

After that and a reboot, your installation of SQL Server 7.0 will be completed.

Michael Levy is a consultant with ISResearch, Inc., a Microsoft Solution Provider and Certified Technical Education Center. He is also a Microsoft Certified Solution Developer and Trainer. Michael specializes in using Visual Studio and SQL Server to solve business problems. mlevy@isresearch.com.