SQL Server 7.0: The New Kid on the Block
Michael Levy
SQL Server 7.0 was due for a
November release, and the improvements are amazing. Virtually every aspect
of the product has been improved upon. Some of these changes will directly
influence the way you develop Visual FoxPro applications. For this article,
Michael will introduce some of the fundamentals of SQL Server 7.0 and, in
order to provide a frame of reference, will be drawing comparisons to VFP.
The purpose of this article is to explore the foundation of SQL Server 7.0.
I'll be discussing the components that make up SQL Server, databases and
files, indexes, locks, and the transaction log. Though seldom discussed,
knowledge of these fundamentals is important to successfully implement any
application that uses SQL Server 7.0 as its data store.
If you're like me when trying to learn a new programming language, you refer
back to what you already know. For example, you might think, "This
is how I would do it in VFP, what's the complement in x?" I'll
take this same approach in this article.
The family
SQL Server 7.0 is really made up of three
services: MSSQLServer, SQLServerAgent, and MSDTC.
The MSSQLServer service is the actual database engine. It contains the components
that handle the processing of Transact SQL statements, the query processor,
data storage management, locks, connections, and network facilities.
SQLServerAgent maintains and schedules Jobs, handles Alerts, and manages
the list of operators for Alert notification.
MSDTC, or the Microsoft Distributed Transaction Coordinator, has the job
of coordinating transactions that stretch over multiple servers. If the
MSDTC sounds familiar, it should. It's one of the major components of the
Microsoft Transaction Server. MSDTC uses a two-phase commit protocol to
guarantee that operations against multiple data sources either complete
as a unit or are all cancelled.
If you install the Full-Text Search option, SQL Server 7.0's setup program
will install an additional service called MSSearch, or the Microsoft Search
Service. MSSearch gives SQL Server the ability to perform text-searching
queries similar to the search screen for the Microsoft Knowledge Base.
Databases, files, and filegroups
A VFP database is made up of an assortment
of files. Each table is usually made up of three files. The main file has
a DBF extension. Large text and BLOBs (Binary Large Objects) are stored
in a secondary file that has an FPT extension. Indexes associated with the
table are usually stored in a file that has a CDX extension. The meta-data
about the database is also stored in three files. The main file has a DBC
extension, and the two secondary files have DCT and DCX extensions.
SQL Server also uses multiple files. When you create a database, you'll
specify at least two database files. The first is called the Primary, and
it will contain all database objects, such as tables, views, and stored
procedures. By convention, the Primary database file will have an MDF extension.
The other file contains the transaction log -- used for committing and/or
rolling back transactions -- and it usually has an LDF extension. It's very
likely that most databases will be made up of just these two database files,
since the maximum size of a data file is 32TB and the maximum for the transaction
log file is 4TB. If you require more space, you can allocate additional
database files. These secondary files usually have an NDF extension and
need not be created on the same physical device as the primary file.
Multiple database files can be organized into user-defined filegroups,
with a filegroup containing one or more database files. Individual
database objects can then be assigned to a filegroup, allowing for explicit
placement of an object on a physical device. For instance, suppose that
you'd like all the indexes placed on the fastest drive that's attached to
the system. The following code segment creates a filegroup and places an
index in it:
CREATE DATABASE sales
ON PRIMARY
(NAME = salesdata,
FILE = 'c:\mssql\data\salesdata.mdf',
SIZE = 10MB)
LOG ON
(NAME = saleslog,
FILE = 'e:\mssql\data\saleslog.ldf',
SIZE = 4MB)
ALTER DATABASE sales ADD FILEGROUP IndexGroup
ALTER DATABASE sales ADD FILE
(NAME = salesdata2,
FILE = 'd:\mssql\data\salesdata2.ndf',
SIZE = 10MB)
TO FILEGROUP IndexGroup
CREATE NONCLUSTERED INDEX Customer_NC1
ON Customer(zip, state, city) ON IndexGroup
|
Space allocations
It is important for you to understand how
SQL Server 7.0 allocates space so that you can accurately size the database
and estimate the amount of work SQL Server will do to solve a query. The
basic unit of storage is called a page. Each page is 8K in length.
Since a row can't span multiple pages, the maximum row size that can be
created is 8,060 bytes (actually a little less), not including text
and image data. The remaining amount of space is used for overhead.
For efficiency, SQL Server will never allocate just a single page. It allocates
space in groups of eight pages, called an extent. SQL Server 7.0
will allow small database objects to share a single extent, and when one
of the objects grows to the point where it takes up eight pages, it will
be allocated its own extent.
Indexes
SQL Server uses indexes for many of the
same reasons that VFP does. An index can be used to speed up data retrieval
and joins, enforce uniqueness, and alleviate sorts. But also, like VFP,
indexes have to be dynamically maintained, and maintaining indexes can be
a very expensive process. So a balance has to be achieved. You need to have
the necessary indexes to allow the query optimizer to find the best way
to solve SELECT queries without creating such a load that INSERT, UPDATE,
and DELETE queries become expensive (performance-intensive).
SQL Server has two types of indexes: clustered and nonclustered. The difference
between the two is subtle but important. A clustered index is implemented
as a b-tree or balanced tree (see Figure
1). But the leaf-level pages of the clustered
index are the actual data pages of the table. This means that the table
is physically sorted in order of the index. Also, each table can only have
one clustered index.
Nonclustered indexes are also implemented as b-trees, but the leaf level
contains pointers to the appropriate date pages in the table (see Figure
2). If the table doesn't have a clustered index,
the leaf-level pointers will contain a row id. If the table does have a
clustered index, the leaf-level pointers will contain a copy of the clustered
index key. VFP indexes are very similar SQL Server's nonclustered indexes.
Since the clustered index physically sorts the data, it's very good for
solving range queries. A range query is a query that returns a large
amount of contiguous data. Queries that rely on the BETWEEN clause are usually
considered range queries. Nonclustered indexes are useful for solving Point
queries. Point queries are queries that return a small number of
rows, including a single row.
Index statistics
SQL Server uses a cost-based query
optimizer to produce the best plan for solving a query. SQL Server stores
some statistical information about each index within the index. The query
optimizer uses this information to estimate the amount of work that the
execution engine would incur if the index were used. It's very important
that the index statistics be as up-to-date as possible. SQL Server will
automatically update the index statistics periodically, depending on the
amount of data in the table and its volatility. The DBA can manually update
the statistics at any time by issuing the Transact SQL UPDATE STATISTICS
command.
Locks
It's a basic property of all database management
systems that two process aren't allowed to make changes to the same piece
of data at the same time. SQL Server and VFP both use the locking of resources
to enforce this property. With SQL Server, locks are usually described in
terms of mode and resource.
VFP and SQL Server are both capable of locking different type of resources.
In VFP, you normally discuss locking at the Row, Table, or Database level.
In addition to Row, Table, and Database locks, SQL Server 7.0 also supports
locking at the Page, Key, and Extent level. Page locks are similar to Row
locks but affect all the rows on a database page. An Extent lock is use
by SQL Server when it's allocating space for a database object. And Key
locks are used for maintaining indexes.
VFP determines the granularity of a lock based on the command and its scope.
For instance, if I issue this command:
REPLACE nTotal WITH lnAmount IN order
|
VFP will only lock the affected row. On the other hand, if I issue the following
command, VFP will lock the entire table:
REPLACE nPrice WITH nPrice * 1.1 IN product
|
Microsoft Knowledge Base article Q92345, "Using File Locking and Record
Locking in FoxPro," summarizes the granularity of locking applied by
VFP based on the command and the scope.
The SQL Server query optimizer also determines the granularity to apply
locks. It will try to lock at the smaller-sized resources, like rows, to
improve concurrency, but more locks incur a higher overhead. During the
execution of the query, SQL Server might decide to escalate the locking
to a larger resource if it determines that it's applying an excessive amount
of locks. Therefore, SQL Server might escalate many Page locks to a single
Table lock. The server dynamically determines the boundaries that SQL Server
uses to determine when lock escalation takes place.
SQL Server basically has four locking modes: Shared, Exclusive,
Update, and Intent. Shared locks are used for operations that
don't update data, such as SELECT. As SQL Server is reading the data, it
will acquire a Shared lock, read the data, and then remove the lock. Applying
the Shared lock prevents another process from changing the data at the same
time that it's being read. Exclusive locks are used for operations that
modify data, such as INSERT, UPDATE, and DELETE. Applying the Exclusive
lock to the data prevents another process from modifying or reading the
data while the modification occurs. By default, Exclusive locks are held
until the transaction commits.
Update locks are used by data-modifying operations while the process is
looking for the data to modify. They're used to prevent a form of deadlock
that could occur if two processes were trying to modify data on the same
resource. If both processes applied Shared locks and then waited for other
Shared locks to be removed, a deadlock would occur. Instead, SQL Server
uses an Update lock while searching for the data to modify. In this way,
only one Update lock can be applied to the data.
Intent locks are more of a flag than a real lock. If a process acquires
a lock at the row or page level, SQL Server will place an Intent lock on
the table. This prevents other processes from acquiring a Table lock. Without
the Intent lock, if a process wanted to acquire a Table lock, it would have
to scan the lock chain to determine if there was a lock on any of the resources
lower in the hierarchy.
Transaction log
VFP 3.0 was the first version of FoxPro
to support Transactions. A Transaction is used to treat operations that
affect multiple tables as a single unit. Either all the operations succeed,
or none of them do. VFP implements Transactions using a transaction buffer.
Changes made to the tables are put in the buffer. The changes in the buffer
are flushed to disk only after an END TRANSACTION has been issued. If the
application were to abruptly terminate, the changes wouldn't make it out
of the buffer to corrupt the data.
SQL Server 7.0 implements Transactions very differently. Instead of storing
the changes in a memory resident buffer, SQL Server uses a special database
file called the transaction log. When a data modification occurs, SQL Server
goes through the following steps:
1. If the required data pages aren't already present in the data cache,
they're read in.
2. The changes are made to the pages in the data cache.
3. A record of the changes is written to the transaction log, and the log
is flushed to disk. The dirty data pages aren't immediately flushed to the
disk.
4. SQL Server has a special process, called the Checkpoint Process, whose
job is to periodically flush dirty cache pages to disk.
As you can see, there's a delay between when the changes are made to the
data and when the page is written to disk. If the server were to fail, those
changes that hadn't been flushed would be lost. But a record of the changes
is in the transaction log. This is why a SQL Server Transaction is sometimes
described as a unit of work and a unit of recovery. When the server restarted,
it would use the information in the transaction log to restore the database
to a stable state.
The end?
Hopefully you now feel that making the
jump from VFP to SQL Server isn't the nightmare that you thought it might
be. While there are differences, both products are database management systems
and both products have similarities. It's these similarities that provide
the path for us to follow to SQL Server 7.0.
Sidebar: Installing SQL Server
7.0
I've always felt that SQL Server was one
of the easiest back office products to install, and SQL Server 7.0 continues
the legacy. I'd like to take a moment to discuss some of the installation
options and their impact.
Bad news for you Netscape fans. One of SQL Server 7.0's prerequisites is
the installation of Internet Explorer 4.01. If you plan on installing on
Windows NT, you'll also have to install Service Pack 3 and a Mini-Service
Pack.
The next big decision that you'll have to make is which variation of the
server to install: Desktop, Standard, or Enterprise. The README file that
ships on the CD goes into detail about the differences, but the big items
are:
- Enterprise only installs on NT Enterprise. Standard
will install only on NT Server. Desktop will install on all 32-bit Windows
platforms (yes -- even Windows 95!).
- SQL Server 7.0 Enterprise can access more than
2GB of RAM.
- Desktop supports full Merge replication but can
only participate as a subscriber in Transactional replication.
- Desktop's maximum database size is 4GB.
Two of the biggest decisions that you'll have to make are which Character
Set and Sort Order to install. Once the server has been installed with these
selections, they can't be changed without rebuilding the master database
and reloading all the user databases. You'll also have to specify the Unicode
Collation. It's recommended that your selection of the Unicode Collation
match your selection for Character Set and Sort Order. If you don't do this,
it's possible that Unicode and non-Unicode data might sort differently,
thoroughly confusing the users. The setup program will default the Unicode
Collation to the appropriate option.
It's important that you get the correct network libraries installed if you
want to be able to communicate with SQL Server over a network. The network
libraries determine which network protocols SQL Server will listen to for
requests (remember, it's a service!). Setup will default to installing Named
Pipes, Multi-Protocol, and TCP/IP. After SQL Server has been installed,
you may add and remove network libraries at any time by running the SQL
Server Network Utility.
The last configuration option is the logon account that the SQL Server services
will use. You basically have two choices: the Local System account or a
Domain account. If you select the Local System account, your SQL Server
won't be able to access any network resources. If you want to use a Domain
account, the account must have the following properties:
- Member of the local administrators group
- All logon hours
- Password never expires
- Log on as a service right
The Log on as a service right has to be granted after the account is created.
While within User Manager for Domains, select Policies | User Rights to
display the User Rights Policy dialog box. You'll have to select the Show
Advanced User Rights check box before Log on as a service will show up in
the right-hand dropdown.
After that and a reboot, your installation of SQL Server 7.0 will be completed.
Michael Levy is a consultant with
ISResearch, Inc., a Microsoft Solution Provider and Certified Technical
Education Center. He is also a Microsoft Certified Solution Developer and
Trainer. Michael specializes in using Visual Studio and SQL Server to solve
business problems. mlevy@isresearch.com.