January 15, 1997
Kevin Collins
Microsoft Jet Program Management
Microsoft® Jet 3.5 is the database engine that is used in Microsoft Office 97, which includes Microsoft Access 97, and in Microsoft Visual Basic® version 5.0. Any application that has access to the version of Visual Basic for Applications that shipped with Office 97 will have access to Jet data. There is no format change in Jet 3.5; thus, it is possible to access a database that was created with Jet 3.0 without converting it. (Note: Microsoft Access 97 does require a conversion process, primarily due to a new method of storing Visual Basic for Applications.) Even though the database format has not changed, an application cannot access Jet 3.5 using DAO 3.0. Thus, if you are using Microsoft Visual Basic version 4.0, you will not be able to reference the DAO 3.5 type library to gain access to Jet 3.5. Jet 3.5 also has new file names for the three primary DLLs that it uses (MSJET35.DLL, MSJINT35.DLL, and MSJTER35.DLL).
The primary goal of this paper is to disseminate information regarding features of Jet 3.5 that you can use to improve and optimize performance. Since many of the features are primarily exposed through DAO, this paper will also illustrate some of the new properties and methods included in DAO 3.5. This paper will go over each feature and, where applicable, include sample code to illustrate the features along with performance numbers generated from the Jet performance lab. In addition, tips and techniques discovered during our performance analysis will be shown along with other pertinent information discovered from customers in the field. Below is a high level list of the major enhancements that were put into Jet/DAO 3.5 to improve and fine tune performance.
To improve multi-user performance over that of Jet 3.0, the Jet team took a hard look at where concurrency bottlenecks were occurring. This analysis leads to a modification of locking algorithms in two areas: write locks on index pages and read locks on long value (LV) data pages.
When a Jet 3.0 database used enforced referential integrity, Jet would always place a read lock on any index page that was being read. This was necessary to prevent other users from changing data in an upper-level index btree that might be necessary to enforce referential integrity. However, this did not come without cost as write locks on a series of index pages would prevent other users from even seeking on that index.
This was discovered too late in the Jet 3.0 time frame for the Jet team to make an architectural change to reduce the concurrency hit. This was resolved in Jet 3.5 by replacing the write locks with read locks on index pages that were not being modified. This allows Jet to continue to enforce referential integrity while still allowing other users to read the index pages.
The other locking enhancement in Jet 3.5 concerns placing read locks on LV pages. LV pages typically contain columns with the Memo and OLE data types used in Microsoft Access. In Jet 3.0 a read lock would be placed on every LV page that was being read. The primary reason for using read locks was to ensure that the user would see unchanged data when reading large LV pages. However, while we examined more customer databases, we discovered that many users were using a Memo field with only a minimal amount of data being stored. Since LV data that did not span more than one page did not meet the original criteria for placing read locks, the Jet team devised a way to eliminate read locks in those scenarios. The outcome of this was to remove read locks on LV pages that did not span more than one page. This substantially reduced concurrency conflicts and substantially increased performance.
Figure 1 shows the multi-user performance throughput improvements in Jet 3.5:
Since Jet 3.0, a large effort has been made to measure performance with Jet. The Jet 3.5 team expanded on this effort by increasing test suites to over 1,500 performance benchmarks. The chart above and graphs that will follow are results from some of those tests.
The lab was also upgraded during the 3.5 cycle to reflect the operating system and machine hardware that a high-end customer might use. Of course, testing was still done with memory restrictions as low as 5 MB of RAM to represent users with low-end hardware. The majority of the multi-user tests were conducted on 36 machines. 27 of them were identically configured Pentium 60 MHz machines with 32 MB of RAM while the remaining nine were Pentium 120 machines with 64 MB of RAM. All machines had a 540-MB IDE hard disk drive and many had a second 1.2- or 2.5-GB EIDE hard disk drive with a PCI network interface card (NIC). The tests were run using only DAO/SQL commands on Windows NT® Server/Workstation 3.51 SP5, Windows NT Server/Workstation 4.0, Windows® 95 SP1 or OSR2. When run in a network environment, the network operating system was NetWare 4.11 on a Dell XPE PowerEdge P120 with four, 1-GB RAID drives running off a dedicated EISA RAID SCSI host adapter, two four-port PCI full-duplexed Ethernet NICs and 64 MB of RAM.
Jet 3.5 was improved to use Rushmore technology on more operands in predicates. In particular, the FALSE and <> operands now provide substantial performance improvements.
Even with all the work in Jet 3.0 to eliminate transactions in order to obtain better performance, SQL DML statements were still placed in an implicit transaction. This negated some of the performance work and typically resulted in DAO looping code to substantially outperform SQL DML statements. Jet 3.5 has removed that restriction and SQL DML statements now are no longer placed in an implicit transaction. This results in a substantial performance benefit when running SQL DML statements that affect many rows of data.
While this change provides a substantial performance increase, it also introduces a change to the behavior of SQL DML statements. Previously, a SQL DML statement would roll back if any part of the SQL DML statement could not be completed. This was due to the fact that Jet placed it in an implicit transaction. It is now possible to have some of the rows committed by a SQL DML statement while others are not. An example of this would be when Jet’s cache is exceeded. The data in the cache will be flushed to disk and the next set of rows will be modified and placed in the cache. Therefore, if the connection were terminated, there would be a possibility that some of the rows were saved to disk and others were not. This behavior is the exact behavior that Jet 3.0 users would see when using DAO looping constructs to update data without an explicit transaction. If this behavior is not desired, then the user will need to add explicit transactions around the SQL DML statement to define a set of work and sacrifice the performance gains.
Jet 3.5 has been enhanced to not reseek an index page if the index is a primary or unique index. This reduces the amount of I/O that occurs since Jet does not need as many reads.
Jet 3.0 would always refresh its cache on a page when a read lock was promoted to a write lock. In Jet 3.5 it is no longer necessary to refresh the cache when this operation occurs, thus resulting in reduced I/O.
While working with the Microsoft Access team, we discovered that their implementation of Jet through the user interface required frequent checking for security on objects. Jet 3.5 now keeps the MSysObjects and MSysACEs tables open for the duration of a Jet instance in order to improve performance through the Microsoft Access user interface and other areas where security is implemented and objects are being referenced through DAO. This reduces the overhead of constantly opening and closing those tables when security is implemented and reduces I/O.
When adding data to a table, Jet 3.0 would always flush its cache every two seconds after each extent (up to 16K) was added. By examining the performance characteristics when adding one million rows of data, Jet 3.5 was modified to only flush its cache two seconds after all new extent data was added to the database. This results in less I/O.
Jet 3.0 would allocate up to 8 pages (16K) at a time when adding data to a table. This was done to allow data to be contiguous within a table, thus optimizing sequential reads and Jet’s read-ahead cache. It was determined during performance testing that increasing this maximum allocation resulted in better performance. Jet 3.5 has been modified to add up to 32 pages (64K) at a time when adding data. Like Jet 3.0, Jet 3.5 is smart about adding large extents to tables and will only do so when large numbers of rows exist in a table.
One of the big performance improvements in Jet 3.0 was the reduced time it took to remove a table or delete all the rows in a table. While Jet 3.0 offered substantial performance improvements, there was still room for improvement and Jet 3.5 took advantage of that by optimizing how it removed pages in the database by reading and processing less directory pages.
During the course of performance tests, it was determined that users who used temporary queries (for example, DB.Execute SQL) or constantly modified stored queries in code were experiencing slow performance. Jet 3.5 addressed this issue by optimizing the query plan generation process.
With Jet 3.0 it was difficult for users to manipulate registry values to control Jet because the key names, types, and values were not created in the registry. Jet 3.5 remedies that by explicitly loading the key names and default values when Jet 3.5 is registered. Explicit registry names and values are also loaded for the Jet 2.x installable ISAM. Below is a listing of the default registry entries for Jet 3.5. New registry entries will be discussed later in this paper.
When using the default value of zero for the MaxBufferSize registry setting in Jet 3.0, Jet would calculate how much RAM to dynamically allocate for its cache. This formula was ((Total in MB – 12) / 4 + 512K)
. However, it was discovered that machines with large amounts of RAM (that is, over 64 MB) were becoming more common and causing Jet to start using a large amount of RAM for the cache. This could potentially cause a lot of swapping to occur due to the use of virtual memory, especially when Jet was being used with Microsoft Internet Information Server (IIS). Jet 3.5 addressed this issue by putting a cap of 13,824K on the result of the formula. The user can override the formula and the cap by putting a value other than zero for the MaxBufferSize setting.
From talking with customers and examining the results of different registry settings while running performance tests, we realized that there was only so much self-tuning that Jet could do to optimize all scenarios. The only way to fine-tune Jet was by exposing programmatic run-time controls via DAO.
Probably the most exciting enhancement to Jet 3.5 is the ability to control the majority of Jet’s registry value settings during run time. This provides the developer with ultimate control over performance and functionality for Jet 3.5. These new features are exposed through DAO 3.5 with the new SetOption method of the DBEngine object. Below is a list of the Jet registry values and their corresponding DAO constants that can be manipulated during run time.
It is important to note that using the SetOption method only affects the run-time values of the registry and does not physically change the values in the registry. Thus, once Jet is restarted, it will read the values in the registry. This means that in order to control Jet’s registry setting the developer must use the SetOption method in code that executes every time an application starts. Below is a code sample that illustrates how a developer might use the SetOption method to optimize code to take advantage of Jet’s buffer setting:
Sub Main()
On Error GoTo ErrorHandler
Dim db As Database, rs As Recordset, ws As Workspace
Dim strCompanyName As String, _
strContactName As String, _
lngReads As Long, lngWrites As Long
Set db = _
OpenDatabase("c:\northwind.mdb", False, False)
DBEngine.SetOption dbMaxBufferSize, 128
Set rs = db.OpenRecordset _
("SELECT * FROM Customers", dbOpenDynaset)
Set ws = Workspaces(0)
lngReads = DBEngine.ISAMStats(0, True)
lngWrites = DBEngine.ISAMStats(1, True)
While Not rs.EOF
rs.Edit
strCompanyName = rs!CompanyName
strContactName = rs!ContactName
rs!CompanyName = strCompanyName
rs!ContactName = strContactName
rs.Update
rs.MoveNext
Wend
' The null transaction ensures no more asynchronous
' activity that could yield inaccurate statistics.
ws.BeginTrans
ws.CommitTrans
lngReads = DBEngine.ISAMStats(0)
lngWrites = DBEngine.ISAMStats(1)
MsgBox "Total reads " & CStr(lngReads) & _
" Total writes " & CStr(lngWrites)
Exit Sub
ErrorHandler:
MsgBox "An error has occurred " & Err & " " & Error
Resume Next
End Sub
Note If the dbMaxBufferSize value above is modified from 128 to 2048, the number of writes will decrease from 20 to 11. This is an extreme example to illustrate the point while using the Northwind database. Using these numbers for a real world scenario is not recommended.
While programmatically flushing Jet’s write cache isn’t necessarily a performance enhancement, it allows developers to guarantee that their data has been written to disk after they issue the DAO CommitTrans method. While Jet does tell the operating system to write its data to disk after a CommitTrans, the operating system has a lazy-write cache that does not necessarily write data to disk, but does notify Jet that the data has been written to its cache. While Jet could tell the operating system to bypass its lazy-write cache, doing so would cause significant performance degradation. Therefore, to give developers the best default performance, but allow for increased functionality, Jet exposed an interface to bypass Windows 95’s and Windows NT Server/Workstation’s lazy-write cache. (Note: This feature works only for Windows 95 and Windows NT Server/Workstation. If the .mdb file is used with any other operating system, the feature will not work.) You can programmatically flush the cache by using an additional property on the CommitTrans method called dbForceOSFlush (for example, ws.CommitTrans dbForceOSFlush
).
This property should be used only when it is critical to know that all the data in a transaction has been written to disk before proceeding with the next command. Below is a chart demonstrating the performance impact when utilizing this property.
A feature of Jet 3.0 was the ability to have performance parity between a database being opened as shared and a database being opened exclusively. This was largely accomplished by improved buffer refreshing. However, to implement that feature, it was necessary to modify the behavior of PageTimeout to check the database header page (DBH). The default for checking the DBH was determined by the PageTimeOut setting in the registry and by default would retry it every five seconds. While this proved optimal for performance, the drawback was that a user could go for almost ten seconds before seeing changes made by other users. The workaround for this behavior was to set the PageTimeOut registry setting to a lower value. However, this resulted in a performance hit as Jet was forced to read the DBH more frequently, resulting in increased I/O.
Jet 3.5 addressed this problem by exposing an interface to force the cache to be refreshed regardless of the PageTimeOut setting. DAO exposed this interface by adding a new argument, dbRefreshCache, to the Idle method of the DBEngine object. The DBEngine.Idle dbRefreshCache
statement forces Jet to immediately read the DBH to see if any changes have occurred. If they have, Jet’s cache will be refreshed and the user that issued the command will see any changes made by other users. This change allows for the PageTimeOut setting to be left at its default, thus providing optimal performance, and giving the developer control over when to check for other users’ changes. However, just as may happen when PageTimeOut is set to a low value, using DBEngine.Idle dbRefreshCache
inappropriately may cause performance problems. Below is a chart illustrating the increased I/O that is caused by repeatedly calling dbRefreshCache.
Depending on the network operating system, Jet can cause short bursts of network traffic when attempting to retry for a lock. In order to prevent these short bursts of network traffic that could cause performance problems for the network administrator, Jet 3.5 introduced a new registry setting called LockDelay. This registry setting works in conjunction with the LockRetry registry setting and places a default delay of 100 milliseconds between every lock retry. Besides eliminating the short burst of lock retries that could occur on certain network operating systems, this feature provides a more consistent feel to users when they encounter locking conflicts.
All versions of Jet place locks on the .ldb file while modifying data. Many locks can accumulate when data is being manipulated inside a transaction. The accumulation of locks caused particular problems with NetWare servers because they could handle only up to 10,000 locks per connection. This limit ensured good performance on a NetWare server, but caused Jet not to complete large transactions. When users would encounter this scenario, they would experience long delays before they would even get an error message stating that the transaction would need to be rolled back. The only workaround was to break transactions that affected many rows into mini transactions and replace SQL DML statements (prior to version 3.5) with the equivalent DAO looping scenarios that would use mini transactions that affected fewer rows.
Jet 3.5 addresses this problem by introducing a new registry setting called MaxLocksPerFile. With Jet 3.5, an explicit transaction will automatically partially commit when more than the default setting of 9500 locks have accumulated. While this setting is primarily for NetWare users, it is also useful to Windows NT Server/Workstation 3.5x users because performance will start to degrade when Windows NT Server/Workstation has to manage that many locks. Windows NT Server/Workstation 4.0 addresses many of those performance issues, but still runs optimally with the MaxLocksPerFile registry setting as its default.
Many applications would experience database bloat when manipulating LV data types. This behavior was caused by performance enhancements with LV data in Jet 3.0. A particular area of database bloat would occur when developers would manipulate form, report, or Visual Basic for Application modules in Microsoft Access. There were only two workarounds to recover the empty space from modified LV data types. The first workaround required a user to be the last user to close the database. Once the database was reopened, the empty LV space would be recycled, but the size of the .mdb file would not shrink. The other solution was to compact the database, which would reclaim the LV data pages and shrink the database’s physical size.
Jet 3.5 resolved this issue by modifying the way that it recycles pages. Jet 3.5 can now recycle LV pages once a new allocation of LV pages has been created. The size of the database will still grow initially, but it will remain stable with only small increases of size over time. However, the Jet performance team discovered that this feature caused a performance hit when manipulating LV data. To circumvent the performance issue, a new registry entry called RecycleLVs was added. By default, RecycleLVs is turned off to ensure performance parity with Jet 3.0. However, Microsoft Access will dynamically turn this feature on immediately before manipulating any of its objects that utilize LV data and will turn it off when those objects are no longer being manipulated. This provides the user with optimal performance while minimizing database bloat.
DAO users who are manipulating LV data should leave the RecycleLVs registry setting turned off and turn it on only when necessary by utilizing the SetOption method of the DBEngine object discussed above.
The biggest performance change for Jet 3.5 was a modification to how Jet uses its internal cache for manipulating data outside transactions. One of the biggest performance improvements in Jet 3.0 was the introduction of asynchronous writes. However, due to potential concurrency issues, this feature could not be fully exploited.
In Jet 3.0, asynchronous writes were controlled by two registry entries: MaxBufferSize and SharedAsyncDelay/ExclusiveAsyncDelay. The SharedAsyncDelay setting is used for databases opened in shared mode; likewise, ExclusiveAsyncDelay is used for databases opened in exclusive mode. The combination of the two settings determined how long modified data pages could be held in Jet’s cache before being flushed to disk. The most restrictive of the two registry entries was SharedAsyncDelay. The default SharedAsyncDelay value was 50 milliseconds. This value prevents concurrency issues that could arise with Microsoft Visual Basic or Microsoft Access users modifying data through forms, because any modified data that is held in Jet’s cache also has a corresponding write lock. Therefore it was necessary to keep the SharedAsyncDelay setting low as to prevent pages from sitting in Jet’s cache and holding write locks. This scenario would cause a significant concurrency issue if a higher SharedAsyncDelay value were used as the pages in Jet’s cache would continue to hold write locks. However, leaving the value low also directly affected performance on operations that manipulated many rows of data at once (such as looping constructs in DAO or SQL DML statements) due to the fact that Jet was not utilizing its cache as it flushed its cache every 50 milliseconds.
This situation made it very difficult for a developer to take advantage of the asynchronous writing techniques that were present in Jet 3.0 without affecting concurrency. The only way to take advantage of these techniques was to have one machine with a high SharedAsyncDelay setting designated as the machine to only run operations that affected many rows.
Jet 3.5 introduces a new method of determining when to flush the cache for asynchronous writes that eliminates this problem. This was accomplished by adding a new registry setting called FlushTransactionTimeout that would ignore the AsyncDelay registry settings. While Jet 3.0 would use the SharedAsyncDelay setting to determine a maximum amount of time to elapse before flushing the cache to disk, Jet 3.5 uses the FlushTransactionTimeout setting to determine an amount of time of inactivity before flushing the cache to disk.
This eliminates the problem with users modifying data in forms because 500 milliseconds, the default setting for FlushTransactionTimeout, would expire before the user could modify and save the next row. Therefore, the maximum amount of time that a lock would be held on the data and index pages associated with the row being edited would be 500 milliseconds. This eliminated any concurrency issues when using forms in Microsoft Visual Basic or Microsoft Access. On the performance side, users would see significant performance increases, as Jet would now be able to use the maximum amount of cache before flushing to disk. This is due to the fact that any operation that modifies many rows (for example, a DAO looping construct or a SQL DML statement) would be adding rows to Jet’s cache before the 500 milliseconds setting would expire. This behavior results in Jet fully utilizing its cache and substantially less I/O because Jet does fewer reads and writes. Below is an example of the performance differences with the new behavior.
As the chart illustrates, almost a 50% reduction in I/O was accomplished with this new setting, while not increasing concurrency with forms usage. While the Jet performance team did not encounter any reason to not use this feature, setting the FlushTransactionTimeout value to zero disables the feature. Disabling this feature causes Jet to use the AysncDelay settings in the same manner as Jet 3.0.
This section is intended to illustrate performance optimization tips and tricks that the Jet performance team discovered during our internal tests and while examining performance issues provided by customers.
An area that can cause performance degradation is the constant recompiling of stored queries. We have seen this occur when DAO objects are not explicitly closed. The reason for this is that stored queries reference a particular memory address in order to use the Visual Basic for Applications expression service. If a DAO object is opened followed by the execution of a stored query, re-executing that query may cause Jet to recompile because the original memory address space is occupied. In certain circumstances this can also cause the database to increase in size. Many of these issues were resolved in Jet 3.5, but some of the issues can be resolved only by explicitly closing DAO objects when they are no longer being used. The developer should not rely on Visual Basic for Applications to implicitly close objects when a Sub or Function procedure is exited.
If explicitly closing the DAO object is not an option due to nested explicit transactions or some other scenario, another possible workaround is to use the dbForwardOnly
syntax when opening a recordset. Using the dbOpenShapshot, dbForwardOnly
syntax prevents Jet from recompiling the query. We have found this to be most useful when using the data grid that comes with Microsoft Visual Basic version 4.0, because the developer has no control to explicitly close the DAO objects being used.
From a performance perspective, there are many reasons to frequently compact a database. One reason is that compacting will create a new database that stores all table rows in a contiguous order. If a primary key or unique index is defined, the rows will be sorted in order of the primary key or unique index. This allows Jet to take full advantage of its read-ahead cache and also reduces disk I/O when doing sequential scans of a table. Compacting also causes all the statistics in the database to be recalculated. Statistics can become out of date during the course of database operations, thus resulting in inaccurate query plans. Probably the most important performance reason for compacting the database is that the CompactDatabase command or CompactDatabase method switches a flag in all stored queries that causes them to recompile the next time they are executed. This is important because it ensures that the query plan retrieves the latest statistics and creates the best execution path to retrieve the data.
Compacting is also important from a stability standpoint because it removes all deleted pages, recopies all pages (thus ensuring integrity in the pages), and recreates all index pages.
A somewhat related issue to this concerns repairing a database. A bug was found in Jet 3.0 where issuing the RepairDatabase command (or the RepairDatabase method) before compacting the database could result in a database that could no longer be opened. This problem (due to a very rare bug that could allow duplicate indexes on the system tables) has been resolved in Jet 3.5 and a special release of Jet 3.0 is now available on http://www.microsoft.com/kb/articles/q151/1/86.htm. Note that the problem will never occur if the database is compacted before it is repaired. Previously it was recommended to repair the database before compacting it. This was primarily for Jet 2.x databases, because the RepairDatabase command and RepairDatabase method had additional functionality to recover truncated rows of data. This is no longer true for Jet 3.x file formats and it is recommended that users only repair a database if a Jet error message indicates that this is necessary.
Probably the biggest performance issue that we have seen from customer databases results from the use of expressions in queries. Having an expression in a query prevents Jet’s query optimizer from recognizing the column in the expression, thus not using the column for index or Rushmore optimization. This alone can cause a substantial performance hit. In addition, expressions are evaluated for every row, not just the rows that are returned. This can cause substantial overhead. One customer’s query went from two minutes to two seconds by removing the embedded expressions. The best place to put expressions is at the form or report level because then expressions are only evaluated for the rows returned.
While having indexes can typically decrease data-retrieval times, they always carry a cost in maintenance and concurrency issues. Below is a test showing the throughput differences when randomly updating one row of data in one table from six workstations.
By simply adding an index to the column that was being updated, overall throughput diminished over five times! The question then becomes: When should a column be indexed? There is no concrete answer for this, as it depends on the type of application. The first rule of thumb is that highly duplicated data types should not be indexed (for example, Boolean data types, and columns that represent gender, state abbreviations, or country codes). The second rule of thumb is to not add indexes to columns simply to force Rushmore to use more than one index. An example of this would be indexing a column called City and a column called ZipCode in a customer table when the application is always going to be using both columns for retrieval purposes. In this instance, ZipCode is going to be the most unique index and would return a faster result set if City was not indexed. This is because Rushmore need not use the index on City, thus reducing overall I/O. Of course, if both values were not always being entered and they were used alternatively and equally, then having an index on both columns would probably be advantageous. Rushmore is best utilized when combined indexes generate a unique result set.
It is also important to remember that indexes create concurrency issues, as one index page represents many data pages. Therefore, modifying an index page can cause users with data on an entirely different data page to be locked out when trying to update the indexed column. This is illustrated in the chart above.
To see this behavior, open the Northwind database in Microsoft Access 97 and turn pessimistic locking on. Update a value in one indexed field in the Customer table but don’t move to the next record. On another workstation, open the Customer table and try to edit another value in the same indexed field that the other workstation is editing. Next try updating a value in an non-indexed field in the Customer table. What will become evident is that substantially more records of data are locked when you try to update a value in an indexed field than when you try to update a value in a non-indexed field.
While we are not stating that developers should not index, we are saying that developers and database administrators should be aware of the pros and cons of indexing.
In the majority of cases (Jet 3.5 only), it is better to use a single SQL DML statement than a looping DAO construct. Below are two examples of code, with a chart following showing the time differences.
Set rs = gdbOrder.OpenRecordset("Customer")
While Not rs.EOF
rs.Edit
rs![AddressStatus] = rs![AddressStatus] + 1
rs![CTitle] = "President"
rs![CCompany] = "Olivetti"
rs![EntryDate] = "10:20AM"
rs![Country] = 12
rs.Update
rs.MoveNext
Wend
gdbOrder.Execute "UPDATE Customer SET AddressStatus = (AddressStatus+1), Ctitle = ‘President’, Ccompany = ‘Olivetti’, EntryDate = #10:20AM#, Country=12;", dbFailOnError
While this is one of the more extreme examples that we have seen and is only inherent in Jet 3.5 due to removal of implicit transactions for SQL DML statements, it demonstrates that the developer should examine the code for potential performance enhancements by re-coding DAO looping constructs with SQL DML statements.
Another performance issue raised by customers using Jet 3.0 was due to a change in how Jet dealt with the .ldb file. The .ldb file is used to track which users have the database open and to track the locking of pages in the .mdb file. In Jet 3.0, the .ldb file was deleted when the last user closed the database. This was done to prevent littering of .ldb files when replication was being used. However, in situations where only one user was accessing a linked table, particularly in a looping construct, a significant performance hit was experienced. This was because linked tables do not keep persistent connections to the database where they reside. This causes a lot of additional I/O to delete, recreate, and establish locks on the .ldb file.
The workaround for this is to have the application keep a persistent connection to any table where other linked tables reside in another database file. This prevents Jet from constantly deleting, creating, and obtaining lock information on the .ldb file. One customer’s scenario was improved from three minutes to thirty seconds by employing this technique.
Due to the changes in Jet 3.5 regarding the FlushTransactionTimeout registry setting, it is possible to see significant throughput decreases due to concurrency issues when OLTP is being implemented from many workstations. The reason for this is that Jet 3.5 will cache modified pages until no activity is encountered or the cache is full. Thus the fast throughput of OLTP statements will cause Jet to not flush its cache and release the locks on index and data pages. This may cause many pages to be locked during this time period, which will cause concurrency issues with many users trying to update one row at a time on the same table. The workaround is to use explicit transactions for these scenarios to force Jet to flush its cache when the transaction is committed. Below is an extreme example of throughput degradation when not using an explicit transaction for OLTP type work to quickly update one random row at a time from six workstations.
While this illustrates why you should use explicit transactions for this type of operation, it also illustrates, again, the cost of updating an indexed column in regards to how many rows of indexed columns are locked.
Parameterized queries can only be implemented by using a stored query. Since stored queries have a precompiled query plan, parameterized queries that contain parameters on indexed columns may not execute efficiently. Since the query engine does not know the values to be passed in a parameter in advance, it can only guess as to the most efficient query plan. Based on customer performance scenarios that we have examined, we have discovered that in some instances substantial performance gains can be achieved by replacing a stored parameterized query with a temporary query. This means creating the SQL string in code and passing it to the DAO OpenRecordset or Execute methods of the Database object.
The information below has been available, but not thoroughly documented, since Jet 3.0. Microsoft will not support this information and the sole intention of documenting it here is to have a valid measurement to experiment with the ideas in this white paper. Much of the information is excerpted from the Microsoft Jet Database Engine Programmer’s Guide (available from Microsoft Press). These features were implemented primarily for use by the Jet performance team as a way to better measure and improve Jet performance, thus the availability or similar functionality of these features in future releases of Jet is not guaranteed.
The DAO object model exposes a function that allows the developer to get information about the raw disk reads, writes, locks, and caching. The graphs in this white paper used these functions. The following syntax is used for the ISAMStats function:
ISAMStats(StatNum as Long[, Reset as Boolean]) as Long
This method returns the value of a given engine statistic as defined by StatNum, which is defined in the following table. If the optional Reset argument is supplied, then the statistic defined by StatNum is reset and no value is returned. A Reset argument value of False is equivalent to not supplying the argument. The statistics returned apply to the whole engine, regardless of how many databases or sessions are active, including temporary databases.
StatNum | Description |
0 | Number of disk reads. |
1 | Number of disk writes. |
2 | Number of reads from cache. |
3 | Number of reads from read-ahead cache. |
4 | Number of locks placed. |
5 | Number of release lock calls. |
The following code illustrates a sample use of the ISAMStats function:
Sub Main()
Dim dbs As Database, ws As Workspace
Dim strSQL As String
Dim lngDiskRead As Long, lngDiskWrite As Long, _
lngCacheRead As Long, _
lngCacheReadAheadCache As Long, _
lngLocksPlaced As Long, _
lngLocksReleased As Long
' Explicitly set the counters to zero.
lngDiskRead = DBEngine.ISAMStats(0, True)
lngDiskWrite = DBEngine.ISAMStats(1, True)
lngCacheRead = DBEngine.ISAMStats(2, True)
lngCacheReadAheadCache = DBEngine.ISAMStats(3, True)
lngLocksPlaced = DBEngine.ISAMStats(4, True)
lngLocksReleased = DBEngine.ISAMStats(5, True)
Set dbs = OpenDatabase("northwind.mdb", False, False)
Set ws = Workspaces(0)
strSQL = _
"UPDATE Customers SET ContactName = ContactName"
dbs.Execute strSQL, dbFailOnError
' The null transaction ensures no more asynchronous
' activity that could yield inaccurate statistics.
ws.BeginTrans
ws.CommitTrans
' The following ISAMStats calls will retrieve the latest
' values. The values will accumulate until they
' are reset.
lngDiskRead = DBEngine.ISAMStats(0)
lngDiskWrite = DBEngine.ISAMStats(1)
lngCacheRead = DBEngine.ISAMStats(2)
lngCacheReadAheadCache = DBEngine.ISAMStats(3)
lngLocksPlaced = DBEngine.ISAMStats(4)
lngLocksReleased = DBEngine.ISAMStats(5)
Debug.Print "Disk reads " & lngDiskRead
Debug.Print "Disk writes " & lngDiskWrite
Debug.Print "Cache reads " & lngCacheRead
Debug.Print "Cache reads from RA cache " & lngCacheReadAheadCache
Debug.Print "Locks placed " & lngLocksPlaced
Debug.Print "Locks released " & lngLocksReleased
End Sub
The number of disk reads and writes include all reads and writes in all circumstances, including background read-ahead in separate threads. One read or write doesn’t necessarily equal one page: one read or write could represent many pages that were read or written simultaneously. An example of this is commands wrapped in a transaction. This is why it is important to issue a null transaction to ensure accurate statistics. A null transaction is defined as issuing the CommitTrans and BeginTrans methods with no commands in between the two statements.
There are two types of cached reads returned. The CacheRead statistic reflects pages that are read from previously modified pages that still remain in the cache and have not been modified in the physical database by other users. The read-ahead cache statistic shows reads that occurred when Jet anticipates that a sequential read activity is about to occur. This is done to reduce reads to disk.
The number of locks placed and released may not be balanced. A single call to release a lock may result in the release of many locks.
Using these statistics in conjunction with the SetOption method can allow the developer to instantly see results that may produce timing differences on a LAN that may not be apparent from running on a local machine. This is why many of the graphs above show I/O instead of timings as it more accurately represents what is occurring.
The Microsoft Jet query engine implements a cost-based query optimizer. When a query is compiled, the query engine creates a query plan. This plan is used internally to find the quickest way to execute a query. Using the ShowPlan key in the registry will cause Jet to create a text file containing the query execution plans.
The ShowPlan function was available in Jet 3.0 by adding the following key to the registry:
WARE\MICROSOFT\JET\3.0\Engines\Debug
Under this key, add a string data type called JETSHOWPLAN (make sure to use all capital letters). To turn on ShowPlan, set the value to ON. To turn it off, set the value to OFF.
When ShowPlan is turned on, Jet appends text to a file called SHOWPLAN.OUT every time a query is compiled. You must modify or compact the database in order to have a stored query show its query plan. It is also important to note that SHOWPLAN.OUT appends new data for every new query plan. Leaving ShowPlan on could result in an extremely large SHOWPLAN.OUT file.
Jet 3.5 includes enhancements and bug fixes to ShowPlan. For example, you can now determine the inputs to the query. This is very useful in determining the uniqueness of an index; thus determining how useful that index is in retrieving the overall result set and what affect it may have on concurrency. Since Jet 3.5 utilizes a different registry key structure, the physical location of ShowPlan has changed. To activate ShowPlan for Jet 3.5 you must use this key location:
WARE\MICROSOFT\JET\3.5\Engines\Debug
Below is a sample output generated by running the Invoices query. This query comes with the Northwind database in Microsoft Access 97.
---------------------------------------------
DATE: 01/19/97
VER: 3.50.3428
NOTE: Currently does not handle subqueries, vt [virtual table] parameters, and subqueries.
NOTE: You may see ERROR messages in these cases.
--- Invoices ---
- Inputs to Query -
Table 'Orders'
Table 'Order Details'
Using index 'OrdersOrder Details'
Having Indexes:
OrdersOrder Details 2155 entries, 8 pages, 830 values
which has 1 column, fixed
OrderID 2155 entries, 8 pages, 830 values
which has 1 column, fixed
Table 'Customers'
Using index 'PrimaryKey'
Having Indexes:
PrimaryKey 91 entries, 1 page, 91 values
which has 1 column, fixed, unique, primary-key, no-nulls
PostalCode 91 entries, 1 page, 87 values
which has 1 column, fixed
CompanyName 91 entries, 3 pages, 91 values
which has 1 column, fixed
City 91 entries, 1 page, 69 values
which has 1 column, fixed
Table 'Employees'
Table 'Products'
Table 'Shippers'
- End inputs to Query -
01) Sort table 'Orders'
02) Inner Join table 'Shippers' to result of '01)'
using temporary index
join expression "Shippers.ShipperID=Orders.ShipVia"
03) Sort table 'Employees'
04) Inner Join result of '02)' to result of '03)'
using temporary index
join expression "Orders.EmployeeID=Employees.EmployeeID"
05) Inner Join result of '04)' to table 'Customers'
using index 'Customers!PrimaryKey'
join expression "Orders.CustomerID=Customers.CustomerID"
06) Inner Join result of '05)' to table 'Order Details'
using index 'Order Details!OrdersOrder Details'
join expression "Orders.OrderID=[Order Details].OrderID"
07) Sort table 'Products'
08) Inner Join result of '06)' to result of '07)'
using temporary index
join expression "[Order Details].ProductID=Products.ProductID"
--- temp query ---
- Inputs to Query -
Table 'Customers'
Using index 'CompanyName'
Having Indexes:
CompanyName 91 entries, 3 pages, 91 values
which has 1 column, fixed
City 91 entries, 1 page, 69 values
which has 1 column, fixed
- End inputs to Query -
01) Scan table 'Customers'
Using index 'CompanyName'
--- temp query ---
- Inputs to Query -
Table 'Products'
Using index 'ProductName'
Having Indexes:
ProductName 77 entries, 1 page, 77 values
which has 1 column, fixed
PrimaryKey 77 entries, 1 page, 77 values
which has 1 column, fixed, unique, clustered and/or counter, primary-key, no-nulls
CategoryID 77 entries, 1 page, 8 values
which has 1 column, fixed
CategoriesProducts 77 entries, 1 page, 8 values
which has 1 column, fixed
- End inputs to Query -
01) Scan table 'Products'
Using index 'ProductName'
The Jet team takes performance very seriously and has had a dedicated team for performance since Jet 3.0. However, even with our 1,500 plus performance tests, we cannot encounter all the scenarios that our customers put Jet through. If you believe that you have encountered a performance issue with Jet, please contact Microsoft support with a precise reproducible scenario. They will forward this information to the Jet performance team and we will examine it and try to come up with workarounds or attempt to correct the issue in the next release of Jet.
Another method of letting the Jet team know about features or performance requests specific to the Jet database engine is to send us e-mail at JetWish@Microsoft.com.
If this paper interests you, I would highly recommend purchasing the Microsoft Jet Database Engine Programmer’s Guide from Microsoft Press. While it does not address Jet 3.5 (a revised release is currently being planned), it does have pertinent information on understanding how Jet works and the best ways to use Jet.