INF: SQL Server and Caching Disk ControllersLast reviewed: April 28, 1997Article ID: Q86903 |
The information in this article applies to:
- Microsoft SQL Server versions 4.2x for OS/2
SUMMARYUse of a write-caching disk controller can seriously jeopardize the normally reliable SQL Server data integrity. Significant data corruption can result from a system failure when a write-caching controller is used. The normally reliable SQL Server recovery mechanism can be compromised by this type of controller. Recent advances in hardware design coupled with a need for high disk performance on server platforms make it increasingly likely that a SQL Server hardware platform utilizes a write-caching disk controller. It is advisable to determine whether a given SQL Server has a write-caching controller, or whether the disk drives themselves contain a write cache. You should check with the hardware vendor for this information. Explain to the vendor that your system is to be used as a database server, and the write-ahead log mechanism generally requires that writes not be cached. The only exception is if the hardware write caching mechanism on the server was designed with a database server in mind. It is technically possible for a hardware write cache to be safe for SQL Server, but only if certain criteria are met by the hardware write cache design. Essentially all possible conditions that could result in the discarding of dirty or updated pages in the write cache must be considered and protected against.
MORE INFORMATIONSQL Server data modification statements generate logical page writes. This stream of writes can be pictured as going two places: the log and the database itself. For performance reasons, SQL Server defers writes to the database via its own cache buffer system. Writes to the log are only momentarily deferred until COMMIT time. They are not cached in the same manner as data writes. Because log writes for a given page always precede the page's data writes, the log is sometimes referred to as a "write-ahead" log. Transactional integrity is one of the fundamental concepts of a relational database system. Transactions are considered to be atomic units of work that are either totally applied or totally rolled back. The SQL Server write-ahead transaction log is a vital component in implementing transactional integrity. Any relational database system must also deal with a concept closely related to transactional integrity, which is recovery from unplanned system failure. A variety of non-ideal, real-world effects may cause this failure. On many database management systems, system failure may result in a lengthy human-directed manual recovery process. In contrast, the SQL Server recovery mechanism is completely automatic and operates without human intervention. For example, SQL Server could be supporting a mission-critical production application, and experience a system failure due to a momentary power fluctuation. Upon restoration of power, the server hardware would restart, networking software would load and initialize, and SQL Server would restart. As SQL Server initializes, it will automatically run its recovery process based on data in the transaction log. This entire process occurs without human intervention. Whenever the client workstations restarted, users would find all of their data present, up to the last transaction they entered. SQL Server transactional integrity and automatic recovery constitute a very powerful time-and-labor saving capability. Unfortunately, use of a write- caching disk controller can compromise the ability of SQL Server to recover. Such a controller intercepts SQL Server transaction log writes, buffering them in a hardware cache on the controller board. This improves performance significantly, but if system failure occurs for any reason, the volatile data in the hardware cache may be lost, jeopardizing data integrity. Most caching controllers perform write caching. The write caching function cannot always be disabled. Even if the server uses an uninterruptible power supply (UPS), this does not guarantee the security of the cached writes. Many types of system failures can occur that a UPS does not address. For example, a memory parity error, an operating system trap, or a hardware glitch that causes a system reset can produce an uncontrolled system interruption. A memory failure in the hardware write cache can also result in the loss of vital log information. Another possible problem related to a write-caching controller may occur at system shutdown. It is not uncommon to "cycle" the operating system or reboot the system during configuration changes. Even if a careful operator follows the operating system recommendation to wait until all disk activity has ceased before rebooting the system, cached writes can still be present in the controller. When the CTRL+ALT+DEL key combination is pressed, or the RESET button is pressed, cached writes can be discarded, potentially damaging the database. It is possible to design a hardware write cache which takes into account all possible causes of discarding dirty cache data, which would thus be safe for use by a database server. Some of these design features would include intercepting the RST bus signal to avoid uncontrolled reset of the caching controller, on-board battery backup, and mirrored or ERC (error checking & correcting) memory. Check with your hardware vendor to ensure that the write cache includes these and any other features necessary to avoid data loss.
|
Additional query words:
© 1998 Microsoft Corporation. All rights reserved. Terms of Use. |