Insufficient Resource Space

If a Microsoft SQL Server configuration option is set too high for the amount of available resources, SQL Server fails to start. For example, if the max server memory setting is too high, other applications may take some time to start.

Reset configuration options to their default values as described in SQL Server Books Online, or start SQL Server with minimal configuration by using the -f startup option of the sqlservr application.

Determining When SQL Server Causes a Windows NT Blue Screen

Infrequently, Microsoft Windows NT may either halt with a STOP screen or the console may become completely frozen and unresponsive. This is commonly called a blue screen. This may sometimes happen on a computer where Microsoft SQL Server is running, or may coincide with a particular SQL Server operation such as the bcp utility, a long-running query, and so on.

The vast majority of time, this indicates an operating system, device driver, or hardware problem and should be pursued as such. The Windows NT user or kernel mode process isolation ensures that a user mode application problem does not cause the operating system to stop responding. This section discusses exceptions to this and ways to determine whether to troubleshoot the problem at the system or application layer.

Sometimes the cause of a computer failing to respond or blue screen may be a nonmaskable interrupt (NMI) error. This is sometimes visible as an error code stating NMI, parity check, or I/O parity check. NMI errors are almost always hardware. Usually they are caused by a memory failure; however, they can originate in other hardware subsystems such as video boards. Even if the NMI error happens only during certain SQL Server operations, and if the system passes initial hardware diagnostics, it should still be considered a hardware problem and pursued as such. It may be necessary to use a dedicated memory SIMM testing device, which can often find a transient memory error that eludes software-based diagnostics.

Processes exist on Windows NT in either user mode or kernel mode (sometimes called supervisor or privileged mode). In the Intel x86 architecture, user mode maps to ring 3 and kernel mode to ring 0 of the 4-ring protection system. The x86 architecture has been carried forward with little change in all Intel and compatible processors to date, including the Pentium Pro and Pentium II. Processors such as the Alpha AXP typically have unprivileged and privileged modes as well.

Kernel mode is a privileged processor mode in which a thread has access to system-wide memory (including that of all user-mode processes) and to hardware. By contrast, user mode is a nonprivileged processor mode in which a thread can only access system resources by calling system services.

A user mode process cannot access kernel mode memory, or access memory of another user mode process. This is enforced by processor hardware, in conjunction with kernel mode data structures such as Page Tables. For more information, see the 80386 Programmer's Reference Manual, the 80386 System Software Writer's Guide, or equivalent Alpha AXP documentation.

As a result of this protection system, a user mode application generally cannot stop responding, cause a blue screen, or otherwise cause a failure in the Windows NT operating system. Such problems should be pursued primarily at the system layer as an operating system, device driver, or hardware issue.

Although an application error cannot cause a failure in the operating system, an operating system error can cause an application to stop responding. This is because of the general rule: applications must call inward (to kernel mode), but the operating system can reference outward to user mode freely at any time. A microkernel-influenced architecture such as Windows NT may in turn dispatch certain work to a user-mode system process rather than perform the work in kernel mode. However, the overall principal remains the same: processor hardware enforces process-context isolation, which prevents one process from causing a failure in another, whether one or both are in user mode.

If a user mode application passes an invalid parameter in a Win32 API call, it is the operating system's responsibility to validate this parameter. In very rare cases, passing an invalid parameter may cause a Windows NT blue screen error. However, this is an operating system issue, and should be debugged and pursued as such.

There are a few narrow exceptions to the above guidelines. These exceptions can be easily and quickly eliminated.

Winlogon Problem Caused by SQL Extensible Performance Counters

Current Microsoft Windows NT architecture stipulates that any extensible performance counters added by a service will run in the process context of the Windows NT Winlogon process. Because Winlogon is a vital component of the operating system, a bug or resource leak in any performance counter DLL may disrupt Winlogon, and hence the operating system. The Microsoft SQL Server extensible performance counter DLL is called Sqlctr70.dll, and it exports several SQL Server-specific objects from SQL Server to the operating system. You can use Windows NT Performance Monitor to monitor these objects.

Although it is very rare for Sqlctr70.dll to cause a Winlogon problem, you can find problems more quickly when pursuing a Windows NT failure or blue screen problem on a computer running SQL Server by renaming this DLL. This eliminates the use of SQL Server performance counters; however, you can still use Windows NT Performance Monitor to monitor SQL Server by using regular Windows NT performance counters (such as threads, process, memory, and so on).

If renaming Sqlctr70.dll fixes the problem, and if this is confirmed by reinstating and removing the DLL several times, the problem should be pursued as a SQL Server issue. Otherwise, it should be pursued as a system layer issue.

Resource Leak

If a resource leak continues for a long period of time, the operating system should return the appropriate return code to the application, which should log this. For example, if you receive operating system error 8, "Not enough storage," the operating system should handle the situation by not granting further resource requests. However, a continued application resource leak may not be handled by the operating system under all conditions, resulting in a blue screen or operating system or application failure.

Almost all resource leaks will manifest themselves as a gradual increase in consumption of some resource, such as handles, virtual memory, private bytes, and so on. Therefore, the easiest way to rule in or out a resource leak is to run Windows NT Performance Monitor and log all objects to a file. When the problem occurs, examine the logged performance data for signs of a leak. Some good counters to examine are: handle count, page file bytes, pool paged bytes, pool nonpaged bytes, private bytes, thread count, virtual bytes, and working set for each process running on the computer.

It is not necessary to classify certain values as normal or abnormal. Focus on identifying leaks by the continuous nature of the increase, not by the absolute value at a given time. Remember it is normal for the Windows NT Performance Monitor private bytes counter for Microsoft SQL Server to start well below the configured min server memory setting value, and then increase with activity until it roughly approaches, but does not significantly exceed, that value.

If one of the logged Windows NT Performance Monitor counters continuously increases for the Sqlservr.exe process, and if reaching a certain value repeatedly coincides with a Microsoft Windows NT blue screen or operating system failure, it should be pursued temporarily as a Microsoft Windows operating system issue until the cause of the continuous SQL Server resource leak is understood. Otherwise, it should be pursued as a system layer problem.

CPU Monopolization

If a process spawns high priority threads that are continuously in a runnable state, this process can dominate the computer and prevent the operating system from running. A properly configured Microsoft SQL Server computer will not cause this problem. However, under some conditions, the operating system may appear to stop responding. For example, boosting the priority boost too high may drain resources from essential operating system and network functions, resulting in problems shutting down SQL Server or using other Microsoft Windows NT tasks on the server. In general, you should leave priority boost setting at the default.

When pursuing an operating system failure, verify that the SQL Server configuration settings mentioned above are at their default values. Then, if the operating system or application failure recurs, it should be pursued as a system layer problem.