Memory Problems
If your PC crashes randomly and inconsistently, you may have memory problems. Memory problems are not uncommon and are documented in the Knowledge Base.
This section presents general information on preventing memory problems, followed by a more detailed discussion of memory errors.
In general, you should first carefully clean the system of dust. This includes the areas allowing ventilation so that heat does not build up abnormally. The contacts of all boards and SIMMs should be cleaned. Be certain that all boards are firmly seated in their slots or sockets. It may be necessary to replace old cabling, which will degrade over time and under high temperatures. Power supplies can also cause many problems, so have the output voltages checked, if possible. Even monitors can cause strange behaviors on your system. Computers should be placed on some type of surge suppression power strip since after a power outage occurs, the return of power is usually a fairly high surge and can permanently damage sensitive electrical components of your system.
When 9-bit memory detects a parity difference, it signals the CPU through a Non-Maskable Interrupt (NMI). Depending on where and when this happens, Windows NT determines if this is an I/O board parity error, memory bus error, etc. Windows NT can also report I/O channel parity errors from cards in slots.
Since memory errors are very serious, the system shuts down.
Eight-bit memory doesn't do parity checking. When the system is having single-bit memory errors, which only seems to happen on 8-bit memory, then we are using corrupted memory.
Microsoft has been using a high quality SIMM tester to study what may be causing some of the NMI Memory Parity Errors on Windows NT. Although the results are not conclusive and the research continues, the information is important enough to include here.
Both IBM's OS/2 2.x and Windows NT experience problems that appear to be associated with system memory in some circumstances. It is frustrating to have a system that is able to run MS-DOS, Windows 3.1, or OS/2 1.x and suddenly find it can't run Windows NT due to memory problems. The first issue to clear up is that not all NMI errors are due to memory. Other boards in the system can cause this problem, and even components directly on the system motherboard can be at fault.
In addition, the timing of the memory is quite critical to Windows NT. Speed drifting in the range of 15ns can cause extreme memory problems and not be reported as an NMI Parity Error.
When memory is at fault, it can be for any of the following reasons:
- • The memory is not functioning at the specified access rate as required by the system board. If the system specifications call for 80ns access rate, Windows NT will most likely fail if memory is really accessing at a slower rate such as 90ns. Even though the chips may be marked as 80ns, some fail to meet this access rate. Quite often chips will run at a slower speed when they reach operating temperature. This produces an effect called "speed drift." The symptoms are a system that runs Windows NT when first turned on, but after 15 minutes or so will start having memory errors. A high quality SIMM tester can cycle the chips through various voltage and heat cycles, so this is fairly easy to see.
- • The memory meets the system specifications but the speeds are different between individual SIMM modules. The average access rate may be 70ns on one SIMM module while the next module is running at 60ns. SIMMs stamped at the factory as 70ns average access rate can actually be running as fast as 50ns. Although the SIMMs are obviously well under the system-required access specifications, the difference of 10ns or more between them can often cause problems on some systems. If you can move these to a different system board that is using a different BIOS and Chipset, it may not have any memory problems. This is because each BIOS and Chipset regulate the "refresh wait states" used for timing, and this difference often allows for variance in speed to be acceptable. If your system's BIOS allows you to adjust the "wait states" for memory refresh, this often will allow the system to run with SIMMs or DRAM memory chips that are running at different access rates. The downside to increasing the number of wait states is a slower system.
- • The individual chips on the SIMM module are running at different access rates. Determining this requires a sensitive memory testing device. It must be able to gauge the access rate of each individual bit (chip) on the module. A difference of 10ns or more between bits has been known to cause problems. This also can be regulated somewhat by the BIOS and Chipset of the system board if it allows you to lengthen the refresh wait states for memory access.
- • One of the memory chips is being affected by "cell leakage." This ends up being a true parity error and is also known as a "soft error." This occurs when the change in the state of an individual cell (a zero or one) electrically leaks into a neighboring cell, changing its state. When the memory is read back, it no longer matches the parity bit's checksum value, and an NMI is issued to the processor signaling a parity error has occurred. This memory SIMM must be replaced. If problems persist with replacement chips, it is quite possible a voltage or heat anomaly is occurring with the socket or circuitry, which is damaging the chips.
- • Cache memory is another thing to suspect. There are cases where the cache memory access rates were too slow, which caused enormous problems. On most 486 computers, 15ns to 25ns is normal. You will most likely have problems if it is slower than 25ns. The system manufacturer can provide the specifications and locations of these chips.