INF: CE_OVERRUN Errors with Serial Communications

ID Number: Q79988

3.00

WINDOWS

Summary:

The communications driver (COMM.DRV) that is shipped with Windows 3.0

will return CE_OVERRUN errors under "stress" conditions. This article

discusses this error and the conditions that can cause it to occur.

More Information:

When using the communications functions in Windows 3.0, an application

may encounter a CE_OVERRUN error. The frequency of this error is

highly dependent on the machine configuration; however, the baud rate

is normally the most critical factor. Generally, the higher the baud

rate, the greater the probability a CE_OVERRUN error.

The CE_OVERRUN error indicates an "overrun" of the receive buffer in

the universal asynchronous receiver transmitter (UART). The COMM

driver obtains this error by reading the line status register (LSR) of

the UART -- of which bit 1 will be set when an overrun error occurs.

When the COMM driver cannot service a Received Data Available

interrupt before the next transmitted character is received, the UART

signals an error because there is no way to recover this old

character.

The COMM driver can't service the interrupt in time because interrupts

were disabled for an extended period of time -- this condition is

commonly referred to as "interrupt latency."

Some of the biggest contributors to interrupt latency are terminate-

and-stay-resident (TSR) programs such as network drivers, pop-up

utilities, and drive cache programs. TSRs are not known to cooperate

well with other interrupt-intensive applications, such as the COMM

driver.

Processing takes place in this fashion for performance reasons.

However, it does leave Windows vulnerable to ill-behaved applications.

The COMM driver is one driver that is very sensitive to ill-behaved

applications. At 9600 baud, characters can arrive at a frequency of

approximately 1000 Hz (1 character every millisecond); at 19.2K baud,

that frequency doubles. This is near the limit for an interrupt

service routine (ISR) running under Windows.

There is an additional problem that may cause CE_OVERRUN errors and

data loss when running in standard mode on an 80286 computer. Windows

must regularly switch the processor from protected mode into real mode

to pass data to MS-DOS, MS-DOS drivers, and TSRs. This mode switch

occurs a minimum of 18.2 times per second (to update the time-of-day

clock). Because the 80286 does not support a method to switch the

processor from protected mode into real mode, the system must reset

the processor. This "trick" requires up to 1 millisecond to complete.

In addition, during the transition, interrupts are disabled.

Therefore, a COMM interrupt may get lost during one of these

transitions, which is more of a problem on some machines than others.

The speed of the transition varies depending on the speed of the

processor, the speed of installed memory, and other factors. If

Windows is running in standard mode on an 80386 computer, this should

not be a problem because the 80386 is designed to make the protected

mode to real mode transition easier and faster.

There are even more factors that can affect interrupt latency in the

enhanced mode of Windows. CE_OVERRUN errors can occur more often if

more than one virtual machine (VM) is running (that is, standard

MS-DOS applications are running under Windows).

The source of the problem is the enhanced mode Windows virtual machine

architecture. There is a certain amount of overhead associated with

virtualizing interrupts and device ports. However, the largest problem

in this area is interrupt latency caused by transitions between VMs.

Enhanced mode Windows performs preemptive multitasking between VMs.

Several times each second, Windows performs a "task switch" from one

VM to another. If COMMAND.COM or an MS-DOS application is running

under Windows, two VMs are present in the system. If a COMM port

interrupt occurs while the MS-DOS VM is active, the interrupt cannot

be processed because the COMM driver cannot be called until the

Windows VM becomes active again. Normally, this will be handled

quickly enough, but there are times when the switch does not occur

fast enough. Windows 3.0 has a virtual device driver to buffer

characters until the correct VM is scheduled (the VxD is called

COMBUFF.386 and is automatically installed with Windows when a machine

is capable of running in enhanced mode). COMBUFF.386 helps to correct

this problem considerably; however, it does not eliminate the problem.

The following three items are factors that will affect interrupt

latency and task switches in enhanced mode:

1. MS-DOS spends most of its time in a state that prevents Windows

from performing a task switch. Almost every MS-DOS call places

MS-DOS into this state. Therefore, task switches cannot occur

during file I/O, directory manipulation, screen I/O, getting or

setting the system time, and so on. Even running COMMAND.COM in a

Windows window incurs MS-DOS calls to blink the cursor.

Large amounts of file I/O in an MS-DOS application may cause a

great deal of time to be spent inside MS-DOS. Floppy disk file I/O

has the greatest impact. If too much time is spent in MS-DOS, a

task switch will not occur soon enough for the COMM interrupt to be

processed correctly.

2. Interrupts are not processed when interrupts are disabled. There

are various times in Windows and in MS-DOS when interrupts are

disabled. Also, Windows provides expanded memory (EMS) emulation

for banking Windows applications into and out of conventional

memory. Interrupts are disabled during the EMS task switch. These

times are generally very short, but when they occur in conjunction

with task switch latency, they can combine to cause the problem.

3. Higher-priority interrupts get processed before lower-priority

interrupts. This often is not a problem; however, it has been seen

more often when using a serial mouse. The mouse has a higher

interrupt priority than the COMM port. If the mouse is using one of

the COMM ports, communication with the mouse is relatively slow.

Thus, if the mouse is very active, mouse processing takes a high

priority and a relatively long time to complete. Therefore, a COMM

interrupt may be missed.

In all of these cases, a faster machine performs better because the

time spent in MS-DOS, or with interrupts disabled, is less, and

therefore more time is available to process the COMM interrupts.

Running Windows in standard mode on an 80386 machine eliminates the

virtualizing layer of Windows and yet retains the improved mode switch

of the 80386 processor. Eliminating unnecessary TSRs and MS-DOS

drivers may help; decreasing the number of applications and VMs

running simultaneously can only help matters.

There is no easy, general method to work around this problem. In

general, the COMM.DRV will perform better at lower baud rates (300 to

2400 baud). As in all serial communications, the only way to guarantee

that data is not lost is to use a packet/protocol transmission scheme.

Using this type of protocol, it is possible to detect errors in

packets and, more importantly, to request retransmission of an

incorrectly received packet.

Another thing to keep in mind while when developing a communications

application is to test with dedicated hardware. For instance, if two

computers are running Windows 3.0 in enhanced mode, and data is sent

back and forth at 19.2K baud, the system is NOT under stress. Both

machines are affected by the interrupt latency inherent in Windows's

enhanced mode. Therefore, the effective throughput is probably NOT

19.2K baud. A single character is transmitted at 19.2K baud; however,

this does not account for the delay between characters.

Instead, dedicate the "remote" machine to an MS-DOS terminal program

that can maintain an effective throughput of 19.2K baud. It is quite

likely that it will be necessary to decrease the baud rate to 9600 or

even to 2400.

Another option is to modify the COMM.DRV to support a buffered UART.

COMM.DRV source is provided with the Windows Device Development Kit,

version 3.0. The INS16550A UART is pin-for-pin compatible with the

8250 and 16450.

Additional reference words: 3.0