Ross M. Greenberg
You probably don't consider the lowly parallel port as versatile as its cousin, the serial port. It has been relegated to handling only the most mundane of tasks-printing, plotting, perhaps an analog control function now and then, sitting lonely the rest of the time. The port is powerful, but there's only so much you can do with a unidirectional port, right?
Not once you realize that the standard parallel port is actually bidirectional. In fact, if you want to, you can input or output on as many as twelve lines at once. Compare that to the serial card, which can only input or output on one line, one bit at a time. A parallel port can output a byte and a half in the time it takes a serial port to determine whether it's time to cause a singular transition.
Of course, the RS-232 serial port is the standard when it comes to communicating with modems, WANs, and so forth, and a large variety of software that supports serial communication is available. Why, then, try to use the parallel port for the sort of communication usually associated with the serial port, other than the programming challenge?
First, communication doesn't always need to be performed in ASCII-sometimes it's as easy as turning on or off a wire to a relay, or reading the on/off condition of a sensor. Using serial I/O wouldbe overkill in these cases: it's sometimes easier to connect the real world to the parallel port.
Second, since the parallel port is able to get multiple bits in or out simultaneously, it should be able to provide faster I/O than a serial port. Special code will be needed to take advantage of this bidirectional capability. First, however, you have to understand the hardware.
Hardware
The parallel port is one of the simplest components of your computer (see Figure 1). There are four distinct parts. The address decoding portion is activated when a specific address is on the bus. The write logic takes the data on the bus and gives it to the third part, the physical input/output portion. The fourth part makes the data from the input/output portion available to the bus when it is requested.
Four addresses enable the parallel port, together with the or bus lines. These lines indicate that a port input or output operation is taking place on the bus. Normal memory reads and writes do not set these pins, so the card ignores those types of operations.
The various combinations of wires allow for reading and writing the data and status lines on the DB25 connector. The characteristics of the output chips, the 374 for the data output and the 174 for the control signal output, cause the last value output to them to be destructively OR'ed with the data being read. This means that before data is read in from the corresponding 244 and 240 chips, a logical zero must be sent out the data and control ports.
The diagram in Figure 1 shows that as many as 12 outputs can be made available to the DB25 at a given time, and that all 17 active pins (the other 8 are grounded) can be used for input of one form or another. Clearly, there is a shortage of output, not input, on the parallel adapter-not what you were expecting!
Let's examine what happens if you attempt to connect the parallel ports of two PCs. Not all parallel adapters were created equal. When two adapters are fighting for simultaneous control of those data lines (Pins 29) often one of them will win out over the other, making the data read on those lines unreliable. The solution seems to be to wire the data output lines from one PC to the control input lines of the other, utilizing eight of the nine wires connected to Pins 1 and 1017. But deciding which wire to connect to each pin is a problem. Whereas the serial port is synchronized on a character-by-character basis with start and stop bits, the parallel port is not synchronized with anything, unless you explicitly synchronize it. Whatever those eight data lines are attached to, the data made available on them will be available at slightly different times. Also, the receiving machine must be told at some point that data is ready on those lines. That seems easy enough to do: you simply raise a signal on some line to tweak the remote machine when data is available. Since the parallel port has its own interrupt, IRQ7, using it for this purpose seems to be the logical way of doing things. You simply trigger an interrupt when data is available, and let the remote machine's interrupt service routine handle the rest.
Unfortunately, this method would need two pins: the output line that you'd tweak would be connected to the ACK pin in the other machine that causes an interrupt to occur. If you exclude the data output pins from the potential pool of pins available for outputting the data ready signal (since they're already in use for the data), that leaves us with nine pins. Eight of these will be used for input from the remote, leaving only one free pin. You're one pin short of what is needed. If only the original design allowed for the ACK control bit to be a bidirectional one!
If polled I/O is used instead of interrupt-driven I/O there isn't any problem: the full eight bits are transferred across at once, and strobing of a "data ready" pin can be accomplished through one of the bidirectional wires, such as Pin 1. Consider, though, how the data is actually transmitted: a cable allowing for bidirectional 8-bit transfer requires eight wires in each direction for the data, a common ground wire, and a means of communicating the "data ready" condition distinct from the "data received" condition. Once again, a problem can arise because one parallel printer adapter is a little stronger electrically than another. Not all Pin 1's were created equal; and there is no guarantee that a bidirectional port can be simply hooked up with a straight wire to another one. Eight-bit data transfer might be out of the question.
What's a programmer to do? The answer is, split the byte in half and send one nibble at a time. If you take a look at the two input registers, the status and input ports 3BDH & 3BEH on a monochrome display/printer adapter card (379H and 37AH on a simple parallel printer adapter), the answer starts to become obvious (see Figure 2). A read on 3BEH/37AH allows for 4 bits of input directly from the pins (Pins 1, 14, 16 and 17) and an additional bit showing the interrupt enable state: not enough to have a control bit, alas.
The status port, however, has 4 bits of data plus the ability to raise or lower a control line. Looks like the right port for the job.
Now, however, another problem arises: each byte is going to be sent in two pieces, but each piece must still be synchronized. There is no guarantee that what appears to be a simultaneous output on five lines will be received as such on the remote site. A protocol must be established that permits a machine, regardless of clock speed, to determine when the data on its data lines is legitimate. Without timing loops, it seems that transition of the control pin from high to low and low to high can be used for this purpose.
To demonstrate this technique, as well as the parallel port's capability for rapid transfer, SENDFILE.C was created (see Figure 3). All of the actual send/receive routines can be found in PAR_POLL.ASM (see Figure 4). On a 386 at 16MHz, a sustained transfer rate of over 35K bytes per second was measured. SENDFILE has successfully transferred files between AT- and MCA-style parallel ports.
This is the algorithm used in PAR_POLL.ASM:
Sender
1. Send the high nibble, after setting the high fifth bit, the control bit.
Receiver
1. Stay in a tight polling loop until the high bit, or control bit, on the input control port is on.
2. Acknowledge that nibble by setting the high bit appropriately.
Sender
2. Wait for the remote site to acknowledge the nibble by raising its high bit on the input control port, BASEPORT + 1. A tight polling loop is best for this.
3. Output the second nibble of the byte after turning off the high bit.
Receiver
3. Stay in a tight polling loop while waiting for the control bit to go low, indicating the second nibble is ready.
4. Acknowledge the last nibble of the byte by turning off the high bit and outputting a dummy byte to the remote system.
5. Process the character just received.
Sender
4. Wait for the remote site to acknowledge receipt of the second half of the byte by dropping the high bit read from the sender's input control port.
5. Repeat the process.
This scheme has a number of advantages. First, because 3BDH/379H is used for both data and the control bit, the number of I/Os needed to accomplish a single byte's transfer is reduced. Only two inputs and two outputs are required of 3BDH/379H to get the two nibbles from the receiver's viewpoint, and only four I/Os are needed to get the nibbles and control information out from the sender's viewpoint.
If the control pin used is Bit 6 of 3BDH/379H (Pin 10), interrupt processing can be used. This allows a TSR to read the initial nibble regardless of other processing in the foreground and potentially emulate BIOS service 14H, which is used for serial communications. A disadvantage is that having a control bit in the middle of a nibble makes processing two nibbles into a byte slightly more difficult.
If a buffered I/O routine is needed, one that can store data as it arrives and return it only when requested, it might be implemented as a TSR. Interrupt processing is perfect as a background task for TSRs. The code must be changed substantially, however, and making it into a virtual BIOS service replacement for interrupt 14H is beyond the scope of this article.
Figure 5 shows PAR_INT.ASM, an interrupt-driven parallel routine. The logic of an interrupt-driven routine differs greatly from the logic of a polled system. Interestingly enough, due to the high overhead incurred by the interrupt-driven routine, the throughput is substantially lower. Each character transmitted requires an interrupt to be generated on each system ( both the sender and the receiver) and some polling must take place, as well. (A hybrid system that generates only a single interrupt on the receiving system is possible, too, but would be less fun to write.) As an example of interrupt-driven I/O, PAR_INT.ASM demonstrates simple keyboard buffering techniques. It stores data in a buffer, then sends it when told to do so simply by initiating the first nibble, and letting interrupts handle the rest. This is the logic for the interrupt routine.
Sender
(send_byte subroutine)
1. Remove a character from the transmit circular buffer.
2. Take the low nibble of the character, and insert a logic one between the third and fourth bits of the nibble, left-shifting the fourth bit into the fifth bit position. Because the hardware will reverse the topmost bit, reverse it now. Send the byte out the port. Return to the calling subroutine.
Receiver
1. Upon receiving the interrupt caused by the fourth bit being held high, get back the original low order nibble.
2. Generate an interrupt on the Sender side of the wire by outputting a high fourth bit.
Sender
3. Upon receiving the interrupt, if in the middle of a send, vector off the main interrupt routine.
(second_nibble_routine)
4. Send the final nibble of the character. This time, though, insert a zero bit in the fourth bit position to signal the remote system that it may process the second nibble.
Receiver
3. Wait for the fourth bit position to clear on the port.
4. When clear, build the final nibble for the character, build the character and then stuff it to the input circular buffer.
5. Return from interrupt.
Sender
5. If there are more characters in the transmit circular buffer, get the next one and call the send routine.
6. IRET back after clearing interrupts.
A simple transfer program could be grafted on to PAR_INT.ASM, but be careful! The timing on this system bodes ill for bidirectional interrupt-driven polling I/O.
Summary
The standard PC parallel port could have been designed a little more carefully. If it had been, you could have gotten true 8-bit I/O, and potentially simultaneous bidirectional I/O, at an extraordinary throughput. Ideally, the output of a byte would require two port output and one port input instructions (output the data, strobe the port, read the ACK), and the receive would require one input and one output instruction (interrupt generated on the strobe, one read of the data, one output to acknowledge the data). However, the routines presented here are not substantially more resource-intensive, requiring a few more IN and OUT instructions and a carefully constructed low-level protocol.
In my environment, where all of my serial ports are used up by modems, serial printers, and other devices, it's nice to be able to transfer files quickly through my unused parallel ports. Adding a few lines of code, you should be able to transfer files in the background on each machine, access remote devices transparently (including remote printing and printer sharing), and perhaps even use a remote machine's hard disk as a hot back-up disk.