The 8086-family instruction set has seven string instructions for fast and efficient processing of entire strings and arrays. The term “string” in “string instructions” refers to a sequence of elements, not just character strings. These instructions work directly only on arrays of bytes and words on the 8086–80486 and on arrays of bytes, words, and doublewords on the 80386 and 80486. Processing larger elements must be done indirectly with loops.
The following list gives capsule descriptions of the five instructions discussed in this section. Two additional instructions not described here are the INS and OUTS instructions that transfer values to and from a memory port.
Instruction | Description |
MOVS | Copies a string from one location to another |
STOS | Stores values from the accumulator register to a string |
CMPS | Compares values in one string with values in another |
LODS | Loads values from a string to the accumulator register |
SCAS | Scans a string for a specified value |
All of these instructions use registers in a similar way and have a similar syntax. Most are used with the repeat instruction prefixes REP, REPE (or REPZ), and REPNE (or REPNZ). REPZ is a synonym for REPE (Repeat While Equal) and REPNZ is a synonym for REPNE (Repeat While Not Equal).
This section first explains the general procedures for using all string instructions. It then illustrates each instruction with an example.
The string instructions have specific requirements for the location of strings and the use of registers. To operate on any string, follow these three steps:
Summary: All string operations follow three basic steps.
1.Set the direction flag to indicate the direction in which you want to process the string. The STD instruction sets the flag, while CLD clears it.
If the direction flag is clear, the string is processed upward (from low addresses to high addresses, which is from left to right through the string). If the direction flag is set, the string is processed downward (from high addresses to low addresses, or from right to left). Under DOS, the direction flag is normally clear if your program has not changed it.
2.Load the number of iterations for the string instruction into the CX register.
If you want to process a 100-byte string, move 100 into CX. If you wish the string instruction to terminate conditionally (for example, during a search when a match is found), load the maximum number of iterations that can be performed without an error.
3.Load the starting offset address of the source string into DS:SI and the start-ing address of the destination string into ES:DI. Some string instructions take only a destination or source, not both (see Table 5.1).
Normally, the segment address of the source string should be DS, but you can use a segment override to specify a different segment for the source operand. You cannot override the segment address for the destination string. Therefore, you may need to change the value of ES. See Section 3.1 for information on changing segment registers.
NOTE:
Although you can use a segment override on the source operand, a segment override combined with a repeat prefix can cause problems in certain situations on all processors except the 80386/486. If an interrupt occurs during the string operation, the segment override is lost and the rest of the string operation processes incorrectly. Segment overrides can be used safely when interrupts are turned off or with an 80386/486 processor.
You can adapt these steps to the requirements of any particular string operation. The syntax for the string instructions is:
[[prefix]]CMPS[[segmentregister:]]source, [[ES:]]destination
LODS[[segmentregister:]]source
[[prefix]]MOVS[[ES:]]destination, [[segmentregister:]]source
[[prefix]]SCAS[[ES:]]destination
[[prefix]]STOS[[ES:[[destination
Some instructions have special forms for byte, word, or doubleword operands. If you use the form of the instruction that ends in B (BYTE), W (WORD), or D (DWORD) with LODS, SCAS, and STOS, the assembler knows whether the element is in the AL, AX, or EAX register. Therefore, these instruction forms do not require operands.
Table 5.1 lists each string instruction with the type of repeat prefix it uses and indicates whether the instruction works on a source, a destination, or both.
Table 5.1 Requirements for String Instructions
Instruction | Repeat Prefix | Source/Destination | Register Pair |
MOVS | REP | Both | DS:SI, ES:DI |
SCAS | REPE/REPNE | Destination | ES:DI |
CMPS | REPE/REPNE | Both | DS:SI, ES:DI |
LODS | None | Source | DS:SI |
STOS | REP | Destination | ES:DI |
INS | REP | Destination | ES:DI |
OUTS | REP | Source | DS:SI |
Summary: The instruction automatically increments DI or SI.
The repeat prefix causes the instruction that follows it to repeat for the number of times specified in the count register or until a condition becomes true. After each iteration, the instruction increments or decrements SI and DI so that it points to new array elements. The string instructions work on these elements. The direction flag determines whether SI and DI are incremented (flag clear) or decremented (flag set). The size of the instruction determines whether SI and DI are altered by one, two, or four bytes each time.
These are the conditions that determine the number of repetitions specified by a prefix.
Prefix | Description |
REP | Repeats instruction CX times |
REPE, REPZ | Repeats instruction CX times, or as long as elements are equal, whichever is fewer |
REPNE, REPNZ | Repeats instruction CX times, or as long as elements are not equal, whichever is fewer |
The prefixes apply to only one string instruction at a time. To repeat a block of instructions, use a loop construction (see Section 7.2, “Loops”).
At run time, if a string instruction is preceded by a repeat sequence, the processor takes the following steps:
1.Checks the CX register and exits if CX is 0. If the REPE prefix is used, the loop exits if the zero flag is set; if REPNE is used, the loop exits if the zero flag is clear.
2.Performs the string operation once.
3.Increases SI and/or DI if the direction flag is clear. Decreases SI and/or DI if the direction flag is set. The amount of increase or decrease is 1 for byte operations, 2 for word operations, and 4 for doubleword operations (80386/486 only).
4.Decrements CX (no flags are modified).
5.Checks the zero flag at this point if the REPE or REPNE prefix is used (for SCAS or CMPS). If the repeat condition does not hold, execution proceeds to the next instruction.
6.Proceeds to the next iteration and repeats from step 1.
Summary: At loop end, SI and DI point to the element immediately after the match.
When the repeat loop ends, SI (or DI) points to the position following a match (when using SCAS or CMPS), so you need to decrement or increment DI or SI to point to the element where the match occurred.
Although string instructions (except LODS) are most often used with repeat prefixes, they can also be used by themselves. In this case, the SI and/or DI registers are adjusted as specified by the direction flag and the size of operands. However, you must decrement the CX register and set up a loop for the repeated action.
To use the 8086-family string instructions, apply the steps outlined in the previous section. Examples in this section illustrate each instruction.
You can also use the techniques in this section with structures and unions, since arrays and strings can be fields in structures and unions (see Section 5.2).
The MOVS instruction copies data from one area of memory to another. To move data, first load the count and the source and destination addresses into the appropriate registers. Then use REP with the MOVS instruction.
.MODEL small
.DATA
source BYTE 10 DUP ('0123456789')
destin BYTE 100 DUP (?)
.CODE
mov ax, @data ; Load same segment
mov ds, ax ; to both DS
mov es, ax ; and ES
.
.
.
cld ; Work upward
mov cx, LENGTHOF source ; Set iteration count to 100
mov si, OFFSET source ; Load address of source
mov di, OFFSET destin ; Load address of destination
rep movsb ; Move 100 bytes
The STOS instruction stores a specified value in each position of a string. The string is the destination, so it must be pointed to by ES:DI. The value to store must be in the accumulator.
This example stores the character 'a' in each byte of a 100-byte string. Notice that it does this by storing 50 words rather than 100 bytes. This makes the code faster by reducing the number of iterations. To fill an odd number of bytes, you would have to adjust for the last byte.
.MODEL small, C
.DATA
destin BYTE 100 DUP (?)
ldestin EQU (LENGTHOF destin) / 2
.CODE
. ; Assume ES = DS
.
.
cld ; Work upward
mov ax, 'aa' ; Load character to fill
mov cx, ldestin ; Load length of string
mov di, OFFSET destin ; Load address of destination
rep stosw ; Store 'aa' into array
The CMPS instruction compares two strings and points to the address after which a match or nonmatch occurs. If the values are the same, the zero flag is set. Either string can be considered as the destination or the source unless a segment override is used.
This example using CMPSB assumes that the strings are in different segments. Both segments must be initialized to the appropriate segment register.
.MODEL large, C
.DATA
string1 BYTE "The quick brown fox jumps over the lazy dog"
.FARDATA
string2 BYTE "The quick brown dog jumps over the lazy fox"
lstring EQU LENGTHOF string2
.CODE
mov ax, @data ; Load data segment
mov ds, ax ; into DS
mov ax, @fardata ; Load far data segment
mov es, ax ; into ES
.
.
.
cld ; Work upward
mov cx, lstring ; Load length of string
mov si, OFFSET string1 ; Load offset of string1
mov di, OFFSET string2 ; Load offset of string2
repe cmpsb ; Compare
jcxz allmatch ; CX is 0 if no nonmatch
.
.
.
allmatch: ; Special case for all match
The LODS instruction loads a value from a string into a register. The string is the source; the value is in the accumulator. This instruction normally is not used with a repeat instruction prefix, since something must be done with each element before going on to the next.
The code in this example loads, processes, and displays each byte in a string of bytes.
.DATA
info BYTE 0, 1, 2, 3, 4, 5, 6, 7, 8, 9
linfo WORD LENGTHOF info
.CODE
.
.
.
cld ; Work upward
mov cx, linfo ; Load length
mov si, OFFSET info ; Load offset of source
mov ah, 2 ; Display character function
get:
lodsb ; Get a character
add al, '0' ; Convert to ASCII
mov dl, al ; Move to DL
int 21h ; Call DOS to display character
loop get ; Repeat
The SCAS instruction scans a string for a specified value. As the loop executes, this instruction compares the value pointed to by DI with the value in the accumulator. If values are the same, the zero flag is set.
After a REPNE SCAS, the zero flag is cleared if no match was found. After a REPE SCAS, the zero flag is set if all values matched.
This example assumes that ES is not the same as DS and that the address of the string is stored in a pointer variable. The LES instruction loads the far address of the string into ES:DI.
.DATA
string BYTE "The quick brown fox jumps over the lazy dog"
pstring PBYTE string ; Far pointer to string
lstring EQU LENGTHOF string ; Length of string
.CODE
.
.
.
cld ; Work upward
mov cx, lstring ; Load length of string
les di, pstring ; Load address of string
mov al, 'z' ; Load character to find
repne scasb ; Search
jcxz notfound ; CX is 0 if not found
. ; ES:DI points to character
. ; after first 'z'
.
notfound: ; Special case for not found