5.1.2 Declaring and Initializing Strings

A string is an array of bytes. Initializing a string like "Hello, there" allocates and initializes one byte for each character in the string. An initialized string can be no longer than 255 characters.

Summary: Strings declared with types other than BYTE must fit the memory space allocated.

For data directives other than BYTE, a string may initialize only a single element. This element must be short enough to fit into the specified size and conform to the expression word size in effect (see Section 1.2.4,“Integer Constants and Constant Expressions”), as shown in these examples:

wstr WORD "OK"

dstr DWORD "ADCD" ; Legal under EXPR32 only

As with arrays, string initializers can span multiple lines. The line must end with a comma if you want the string to continue to the next line.

str1 BYTE "This is a long string that does not ",

"fit on one line."

You can also have an array of pointers to strings. For example:

PBYTE TYPEDEF PTR BYTE

.DATA

msg1 BYTE "Operation completed successfully."

msg2 BYTE "Unknown command"

msg3 BYTE "File not found"

pmsg1 PBYTE msg1

pmsg2 BPBYTE msg2

pmsg3 PBYTE msg3

errors WORD pmsg1, pmsg2, pmsg3 ; An array of pointers

; to strings

Strings must be enclosed in single (') or double (") quotation marks. To put a single quotation mark inside a string enclosed by single quotation marks, use two single quotation marks. Likewise, if you need quotation marks inside a string enclosed by double quotation marks, use two sets. These examples show the various uses of quotation marks:

char BYTE 'a'

message BYTE "That's the message." ; That's the message.

warn BYTE 'Can''t find file.' ; Can't find file.

string BYTE "This ""value"" not found." ; This “value”

not found.

You can always use single quotation marks inside a string enclosed by double quotation marks, as the initialization for message shows, and vice versa.

The ? Initializer

Summary: The actual values stored when you use ? depend on the other data in your program.

You do not have to initialize all elements in an array to a value. If there is no initial value, you can initialize the array elements with the ? operator. The ? operator either is treated as a zero or causes a byte to be left unspecified in the object file. Object files contain records for initialized data. An unspecified byte left in the object file means that no records contain initialized data for that address.

The actual values stored in arrays allocated with ? depend on certain conditions. The ? initializer is treated as a zero in a DUP statement that contains initializers in addition to the ? initializer. An unspecified byte is left in the object file if the ? initializer does not appear in a DUP statement, or if the DUP statement contains only ? initializers for nested DUP statements.

Length-Specified Strings

Often there are reasons to know the length of a string. To use the DOS functions for writing to a file, for example, CX must contain the length of the string before the interrupt is called, as shown in this example.

msg BYTE "This is a length-specified string"

.

.

.

mov ah, 40h

mov bx, 1

mov cx, LENGTHOF msg

mov dx, OFFSET msg

int 21h

Some high-level languages also expect strings passed to procedures to have a certain format. For example, Pascal procedures require the first byte of a string passed as a parameter to contain the length of the string. You can write this length into the first byte with

msg BYTE LENGTHOF msg - 1, "This is a Pascal string"

Summary: Interfacing with high-level languages requires special techniques with strings.

Other languages such as Basic have string descriptions—a kind of structure containing both the length and the address of the string. For example, this structure DESC could be used in a procedure accessed from Basic:

DESC STRUCT

len WORD ? ; Length of string1

off WORD ? ; Offset of string1

DESC ENDS

string1 BYTE "This string goes in a string descriptor"

msg DESC {LENGTHOF string1, string1}

See Section 5.2, “Structures and Unions.”

Null-Terminated and $-Terminated Strings

Null-terminated and $-terminated strings have a special use with DOS functions. Strings in modules shared with C need to end with a null character (0).

str1 BYTE "This string ends with a null character", 0

DOS file names also require a null character at the end. This example opens a file named "MYFILE.ASM".

name1 BYTE "MYFILE.ASM", 0

.

.

.

mov ah, 3Dh

mov dx, OFFSET name1

int 21h

DOS function 9 requires a string to end with a dollar sign ($) so that it can recognize the end of the string to write to the screen, as shown in this example.

msg BYTE "This is a dollar-terminated string$"

.

.

.

mov ah, 09h

mov dx, OFFSET msg

int 21h

LENGTHOF, SIZEOF, and TYPE for Strings

Because the assembler considers strings as simply arrays of byte elements, the LENGTHOF and SIZEOF operators return the same values for strings as they do for arrays, as illustrated in this example. The TYPE operator considers msg to be one data unit and returns 1.

msg BYTE "This string extends ",

"over three ",

"lines."

lmsg EQU LENGTHOF msg ; 37 elements

smsg EQU SIZEOF msg ; 37 bytes

tmsg EQU TYPE msg ; 1 byte per element