Program Segments

The term segments refers to two discrete programming concepts: physical segments and logical segments.

Physical segments are 64 KB blocks of memory. The Intel 8086/8088 and 80286 microprocessors have four segment registers, which are essentially used as pointers to these blocks. (The 80386 has six segment registers, which are a superset of those found on the 8086/8088 and 80286.) Each segment register can point to the bottom of a different 64 KB area of memory. Thus, a program can address any location in memory by appropriate manipulation of the segment registers, but the maximum amount of memory that it can address simultaneously is 256 KB.

As we discussed earlier in the chapter, .COM programs assume that all four segment registers always point to the same place——the bottom of the program. Thus, they are limited to a maximum size of 64 KB. .EXE programs, on the other hand, can address many different physical segments and can reset the segment registers to point to each segment as it is needed. Consequently, the only practical limit on the size of a .EXE program is the amount of available memory. The example programs throughout the remainder of this book focus on .EXE programs.

Logical segments are the program components. A minimum of three logical segments must be declared in any .EXE program: a code segment, a data segment, and a stack segment. Programs with more than 64 KB of code or data have more than one code or data segment. The routines or data that are used most frequently are put into the primary code and data segments for speed, and routines or data that are used less frequently are put into secondary code and data segments.

Segments are declared with the SEGMENT and ENDS directives in the following form:

name SEGMENT attributes

.

.

.

name ENDS

The attributes of a segment include its align type (BYTE, WORD, or PARA), combine type (PUBLIC, PRIVATE, COMMON, or STACK), and class type. The segment attributes are used by the linker when it is combining logical segments to create the physical segments of an executable program. Most of the time, you can get by just fine using a small selection of attributes in a rather stereotypical way. However, if you want to use the full range of attributes, you might want to read the detailed explanation in the MASM manual.

Programs are classified into one memory model or another based on the number of their code and data segments. The most commonly used memory model for assembly-language programs is the small model, which has one code and one data segment, but you can also use the medium, compact, and large models (Figure 3-9). (Two additional models exist with which we will not be concerning ourselves further: the tiny model, which consists of intermixed code and data in a single segment—— for example, a .COM file under MS-DOS; and the huge model, which is supported by the Microsoft C Optimizing Compiler and which allows use of data structures larger than 64 KB.)

Model Code segments Data segments

Small One One

Medium Multiple One

Compact One Multiple

Large Multiple Multiple

Figure 3-9. Memory models commonly used in assembly-language and C programs.

For each memory model, Microsoft has established certain segment and class names that are used by all its high-level-language compilers (Figure 3-10). Because segment names are arbitrary, you may as well adopt the Microsoft conventions. Their use will make it easier for you to integrate your assembly-language routines into programs written in languages such as C, or to use routines from high-level-language libraries in your assembly-language programs.

Another important Microsoft high-level-language convention is to use the GROUP directive to name the near data segment (the segment the program expects to address with offsets from the DS register) and the stack segment as members of DGROUP (the automatic data group), a special name recognized by the linker and also by the program loaders in Microsoft Windows and Microsoft OS/2. The GROUP directive causes logical segments with different names to be combined into a single physical segment so that they can be addressed using the same segment base address. In C programs, DGROUP also contains the local heap, which is used by the C runtime library for dynamic allocation of small amounts of memory.

Memory Segment Align Combine Class Group

model name type type type

Small _TEXT WORD PUBLIC CODE

_DATA WORD PUBLIC DATA DGROUP

STACK PARA STACK STACK DGROUP

Medium module_TEXT WORD PUBLIC CODE

. WORD PUBLIC DATA DGROUP

.

.

_DATA

STACK PARA STACK STACK DGROUP

Compact _TEXT WORD PUBLIC CODE

data PARA PRIVATE FAR_DATA

. WORD PUBLIC DATA DGROUP

.

.

_DATA

STACK PARA STACK STACK DGROUP

Large module_TEXT WORD PUBLIC CODE

.

.

.

data PARA PRIVATE FAR_DATA

.

.

.

_DATA WORD PUBLIC DATA DGROUP

STACK PARA STACK STACK DGROUP

Figure 3-10. Segments, groups, and classes for the standard memory models as used with assembly-language programs. The Microsoft C Optimizing Compiler and other high-level-language compilers use a superset of these segments and classes.

For pure assembly-language programs that will run under MS-DOS, you can ignore DGROUP. However, if you plan to integrate assembly-language routines and programs written in high-level languages, you'll want to follow the Microsoft DGROUP convention. For example, if you are planning to link routines from a C library into an assembly-language program, you should include the line

DGROUP group _DATA,STACK

near the beginning of the program.

The final Microsoft convention of interest in creating .EXE programs is segment order. The high-level compilers assume that code segments always come first, followed by far data segments, followed by the near data segment, with the stack and heap last. This order won't concern you much until you begin integrating assembly-language code with routines from high-level-language libraries, but it is easiest to learn to use the convention right from the start.