Release Notes for Version 3.04.00 of the Alpha Assembler

The Release Notes section of the Alpha Assembler documentation contains the most up-to-date information about the assembler's features and use.

1.0  The Microsoft C++ compiler program CL.EXE must be installed in order for the Alpha assembler preprocessor to run correctly. Preprocessing can be optionally bypassed by using the /nopp command-line option.

2.0  Command syntax

ASAXP [options] <filename>

where <filename> is the name of the assembly source file. The ASAXP assembler assumes that a source filename extension of .I means that the source has already been preprocessed. The options are listed below.

3.0  Basic command-line options:

4.0  Advanced command-line options:

5.0  Options provided for compatibility with the ACC and DIGITAL™ UNIX® assembler:

6.0  Options accepted by the assembler, but silently ignored:

7.0  Differences from the DIGITAL UNIX assembler, as0/as1 (a.k.a. acc)

Directives

7.1. Unsupported instructions

The ldgp instruction is silently ignored by ASAXP.

7.2. Code generated

The code generated by ASAXP is meant to be functionally equivalent to as0/as1; however, it is not identical.

7.3. Optimizations

ASAXP makes no attempt to optimize code. The code must be optimized "by hand."

The as0/as1 assembler attempts to keep track of register contents in the emitted code. For instance:

ldiq $3, 0x7fff0001
ldiq $4, 0x7fff0002

is emitted as three instructions by as0/as1 because it realizes the second constant can be built from the first. ASAXP does not do this and will generate four instructions.

The as0/as1 assembler has a similar ability to do this with relocatable expressions, an ability that ASAXP does not have.

The as0/as1 assembler uses tricky ways to load constants that are larger than 2**32. ASAXP also tries to reduce constants. Those constants that are not reduced are placed in a constant pool.

7.4 Code scheduling (reordering)

ASAXP uses a post-generation code scheduler to reorder instructions based on the characteristics of the individual instructions in the architecture. This reordering generally results in faster performance. It does, however, affect the line number information used by the debugger. The reordering algorithms used are consistent with as0/as1.

7.5 Corrected instruction encodings

The as0/as1 assembler does not encode the excb, mb, rpcc, trapb, and wmb instructions correctly. ASAXP has the correct encodings.

7.6 jsr Instructions

The as0/as1 assembler transforms jsrd" instructions with symbolic operands to bsr instructions.

ASAXP generates the long form of the jsr (ldah-lda-jsr sequence using register $at) for such cases. If a bsr instruction is desired, then the code should be modified to use bsr with a symbolic operand.

7.7 Preprocessor

ASAXP uses the CL command for preprocessing the source files. One limitation from this is that it is not legal to use the pound (#) character to begin an end-of-line comment in #include files.

ASAXP does work to make it legal to use the pound(#) character to begin an end-of-line comment in the main source file (the file specified on the command line).

The C++ style comment (//) still works in header files.

7.8 Libc.lib

References to the divide and remainder instructions actually get transferred to calls to special run-time library routines.

These routines can be found in the system library libc.lib.

7.9 Other

The as0/as1 assembler emits local BRADDR relocation records for unconditional branches to local labels. Since branches are pc-relative and the distance to the target is known if the symbol is local, ASAXP fills in the correct branch displacement itself. Relocations are issued for non-local branches as well as branches between sections.

8.0  Features added for version 2.0 to support Windows NT 3.5

8.1. The maximum value for the .align directive is 6, not 4.

Section alignments are adjusted based upon the strictest alignment specified by a .align directive within that section. Please note that the .align directive specifies the low-order bits of the PC to be cleared, not the boundary. To align on a quadword (8 byte) boundary, use .align 3, not .align 8.

8.2. Command-line options have been updated:

/Zi — Same as /g or /g2. Emit CodeView™ debugging information.

/Zd — Same as /g1. Emit COFF line numbers.

/O0 — Turn off code scheduling optimization.

/O1 — Turn on code scheduling optimization (default).

/nologo — Suppress the logo.

/wnt3.1 — Generate object files compatible with the WNT 3.1 linker.

The following command-line options set the default exception handling run time procedure descriptor flags. Please refer to Windows NT for Alpha Calling Standard.

/QApdst — Set EXCEPTION_MODE_SILENT (default).

/QApdsg — Set EXCEPTION_MODE_SIGNAL.

/QApdsa — Set EXCEPTION_MODE_SIGNAL_ALL.

/QApdie — Set EXCEPTION_MODE_IEEE.

/QApdca — Set EXCEPTION_MODE_CALLER.

/eflag number — Set the default exception handling.

/symbols_aligned_0mod4 — Symbols are longword granular (default).

/symbols_not_aligned — No attempt is made to align symbols.

/stack_aligned_0mod8 — Stack is aligned on quadword boundary.

/stack_not_aligned — No attempt is made to align the stack.

8.3. Store byte (stb) and store word (stw) pseudo instructions

The store byte (stb) and store word (stw) pseudo instructions use one additional register, $t11. The assembler now uses the additional registers $at, $t9, $t10, and $t11. The stb and stw instructions are implemented as hardware instructions by the 21164-333 and newer processors. (See section 6.4, below.)

8.4. The nomove and noreorder assembler options

The nomove and noreorder assembler options behave identically as documented under the .set nomove directive.

8.5. Errors and warnings

Errors and warnings are now issued to stdout so they can be optionally redirected.

8.6. Internally generated symbols

Internally generated symbols are now emitted to the COFF symbol table. The generated names have the form $$nnn. Since $$nnn names are also generated by the acc compiler, the ASAXP-generated names are guaranteed to be unique.

8.7 Identifier names can now contain the @ and ? characters.

8.8 The load and store sequences are now longword granular.

8.9 Added a new directive, .tls$, for thread local storage.

8.10 The .extern directive now contains a modifier, .extern (thread) <symbol>

9.0  Features added for version 2.02.

9.1 Procedure linking (/Gy).

The /Gy command-line option causes every procedure to be placed into a section by itself. This is compatible with VC++ function linking.

9.2 Syntax for section directives

The syntax for the section directives .text, .data, .rdata, .tls$, and .sdata has been extended. The new syntax is:

.text [identifier]
.data [identifier]
.rdata [identifier]
.tls$ [identifier]
.sdata [identifier]

The identifier parameter is optional. If specified, that identifier is used to name the section. If identifier is not specified, the section name is generated by asaxp as standard section names. There is no limit to the length of a section name. The same rules that apply to other identifiers also apply to section names. Section names are maintained in a separate name space from other symbols used in the assembler.

There is a special case of the .text directive. The assembler allows a non-text directive to occur within the scope of a procedure. The use of the .text directive following the non-text instructions causes the assembler to reset its context to the procedure. The Alpha Asssember does produce a warning in this case. This feature has been retained in asaxp to support code generated by the acc compiler. For example:

.text dohello # Assign the following code to the dohello section
.ent dohello
dohello:
    lda    $sp, -16($sp)
    stq    $26, 0($sp)
    .prologue
    .rdata hello_world    # Assign the following code to the hello_world
# section. Note that asaxp does produce a warning.
hello:
    .asciiz    "Hello world\n"
.text # Restore context to the dohello section
    lda    $16, hello
    bsr    $26, printf
    ldq    $26, 0($sp)
    lda    $sp, 16($sp)
    ret    $31, ($26), 1
.end

9.3 Interaction between procedure linking and using named sections.

With /Gy in effect, each procedure is placed in a separate section. That section is named .text. There may be multiple .text sections. If the programmer elects to place a procedure into a named section other than .text, that is permissible. There is one caveat: The named section specified by the programmer must be empty. If it is not empty, the procedure linking code will cause the following procedure to be placed into a new .text section.

9.4 Additional error checking on sections.

The assembler does not allow a procedure to be declared in other than a .text section. Instructions occurring in non-text sections cause the assembler to flag an error.

9.5 Symbols defined by a symbolic equate can now be exported:

.global foo
foo = <some constant>

9.6 Subtraction of two identifiers

The subtraction of two identifiers is now permitted as part of a constant expression. An optional expression can follow the symbolic difference.

The expression is limited to addition, subtraction, multiplication, and division binary operators. The normal expression evaluation rules have been altered a bit such that the symbolic difference is assumed to be contained within parentheses. The expression following the binary operator can be any legal constant expression.

foo:
.quad bar - foo * 2 # The result here is 16. 
bar:

10.0  Features added for version 3.01.

10.1 Support for 21164 architecture

Support for the 21164 architecture family, including support for the new word and byte instructions, architecture mask, and implementation version.

Added command-line options and directives to allow selection of new instructions as they are introduced. The default architecture is currently EV4 with EV5 scheduling.

The term <arch argument> is used to specify the architecture or tuning features (uppercase or lowercase):

EV4 — Available in all chips.

EV5 — For /arch, same as EV4. For /tune, selects scheduling for the 21164 chip architecture.

EV56 — For /arch, emit byte instructions. For /tune use 21164 scheduling.

EV6 — Same as EV56. Will reflect the 21264 chip when available.

HOST — Generate instructions or tune as appropriate for the system on which the code is assembled.

GENERIC — Generate instructions or schedule based on the most prevalent Alpha architecture. At present, this is EV5, but will change in the future.

10.2 New command-line switches for architecture selection

Added command-line options for new Alpha chip architectures. These switches are compatible with VC++ 4.0 and DIGITAL UNIX. The QA21 family set both the architecture and tuning parameters. The /arch, /tune, /QAarch, and /QAtune options work independently of each other.

/QA21064 — Generate and schedule instructions for the EV4 architecture.

/QA21066 — Same as /QA21064.

/QA21064A — Same as /QA21064.

/QA21066A — Same as /QA21064.

/QA21164 — Generate instructions for the EV4 architecture and schedule for the EV5 architecture (default).

/QA21164A — Generate instructions for the EV56 architecture and schedule for the EV5 architecture.

/QA21264 — Generate instructions for the EV56 architecture and schedule for the EV5 architecture (EV6 in future).

/QAarch<arch argument> — Generate instructions based on <arch argument>.

/arch <arch argument> — Generate instructions based on <arch argument>.

/QAtune<arch argument> — Schedule based on <arch argument>.

/tune <arch argument> — Schedule based on <arch argument>.

Some additional options have been added to be silently ignored for compatibility with Microsoft's masm assembler.

10.3 Two new arch directives.

These directives override the command-line options.

.arch <arch argument> — Generate instructions based on <arch argument>.

.tune <arch argument> — Schedule based on <arch argument>.

10.4 Macro substitution in the .repeat/.endr block.

The %r token may be used within an identifier. This expands to the repeat iteration number, starting at 0. For example:

.repeat 3
.globl aglob%r
.endr
This expands to:
.globl aglob0
.globl aglob1
.globl aglob2

10.5 Added ldbu, ldwu, stb, and stw instructions for EV56.

These instructions are implemented as pseudo instructions for EV4 and prior releases of the assembler.

10.6 Added the sextb and sextw instructions for EV56.

These instructions sign extend byte and word respectively. These are new instructions as of ASAXP release 3.01.0. These instructions have the same syntax as the sextl instruction. They are implemented as pseudo instructions for EV4 and EV5. Note that the immediate value versions of these instructions are implemented as pseudo instructions, with the sign extension being performed at assembly time.

10.7 Added encodings for the amask and implver instructions.

These instructions test the architecture and implementation variants of the machine. These instructions are supported on all architectures.

10.8 Architecture Mask Syntax

amask $s_reg, $d_reg
amask $d_reg/$s_reg
amask val_immed, $d_reg

The source register or immediate value represents a mask of architectural extensions requested. Bits corresponding to architectural extensions that are present are cleared; reserved bits and bits corresponding to absent extensions are copied unchanged. The result is placed into the destination register. If the result is zero, all requested features are present. Software may specify a source value of all 1's to determine the complete set of architectural extensions implemented by a processor.

Bit Feature
0 Byte/Word instructions are present
1..63 Reserved.

10.9 Implementation Version Syntax:

implver $d_reg

A small integer is placed into the destination register. The integer specifies the major implementation version of the processor on which it is executed. This information can be used to make code-scheduling or tuning decisions, or it can be used to branch to different pieces of code optimized for different implementations.

Value Implementations
0 For EV4, EV45, LCA and LCA 45 chips (21064, 21064A, 21066, 21068, and 21066A)
1 For EV-5 and EV56 chips (21164, 21164A)
2 For EV-6 and derivative chips (21264, etc).

10.10 Added EV5 code scheduling.

Code scheduling may be altered by setting the command-line option (as noted in section 6.1) or by using the .tune directive (as noted in section 6.2). Code scheduling is performed on a basic block. In asaxp, an extended basic block is a block of code with a single entry point and possibly multiple exit points. In addition, the .prologue directive starts a new basic block. The .tune directive may start a basic block if the tuning context is changed. Conditional branch instructions are included within an extended basic block. This enhances the scheduling in cases where the branch is not taken.

Example using a single source code block for run-time architecture determination:

.text
.tune ev5 # Select 21164 tuning. Will run ok for 21064.
.arch ev4 # Default EV4 architecture
# Prototype:char *cpystr(char *, const char *); 
.globl cpystr
.ent cpystr
cpystr:
mov $16, $0 # Save return value
amask 1, $1 # Test if hardware ldbu/stb is present
beq $1, while1 # If bit 0 == 0, go to second sequence.

# Generate 2 sequences of instructions. The first instance
# generates code for the 21064 and the early 21164 chips. 
# The second sequence will cause the ldbu/stb hardware 
# instructions to be generated. These are supported by the 
# 21164-366 and later versions of the 21164 chips.
# Note that .arch EV56 directive at the end of the sequence
# will cause the assembler to switch into EV56 mode. 

 .repeat 2
while%r: # This is while0 for first instance, 
 # while1 for second
 ldbu $1, ($17) # Get source byte 
 stb $1, ($16) # Store to destination
 beq $1, done # If at EOS, return.
 addq $16, 1, $16 # Next destination
 addq $17, 1, $17 # Next source
 br while%r
 .arch ev56 # Change architecture at end
 # of repetition
 .endr
done: 
 ret # dedemau
 .end cpystr
 

10.11 Relocation operands

Relocation operands are generally useful in three situations:

10.12 Support for large tls section.

The /QAltls command-line option turns on large thread local storage for the entire module. Use the relocation operand to control references at an instruction level.

11.0  Features added for release 3.02.0.

Support for EV6 ECO 84, 87, 88, 90, 96. These new instructions require either the .arch ev6 directive or /arch ev6 command-line option. These instructions are generated as primitive instructions for all architectures. However, when these are unimplemented, the hardware will generate an OPDEC trap. Added support for .split directive.

11.1 ECO 84 - SQRT instructions.

SQRTx Fb.rx, Fc.wx

VAX Modes:

SQRTF - F_floating

SQRTG - G_floating

Includes qualifiers s, c, u (e.g., sqrtgsuc).

IEEE Modes:

SQRTS - S_floating

SQRTT - T_floating

Includes qualifiers d, m, c, s, u, i (e.g., sqrtssuic).

11.2 CTPOP/CTLZ/CTTZ

CTPOP Rb.rq, Rc.wq

CTLZ Rb.rq, Rc.wq

CTTZ Rb.rq, Rc.wq

11.3 Integer/Floating Register moves.

FTOIx Fa.rq, Rc.wq

FTOIS - s_floating to longword.

FTOIT - t_floating to quadword.

ITOFx Ra.rq, Fc.wq

ITOFS - Longword to s_floating

ITOFF = Longword to f_floating

ITOFT - Quadword to t_floating

11.4 Instructions for Graphics and Video Algorithms

PERR Ra.rq, Rb.rq, Rc.wq - Pixel Error
MINxxx Ra.rq, Rb.rq, Rc.wq
MAXxxx Ra.rq, Rb.rq, Rc.wq
MAXSB8 - Vector Signed byte Maximum
MAXSW4 - Vector Signed word Maximum
MAXUB8 - Vector Unsigned byte Maximum
MAXUW4 - Vector Unsigned word Maximum
MINSB8 - Vector Signed byte Minimum
MINSW4 - Vector Signed word Minimum
MINUB8 - Vector Unsigned byte Minimum
MINUW4 - Vector Unsigned word Minimum
UNPKBx Rb.rq, Rc.wq
UNPKBL Unpack Bytes to Longwords
UNPKBW Unpack Bytes to Words
PKxB Rb.rq, Rc.wq
PKLB Pack Bytes to Longwords
PKWB Pack Bytes to Words

11.5 Data Cache Control Instructions

WH64 (Rb.ab) - Write Hint - 64 bytes
ECB (Rb.ab) - Evict Cache Block

11.6 Added support for NTOM and other profiling tools.

This includes the addition of the .split directive. A related change is that data directives (e.g., .long) that are placed within a procedure cause the assembler to generate an additional procedure descriptor such that tools like NTOM may move code segments efficiently.

.split proc_name

The proc_name refers to the procedure name of the associated mainline procedure. The proc_name must have been the subject of a previous .ent directive. The .split directive differs from the .aent directive in that it is part of the referenced procedure, but may not be adjacent to that procedure.

12.0  Features added for 3.03.2

12.1 Support for GEM-generated .s files.

12.2 Added .tlscomm directive.

.tlscomm name, expression[, section identifier]

The .tlscomm directive causes name (unless defined elsewhere) to become a global common symbol at the head of a block of at least expression bytes of storage within the .tls$ section. The linker overlays like-named common blocks, using the expression value of the largest block as the byte size of the overlay. If section identifier is selected, it will be appended to .tls$.

12.3 Added the .comdat directive

.comdat symbol [ comdat type [ section identifier ]]

The .comdat directive declares the referenced symbol to be a comdat symbol. If specified, the comdat type is one of (case insensitive):

"IMAGE_COMDAT_SELECT_NODUPLICATES"

"IMAGE_COMDAT_SELECT_ANY"

"IMAGE_COMDAT_SELECT_SAME_SIZE"

"IMAGE_COMDAT_SELECT_EXACT_MATCH"

"IMAGE_COMDAT_SELECT_ASSOCIATIVE"

"IMAGE_COMDAT_SELECT_LARGEST"

"IMAGE_COMDAT_SELECT_NEWEST"

These correspond to the definitions of the same names in winnt.h.

The "IMAGE_COMDAT_SELECT_" may be omitted as well as "_SIZE" or "_MATCH".

If comdat type is not specified, the comdat type defaults to IMAGE_COMDAT_SELECT_EXACT_MATCH. If the comdat type is IMAGE_COMDAT_SELECT_ASSOCIATIVE, then the section identifier is required, and it must be the name of a section defined within the compilation unit.

12.4 Added the .drectve directive.

.drectve string [, string]...

The .drectve directive places linker directives in the object file into the .drectve section. This directive may appear anywhere in the source file, and does not change the current section context. The string must be enclosed in quotation marks ("). Each string is padded by a single space as required by the linker.

.drectve "-defaultlib:libc", "-defaultlib:oldnames"

12.5 Added the .section directive

.section section-name[,Coff section name ] [section-attribute[, section-attribute]...

Where section-attribute is one of the IMAGE_SCN_ manifest constants that are defined in winnt.h. The full attribute name (e.g., IMAGE_SCN_xxx) or the suffix may be used as shorthand.

For alignment purposes, the assembler will assume that CNT_CODE or MEM_EXECUTE characteristics are text type sections, and will generate nops to fill gaps. All other sections are treated as data, and the assembler will fill in with binary zeros. The assembler will use whatever values are selected for the object file section header, and makes no attempt to warn in the event of a conflict. Section attributes are read together bitwise.

It is important that thread-local-storage data sections contain the section name beginning with .tls. This affects relocations. However, relocation operands can supply the correct relocations.

The COFF section name is a quoted string containing the name that will be placed into the symbol table for this section. The assembler does not look at this name. If not supplied, the section name will be the asaxp section name.

If the section directive is used without characteristics, a lookup for that name is performed, and the first section in the section table matching that name is chosen. If attributes are supplied, the lookup will match by both name and attributes. If the assembler is unable to find an appropriate match, a new section is created. Also note that there are standard section names. The assembler creates empty sections with names of .text, .rdata, .xdata, .pdata, .data, .sdata, .tls$, .debug$, .debug$, .drectv, and .bss.

.section .text
# lookup a section named ".text", and set context to that section.
.section gallagher
# Lookup a section named gallagher. If found, the context is changed
# to the gallagher section. If not found, an error is generated concerning
# unknown attributes
.section .text "CNT_CODE", "MEM_EXECUTE", "MEM_WRITE"
# The assembler will seach for a section containing these attributes with 
# a name .text and create one if one is not found. The assembler will 
# not make any judgements even though the attributes conflict.
Attribute name: 
IMAGE_SCN_CNT_CODE
IMAGE_SCN_CNT_INITIALIZED_DATA
IMAGE_SCN_CNT_UNINITIALIZED_DATA
IMAGE_SCN_LNK_OTHER
IMAGE_SCN_LNK_INFO
IMAGE_SCN_LNK_REMOVE
IMAGE_SCN_LNK_COMDAT
IMAGE_SCN_ALIGN_1BYTES
IMAGE_SCN_ALIGN_2BYTES
IMAGE_SCN_ALIGN_4BYTES
IMAGE_SCN_ALIGN_8BYTES
IMAGE_SCN_ALIGN_16BYTES
IMAGE_SCN_ALIGN_32BYTES
IMAGE_SCN_ALIGN_64BYTES
IMAGE_SCN_ALIGN_1KBYTES
IMAGE_SCN_MEM_DISCARDABLE
IMAGE_SCN_MEM_NOT_CACHED
IMAGE_SCN_MEM_NOT_PAGED
IMAGE_SCN_MEM_SHARED
IMAGE_SCN_MEM_EXECUTE
IMAGE_SCN_MEM_READ
IMAGE_SCN_MEM_WRITE

13.0  Features added for 3.04.00

13.1 Support for /nosplit and /noaent command-line options.

The /nosplit option prevents the assembler from automatically generating split procedures.

The /noaent option disallows the specification of alternate entry points.

13.2 Added support for .set[no]aent and .set[no]split directives.

These directives instruct the assembler to enable or disable specific options. The assembler uses aent and split as default options.

13.3 Support /Ap64 command-line option.

This option directs the assembler to generate code using a 64-bit addressing model.

13.4 Support /Gtn command line option and GP-relative addressing for .sdata items.

13.5 Added optional "expected size" argument to .extern directive.

When /Gtn is specified on the command line, data items specified with a size less than or equal to n will be addressed with GP-relative addressing (rather than the ldah/lda pairs). Assumes that the defining instance of such an external is declared in the .sdata section in another object file.

An optional "expected size" argument may now be specified with a .extern that allows correct placement of the extern in .sdata by the assembler.

13.6 Changed allocation of .bss and .sbss sections.

The allocation of symbols within the .bss and .sbss sections has been changed so that it corresponds to the order in which .lcomm directives appear in the assembler source.

13.7 Changed naming convention used for internally generated labels.

The naming convention used for internally generated labels has been changed from ~ngenlab to ~Ln.

13.8 Added support for extension relocations.

The previous limit on the number of relocations that could be applied to an individual section (65535) has been removed due to the implementation of extension relocations.