A Comprehensive Examination of the Microsoft(R) C Version 6.0 Startup Code

Kevin J. Goodman

Kevin J. Goodman is the manager of communications applications for DCA, where he develops communications programs for a variety of operating systems.

{ewc navigate.dll, ewbutton, /Bcodeview /T"Click to open or copy the code samples from this article." /C"samples_1}

As you might guess, C startup code does everything necessary to "start up" a C program. The startup code is part of the run-time library that initializes and terminates all programs compiled with MicrosoftÒ C. It performs all processing prior to calling main( ) (the point most programmers consider the start of a C program) and all operations necessary to shut down after returning from main( ). The C linker places the startup code above and below the compiled code and attaches the combined image just after the header information in all EXE files. When you run the program, the MS-DOSÒ operating system reads the EXE header, performs segment and stack fixups, then begins executing the startup code. Figure 1 shows the relationship between your code, the startup code, the linker, and the MS-DOS loader. Unfortunately, the startup code is not fully documented. Even though Microsoft now publishes the startup code, I have noticed many programmers writing code that duplicates the startup code’s functionality: either they don’t know that the startup code listings exist or they’ve been unable to decipher them.

Figure 1 How Startup Code Is Added to all C Programs

I’ll explain the startup code used in C version 6.00AX, document its functions and variables, and describe how to take advantage of its undocumented features. Also, the differences between the startup code in versions 5.1 and 6.00AX will be discussed.

About thirty assembly language and C files make up the startup code (see Figure 2). These source files come with the C compiler starting with version 6.0. In previous versions, the startup sources were sold as part of the C Run-time Library Source. You can get a copy of the sources by saying yes to the "Copy C startup sources?" question when installing C 6.0 or C 6.00AX. You might want to pull these files into your editor and follow along as they’re discussed.

Figure 2 C 6.0 Startup Source Files

Directory of C600\SOURCE\STARTUP
File Name

CHKSTK.ASM
CHKSUM.ASM
CMACROS.INC
CRT0FP.ASM
_FILE.C
FILE2.H
FMSGHDR.ASM
HEAP.INC
MAKEFILE
MSDOS.H
MSDOS.INC
NULBODY.C
RCHKSTK.ASM
REGISTER.H
RTERR.INC
SETARGV.ASM
STARTUP.BAT
VERSION.INC
WILD.C
Directory of C600\SOURCE\STARTUP\DOS
File Name

CRT0.ASM
CRT0DAT.ASM
CRT0MSG.ASM
EXECMSG.ASM
NMSGHDR.ASM
NULBODY.LNK
STDALLOC.ASM
STDARGV.ASM
STDENVP.ASM
Directory of C600\SOURCE\DOC
File Name

STARTUP.DOC

Included with the startup code is a file, STARTUP.DOC, that details how to rebuild the startup code. Similar startup code is used for Microsoft WindowsÔ-based programs, with the addition of WINSTART.ASM. Because Microsoft does not currently distribute the source for WINSTART.ASM , this discussion pertains only to MS-DOS1-based programs.

The startup code, as mentioned, is completely transparent to the programmer. In fact, it’s possible that many C programmers don’t even know that the startup exists. They don’t have to because a neat trick ensures that the startup code is included in all C programs. Every source file that contains a main, LibMain, or WinMain function includes an external reference to the absolute variable _ _acrtused when compiled with version 6.00AX. In C 5.1, all compiled C functions include this reference. Figure 3 shows the assembly language output of a program compiled with the /Fa switch. The _ _acrtused variable is located in CRT0.ASM; the absolute reference in the object file forces the linker to include CRT0.OBJ from the run-time library file. CRT0.ASM has other absolute references that force the linker to include the rest of the startup code. This is why you get so many unresolved externals when you forget to link in a standard library. Figure 4 shows the contents of NULBODY.MAP. Notice all of the variables that are defined even though none were declared.

Figure 3 Assembly Language for NULBODY.C

CL /c /Fa nulbody.c

TITLE nulbody.c

;.8087

INCLUDELIB SLIBCE

_TEXT SEGMENT WORD PUBLIC 'CODE'

_TEXT ENDS

_DATA SEGMENT WORD PUBLIC 'DATA'

_DATA ENDS

CONST SEGMENT WORD PUBLIC 'CONST'

CONST ENDS

_BSS SEGMENT WORD PUBLIC 'BSS'

_BSS ENDS

DGROUP GROUP CONST, _BSS, _DATA

ASSUME DS: DGROUP, SS: DGROUP

EXTRN _ _acrtused:ABS

_TEXT SEGMENT

ASSUME CS: _TEXT

; Line 1

PUBLIC _main

_main PROC NEAR

; Line 2

ret

nop

_main ENDP

_TEXT ENDS

END

Figure 4 Output of NULBODY.MAP

Address Variable Name Type Located Description

0077:00B6 STKHQQ WORD CHKSTK.ASM Used to determine if enough stack is available
0077:01BC _edata BYTE CRT0.ASM End of data
0077:01C0 _end BYTE CRT0.ASM End of bss
0077:00A3 _environ FAR PTR CRT0DAT.ASM Pointer to environment
0077:007C _errno DWORD CRT0DAT.ASM Global error code variable
0000:01CE _exit function CRT0DAT.ASM Termination function
0000:0010 _main function NULBODY.C Dummy program
0000:058E _malloc function XLIBCE.LIB Standard library routine; used by startup
0077:00B4 __aaltstkovr WORD CHKSTK.ASM Stack overflow for Pascal/FORTRAN
0077:0060 __acfinfo label CRT0DAT.ASM Points to _C_FILE_INFO string
0000:9876 Abs __acrtmsg label CRT0MSG.ASM Forces inclusion of RTE messages
0000:9876 Abs __acrtused label CRT0.ASM Forces inclusion of startup
0000:D6D6 Abs __aDBdoswp label CRT0.ASM QuickC debug info
0077:00B2 __adbgmsg WORD CRT0MSG.ASM FORTRAN $DEBUG info
0077:0044 __aexit_rtn PTR CRT0.ASM Points to error exit routine
0077:006E __aintdiv FAR PTR CRT0DAT.ASM Holder for divide-by-zero vector
0077:00BA __amblksiz WORD GROWSEG.ASM Part of standard library--brought in when malloc is used
0000:00DE __amsg_exit PTR CRT0.ASM Points to error exit routine
0000:02C4 __aNchkstk label CHKSTK.ASM Used in small and medium model for chkstk function
0077:0042 __anullsize label CHKSUM.ASM Points to end of null segment
0077:005C __aseghi WORD CRT0.ASM Used by QuickC
0077:005E __aseglo WORD CRT0.ASM Used by QuickC
0077:0046 __asizds WORD CRT0.ASM Size of DGROUP
0000:0016 __astart label CRT0.ASM True start of C program
0077:0042 __atopsp WORD CRT0.ASM Top of stack
0000:01DD __cexit function CRT0DAT.ASM Terminate function
0077:00AA __child flag CRT0DAT.ASM Used to indicate that a child process is executing
0000:02C4 __chkstk function CHKSTK.ASM Used to determine if enough stack is available; automatically called in all functions unless the /Gs switch is used at compile time
0000:0100 __cinit function CRT0DAT.ASM Called during startup
0000:00CE __cintDIV label CRT0.ASM Divide-by-zero interrupt vector points here
0000:024F __ctermsub function CRT0DAT.ASM Terminate function
0000:01E7 __c_exit function CRT0DAT.ASM Terminate function
0000:00FD __dataseg WORD CRT0.ASM Segment value--contains DGROUP
0077:0087 __doserrno WORD CRT0DAT.ASM DOS error number
0077:0084 __dosvermajor label CRT0DAT.ASM Points to _osmajor
0077:0085 __dosverminor label CRT0DAT.ASM Points to _osminor
0076:0000 __EmDataSeg PTR CRT0DAT.ASM Pointer to EMULATOR_DATA segment
0000:01D5 __exit function CRT0DAT.ASM Terminate function
0077:0072 __fac WORD CRT0DAT.ASM Floating point accumulator
0000:029E __FF_MSGBANNER function CRT0MSG.ASM Prints out first part of the run-time error messages
0000:0735 __findlast function GROWSEG.ASM Part of standard library functions--used to find last entry in heap descriptor table
0077:00CE __fpinit label CRT0.ASM Used by startup to determine if floating point is loaded
0000:02BE __fptrap function CRT0FP.ASM Trap for missing floating point software
0000:0658 __growseg function GROWSEG.ASM Part of standard library function
0000:06E4 __incseg function GROWSEG.ASM Part of standard library function
0077:00AD __intno BYTE CRT0DAT.ASM Interrupt number for overlay handler
0000:056A __myalloc function STDALLOC.ASM Allocates heap for wildcards and environment
0077:0089 __nfile WORD CRT0DAT.ASM Maximum number of file handles
0000:0592 __nfree function NMALLOC.ASM Near free function
0077:0048 __nheap_desc structure CRT0.ASM Near heap descriptor table (defined in HEAP.INC)
0000:05B3 __nmalloc function NMALLOC.ASM Near malloc routine
0000:050A __NMSG_TEXT function NMSGHDR.ASM Part of the RTE message routines
0000:0535 __NMSG_WRITE function NMSGHDR.ASM Routine used to print out RTE messages
0000:02DC __nullcheck function CHKSUM.ASM Checks for null pointer assignments
0077:0087 __oserr label CRT0DAT.ASM Same as doserrno
0077:008B __osfile structure CRT0DAT.ASM Holds file handles
0077:0084 __osmajor BYTE CRT0DAT.ASM OS major revision
0077:0085 __osminor BYTE CRT0DAT.ASM OS minor revision
0077:0086 __osmode BYTE CRT0DAT.ASM Flag to indicate real (0) or protected mode (nonzero)
0077:0084 __osversion label CRT0DAT.ASM Same as dosvermajor
0077:00AC __ovlflag BYTE CRT0DAT.ASM Flag to indicate if overlays are in use (0 = no)
0077:00AE __ovlvec FAR PTR CRT0DAT.ASM Pointer to original overlay handler
0077:00A5 __pgmptr PTR CRT0DAT.ASM Pointer to program name
0077:0082 __psp WORD CRT0DAT.ASM Program Segment Prefix (segment)
0077:0080 __pspadr PTR CRT0DAT.ASM Program Segment Prefix (offset)
0000:05DC __searchseg function SEARCHSEG.ASM Part of standard library routines
0000:02FE __setargv function STDARGV.ASM Processes command line
0000:048C __setenvp function SETENVP.ASM Get environment from env segment
0077:007E __umaskval WORD UMASK.ASM File permission mask (not part of startup)
0077:00C4 ___aDBexit WORD CRT0.ASM QuickC debug info
0077:00CA ___aDBptrchk WORD CRT0.ASM QuickC debug info
0077:00C2 ___aDBrterr WORD CRT0.ASM QuickC debug info
0077:00C0 ___aDBswpchk WORD CRT0.ASM QuickC debug info
0077:00BE ___aDBswpflg WORD CRT0.ASM QuickC debug info
0077:009F ___argc WORD CRT0.ASM Count of command-line arguments
0077:00A1 ___argv DWORD CRT0.ASM Command-line arguments
0077:00BC ___qczrinit WORD CRT0.ASM QuickC debug info

The Startup Process

The true beginning of a C program is located in the CRT0.ASM file. The first line of code, labeled _ _astart, comes after some variables are defined. Main doesn’t get called until much later. You may have seen the _ _astart label if you have done any debugging in the CodeViewÒ debugger in unassembled mode. After _ _astart the first thing the startup does is check to make sure the MS-DOS version is 2.0 or later. If it is not, the program abruptly aborts and sends you back to the MS-DOS prompt. No explanations. No messages. Nothing. Perhaps this is one way to punish users who are still on MS-DOS 1.0. If the version is 2.0 or later, the startup checks if there is enough stack space. The default stack is set to the constant STACK_SIZE, which is 2048 bytes in MS-DOS. If 2KB is not sufficient, you can change STACK_SIZE, which is in CRT0.ASM, or link with the /STACK option. If there is less stack available than requested the startup aborts, but this time with a run-time error message to the user. In small- and medium-model programs the stack is always a part of DGROUP. In large- and compact-model programs the stack can be in its own segment, if necessary. If your program requires a large amount of stack space or if you want a full 64KB for near data, you can rebuild the startup code with the /DFARSTACK option. If you do decide you need a far stack, all of your programs must operate with SS != DS.

The heap has been totally reorganized for version 6.0. This change was necessary to support COM programs and based pointers. All heap variables are now in the structure heap_seg_desc defined in HEAP.INC.

_heap_seg_desc struc

checksum dw ? ; Checksum area

flags dw ? ; Flags word

segsize dw ? ; Size of segment

start dw ? ; Offset of first heap entry

rover dw ? ; Rover offset

last dw ? ; Offset to end-of-heap marker

nextseg dd ? ; Far pointer to next

; heap_seg_desc

prevseg dd ? ; Far pointer to

; previous heap_seg_desc

_heap_seg_desc ends

This change in memory management may render some TSRs useless. If your TSR worked fine under C 5.1 and it won’t link or behaves erratically under C 6.00AX, you may have been using a variable that is no longer defined or has a different meaning. For example, the _ _abrktb variable that was a convenient way to determine the end of your program is no longer in version 6.00AX. The variable named last in the heap_seg_desc struc now serves the same purpose. In small- and medium-model programs the initial heap is carved out of the bottom of the stack. The size of the heap is 64KB minus the length of DGROUP.

After the stack and heap are set up, an interesting thing happens. The startup checks to see if any uninitialized global or static data is stored in the _BSS segment, and if so sets it to zero. This type of data is placed in the _BSS segment at compile time. In Figure 5 the variables b and d are placed in the DATA segment while variables a and c, which are uninitialized, get placed in the _BSS segment. This brings up an important point. If during run time your uninitialized data is set to zero, you should initialize it at compile time to avoid the extra overhead at run time, allowing your programs to load more quickly.

Figure 5 MAIN.C

MAIN.C

void main(void);

char a[20];

char b[20]={0};

void main(void)

{

static c;

static d=5;

}

Assembly Listing of MAIN.C

; Static Name Aliases

;

; $S104_c EQU c

; $S105_d EQU d

TITLE main.c

.8087

INCLUDELIB SLIBCE

_TEXT SEGMENT WORD PUBLIC 'CODE'

_TEXT ENDS

_DATA SEGMENT WORD PUBLIC 'DATA'

_DATA ENDS

CONST SEGMENT WORD PUBLIC 'CONST'

CONST ENDS

_BSS SEGMENT WORD PUBLIC 'BSS'

_BSS ENDS

DGROUP GROUP CONST, _BSS, _DATA

ASSUME DS: DGROUP, SS: DGROUP

PUBLIC _b

EXTRN _ _acrtused:ABS

EXTRN _ _aNchkstk:NEAR

_BSS SEGMENT

COMM NEAR _a: 1: 20

_BSS ENDS

_DATA SEGMENT

_b DB 00H

DB 19 DUP(0)

$S105_d DW 05H

_DATA ENDS

_BSS SEGMENT

$S104_c DW 01H DUP (?)

_BSS ENDS

_TEXT SEGMENT

ASSUME CS: _TEXT

; Line 1

; Line 9

PUBLIC _main

_main PROC NEAR

xor ax,ax

call _ _aNchkstk

; Line 13

ret

_main ENDP

_TEXT ENDS

END

After cleaning up the _BSS, the startup sets up the environment. The environment consists of an array of ASCIIZ strings and an array of pointers to them. The variable _environ points to the environment strings, each of which looks like the typical varname=string that you see when you execute the MS-DOS set command. Two null bytes in a row indicate the end of the environment (one null byte for the last environment string and one for the end of the environment array itself). In STDENVP.ASM, the setenvp function searches for these two null bytes and when it finds them copies the environment into the heap. The standard library functions putenv and getenv work only on this local copy of the environment, not the master copy that MS-DOS keeps. That’s why if you add an environment variable during the scope of your program it disappears when your program terminates. If your TSR does not use any environment variables, you may want to edit CRT0.ASM (see Figure 6) and comment out the call to setenvp. The size of your current TSR will be reduced by the size of the environment when you run.

Figure 6 Partial Listing of CRT0DAT.ASM

;***

;crt0dat.asm - DOS and Windows shared startup and termination

; Copyright (c) 1985-1990, Microsoft Corporation. All rights reserved.

; Purpose: Shared startup and termination.

; NOTE: This source is included in crt0.asm for assembly purposes

; when building .COM startup. This is so the .COM startup resides

; in a single special object that can be supplied to the user.

;*******************************************************************************

_NFILE_ = 20 ; Maximum number of file handles

?DF = 1 ; tell cmacros.inc we want to define our own segments

.xlist

include version.inc

include cmacros.inc

include msdos.inc

.list

ifdef FARSTACK

ife sizeD

error <You cannot have a far stack in Small or Medium memory models.>

endif

endif

ifdef _COM_

if sizeC or sizeD

error <Must use Small memory model for .COM files.>

endif

endif ;_COM_

o

o

o

sBegin xiqcseg

globalW __qczrinit, 0 ;* QC -Zr initializer call address

sEnd xiqcseg

ifdef _COM_

sBegin EmData

labelB _EmDataLabel

sEnd EmData

sBegin EmCode

globalW _EmDataSeg,0

sEnd EmCode

else ;not _COM

EMULATOR_DATA segment para public ‘FAR_DATA’

EMULATOR_DATA ends

EMULATOR_TEXT segment para public ‘CODE’

public __EmDataSeg

__EmDataSeg dw EMULATOR_DATA

EMULATOR_TEXT ends

endif ;not _COM_

sBegin data

assumes ds,data

; special C environment string

labelB <PUBLIC,_acfinfo>

cfile db ‘_C_FILE_INFO=’

cfilex db 0

cfileln = cfilex-cfile

globalD _aintdiv,0 ; divide error interrupt vector save

globalT _fac,0 ; floating accumulator

globalW errno,0 ; initial error code

globalW _umaskval,0 ; initial umask value

;=============== following must be in this order

globalW _pspadr,0 ; psp:0 (far * to PSP segment)

globalW _psp,0 ; psp:0 (paragraph #)

;=============== above must be in this order

;=============== following must be in this order

labelW <PUBLIC,_osversion>

labelB <PUBLIC,_dosvermajor>

globalB _osmajor,0

labelB <PUBLIC,_dosverminor>

globalB _osminor,0

;=============== above must be in this order

globalB _osmode,0 ; 0 = real mode

labelW <PUBLIC,_oserr>

globalW _doserrno,0 ; initial DOS error code

globalW _nfile,_NFILE_ ; maximum number of file handles

labelB <PUBLIC,_osfile>

db 3 dup (FOPEN+FTEXT) ; stdin, stdout, stderr

db 2 dup (FOPEN) ; stdaux, stdprn

db _NFILE_-5 dup (0) ; the other 15 handles

globalW __argc,0

globalDP __argv,0

globalDP environ,0 ; environment pointer

labelD <PUBLIC,_pgmptr> ; pointer to program name

dw dataOFFSET dos2nam

ifdef _COM_

dw 0 ; No relocations in tiny model

elseifdef _QC2

dw 0 ; No DGROUP references allowed

elseifdef _WINDOWS

dw 0 ; No DGROUP references allowed

else ;DEFAULT

dw DGROUP

endif

dos2nam db 0 ; dummy argv[0] for DOS 2.X

; signal related common data

globalW _child,0 ; flag used to handle signals from child process

;Overlay related data

globalB _ovlflag,0 ; Overlay flag (0 = no overlays)

globalB _intno,0 ; Overlay interrupt value (e.g., 3F)

globalD _ovlvec,0 ; Address of original overlay handler

sEnd data

page

externNP _fptrap

externP _cintDIV

externP _nullcheck

ifdef FARSTACK

endif

sBegin code

assumes cs,code

if sizeC

global proc far

endif

page

;***

;_cinit - C initialization

; This routine performs the shared DOS and Windows initialization.

; The following order of initialization must be preserved -

; 1. Integer divide interrupt vector setup

; 2. Floating point initialization

; 3. Copy ;C_FILE_INFO into _osfile

; 4. Check for devices for file handles 0 - 4

; 5. General C initializer routines

;*******************************************************************************

cProc _cinit,<PUBLIC>,<>

cBegin <nogen> ; no local frame to set up in standard libs

assumes ds,data

ifndef FARSTACK

assumes ss,data

endif

; Initialize the DGROUP portion of _pgmptr. We must do this at

; runtime since there are no load-time fixups in .COM files.

ifdef _COM_

mov word ptr [_pgmptr+2],ds ; init seg portion of _pgmptr

endif ;_COM_

; *** Increase File Handle Count ***

; (1) This code only works on DOS Version 3.3 and later.

; (2) This code is intentially commented out; the user must enable

; this code to access more than 20 files.

; mov ah,67h ; system call number

; mov bx,_NFILE_ ; number of file handles to allow

; callos ; issue the system call

; ;check for error here, if desired (if carry set, AX equals error code)

; *** End Increase File Handle Count ***

; 1. Integer divide interrupt vector setup

mov ax,DOS_getvector shl 8 + 0

callos ; save divide error interrupt

mov word ptr [_aintdiv],bx

mov word ptr [_aintdiv+2],es

push cs

pop ds

assumes ds,nothing

mov ax,DOS_setvector shl 8 + 0

mov dx,codeOFFSET _cintDIV

callos ; set divide error interrupt

push ss

pop ds

assumes ds,data

; 2. Floating point initialization

if memS

cmp word ptr [fpmath], 0 ; Note: make sure offset _ _fpmath != 0

je nofloat_i

mov word ptr [fpmath+2], cs ; fix up these far addresses

mov word ptr [fpsignal+2], cs ; in the small model math libs

ifdef _COM_

mov [_EmDataSeg], cs

mov ax, offset DGROUP:_EmDataLabel

sub ax, offset EMULATOR_DATA:_EmDataLabel

mov cl, 4

shr ax, cl

add [_EmDataSeg], ax

endif ;_COM_

else ;not memS

mov cx,word ptr [fpmath+2]

jcxz nofloat_i

endif ;not memS

mov es,[_psp] ; psp segment

mov si,es:[DOS_ENVP] ; environment segment

ifdef FARSTACK

mov ax, word ptr [fpdata]

mov dx, word ptr [fpdata+2]

else

lds ax,[fpdata] ; get task data area

assumes ds,nothing

mov dx,ds ; into dx:ax

endif

xor bx,bx ; (si) = environment segment

call [fpmath] ; fpmath(0) - init

ifdef FARSTACK

mov ax, DGROUP

mov ds, ax

endif

jnc fpok

ifndef FARSTACK

push ss ; restore ds from ss

pop ds

endif

jmp _fptrap ; issue "Floating point not loaded"

; error and abort

fpok:

ifdef FARSTACK

mov ax, word ptr [fpsignal]

mov dx, word ptr [fpsignal+2]

else

lds ax,[fpsignal] ; get signal address

assumes ds,nothing

mov dx,ds

endif

mov bx,3

call [fpmath] ; fpmath(3) - set signal address

ifdef FARSTACK

mov ax, DGROUP

mov ds, ax ; restore DS=DGROUP

else

push ss

pop ds

assumes ds,data

endif

nofloat_i:

; 3. Copy _C_FILE_INFO= into _osfile

; fix up files inherited from parent using _C_FILE_INFO=

mov es,[_psp] ; es = PSP

mov cx,word ptr es:[DOS_envp] ; es = user’s environment

jcxz nocfi ; no environment !!!

mov es,cx

xor di,di ; start at 0

cfilp:

cmp byte ptr es:[di],0 ; check for end of environment

je nocfi ; yes - not found

mov cx,cfileln

mov si,dataOFFSET cfile

repe cmpsb ; compare for ‘_C_FILE_INFO=’

je gotcfi ; yes - now do something with it

mov cx,07FFFh ; environment max = 32K

xor ax,ax

repne scasb ; search for end of current string

jne nocfi ; no 00 !!! - assume end of env.

jmp cfilp ; keep searching

; found _C_FILE_INFO= and transfer info into _osfile

gotcfi:

push es

push ds

pop es ; es = DGROUP

pop ds ; ds = env. segment

assumes ds,nothing

assumes es,data

mov si,di ; si = startup of _osfile info

mov di,dataOFFSET _osfile ; di = _osfile block

mov cl, 4

osfile_lp:

lodsb

sub al, ‘A’

jb donecfi

shl al, cl

xchg dx, ax

lodsb

sub al, ‘A’

jb donecfi

or al, dl

stosb

jmp short osfile_lp

donecfi:

ifdef FARSTACK

push es

else

push ss

endif

pop ds ; ds = DGROUP

assumes ds,data

nocfi:

; 4. Check for devices for file handles 0 - 4

; Clear the FDEV bit (which might be inherited from C_FILE_INFO)

; and then call DOS to see if it really is a device or not

mov bx,4

devloop:

and _osfile[bx],not FDEV ; clear FDEV bit on principal

mov ax,DOS_ioctl shl 8 + 0 ; issue ioctl(0) to get dev info

callos

jc notdev

test dl,80h ; is it a device ?

jz notdev ; no

or _osfile[bx],FDEV ; yes - set FDEV bit

notdev:

dec bx

jns devloop

; 5. General C initializer routines

mov si,dataOFFSET xifbegin

mov di,dataOFFSET xifend

if sizeC

call initterm ; call the far initializers

else

call farinitterm ; call the far initializers

endif

mov si,dataOFFSET xibegin

mov di,dataOFFSET xiend

call initterm ; call the initializers

ret

cEnd <nogen> ; standard C libs

o

o

o

The startup then calls the function setargv in STDARGV.ASM to set up the _argc and _argv[] variables. After setargv gets the command line from the Program Segment Prefix (PSP) at offset 81H and moves it to the heap, it points the variable _argv[1] to the array of pointers to ASCIIZ strings, each of which is an argument from the command line; _argc is the number of arguments. Under MS-DOS 3.x and later, argv[0] points to the fully qualified pathname of the program being run, which the startup gets from the environment segment. Under MS-DOS version 2.x there is no fully qualified pathname in the environment segment, so a null is stored in argv[0] instead. The rest of the command line or "command tail" is found in the PSP and is terminated by a carriage return (0DH). This command tail cannot be greater than 126 bytes in length. If your program needs to process wildcards on the command line, you must link with the SETARGV.OBJ file, which causes startup to use an argument-passing module (WILD.C) that has been assembled with the WILDCARD ifdef. This object file comes with the compiler. If you linked with SETARGV.OBJ, wildcard processing will take place here. If you look at the code in WILD.C, you may be surprised that it looks like 1978 K&R C. It lacks prototypes and uses the old-style function definitions. Since Microsoft also offers a C compiler for the XENIXÒ operating system, I assume some parts of the startup are shared. In fact, XENIX2 isn’t the only other environment shared by the startup; the Microsoft QuickCÒ compiler and Microsoft FORTRAN also use pieces.

Once the command line and environment are taken care of, the startup gives the programmer an opportunity to increase the program’s file handle count. Starting with MS-DOS version 3.3, programs can have more than 20 open files at a time. The default is still 20 in all C 6.00AX programs. If you want more file handles, just change the _NFILE_ constant to the number of file handles you need, then uncomment the source lines in CRT0DAT.ASM (see Figure 6).

Run-time Errors

Next, the startup routes the integer-divide-by-zero vector to the _cintDIV function in CRT0DAT.ASM, so your program won’t lock the machine if it has a divide-by-zero error. Instead of locking the machine, your program produces a run-time error (RTE). There are 18 RTEs (see Figure 7). These messages are processed in CRT0MSG.ASM and RTERR.INC by the _NMSG_WRITE and _NMSG_TEXT functions. Nothing is more embarrassing for a developer than to release code that caused some type of RTE. All C 6.00AX RTEs are in the format

R6<RTE number> <description>

leaving no doubt in anyone’s mind what has happened to your program. But there is a way to replace the C 6.00AX messages with your own custom handling. The code in Figure 8 displays the function message "XYZ company internal error. Contact customer support." The main function that follows forces a divide-by-zero error to verify that the replacement _NMSG_WRITE function works correctly.

Figure 7 Run-time Errors

0 Stack overflow
1 Null pointer assignment
2 Floating point not loaded
3 Integer divide by 0
4 Undefined
5 Not enough memory on exec
6 Bad format on exec
7 Bad environment on exec
8 Not enough space for arguments
9 Not enough space for environment
10 Abnormal program termination
11 Undefined
12 Illegal near pointer use
14 Ctrl-Break encountered (QC 1.0 only)
15 Unexpected interrupt (QC 1.0 only)
16 OS/2 RTE
17 OS/2 RTE
18 Unexpected heap error
252 \r\n
255 Run time error banner

Figure 8 Handling Run-time Errors Yourself

// cl /c /WX rte.c

// link /NOE rte; // Since you will be replacing an

// existing standard library function use /NOE

_pascal _ _NMSG_WRITE(int);

_pascal _ _NMSG_WRITE(int error)

{

switch(error)

{

// 252 and 255 are the constants for CRLF and the run-time

// banner C6.00AX normally prints out.

case 252:

case 255:

break;

default:

cprintf(

"XYZ company internal error # %d. Contact customer support\r\n",error);

break;

}

}

main(void)

{

_asm{

mov ax,0

div ax

ret

}

}

The next procedure for the startup is initializing for floating-point math. If the global variable _fpinit is nonzero, the startup initializes floating point. This variable will be set if any of your code uses floating-point variables or expressions. Then the startup checks to see if it is running with file handles inherited from a parent process. This is done by checking the environment strings for the _C_FILE_INFO variable. If it’s there, it’s the last variable in the environment. The startup takes the information from _C_FILE_INFO and places it into an array of bytes called osfile. This array, defined in CRT0DAT.ASM, contains the handles for stdin, stdout, stderr, stdaux, and stdprn. It also leaves space for the other 15 available file handles (or more if you increased the _NFILE_ variable). The startup will not copy the _C_FILE_INFO variable into the copy of the environment that the program gets. _C_FILE_INFO was called ;C_FILE_INFO in C 5.1 and caused a lot of headaches for programmers. C 5.1 passed the information about open files and translation modes in binary. This wasn’t a problem if a C 5.1 program was calling another C 5.1 program because the startup code of the called program does not copy the ;C_FILE_INFO variable into the program’s copy of the environment. However, if the child process was not a C 5.1 process (such as COMMAND.COM), the ;C_FILE_INFO variable was visible in the environment. Because the open file information was binary, if a program permitted the user to shell to DOS and the user executed the set command, the ;C_FILE_INFO displayed as graphics characters, so it looked as if the environment was corrupted. This is prevented in C 6.00AX because open file information is not passed through the environment unless specifically requested. To pass open file information to a child process, the _fileinfo variable must be set to true. If it isn’t, the standard library exec functions (spawn and exec) will not place the _C_FILE_INFO variable into the copy of the environment that the child process receives.

X-Segments

Anyone who has looked at a MAP file is familiar with the _TEXT, _DATA, _BSS, and STACK segments, but the rest of an EXE’s segments are probably a mystery. The DBData segment contains data for use by QuickC3 in debug mode. The remaining segments execute functions that need to be called outside the body of your program. Segments beginning with the letter X signify an initializer or terminator segment (see Figure 9). The initialization segments consist of calls to near or far routines written by you or a third-party library you are linking with. The terminator segments are comprised of an onexit table, preterminators, and terminators or far terminators. The onexit table is filled by the onexit function, which is a standard library routine. The preterminator segment is the place for functions that should be called during normal termination of a program; that is, if no errors have taken place. The terminator segment contains functions to be executed in all situations. All of these segments are grouped into a segment beginning, segment end, and a placeholder for a pointer to a function. Many third-party libraries use these segments as a convenient way to initialize and shut down. For example, libraries created to be linked in your code might need to take over an interrupt or to check for a certain piece of hardware. It would be nice if they could initialize the hardware without calling an initialization function. The solution is to place a pointer to their initialization function in one of the X-segments and let the startup call it. The alternative to using an X-segment is the old method sprinkled like salt in every function that needs initialization.

if !(initialized)

init();

When the program ends, the third-party library would use a terminator segment to restore the interrupt.

Figure 9 Startup Code Segments

Start Stop Length Name Class

00000H 00511H 00512H _TEXT CODE

00512H 00512H 00000H C_ETEXT ENDCODE

00520H 00561H 00042H NULL BEGDATA

00562H 0060DH 000ACH _DATA DATA

0060EH 0061BH 0000EH CDATA DATA

0061CH 0061CH 00000H XIFB DATA

0061CH 0061CH 00000H XIF DATA

0061CH 0061CH 00000H XIFE DATA

0061CH 0061CH 00000H XIB DATA

0061CH 0061CH 00000H XI DATA

0061CH 0061CH 00000H XIE DATA

0061CH 0061CH 00000H XPB DATA

0061CH 0061CH 00000H XP DATA

0061CH 0061CH 00000H XPE DATA

0061CH 0061CH 00000H XCB DATA

0061CH 0061CH 00000H XC DATA

0061CH 0061CH 00000H XCE DATA

0061CH 0061CH 00000H XCFB DATA

0061CH 0061CH 00000H XCF DATA

0061CH 0061CH 00000H XCFE DATA

0061CH 0061CH 00000H CONST CONST

0061CH 00623H 00008H HDR MSG

00624H 006F1H 000CEH MSG MSG

006F2H 006F3H 00002H PAD MSG

006F4H 006F4H 00001H EPAD MSG

006F6H 006F6H 00000H _BSS BSS

006F6H 006F6H 00000H XOB BSS

006F6H 006F6H 00000H XO BSS

006F6H 006F6H 00000H XOE BSS

00700H 00EFFH 00800H STACK STACK

Origin Group

0052:0 DGROUP

Program entry point at 0000:0016

Figure 10 gives an example of an initializer and a terminator. The XI is the initializer segment, and it contains a pointer to the function init. Placing the pointer to the function in the XI segment means it will be accessed during startup and the init function will be performed. All of this will take place before main is called. You can have up to 32 pointers to functions in these initialization segments. That is what the XIB (begin) and XIE (end) segments are for. The startup checks to see if their addresses are equal, and if they are, there are no functions to perform. If the addresses are not equal, each pointer to a function is called and begin is incremented until the begin and end segments are equal. In compact- and large-model programs, a segment for far functions called XIF will be called in place of the XI segment routine. The terminator segment references the pointer to the function term() that is in the XC segment (XCF in compact and large model). The init and term functions are shown in Figure 11. These functions simply cprintf a message declaring their presence. Notice that there is no reference to either init or term in main. These X-segments can be used for any type of function that is prototyped as

void func(void)

However, avoid the standard library functions that use FILE*, such as fprintf, sprintf, and fopen. The FILE* may not have been initialized or, on termination, startup may have already flushed the I/O streams.

Figure 10 XSEGS.ASM

;

;masm xsegs;

;

;

XIB SEGMENT WORD PUBLIC 'DATA'

XIB ENDS

XI SEGMENT WORD PUBLIC 'DATA'

XI ENDS

XIE SEGMENT WORD PUBLIC 'DATA'

XIE ENDS

XCB SEGMENT WORD PUBLIC 'DATA'

XCB ENDS

XC SEGMENT WORD PUBLIC 'DATA'

XC ENDS

XCE SEGMENT WORD PUBLIC 'DATA'

XCE ENDS

EXTRN _ _acrtused:ABS

EXTRN _init:NEAR

EXTRN _term:NEAR

XI SEGMENT

DW _init

XI ENDS

XC SEGMENT

DW _term

XC ENDS

END

Figure 11 STARTUP.C with Init and Term Functions

// compile with cl /c startup.c

// link with link startup+xsegs;

void main(void)

{

cprintf("We are inside main\r\n");

}

void init(void)

{

cprintf("Initializing...\r\n");

}

void term(void)

{

cprintf("Terminating....\r\n");

}

Calling Main

The initialization is now finished, so the startup gets down to the business of calling main. The following prototype

int _cdecl main(int argc, char **argv, char **envp)

is the proper function prototype according to what the startup pushes onto the stack. Now you know why main is declared as _cdecl. Since programs that don’t use the command line or environment sometimes declare themselves as main(void), _cdecl allows them to pass a variable number of parameters. That is why you don’t find a prototype in any of the C 6.00AX header files for main.

During the execution of main the startup has no control. However, the programmer is free to use any of the public variables or functions that the startup provides. After main returns, a program can terminate by calling exit, _cexit, _exit, or _c_exit. These functions are in CRT0DAT.ASM. You should call only one function. The exit function is the default. The exit(exit_code) function calls C run time preterminator and terminator functions. The exit function then terminates the process with the exit_code parameter, which is supplied by the programmer. This exit_code can be used as a return code if your program is a child process (if it was spawned) or it can be used to set the DOS ERRORLEVEL for batch file processing. The function _cexit(void) performs the same termination processing as exit but returns control to the caller when finished (that is, your program does not end). This call is useful for TSRs that require the divide-by-zero-interrupt vector reset. The _exit(exit_code) function performs a quick exit routine that does not call the programmer’s terminator routines or the preterminators. The _exit function terminates the process with the exit_code supplied by the programmer like the exit function. The _c_exit(void) function is the same as _exit, except that _c_exit returns control to the caller when finished.

Pointer Assignment

As part of the exit processing, the startup checks to see if your program has assigned data via a null pointer. Using a null pointer is a common C mistake. To help detect this problem, the startup places the NULL segment first in DS in small- and medium-model programs. In CHKSUM.ASM, the function nullcheck performs a checksum on this NULL data segment. If the checksum fails, it means that some function has used a null pointer (DS:0). If nullcheck does fail, it aborts with the "Null pointer assignment" RTE. In large- and compact-model programs, the nullcheck function isn’t as effective. If a far pointer is null (0:0), any data written into the zero segment, which is the MS-DOS interrupt table, will probably cause the system to hang. Trying to squeeze that last bit of memory out of your program? Replace the nullcheck function with your own empty function,

_nullcheck(void){}

and link it in before the standard library. Of course, do this only after you are sure that you have no null pointer assignments.

You may have thought that it wasn’t a big deal to call main, but there actually is a lot of work to be done both before and after main is called. I’ve demonstrated some powerful capabilities of the startup that you may not have known existed. Now that you know they are there, you may want to check the startup functions before you write any of your own. A couple of caveats: Microsoft doesn’t offer customer support for the startup code, nor is the presence of any variable or function guaranteed from version to version. Anything may be changed at any time. So if you do make changes to the startup code, you will have to continue them with each version of the C compiler. But this doesn’t outweigh the benefits of generating more efficient programs.

1For ease of reading, "MS-DOS" refers to the Microsoft MS-DOS operating system. MS-DOS is a trademark that refers only to this Microsoft product and is not intended to refer to such products generally.

2For ease of reading, "XENIX" refers to the Microsoft XENIX operating system. XENIX is a trademark that refers only to this Microsoft product and is not intended to refer to such products generally.

3For ease of reading, "QuickC" refers to the Microsoft QuickC compiler. QuickC is a trademark that refers only to this Microsoft product and is not intended to refer to such products generally.