Kevin J. Goodman
Kevin J. Goodman is the manager of communications applications for DCA, where he develops communications programs for a variety of operating systems.
{ewc navigate.dll, ewbutton, /Bcodeview /T"Click to open or copy the code samples from this article." /C"samples_1}
As you might guess, C startup code does everything necessary to "start up" a C program. The startup code is part of the run-time library that initializes and terminates all programs compiled with MicrosoftÒ C. It performs all processing prior to calling main( ) (the point most programmers consider the start of a C program) and all operations necessary to shut down after returning from main( ). The C linker places the startup code above and below the compiled code and attaches the combined image just after the header information in all EXE files. When you run the program, the MS-DOSÒ operating system reads the EXE header, performs segment and stack fixups, then begins executing the startup code. Figure 1 shows the relationship between your code, the startup code, the linker, and the MS-DOS loader. Unfortunately, the startup code is not fully documented. Even though Microsoft now publishes the startup code, I have noticed many programmers writing code that duplicates the startup code’s functionality: either they don’t know that the startup code listings exist or they’ve been unable to decipher them.
Figure 1 How Startup Code Is Added to all C Programs
I’ll explain the startup code used in C version 6.00AX, document its functions and variables, and describe how to take advantage of its undocumented features. Also, the differences between the startup code in versions 5.1 and 6.00AX will be discussed.
About thirty assembly language and C files make up the startup code (see Figure 2). These source files come with the C compiler starting with version 6.0. In previous versions, the startup sources were sold as part of the C Run-time Library Source. You can get a copy of the sources by saying yes to the "Copy C startup sources?" question when installing C 6.0 or C 6.00AX. You might want to pull these files into your editor and follow along as they’re discussed.
Figure 2 C 6.0 Startup Source Files
Directory of C600\SOURCE\STARTUP |
File Name |
CHKSTK.ASM |
CHKSUM.ASM |
CMACROS.INC |
CRT0FP.ASM |
_FILE.C |
FILE2.H |
FMSGHDR.ASM |
HEAP.INC |
MAKEFILE |
MSDOS.H |
MSDOS.INC |
NULBODY.C |
RCHKSTK.ASM |
REGISTER.H |
RTERR.INC |
SETARGV.ASM |
STARTUP.BAT |
VERSION.INC |
WILD.C |
Directory of C600\SOURCE\STARTUP\DOS |
File Name |
CRT0.ASM |
CRT0DAT.ASM |
CRT0MSG.ASM |
EXECMSG.ASM |
NMSGHDR.ASM |
NULBODY.LNK |
STDALLOC.ASM |
STDARGV.ASM |
STDENVP.ASM |
Directory of C600\SOURCE\DOC |
File Name |
STARTUP.DOC |
Included with the startup code is a file, STARTUP.DOC, that details how to rebuild the startup code. Similar startup code is used for Microsoft WindowsÔ-based programs, with the addition of WINSTART.ASM. Because Microsoft does not currently distribute the source for WINSTART.ASM , this discussion pertains only to MS-DOS1-based programs.
The startup code, as mentioned, is completely transparent to the programmer. In fact, it’s possible that many C programmers don’t even know that the startup exists. They don’t have to because a neat trick ensures that the startup code is included in all C programs. Every source file that contains a main, LibMain, or WinMain function includes an external reference to the absolute variable _ _acrtused when compiled with version 6.00AX. In C 5.1, all compiled C functions include this reference. Figure 3 shows the assembly language output of a program compiled with the /Fa switch. The _ _acrtused variable is located in CRT0.ASM; the absolute reference in the object file forces the linker to include CRT0.OBJ from the run-time library file. CRT0.ASM has other absolute references that force the linker to include the rest of the startup code. This is why you get so many unresolved externals when you forget to link in a standard library. Figure 4 shows the contents of NULBODY.MAP. Notice all of the variables that are defined even though none were declared.
Figure 3 Assembly Language for NULBODY.C
CL /c /Fa nulbody.c
TITLE nulbody.c
;.8087
INCLUDELIB SLIBCE
_TEXT SEGMENT WORD PUBLIC 'CODE'
_TEXT ENDS
_DATA SEGMENT WORD PUBLIC 'DATA'
_DATA ENDS
CONST SEGMENT WORD PUBLIC 'CONST'
CONST ENDS
_BSS SEGMENT WORD PUBLIC 'BSS'
_BSS ENDS
DGROUP GROUP CONST, _BSS, _DATA
ASSUME DS: DGROUP, SS: DGROUP
EXTRN _ _acrtused:ABS
_TEXT SEGMENT
ASSUME CS: _TEXT
; Line 1
PUBLIC _main
_main PROC NEAR
; Line 2
ret
nop
_main ENDP
_TEXT ENDS
END
Figure 4 Output of NULBODY.MAP
Address | Variable Name | Type | Located | Description |
0077:00B6 | STKHQQ | WORD | CHKSTK.ASM | Used to determine if enough stack is available |
0077:01BC | _edata | BYTE | CRT0.ASM | End of data |
0077:01C0 | _end | BYTE | CRT0.ASM | End of bss |
0077:00A3 | _environ | FAR PTR | CRT0DAT.ASM | Pointer to environment |
0077:007C | _errno | DWORD | CRT0DAT.ASM | Global error code variable |
0000:01CE | _exit | function | CRT0DAT.ASM | Termination function |
0000:0010 | _main | function | NULBODY.C | Dummy program |
0000:058E | _malloc | function | XLIBCE.LIB | Standard library routine; used by startup |
0077:00B4 | __aaltstkovr | WORD | CHKSTK.ASM | Stack overflow for Pascal/FORTRAN |
0077:0060 | __acfinfo | label | CRT0DAT.ASM | Points to _C_FILE_INFO string |
0000:9876 | Abs __acrtmsg | label | CRT0MSG.ASM | Forces inclusion of RTE messages |
0000:9876 | Abs __acrtused | label | CRT0.ASM | Forces inclusion of startup |
0000:D6D6 | Abs __aDBdoswp | label | CRT0.ASM | QuickC debug info |
0077:00B2 | __adbgmsg | WORD | CRT0MSG.ASM | FORTRAN $DEBUG info |
0077:0044 | __aexit_rtn | PTR | CRT0.ASM | Points to error exit routine |
0077:006E | __aintdiv | FAR PTR | CRT0DAT.ASM | Holder for divide-by-zero vector |
0077:00BA | __amblksiz | WORD | GROWSEG.ASM | Part of standard library--brought in when malloc is used |
0000:00DE | __amsg_exit | PTR | CRT0.ASM | Points to error exit routine |
0000:02C4 | __aNchkstk | label | CHKSTK.ASM | Used in small and medium model for chkstk function |
0077:0042 | __anullsize | label | CHKSUM.ASM | Points to end of null segment |
0077:005C | __aseghi | WORD | CRT0.ASM | Used by QuickC |
0077:005E | __aseglo | WORD | CRT0.ASM | Used by QuickC |
0077:0046 | __asizds | WORD | CRT0.ASM | Size of DGROUP |
0000:0016 | __astart | label | CRT0.ASM | True start of C program |
0077:0042 | __atopsp | WORD | CRT0.ASM | Top of stack |
0000:01DD | __cexit | function | CRT0DAT.ASM | Terminate function |
0077:00AA | __child | flag | CRT0DAT.ASM | Used to indicate that a child process is executing |
0000:02C4 | __chkstk | function | CHKSTK.ASM | Used to determine if enough stack is available; automatically called in all functions unless the /Gs switch is used at compile time |
0000:0100 | __cinit | function | CRT0DAT.ASM | Called during startup |
0000:00CE | __cintDIV | label | CRT0.ASM | Divide-by-zero interrupt vector points here |
0000:024F | __ctermsub | function | CRT0DAT.ASM | Terminate function |
0000:01E7 | __c_exit | function | CRT0DAT.ASM | Terminate function |
0000:00FD | __dataseg | WORD | CRT0.ASM | Segment value--contains DGROUP |
0077:0087 | __doserrno | WORD | CRT0DAT.ASM | DOS error number |
0077:0084 | __dosvermajor | label | CRT0DAT.ASM | Points to _osmajor |
0077:0085 | __dosverminor | label | CRT0DAT.ASM | Points to _osminor |
0076:0000 | __EmDataSeg | PTR | CRT0DAT.ASM | Pointer to EMULATOR_DATA segment |
0000:01D5 | __exit | function | CRT0DAT.ASM | Terminate function |
0077:0072 | __fac | WORD | CRT0DAT.ASM | Floating point accumulator |
0000:029E | __FF_MSGBANNER | function | CRT0MSG.ASM | Prints out first part of the run-time error messages |
0000:0735 | __findlast | function | GROWSEG.ASM | Part of standard library functions--used to find last entry in heap descriptor table |
0077:00CE | __fpinit | label | CRT0.ASM | Used by startup to determine if floating point is loaded |
0000:02BE | __fptrap | function | CRT0FP.ASM | Trap for missing floating point software |
0000:0658 | __growseg | function | GROWSEG.ASM | Part of standard library function |
0000:06E4 | __incseg | function | GROWSEG.ASM | Part of standard library function |
0077:00AD | __intno | BYTE | CRT0DAT.ASM | Interrupt number for overlay handler |
0000:056A | __myalloc | function | STDALLOC.ASM | Allocates heap for wildcards and environment |
0077:0089 | __nfile | WORD | CRT0DAT.ASM | Maximum number of file handles |
0000:0592 | __nfree | function | NMALLOC.ASM | Near free function |
0077:0048 | __nheap_desc | structure | CRT0.ASM | Near heap descriptor table (defined in HEAP.INC) |
0000:05B3 | __nmalloc | function | NMALLOC.ASM | Near malloc routine |
0000:050A | __NMSG_TEXT | function | NMSGHDR.ASM | Part of the RTE message routines |
0000:0535 | __NMSG_WRITE | function | NMSGHDR.ASM | Routine used to print out RTE messages |
0000:02DC | __nullcheck | function | CHKSUM.ASM | Checks for null pointer assignments |
0077:0087 | __oserr | label | CRT0DAT.ASM | Same as doserrno |
0077:008B | __osfile | structure | CRT0DAT.ASM | Holds file handles |
0077:0084 | __osmajor | BYTE | CRT0DAT.ASM | OS major revision |
0077:0085 | __osminor | BYTE | CRT0DAT.ASM | OS minor revision |
0077:0086 | __osmode | BYTE | CRT0DAT.ASM | Flag to indicate real (0) or protected mode (nonzero) |
0077:0084 | __osversion | label | CRT0DAT.ASM | Same as dosvermajor |
0077:00AC | __ovlflag | BYTE | CRT0DAT.ASM | Flag to indicate if overlays are in use (0 = no) |
0077:00AE | __ovlvec | FAR PTR | CRT0DAT.ASM | Pointer to original overlay handler |
0077:00A5 | __pgmptr | PTR | CRT0DAT.ASM | Pointer to program name |
0077:0082 | __psp | WORD | CRT0DAT.ASM | Program Segment Prefix (segment) |
0077:0080 | __pspadr | PTR | CRT0DAT.ASM | Program Segment Prefix (offset) |
0000:05DC | __searchseg | function | SEARCHSEG.ASM | Part of standard library routines |
0000:02FE | __setargv | function | STDARGV.ASM | Processes command line |
0000:048C | __setenvp | function | SETENVP.ASM | Get environment from env segment |
0077:007E | __umaskval | WORD | UMASK.ASM | File permission mask (not part of startup) |
0077:00C4 | ___aDBexit | WORD | CRT0.ASM | QuickC debug info |
0077:00CA | ___aDBptrchk | WORD | CRT0.ASM | QuickC debug info |
0077:00C2 | ___aDBrterr | WORD | CRT0.ASM | QuickC debug info |
0077:00C0 | ___aDBswpchk | WORD | CRT0.ASM | QuickC debug info |
0077:00BE | ___aDBswpflg | WORD | CRT0.ASM | QuickC debug info |
0077:009F | ___argc | WORD | CRT0.ASM | Count of command-line arguments |
0077:00A1 | ___argv | DWORD | CRT0.ASM | Command-line arguments |
0077:00BC | ___qczrinit | WORD | CRT0.ASM | QuickC debug info |
The true beginning of a C program is located in the CRT0.ASM file. The first line of code, labeled _ _astart, comes after some variables are defined. Main doesn’t get called until much later. You may have seen the _ _astart label if you have done any debugging in the CodeViewÒ debugger in unassembled mode. After _ _astart the first thing the startup does is check to make sure the MS-DOS version is 2.0 or later. If it is not, the program abruptly aborts and sends you back to the MS-DOS prompt. No explanations. No messages. Nothing. Perhaps this is one way to punish users who are still on MS-DOS 1.0. If the version is 2.0 or later, the startup checks if there is enough stack space. The default stack is set to the constant STACK_SIZE, which is 2048 bytes in MS-DOS. If 2KB is not sufficient, you can change STACK_SIZE, which is in CRT0.ASM, or link with the /STACK option. If there is less stack available than requested the startup aborts, but this time with a run-time error message to the user. In small- and medium-model programs the stack is always a part of DGROUP. In large- and compact-model programs the stack can be in its own segment, if necessary. If your program requires a large amount of stack space or if you want a full 64KB for near data, you can rebuild the startup code with the /DFARSTACK option. If you do decide you need a far stack, all of your programs must operate with SS != DS.
The heap has been totally reorganized for version 6.0. This change was necessary to support COM programs and based pointers. All heap variables are now in the structure heap_seg_desc defined in HEAP.INC.
_heap_seg_desc struc
checksum dw ? ; Checksum area
flags dw ? ; Flags word
segsize dw ? ; Size of segment
start dw ? ; Offset of first heap entry
rover dw ? ; Rover offset
last dw ? ; Offset to end-of-heap marker
nextseg dd ? ; Far pointer to next
; heap_seg_desc
prevseg dd ? ; Far pointer to
; previous heap_seg_desc
_heap_seg_desc ends
This change in memory management may render some TSRs useless. If your TSR worked fine under C 5.1 and it won’t link or behaves erratically under C 6.00AX, you may have been using a variable that is no longer defined or has a different meaning. For example, the _ _abrktb variable that was a convenient way to determine the end of your program is no longer in version 6.00AX. The variable named last in the heap_seg_desc struc now serves the same purpose. In small- and medium-model programs the initial heap is carved out of the bottom of the stack. The size of the heap is 64KB minus the length of DGROUP.
After the stack and heap are set up, an interesting thing happens. The startup checks to see if any uninitialized global or static data is stored in the _BSS segment, and if so sets it to zero. This type of data is placed in the _BSS segment at compile time. In Figure 5 the variables b and d are placed in the DATA segment while variables a and c, which are uninitialized, get placed in the _BSS segment. This brings up an important point. If during run time your uninitialized data is set to zero, you should initialize it at compile time to avoid the extra overhead at run time, allowing your programs to load more quickly.
Figure 5 MAIN.C
MAIN.C
void main(void);
char a[20];
char b[20]={0};
void main(void)
{
static c;
static d=5;
}
Assembly Listing of MAIN.C
; Static Name Aliases
;
; $S104_c EQU c
; $S105_d EQU d
TITLE main.c
.8087
INCLUDELIB SLIBCE
_TEXT SEGMENT WORD PUBLIC 'CODE'
_TEXT ENDS
_DATA SEGMENT WORD PUBLIC 'DATA'
_DATA ENDS
CONST SEGMENT WORD PUBLIC 'CONST'
CONST ENDS
_BSS SEGMENT WORD PUBLIC 'BSS'
_BSS ENDS
DGROUP GROUP CONST, _BSS, _DATA
ASSUME DS: DGROUP, SS: DGROUP
PUBLIC _b
EXTRN _ _acrtused:ABS
EXTRN _ _aNchkstk:NEAR
_BSS SEGMENT
COMM NEAR _a: 1: 20
_BSS ENDS
_DATA SEGMENT
_b DB 00H
DB 19 DUP(0)
$S105_d DW 05H
_DATA ENDS
_BSS SEGMENT
$S104_c DW 01H DUP (?)
_BSS ENDS
_TEXT SEGMENT
ASSUME CS: _TEXT
; Line 1
; Line 9
PUBLIC _main
_main PROC NEAR
xor ax,ax
call _ _aNchkstk
; Line 13
ret
_main ENDP
_TEXT ENDS
END
After cleaning up the _BSS, the startup sets up the environment. The environment consists of an array of ASCIIZ strings and an array of pointers to them. The variable _environ points to the environment strings, each of which looks like the typical varname=string that you see when you execute the MS-DOS set command. Two null bytes in a row indicate the end of the environment (one null byte for the last environment string and one for the end of the environment array itself). In STDENVP.ASM, the setenvp function searches for these two null bytes and when it finds them copies the environment into the heap. The standard library functions putenv and getenv work only on this local copy of the environment, not the master copy that MS-DOS keeps. That’s why if you add an environment variable during the scope of your program it disappears when your program terminates. If your TSR does not use any environment variables, you may want to edit CRT0.ASM (see Figure 6) and comment out the call to setenvp. The size of your current TSR will be reduced by the size of the environment when you run.
Figure 6 Partial Listing of CRT0DAT.ASM
;***
;crt0dat.asm - DOS and Windows shared startup and termination
; Copyright (c) 1985-1990, Microsoft Corporation. All rights reserved.
; Purpose: Shared startup and termination.
; NOTE: This source is included in crt0.asm for assembly purposes
; when building .COM startup. This is so the .COM startup resides
; in a single special object that can be supplied to the user.
;*******************************************************************************
_NFILE_ = 20 ; Maximum number of file handles
?DF = 1 ; tell cmacros.inc we want to define our own segments
.xlist
include version.inc
include cmacros.inc
include msdos.inc
.list
ifdef FARSTACK
ife sizeD
error <You cannot have a far stack in Small or Medium memory models.>
endif
endif
ifdef _COM_
if sizeC or sizeD
error <Must use Small memory model for .COM files.>
endif
endif ;_COM_
o
o
o
sBegin xiqcseg
globalW __qczrinit, 0 ;* QC -Zr initializer call address
sEnd xiqcseg
ifdef _COM_
sBegin EmData
labelB _EmDataLabel
sEnd EmData
sBegin EmCode
globalW _EmDataSeg,0
sEnd EmCode
else ;not _COM
EMULATOR_DATA segment para public ‘FAR_DATA’
EMULATOR_DATA ends
EMULATOR_TEXT segment para public ‘CODE’
public __EmDataSeg
__EmDataSeg dw EMULATOR_DATA
EMULATOR_TEXT ends
endif ;not _COM_
sBegin data
assumes ds,data
; special C environment string
labelB <PUBLIC,_acfinfo>
cfile db ‘_C_FILE_INFO=’
cfilex db 0
cfileln = cfilex-cfile
globalD _aintdiv,0 ; divide error interrupt vector save
globalT _fac,0 ; floating accumulator
globalW errno,0 ; initial error code
globalW _umaskval,0 ; initial umask value
;=============== following must be in this order
globalW _pspadr,0 ; psp:0 (far * to PSP segment)
globalW _psp,0 ; psp:0 (paragraph #)
;=============== above must be in this order
;=============== following must be in this order
labelW <PUBLIC,_osversion>
labelB <PUBLIC,_dosvermajor>
globalB _osmajor,0
labelB <PUBLIC,_dosverminor>
globalB _osminor,0
;=============== above must be in this order
globalB _osmode,0 ; 0 = real mode
labelW <PUBLIC,_oserr>
globalW _doserrno,0 ; initial DOS error code
globalW _nfile,_NFILE_ ; maximum number of file handles
labelB <PUBLIC,_osfile>
db 3 dup (FOPEN+FTEXT) ; stdin, stdout, stderr
db 2 dup (FOPEN) ; stdaux, stdprn
db _NFILE_-5 dup (0) ; the other 15 handles
globalW __argc,0
globalDP __argv,0
globalDP environ,0 ; environment pointer
labelD <PUBLIC,_pgmptr> ; pointer to program name
dw dataOFFSET dos2nam
ifdef _COM_
dw 0 ; No relocations in tiny model
elseifdef _QC2
dw 0 ; No DGROUP references allowed
elseifdef _WINDOWS
dw 0 ; No DGROUP references allowed
else ;DEFAULT
dw DGROUP
endif
dos2nam db 0 ; dummy argv[0] for DOS 2.X
; signal related common data
globalW _child,0 ; flag used to handle signals from child process
;Overlay related data
globalB _ovlflag,0 ; Overlay flag (0 = no overlays)
globalB _intno,0 ; Overlay interrupt value (e.g., 3F)
globalD _ovlvec,0 ; Address of original overlay handler
sEnd data
page
externNP _fptrap
externP _cintDIV
externP _nullcheck
ifdef FARSTACK
endif
sBegin code
assumes cs,code
if sizeC
global proc far
endif
page
;***
;_cinit - C initialization
; This routine performs the shared DOS and Windows initialization.
; The following order of initialization must be preserved -
; 1. Integer divide interrupt vector setup
; 2. Floating point initialization
; 3. Copy ;C_FILE_INFO into _osfile
; 4. Check for devices for file handles 0 - 4
; 5. General C initializer routines
;*******************************************************************************
cProc _cinit,<PUBLIC>,<>
cBegin <nogen> ; no local frame to set up in standard libs
assumes ds,data
ifndef FARSTACK
assumes ss,data
endif
; Initialize the DGROUP portion of _pgmptr. We must do this at
; runtime since there are no load-time fixups in .COM files.
ifdef _COM_
mov word ptr [_pgmptr+2],ds ; init seg portion of _pgmptr
endif ;_COM_
; *** Increase File Handle Count ***
; (1) This code only works on DOS Version 3.3 and later.
; (2) This code is intentially commented out; the user must enable
; this code to access more than 20 files.
; mov ah,67h ; system call number
; mov bx,_NFILE_ ; number of file handles to allow
; callos ; issue the system call
; ;check for error here, if desired (if carry set, AX equals error code)
; *** End Increase File Handle Count ***
; 1. Integer divide interrupt vector setup
mov ax,DOS_getvector shl 8 + 0
callos ; save divide error interrupt
mov word ptr [_aintdiv],bx
mov word ptr [_aintdiv+2],es
push cs
pop ds
assumes ds,nothing
mov ax,DOS_setvector shl 8 + 0
mov dx,codeOFFSET _cintDIV
callos ; set divide error interrupt
push ss
pop ds
assumes ds,data
; 2. Floating point initialization
if memS
cmp word ptr [fpmath], 0 ; Note: make sure offset _ _fpmath != 0
je nofloat_i
mov word ptr [fpmath+2], cs ; fix up these far addresses
mov word ptr [fpsignal+2], cs ; in the small model math libs
ifdef _COM_
mov [_EmDataSeg], cs
mov ax, offset DGROUP:_EmDataLabel
sub ax, offset EMULATOR_DATA:_EmDataLabel
mov cl, 4
shr ax, cl
add [_EmDataSeg], ax
endif ;_COM_
else ;not memS
mov cx,word ptr [fpmath+2]
jcxz nofloat_i
endif ;not memS
mov es,[_psp] ; psp segment
mov si,es:[DOS_ENVP] ; environment segment
ifdef FARSTACK
mov ax, word ptr [fpdata]
mov dx, word ptr [fpdata+2]
else
lds ax,[fpdata] ; get task data area
assumes ds,nothing
mov dx,ds ; into dx:ax
endif
xor bx,bx ; (si) = environment segment
call [fpmath] ; fpmath(0) - init
ifdef FARSTACK
mov ax, DGROUP
mov ds, ax
endif
jnc fpok
ifndef FARSTACK
push ss ; restore ds from ss
pop ds
endif
jmp _fptrap ; issue "Floating point not loaded"
; error and abort
fpok:
ifdef FARSTACK
mov ax, word ptr [fpsignal]
mov dx, word ptr [fpsignal+2]
else
lds ax,[fpsignal] ; get signal address
assumes ds,nothing
mov dx,ds
endif
mov bx,3
call [fpmath] ; fpmath(3) - set signal address
ifdef FARSTACK
mov ax, DGROUP
mov ds, ax ; restore DS=DGROUP
else
push ss
pop ds
assumes ds,data
endif
nofloat_i:
; 3. Copy _C_FILE_INFO= into _osfile
; fix up files inherited from parent using _C_FILE_INFO=
mov es,[_psp] ; es = PSP
mov cx,word ptr es:[DOS_envp] ; es = user’s environment
jcxz nocfi ; no environment !!!
mov es,cx
xor di,di ; start at 0
cfilp:
cmp byte ptr es:[di],0 ; check for end of environment
je nocfi ; yes - not found
mov cx,cfileln
mov si,dataOFFSET cfile
repe cmpsb ; compare for ‘_C_FILE_INFO=’
je gotcfi ; yes - now do something with it
mov cx,07FFFh ; environment max = 32K
xor ax,ax
repne scasb ; search for end of current string
jne nocfi ; no 00 !!! - assume end of env.
jmp cfilp ; keep searching
; found _C_FILE_INFO= and transfer info into _osfile
gotcfi:
push es
push ds
pop es ; es = DGROUP
pop ds ; ds = env. segment
assumes ds,nothing
assumes es,data
mov si,di ; si = startup of _osfile info
mov di,dataOFFSET _osfile ; di = _osfile block
mov cl, 4
osfile_lp:
lodsb
sub al, ‘A’
jb donecfi
shl al, cl
xchg dx, ax
lodsb
sub al, ‘A’
jb donecfi
or al, dl
stosb
jmp short osfile_lp
donecfi:
ifdef FARSTACK
push es
else
push ss
endif
pop ds ; ds = DGROUP
assumes ds,data
nocfi:
; 4. Check for devices for file handles 0 - 4
; Clear the FDEV bit (which might be inherited from C_FILE_INFO)
; and then call DOS to see if it really is a device or not
mov bx,4
devloop:
and _osfile[bx],not FDEV ; clear FDEV bit on principal
mov ax,DOS_ioctl shl 8 + 0 ; issue ioctl(0) to get dev info
callos
jc notdev
test dl,80h ; is it a device ?
jz notdev ; no
or _osfile[bx],FDEV ; yes - set FDEV bit
notdev:
dec bx
jns devloop
; 5. General C initializer routines
mov si,dataOFFSET xifbegin
mov di,dataOFFSET xifend
if sizeC
call initterm ; call the far initializers
else
call farinitterm ; call the far initializers
endif
mov si,dataOFFSET xibegin
mov di,dataOFFSET xiend
call initterm ; call the initializers
ret
cEnd <nogen> ; standard C libs
o
o
o
The startup then calls the function setargv in STDARGV.ASM to set up the _argc and _argv[] variables. After setargv gets the command line from the Program Segment Prefix (PSP) at offset 81H and moves it to the heap, it points the variable _argv[1] to the array of pointers to ASCIIZ strings, each of which is an argument from the command line; _argc is the number of arguments. Under MS-DOS 3.x and later, argv[0] points to the fully qualified pathname of the program being run, which the startup gets from the environment segment. Under MS-DOS version 2.x there is no fully qualified pathname in the environment segment, so a null is stored in argv[0] instead. The rest of the command line or "command tail" is found in the PSP and is terminated by a carriage return (0DH). This command tail cannot be greater than 126 bytes in length. If your program needs to process wildcards on the command line, you must link with the SETARGV.OBJ file, which causes startup to use an argument-passing module (WILD.C) that has been assembled with the WILDCARD ifdef. This object file comes with the compiler. If you linked with SETARGV.OBJ, wildcard processing will take place here. If you look at the code in WILD.C, you may be surprised that it looks like 1978 K&R C. It lacks prototypes and uses the old-style function definitions. Since Microsoft also offers a C compiler for the XENIXÒ operating system, I assume some parts of the startup are shared. In fact, XENIX2 isn’t the only other environment shared by the startup; the Microsoft QuickCÒ compiler and Microsoft FORTRAN also use pieces.
Once the command line and environment are taken care of, the startup gives the programmer an opportunity to increase the program’s file handle count. Starting with MS-DOS version 3.3, programs can have more than 20 open files at a time. The default is still 20 in all C 6.00AX programs. If you want more file handles, just change the _NFILE_ constant to the number of file handles you need, then uncomment the source lines in CRT0DAT.ASM (see Figure 6).
Next, the startup routes the integer-divide-by-zero vector to the _cintDIV function in CRT0DAT.ASM, so your program won’t lock the machine if it has a divide-by-zero error. Instead of locking the machine, your program produces a run-time error (RTE). There are 18 RTEs (see Figure 7). These messages are processed in CRT0MSG.ASM and RTERR.INC by the _NMSG_WRITE and _NMSG_TEXT functions. Nothing is more embarrassing for a developer than to release code that caused some type of RTE. All C 6.00AX RTEs are in the format
R6<RTE number> <description>
leaving no doubt in anyone’s mind what has happened to your program. But there is a way to replace the C 6.00AX messages with your own custom handling. The code in Figure 8 displays the function message "XYZ company internal error. Contact customer support." The main function that follows forces a divide-by-zero error to verify that the replacement _NMSG_WRITE function works correctly.
Figure 7 Run-time Errors
0 | Stack overflow |
1 | Null pointer assignment |
2 | Floating point not loaded |
3 | Integer divide by 0 |
4 | Undefined |
5 | Not enough memory on exec |
6 | Bad format on exec |
7 | Bad environment on exec |
8 | Not enough space for arguments |
9 | Not enough space for environment |
10 | Abnormal program termination |
11 | Undefined |
12 | Illegal near pointer use |
14 | Ctrl-Break encountered (QC 1.0 only) |
15 | Unexpected interrupt (QC 1.0 only) |
16 | OS/2 RTE |
17 | OS/2 RTE |
18 | Unexpected heap error |
252 | \r\n |
255 | Run time error banner |
Figure 8 Handling Run-time Errors Yourself
// cl /c /WX rte.c
// link /NOE rte; // Since you will be replacing an
// existing standard library function use /NOE
_pascal _ _NMSG_WRITE(int);
_pascal _ _NMSG_WRITE(int error)
{
switch(error)
{
// 252 and 255 are the constants for CRLF and the run-time
// banner C6.00AX normally prints out.
case 252:
case 255:
break;
default:
cprintf(
"XYZ company internal error # %d. Contact customer support\r\n",error);
break;
}
}
main(void)
{
_asm{
mov ax,0
div ax
ret
}
}
The next procedure for the startup is initializing for floating-point math. If the global variable _fpinit is nonzero, the startup initializes floating point. This variable will be set if any of your code uses floating-point variables or expressions. Then the startup checks to see if it is running with file handles inherited from a parent process. This is done by checking the environment strings for the _C_FILE_INFO variable. If it’s there, it’s the last variable in the environment. The startup takes the information from _C_FILE_INFO and places it into an array of bytes called osfile. This array, defined in CRT0DAT.ASM, contains the handles for stdin, stdout, stderr, stdaux, and stdprn. It also leaves space for the other 15 available file handles (or more if you increased the _NFILE_ variable). The startup will not copy the _C_FILE_INFO variable into the copy of the environment that the program gets. _C_FILE_INFO was called ;C_FILE_INFO in C 5.1 and caused a lot of headaches for programmers. C 5.1 passed the information about open files and translation modes in binary. This wasn’t a problem if a C 5.1 program was calling another C 5.1 program because the startup code of the called program does not copy the ;C_FILE_INFO variable into the program’s copy of the environment. However, if the child process was not a C 5.1 process (such as COMMAND.COM), the ;C_FILE_INFO variable was visible in the environment. Because the open file information was binary, if a program permitted the user to shell to DOS and the user executed the set command, the ;C_FILE_INFO displayed as graphics characters, so it looked as if the environment was corrupted. This is prevented in C 6.00AX because open file information is not passed through the environment unless specifically requested. To pass open file information to a child process, the _fileinfo variable must be set to true. If it isn’t, the standard library exec functions (spawn and exec) will not place the _C_FILE_INFO variable into the copy of the environment that the child process receives.
Anyone who has looked at a MAP file is familiar with the _TEXT, _DATA, _BSS, and STACK segments, but the rest of an EXE’s segments are probably a mystery. The DBData segment contains data for use by QuickC3 in debug mode. The remaining segments execute functions that need to be called outside the body of your program. Segments beginning with the letter X signify an initializer or terminator segment (see Figure 9). The initialization segments consist of calls to near or far routines written by you or a third-party library you are linking with. The terminator segments are comprised of an onexit table, preterminators, and terminators or far terminators. The onexit table is filled by the onexit function, which is a standard library routine. The preterminator segment is the place for functions that should be called during normal termination of a program; that is, if no errors have taken place. The terminator segment contains functions to be executed in all situations. All of these segments are grouped into a segment beginning, segment end, and a placeholder for a pointer to a function. Many third-party libraries use these segments as a convenient way to initialize and shut down. For example, libraries created to be linked in your code might need to take over an interrupt or to check for a certain piece of hardware. It would be nice if they could initialize the hardware without calling an initialization function. The solution is to place a pointer to their initialization function in one of the X-segments and let the startup call it. The alternative to using an X-segment is the old method sprinkled like salt in every function that needs initialization.
if !(initialized)
init();
When the program ends, the third-party library would use a terminator segment to restore the interrupt.
Figure 9 Startup Code Segments
Start Stop Length Name Class
00000H 00511H 00512H _TEXT CODE
00512H 00512H 00000H C_ETEXT ENDCODE
00520H 00561H 00042H NULL BEGDATA
00562H 0060DH 000ACH _DATA DATA
0060EH 0061BH 0000EH CDATA DATA
0061CH 0061CH 00000H XIFB DATA
0061CH 0061CH 00000H XIF DATA
0061CH 0061CH 00000H XIFE DATA
0061CH 0061CH 00000H XIB DATA
0061CH 0061CH 00000H XI DATA
0061CH 0061CH 00000H XIE DATA
0061CH 0061CH 00000H XPB DATA
0061CH 0061CH 00000H XP DATA
0061CH 0061CH 00000H XPE DATA
0061CH 0061CH 00000H XCB DATA
0061CH 0061CH 00000H XC DATA
0061CH 0061CH 00000H XCE DATA
0061CH 0061CH 00000H XCFB DATA
0061CH 0061CH 00000H XCF DATA
0061CH 0061CH 00000H XCFE DATA
0061CH 0061CH 00000H CONST CONST
0061CH 00623H 00008H HDR MSG
00624H 006F1H 000CEH MSG MSG
006F2H 006F3H 00002H PAD MSG
006F4H 006F4H 00001H EPAD MSG
006F6H 006F6H 00000H _BSS BSS
006F6H 006F6H 00000H XOB BSS
006F6H 006F6H 00000H XO BSS
006F6H 006F6H 00000H XOE BSS
00700H 00EFFH 00800H STACK STACK
Origin Group
0052:0 DGROUP
Program entry point at 0000:0016
Figure 10 gives an example of an initializer and a terminator. The XI is the initializer segment, and it contains a pointer to the function init. Placing the pointer to the function in the XI segment means it will be accessed during startup and the init function will be performed. All of this will take place before main is called. You can have up to 32 pointers to functions in these initialization segments. That is what the XIB (begin) and XIE (end) segments are for. The startup checks to see if their addresses are equal, and if they are, there are no functions to perform. If the addresses are not equal, each pointer to a function is called and begin is incremented until the begin and end segments are equal. In compact- and large-model programs, a segment for far functions called XIF will be called in place of the XI segment routine. The terminator segment references the pointer to the function term() that is in the XC segment (XCF in compact and large model). The init and term functions are shown in Figure 11. These functions simply cprintf a message declaring their presence. Notice that there is no reference to either init or term in main. These X-segments can be used for any type of function that is prototyped as
void func(void)
However, avoid the standard library functions that use FILE*, such as fprintf, sprintf, and fopen. The FILE* may not have been initialized or, on termination, startup may have already flushed the I/O streams.
Figure 10 XSEGS.ASM
;
;masm xsegs;
;
;
XIB SEGMENT WORD PUBLIC 'DATA'
XIB ENDS
XI SEGMENT WORD PUBLIC 'DATA'
XI ENDS
XIE SEGMENT WORD PUBLIC 'DATA'
XIE ENDS
XCB SEGMENT WORD PUBLIC 'DATA'
XCB ENDS
XC SEGMENT WORD PUBLIC 'DATA'
XC ENDS
XCE SEGMENT WORD PUBLIC 'DATA'
XCE ENDS
EXTRN _ _acrtused:ABS
EXTRN _init:NEAR
EXTRN _term:NEAR
XI SEGMENT
DW _init
XI ENDS
XC SEGMENT
DW _term
XC ENDS
END
Figure 11 STARTUP.C with Init and Term Functions
// compile with cl /c startup.c
// link with link startup+xsegs;
void main(void)
{
cprintf("We are inside main\r\n");
}
void init(void)
{
cprintf("Initializing...\r\n");
}
void term(void)
{
cprintf("Terminating....\r\n");
}
The initialization is now finished, so the startup gets down to the business of calling main. The following prototype
int _cdecl main(int argc, char **argv, char **envp)
is the proper function prototype according to what the startup pushes onto the stack. Now you know why main is declared as _cdecl. Since programs that don’t use the command line or environment sometimes declare themselves as main(void), _cdecl allows them to pass a variable number of parameters. That is why you don’t find a prototype in any of the C 6.00AX header files for main.
During the execution of main the startup has no control. However, the programmer is free to use any of the public variables or functions that the startup provides. After main returns, a program can terminate by calling exit, _cexit, _exit, or _c_exit. These functions are in CRT0DAT.ASM. You should call only one function. The exit function is the default. The exit(exit_code) function calls C run time preterminator and terminator functions. The exit function then terminates the process with the exit_code parameter, which is supplied by the programmer. This exit_code can be used as a return code if your program is a child process (if it was spawned) or it can be used to set the DOS ERRORLEVEL for batch file processing. The function _cexit(void) performs the same termination processing as exit but returns control to the caller when finished (that is, your program does not end). This call is useful for TSRs that require the divide-by-zero-interrupt vector reset. The _exit(exit_code) function performs a quick exit routine that does not call the programmer’s terminator routines or the preterminators. The _exit function terminates the process with the exit_code supplied by the programmer like the exit function. The _c_exit(void) function is the same as _exit, except that _c_exit returns control to the caller when finished.
As part of the exit processing, the startup checks to see if your program has assigned data via a null pointer. Using a null pointer is a common C mistake. To help detect this problem, the startup places the NULL segment first in DS in small- and medium-model programs. In CHKSUM.ASM, the function nullcheck performs a checksum on this NULL data segment. If the checksum fails, it means that some function has used a null pointer (DS:0). If nullcheck does fail, it aborts with the "Null pointer assignment" RTE. In large- and compact-model programs, the nullcheck function isn’t as effective. If a far pointer is null (0:0), any data written into the zero segment, which is the MS-DOS interrupt table, will probably cause the system to hang. Trying to squeeze that last bit of memory out of your program? Replace the nullcheck function with your own empty function,
_nullcheck(void){}
and link it in before the standard library. Of course, do this only after you are sure that you have no null pointer assignments.
You may have thought that it wasn’t a big deal to call main, but there actually is a lot of work to be done both before and after main is called. I’ve demonstrated some powerful capabilities of the startup that you may not have known existed. Now that you know they are there, you may want to check the startup functions before you write any of your own. A couple of caveats: Microsoft doesn’t offer customer support for the startup code, nor is the presence of any variable or function guaranteed from version to version. Anything may be changed at any time. So if you do make changes to the startup code, you will have to continue them with each version of the C compiler. But this doesn’t outweigh the benefits of generating more efficient programs.
1For ease of reading, "MS-DOS" refers to the Microsoft MS-DOS operating system. MS-DOS is a trademark that refers only to this Microsoft product and is not intended to refer to such products generally.
2For ease of reading, "XENIX" refers to the Microsoft XENIX operating system. XENIX is a trademark that refers only to this Microsoft product and is not intended to refer to such products generally.
3For ease of reading, "QuickC" refers to the Microsoft QuickC compiler. QuickC is a trademark that refers only to this Microsoft product and is not intended to refer to such products generally.