INF: Difference Between Arrays and Pointers in C

ID Number: Q44463

5.10 6.00 6.00 6.00ax 7.00 | 5.10 6.00 6.00a 6.00ax

MS-DOS | OS/2

Summary:

The following is a sample of a common mistake where array and pointer

declarations are confused:

A program is divided into several modules. In one module, declare

an array with the following declaration:

signed char buffer[100];

In another module, access the variable with one of the following:

extern signed char *buffer; /* FAILS */

extern signed char buffer[]; /* WORKS */

CodeView reveals that the program is using the wrong address

for the array in the first case. The second case works correctly.

More Information:

The following declarations are NOT the same:

char *pc;

char ac[20];

The first declaration sets aside memory for a pointer; the second sets

aside memory for 20 characters.

A picture of pc and ac in memory might appear as follows:

pc +--------+

| ??? |

+--------+

ac +-----+-----+-----+-----+ +-----+

| ? | ? | ? | ? | ... | ? |

+-----+-----+-----+-----+ +-----+

The same is true for the following:

extern char *pc;

extern char ac[];

Thus, to access the array in ac in another module, the correct

declaration is as follows:

extern signed char ac[];

In your case, the correct declaration is the following:

extern char buffer[];

The first declaration says that there's a pointer to char called pc

(which is 2 or 4 bytes) somewhere out there; the second says that

there's an actual array of characters called ac.

The addressing for pc[3] and ac[3] is done differently. There are some

similarities; specifically, the expression "ac" is a constant pointer

to char that points to &ac[0]. The similarity ends there, however.

To evaluate pc[3], we first load the value of the pointer pc from

memory, then we add 3. Finally, we load the character that is stored

at this location (pc + 3) into a register. The MASM code might appear

as follows (assuming small-memory model):

MOV BX, pc ; move *CONTENTS* of pc into BX

; BX contains 1234

MOV AL, [BX + 3] ; move byte at pc + 3 (1237) into AL

; ==> AL contains 'd'

A picture might appear as follows, provided the pc is properly set to

point to an array at location 1234 and that the array contains "abcd"

as its first four characters:

address: 1000 1234 1235 1236 1237

pc +--------+--->>>>>------v-----v-----v-----v-----+

| 1234 | *pc | a | b | c | d | ...

+--------+ +-----+-----+-----+-----+

pc[0] pc[1] pc[2] pc[3]

*pc *(pc+1) etc.

Note: Using pc without properly initializing it (a simple way to

initialize it is "pc = malloc(4);" or "pc = ac;") causes you to access

random memory you didn't intend to access (and causes the strange

behavior).

Because ac is a constant, it can be built into the final MOV command,

eliminating the need for two MOVs. The MASM code might appear as

follows:

MOV AL, [offset ac + 3] ; mov byte at ac + 3 into AL

; offset ac is 1100, so move

; byte at 1103 into AL

; ==> AL contains 'd'

The picture might appear as follows:

address: 1100 1101 1102 1103 1119

ac +-----+-----+-----+-----+ +-----+

| a | b | c | d | ... | \0 |

+-----+-----+-----+-----+ +-----+

ac[0] ac[1] ac[2] ac[3] ac[19]

*ac *(ac+1) etc.

Note: If you first initialize pc to point to ac (by saying "pc =

ac;"), then the end effect of the two statements is exactly the same.

(This change can be shown in the picture by changing pc so it contains

the address of ac, which is 1100.) However, the instructions used to

produce these effects are different.

Note: If you declare ac to be as follows, the compiler will generate

code to do pointer-type addressing rather than array-type addressing:

extern char *ac; /* WRONG! */

The compiler will use the first few bytes of the array as an address

(rather than characters) and access the memory stored at that

location, which is why the problems result.

Additional reference words: 5.00 5.10 6.00 6.00a 6.00ax 7.00