You've seen that the far function prolog inserted by the compiler is modified by Windows if the function is exported. When far functions are in memory, they can have one of three possible prologs. If the prolog starts off like this:
MOV AX, DS
NOP
then the far function is called only from within the same program and is not called by Windows. If the prolog starts off like this:
NOP
NOP
NOP
then the far function has been exported. It is not called from within the program but instead is called only by Windows. A program's window procedure starts off in this way.
What's the purpose of the extra NOP that shows up in both prologs? The extra NOP disappears in the third form of the prolog:
MOV AX, xxxx
You'll find this form at the beginning of many Windows functions in the Windows library modules (USER, KERNEL, and GDI) and drivers (MOUSE, KEYBOARD, SYSTEM, and so forth). Unlike Windows programs, Windows libraries cannot have multiple instances, so they do not need instance thunks. The far function itself can contain code that sets AX equal to the segment address of the library's data segment (the value xxxx). The rest of the prolog then saves DS and sets DS equal to AX.
When Windows moves a library's data segment in memory, it must change the prologs of the library functions that use that data segment. And when your program calls a Windows function, the address it calls is either the address of the function's reload thunk (if the function is in a moveable segment) or the function itself (if the function is in a fixed segment).
Note that Windows library functions use their own data segments but continue to use the stack of the caller. If the library function were to use its own stack, it would need to copy function parameters from the caller's stack. Windows switches the stack only when switching between tasks; calling a library function does not constitute a task switch.