Portability means that Windows NT runs on both CISC and RISC processors. CISC includes computers running with Intel 486 or higher processors. RISC includes computers with MIPS R4000or Digital Alpha AXP, or PowerPC processors.
Scalability means that Windows NT 4.0 takes full advantage of symmetric multiprocessing hardware. It allows the Microkernel to execute on any processor and allows the processors to run any thread. It does this, in part, by allowing the Microkernel to preempt lower priority threads and requiring all code to be reentrant, even in the Executive.
Incorporating Win32 graphics functions into the Executive significantly improves the scalability of Windows NT because graphics calls no longer involve context switches. Context switching requires systemwide spinlocks which limit the efficacy of multiple processors. The simple kernel mode thread transitions now involved in graphics calls do not require spinlocks.
In addition, although the Window Manager and GDI now run in kernel mode, these threads are all still scheduled and preemptible and all code is reentrant. In fact, only Microkernel code which, technically, does not run on a thread, is not preemptible.
Also, the threads of the GDI still run asynchronously; that is, they do not wait for the threads of other applications and do not require application threads to wait for them. On multiprocessor computers, multiple threads can run in Window Manager and the GDI simultaneously. GDI synchronizes with applications only when they need access to the same device. This allows the threads of the GDI, and of the applications that call its functions, to run on any available processor.
Windows NT 4.0 applications now use a single thread to get graphics services from the Executive. Previous versions required two paired threads; one in the application and one in the Win32 subsystem. Ironically, Windows NT 4.0 is not losing the efficiency of running parallel threads simultaneously, just because it never had it. Even on symmetric multiprocessing systems, almost all calls to Win32 run synchronously. Each thread each waits for the other to finish before proceeding. However, even this single-threaded system runs more efficiently on multiprocessor systems than the double-threaded model because costly context switches are eliminated. Also, less memory is required when graphics services run in the context of a single thread, instead of paired thread stacks.