If you decide to use the file system to access your file data, get a reasonable chunk of data at a time. If you are processing a file sequentially, get 4K or 8K at a time to reduce the number of calls you have to make to the file system. There is no point in crossing the boundary between user mode and privileged mode and going through a slew of protection and security checks unnecessarily. Of course, if you randomly access small amounts of data, you are probably better off not reading or writing large numbers of bytes you don't need. In that case try to map the file.
Think about using multiple threads to improve your performance on multiple processor computers. Just because you have a desktop application does not mean you cannot take advantage of multiple threads. First, you can use multiple threads as a technique to get back to the user quickly when the user has requested a task that takes a little time. Second, the day is not far off when we will see multiple processors on the desktop.
If you are working on a server application, you certainly want to use multiple threads, because multiprocessor servers will soon be commonplace.
In MS-DOS systems, there was a limit on the number of files the system could have open at one time. This led to a coding style of opening and closing files frequently. Because of the additional protection and security in Windows NT, the action of opening a file uses more resources, and we don't encourage this coding style. Open files and leave them open for access. There is no limit on how many can be opened at one time, other than the size of non-paged pool; it cannot be allocated so large as to take all of physical memory. But we are talking many thousands of files before this is a consideration. So don't be afraid to leave your files open.
In Windows there was a distinction between memory obtained using LocalAlloc versus memory obtained with GlobalAlloc. Windows NT supports both allocation calls to make porting to Windows NT easier, but for 32-bit applications they execute the identical underlying code. The memory allocated is local to your process, and will be deleted by the system when your process dies. You cannot share it with another process; that's what CreateFileMapping is for. The one place where this is not true is when the memory is flagged as GMEM_DDESHARE, which Windows NT handles differently. Only applications using dynamic data exchange (DDE) or the clipboard will specify this flag. For 16-bit applications the calls appear to work as they did on 16-bit Windows, because these all execute in the NTVDM process.
If you're looking for the acme of performance on short bursts of activity, use the Real-Time Priority class. It's most useful for an application which is processing data in real time or doing time-sensitive communication with an external device. Your application must run in short bursts and not keep the processor for very long before waiting for the device to deliver more data. This is because you will be preempting all activity on the system, including the work of Windows NT system processes.
Another useful facility for development of real-time applications is the VirtualLock call. This permits you to identify a small number of pages to retain in memory so you will not have to wait for pages to come in from the disk when attempting to respond to a real-time device. You should implement a design that minimizes the amount of code that executes in the Real-Time Priority class with locked pages. You can use Event objects and shared named memory to exchange information with processes running at normal priority and thus minimize the real-time code.
One way to improve your performance when storing and retrieving data from the Configuration Registry is to use the new data type MULTI_SZ. This data type permits you to store a set of data values under the name of a single value by concatenating the strings into a single "multistring." A multistring has multiple individual strings separated by TEXT('/0'), with the last one followed by an additional TEXT('/0'). One call to the registry will retrieve all the strings. This is very efficient, especially if the value is accessed remotely. Performance Monitor counter names and Explain text are stored in two giant MULTI_SZ multistrings. Performance Monitor retrieves them all with just two RPC calls to the remote registry during remote monitoring.
This touches on another point. Internally, Windows NT uses Unicode™. (Unicode is a 16-bit character-coding standard which includes symbols for all international languages.) When an application passes ASCII text strings to the system (to be stored in the Configuration Registry for example), they are translated to Unicode right off the bat. They must be translated in the reverse direction if the application is coded to deal with ASCII. So the obvious right thing to do, at least from a performance viewpoint, is to write the application to work with Unicode. This will avoid some unnecessary overhead and make the application easier to port to foreign languages, especially in the Far East. So if you want those trips to the Far East to work on the Asian versions of your application, use Unicode.