INFO: Design Issues When Using IOCP in a Winsock Server
ID: Q192800
|
The information in this article applies to:
-
Microsoft Win32 Software Development Kit (SDK), used with:
-
Microsoft Windows NT 4.0
-
Microsoft Windows 2000
SUMMARY
This article assumes you already understand the I/O model of the Windows NT
I/O Completion Port (IOCP) and are familiar with the related APIs. If you
want to learn IOCP, please see Advanced Windows (3rd edition) by Jeffery
Richter, chapter 15 Device I/O for an excellent discussion on IOCP
implementation and the APIs you need to use it.
An IOCP provides a model for developing very high performance and very
scalable server programs. Direct IOCP support was added to Winsock2 and is
fully implemented on the Windows NT platform. However, IOCP is the hardest to understand and implement among all Windows NT I/O models. To help you design a better socket server using IOCP, a number of tips are provided in this article.
MORE INFORMATION
TIP 1: Use Winsock2 IOCP-capable functions, such as WSASend and WSARecv,
over Win32 file I/O functions, such as WriteFile and ReadFile.
Socket handles from Microsoft-based protocol providers are IFS handles so
you can use Win32 file I/O calls with the handle. However, the interactions
between the provider and file system involve many kernel/user mode
transition, thread context switches, and parameter marshals that result in
a significant performance penalty. You should use only Winsock2 IOCP-
capable functions with IOCP.
The additional parameter marshals and mode transitions in ReadFile and
WriteFile only occur if the provider does not have XP1_IFS_HANDLES bit set
in dwServiceFlags1 of its WSAPROTOCOL_INFO structure.
NOTE: These providers have an unavoidable additional mode transition, even
in the case of WSASend and WSARecv, although ReadFile and WriteFile will
have more of them.
TIP 2: Choose the number of the concurrent worker threads allowed and the
total number of the worker threads to spawn.
The number of worker threads and the number of concurrent threads that the
IOCP uses are not the same thing. You can decide to have a maximum of 2
concurrent threads used by the IOCP and a pool of 10 worker threads. You
have a pool of worker threads greater than or equal to the number of
concurrent threads used by the IOCP so that a worker thread handling a
dequeued completion packet can call one of the Win32 "wait" functions
without delaying the handling of other queued I/O packets.
If there are completion packets waiting to be dequeued, the system will
wake up another worker thread. Eventually, the first thread satisfies it's
Wait and it can be run again. When this happens, the number of the threads
that can be run is higher than the concurrency allowed on the IOCP (for
example, NumberOfConcurrentThreads). However, when next worker thread calls
GetQueueCompletionStatus and enters wait status, the system does not wake
it up. In other words, the system tries to keep your requested number of
concurrent worker threads.
Typically, you only need one concurrent worker thread per CPU for IOCP. To
do this, enter 0 for NumberOfConcurrentThreads in the
CreateIoCompletionPort call when you first create the IOCP.
TIP 3: Associate a posted I/O operation with a dequeued completion packet.
GetQueuedCompletionStatus returns a completion key and an overlapped
structure for the I/O when dequeuing a completion packet. You should use
these two structures to return per handle and per I/O operation
information, respectively. You can use your socket handle as the completion
key when you register the socket with the IOCP to provide per handle
information. To provide per I/O operation "extend" the overlapped structure
to contain your application-specific I/O-state information. Also, make sure
you provide a unique overlapped structure for each overlapped I/O. When an
I/O completes, the same pointer to the overlapped I/O structure is
returned.
TIP 4: I/O completion packet queuing behavior.
The order in which I/O completion packets are queued in the IOCP is not
necessarily the same order the Winsock2 I/O calls were made. Additionally,
if a Winsock2 I/O call returns SUCCESS or IO_PENDING, it is guaranteed that
a completion packet will be queued to the IOCP when the I/O completes,
regardless of whether the socket handle is closed. After you close a socket
handle, future calls to WSASend, WSASendTo, WSARecv, or WSARecvFrom will
fail with a return code other than SUCCESS or IO_PENDING, which will not
generate a completion packet. The status of the completion packet retrieved
by GetQueuedCompletionStatus for I/O previously posted could indicate a
failure in this case.
If you delete the IOCP itself, no more I/O can be posted to the IOCP
because the IOCP handle itself is invalid. However, the system's underlying
IOCP kernel structures do not go away until all successfully posted I/Os
are completed.
TIP 5: IOCP cleanup.
The most important thing to remember when performing ICOP cleanup is the
same when using overlapped I/O: do not free an overlapped structure if the
I/O for it has not yet completed. The HasOverlappedIoCompleted macro allows
you to detect if an I/O has completed from its overlapped structure.
There are typically two scenarios for shutting down a server. In the first
scenario, you do not care about the completion status of outstanding I/Os
and you just want to shut down as fast as you can. In the second scenario,
you want to shut down the server, but you do need to know the completion
status of each outstanding I/O.
In the first scenario, you can call PostQueueCompletionStatus (N times,
where N is the number of worker threads) to post a special completion
packet that informs the worker thread to exit immediately, close all socket
handles and their associated overlapped structures, and then close the
completion port. Again, make sure you use HasOverlappedIoCompleted to check
the completion status of an overlapped structure before you free it. If a
socket is closed, all outstanding I/O on the socket eventually complete
quickly.
In the second scenario, you can delay exiting worker threads so that all
completion packets can be properly dequeued. You can start by closing all
socket handles and the IOCP. However, you need to maintain a count of the
number of outstanding I/Os so that your worker thread can know when it is
safe to exit the thread. The performance penalty of having a global I/O
counter protected with a critical section for an IOCP server is not as bad
as might be expected because the active worker thread does not switch out
if there are more completion packets waiting in the queue.
Additional query words:
IOCP overlapped
Keywords : kbnetwork kbAPI kbNTOS400 kbWinOS2000 kbSDKPlatform kbWinsock kbGrpNet
Version : WINDOWS:
Platform : WINDOWS
Issue type : kbinfo