File System Concepts

This section discusses certain key concepts that are a part of the filesystem and are referred to numerous times through this document.

Canonicalized Paths

All pathnames that the IFS manager passes to the FSDs are in unicode. A canonicalized pathname has a specific format that is described by the ParsedPath structure. It consists of a word giving the total length of the pathname (including itself, but not including the NUL character), a word giving the offset of the last path element in the pathname relative to the start of the pathname and a sequence of PathElement structures. Each PathElement structure comprises a word giving the length of the path element (including itself) and a string of unicode characters that make up the name of the path element. The pathname is always absolute and describes a path from the root of the volume. A normal path structure would have each path element separated by a path separator, such as '\' or '/'. By passing in canonicalized paths to the FSDs, the need for each FSD to have its own parsing code has been eliminated. FSDs will get syntactically validated paths at all times and because of the canonicalized structure, it is very easy to traverse the path to perfom operations. In addition to the canonicalized path, the IFS also passes in certain flags that give more information about the kind of pathname such as whether the path name has wildcards or has long name components etc. These are described below.

FILE_FLAG_LONG_PATH	This pathname has long path components. If only this flag is set, the last path component is not a longname. On short name apis, it is permissible for the path components to have long names. This is because an app could issue a shortname call in a long current directory and we want this operation to work. If the app tries to change to a long name directory or create a long name using a short api, the IFS manager will truncate that name as per DOS rules. FSDs can have different searching semantics based on whether the pathname has long components or not and this flag can be used to trigger this behaviour.
FILE_FLAG_IS_LFN	The final element of this pathname is a long name. FSDs can use this knowledge to do things differently. It may be necessary to store long names in a different format than short names. This flag can be used to trigger off such behaviour without the FSD needing to scan the entire pathname to figure out if it is trying to use a long name final element. Also, the fact that the final path element is a long name implies that the FSD should use long name semantics for matching.
FILE_FLAG_WILDCARDS	The pathname passed in contains wildcards. Wildcards are valid only in the final path component and a path with wildcards elsewhere on the path will be errored out by the IFS manager and not passed down to FSDs. The wildcards could be either '?' characters or '*' characters. On the short name apis, only the '?' character is valid as a wildcard. For long name apis, both wildcard characters are valid. FSDs can use this flag to figure out whether they need to do wildcard matching during their filename search.
FILE_FLAG_HAS_STAR	The pathname passed in contains the '' character as a wildcard. FILE_FLAG_WILDCARDS is always set whenever this flag is set. This flag is given to provide additional information to FSDs about how to perform meta-matching. Meta-matching semantics are much simpler for the '?' character than for the '' character and FSDs can use this fact to optimize meta-matching. However, it is recommended that all FSDs use the IFSMgr_MetaMatch service for doing meta-matching. This ensures a consistent interface to the user so that all filesystems behave identically with respect to wildcard operations.
FILE_FLAG_HAS_DOT	The pathname passed in contains a '.' character. The IFS manager strips off trailing dots. Leading dots are preserved if there are other characters in the path element that are not dots. There is only one exception when the IFS manager allows a trailing dot. This is when there are wildcard characters present in the path component. This is to allow users to use matching semantics such as '*.', which DOS allows, though, it does not strictly conform to the true regular expression matching semantics that the long name matching semantics use. Even in this case, multiple trailing dots are stripped by the IFS manager leaving only a single trailing dot.
FILE_FLAG_KEEP_CASE	The FSD should preserve the case of the name passed in when it stores it on disk. If this flag is not set, the FSD can ignore the case of the name. This flag is set for all LFN apis and cleared for shortname apis. This flag is also overloaded with another meaning. If this flag is set, then the FSDs should use LFN semantics. Otherwise, they should use short name semantics.

To determine whether LFN or shortname semantics need to be used on a given api, FSDs should look at FILE_FLAG_KEEP_CASE and FILE_FLAG_IS_LFN. If either or both of these flags are set, then the FSD should use LFN semantics. The FSD should not hardcode the usage of LFN semantics just because it is a new LFN based api e.g. the LFN-style FindFirst. There are certain cases, especially across the server when an LFN-style api may desire short name semantics. This is true for the LFN-style FindFirst when it is issued by the server. The server does it because it may have a longname directory shared, which an application at the client end need not be aware of and can get to files by using shortname apis.

Meta Character Matching Semantics

There are two kinds of meta-character matching semantics in operation: one for the shortname apis and one for the longname apis. Both of these are described below.

The meta-character matching on shortname apis is very simple. No '*' characters are allowed, '?' is the only wildcard character allowed. Any '*' characters passed in by the user are converted to '?' characters by the IFS manager before passing the name to the FSDs. The names are also fixed length i.e. the standard 8.3 name format. The algorithm used is the DOS algorithm, which just superimposes two 8.3 strings, with a '?' character standing for any character.

The meta-character matching on longname apis is more complex. It provides regular expression matching. The '*' character stands for 0 or more characters, the '?' character stands for 1 character. The dot is not a special character on the longname apis, it is just another character that is part of the name. Users can thus do proper regular expression matching such as "a*.b*.c*". There are some special cases where true regular expression matching has been abandoned in favour of compatibility. For e.g. '*.*' strictly speaking means all filenames that have a dot in them. However, because DOS users use this to mean all files, the longname matching semantics have been changed to treat '*.*' to be equivalent to a '*' and match all files. '*.' is another exception to the rule. In DOS, '*.' means all files without an extension, which violates regular expression matching. However, this has been implemented for compatibility reasons.

FSDs should use the IFSMgr_MetaMatch services to do all meta-character matching. This service provides both shortname and longname matching semantics and also encapsulates all the special cases that have been discussed above, so that FSDs do not need to take care of them. This also ensures that if we need to take care of more special cases in future, FSDs will not need to change, all changes can be made in the IFS manager and all FSDs will automatically work.

Must Match Attributes

There are certain new apis that take must match attributes in as parameters. Must match attributes are different from normal search attributes that are passed in. It consists of two sets of attributes: a search attribute, and another attribute called the must match attribute which is an additional filter on the attributes retrieved from the media. The basic formula for this is as follows:

(((MustMatchAttr & MediaAttr) ^ MustMatchAttr) & FILE_ATTRIBUTE_MUSTMATCH) == 0
where FILE_ATTRIBUTE_MUSTMATCH = 0x3F.

What this algorithm basically does is to match only those entries whose media attributes have at least one of the bits in the MustMatchAttr set. For e.g. if the user passes in attributes of 0x1016, 0x16 is the search attribute and 0x10 is the must match attribute. The search operation thus finds all directories and files. However, the must match attribute filters this additionally and so only directories are considered matches and are returned on the api. Notice that attribute bits are matched, not the entire attribute value i.e. an entry with the attribute of 0x14 (hidden directory) will also be matched with a must match attribute of 0x1016. This also means that there is no way to specify finding files/directories without a certain attribute set, you can only match for entries with a certain attribute set.

Swap File Handling

FSDs that handle media on which the swap file for Chicago resides, need to do certain special stuff to take care of it. The swap file is different from normal files and needs special handling for the system to keep working. The special considerations needed for the swap file are listed below:

1 All data structures that are used for the swap file need to be locked down. All codepaths that can be hit during swap file io should also be locked down. In other words, no paging can occur while an FSD is processing reads or writes to the swapfile.

2 It is advisable to not cache the swap file data though this is not a must. This is because the swapfile data would just crowd out other more relevant data in the cache. The important thing to keep in mind is that no memory allocations can be made while doing swap file io nor can any paging happen. This puts a pretty big restriction on any cache, in that, it cannot be dynamic and also has to be locked down. For these reasons, it makes much more sense to write the swap file data directly to the disk without going through a cache. The IFS manager provides a special flag called R0_SWAPPER_CALL on FS_OpenFile and FS_ReadWrite to inform FSDs that this is swap file io. In addition there are also special flags passed in to prevent read-aheads, write-behinds on the swap file. These are all described in section 8.5 under the respective functions.

3 The memory manager does swap file io only at 4K boundaries and in multiples of 4K. Since swap file transfers are guaranteed to be aligned, FSDs can optimize the swap file codepath without having to take care of any partial transfers etc. In addition, the transfer addresses that the swap file passes down are all locked, so the FSDs can optimize further by not trying to lock the user pages in case of the swap file.

4 The FSD should be reentrant with respect to io on the swap file. For e.g., the FSD may take semaphores around code that allocates clusters to prevent any other writers from coming in. However, reads and writes to the swap file need to be permitted at all times. This is also safe because such writes to the swap file will not grow the file. Besides, only the memory manager can access the swap file, no one else in the system is allowed to access it.

5 There is one catch to (4) above. The memory manager can issue a call to grow or shrink the swap file. Obviously, the restrictions on memory allocation and paging do not apply in this case. The memory manager guarantees that size changes of the swap file will occur at a time when it is safe to do so and this operation can block. The other thing that the FSD needs to be aware of in this case is that, while the swap file is being grown or shrunk, there can be another write to the swap file. However, this io is guaranteed to be in the region within the old size of the swap file and not in the region that is being changed. Given this condition, it is fine to be reentered.

Memory Mapped File Handling

Memory mapped files in Chicago also need special handling by FSDs. A memory mapped file is basically an extension of the swap file. The memory mapped file now provides virtual memory for the system. This means that most of the restrictions that apply to the swap file also apply to memory mapped files. Thus, conditions (1), (3) and (4) listed in Swap File Handling for swap files apply directly to memory mapped files.

There are a few important differences, however:

1 Unlike condition (2), it is fine to cache data for memory mapped files. In fact, it would be preferable to cache them. While this may sound contadictory, there are a lot of memory mapped files that can get loaded again and again by different processes, in which case, it would make sense to cache the file even though it is eventually going to become memory mapped.

2 A memory mapped file cannot grow or shrink. Once a memory mapping is created to a file, the size of the mapping cannot change. It is conceivable that another process could open the same file and change its size, but this would cause an error during the memory mapping and, in itself, cannot affect system integrity.

3 A file is not opened memory mapped, instead, it is opened as a normal file and then a memory mapping is created for it. This means that the FSD now needs to transform the status of a file from normal to a special memory mapped file. For this purpose, the IFS manager calls the FSDs at the time the memory mapping is created so that the FSDs can now mark that this is a memory mapped file and lock everything down (if not already locked). The IFS signals this by calling the FSD on a zero-length read with a special flag R0_MM_READ_WRITE. This interface is described in section 8.5.23