5.  Support facilities and other interactions

      Several support facilities are used by the current UNIX filesystem and require generalization for use by other filesystem types. For filesystem implementations to be portable, it is desirable that these modified support facilities should also have a uniform interface and behave in a consistent manner in target systems. A prominent example is the filesystem buffer cache. The buffer cache in a standard (System V or 4.3BSD) UNIX system contains physical disk blocks with no reference to the files containing them. This works well for the local filesystem, but has obvious problems for remote filesystems. Sun has modified the buffer cache routines to describe buffers by vnode rather than by device. For remote files, the vnode used is that of the file, and the block numbers are virtual data blocks. For local filesystems, a vnode for the block device is used for cache reference, and the block numbers are filesystem physical blocks. Use of per-file cache description does not easily accommodate caching of indirect blocks, inode blocks, superblocks or cylinder group blocks. However, the vnode describing the block device for the cache is one created internally, rather than the vnode for the device looked up when mounting, and it is located by searching a private list of vnodes rather than by holding it in the mount structure. Although the Sun modification makes it possible to use the buffer cache for data blocks of remote files, a better generalization of the buffer cache is needed.

      The RFS filesystem used by AT&T does not currently cache data blocks on client systems, thus the buffer cache is probably unmodified. The form of the buffer cache in ULTRIX is unknown to us.

      Another subsystem that has a large interaction with the filesystem is the virtual memory system. The virtual memory system must read data from the filesystem to satisfy fill-on-demand page faults. For efficiency, this read call is arranged to place the data directly into the physical pages assigned to the process (a ``raw'' read) to avoid copying the data. Although the read operation normally bypasses the filesystem buffer cache, consistency must be maintained by checking the buffer cache and copying or flushing modified data not yet stored on disk. The 4.2BSD virtual memory system, like that of Sun and ULTRIX, maintains its own cache of reusable text pages. This creates additional complications. As the virtual memory systems are redesigned, these problems should be resolved by reading through the buffer cache, then mapping the cached data into the user address space. If the buffer cache or the process pages are changed while the other reference remains, the data would have to be copied (``copy-on-write'').

      In the meantime, the current virtual memory systems must be used with the new filesystem framework. Both the Sun and AT&T filesystem interfaces provide entry points to the filesystem for optimization of the virtual memory system by performing logical-to-physical block number translation when setting up a fill-on-demand image for a process. The VFS provides a vnode operation analogous to the bmap function of the UNIX filesystem. Given a vnode and logical block number, it returns a vnode and block number which may be read to obtain the data. If the filesystem is local, it returns the private vnode for the block device and the physical block number. As the bmap operations are all performed at one time, during process startup, any indirect blocks for the file will remain in the cache after they are once read. In addition, the interface provides a strategy entry that may be used for ``raw'' reads from a filesystem device, used to read data blocks into an address space without copying. This entry uses a buffer header (buf structure) to describe the I/O operation instead of a uio structure. The buffer-style interface is the same as that used by disk drivers internally. This difference allows the current uio primitives to be avoided, as they copy all data to/from the current user process address space. Instead, for local filesystems these operations could be done internally with the standard raw disk read routines, which use a uio interface. When loading from a remote filesystems, the data will be received in a network buffer. If network buffers are suitably aligned, the data may be mapped into the process address space by a page swap without copying. In either case, it should be possible to use the standard filesystem read entry from the virtual memory system.

      Other issues that must be considered in devising a portable filesystem implementation include kernel memory allocation, the implicit use of user-structure global context, which may create problems with reentrancy, the style of the system call interface, and the conventions for synchronization (sleep/wakeup, handling of interrupted system calls, semaphores).