From owner-freebsd-hackers Mon Feb 5 00:24:36 1996 Return-Path: owner-hackers Received: (from root@localhost) by freefall.freebsd.org (8.7.3/8.7.3) id AAA11045 for hackers-outgoing; Mon, 5 Feb 1996 00:24:36 -0800 (PST) Received: from labinfo.iet.unipi.it (labinfo.iet.unipi.it [131.114.9.5]) by freefall.freebsd.org (8.7.3/8.7.3) with SMTP id AAA11033 for ; Mon, 5 Feb 1996 00:24:13 -0800 (PST) Received: from localhost (luigi@localhost) by labinfo.iet.unipi.it (8.6.5/8.6.5) id JAA20270 for hackers@freebsd.org; Mon, 5 Feb 1996 09:24:40 +0100 From: Luigi Rizzo Message-Id: <199602050824.JAA20270@labinfo.iet.unipi.it> Subject: Some thoughts on FAT filesystems To: hackers@freebsd.org Date: Mon, 5 Feb 1996 09:24:40 +0100 (MET) X-Mailer: ELM [version 2.4 PL23] Content-Type: text Sender: owner-hackers@freebsd.org Precedence: bulk Since some people is now looking at the msdosfs, it is nice that the discussion on FAT has progressed a little bit. I'd like to contribute some more thoughts. In the following I assume that there are no concurrent accesses to the FAT partition. I believe that the performance of msdosfs can benefit from some caching of the FAT area. The FAT16 is by definition <= 128KB in size, which is not much if we want to keep it in memory. Avoiding synchronous writes to the FAT can also give some advantages, at the price of the ease of recovery in case of power failures. One performance problem with FAT filesystems is that the copies of the FAT all reside at the beginning of the partition, thus potentially far away from the data. Updating a file, and then the FAT, might require long seeks. FFS solves this problem by defining cyclinder groups, so that inodes and data blocks are not too far away. For a FAT filesystem, the only viable solution to this is to avoid synchronous writes to the FAT. This poses a recovery problem which can be solved in various ways, essentially by adding redundant information to the filesystem. In designing a high performance FAT one might try to use the same layout policies of the FFS, i.e divide the FAT partition into a number of logical "cylinder groups" (CG). Clusters in the CG are physically contiguous, and even the corresponding entries in the FAT are contiguous. Assuming a cluster size of 4K (which is nice because it has the same size as a page), then a 4K portion of the FAT suffices for 8MB of data. For efficiency, when the FS loads this FAT portion in memory (or at the first write) it could also build a "free list" of clusters in the CG, so as to speed up allocations. Also for efficiency, writes to the FAT should not be done too frequently, because of their cost. When creating new files, one should privilege locality within the CG, much like it is done in the FFS. Of course, when the CG fills up, or the disk is heavily fragmented (read: has been written by DOS) one might have to use blocks in different CGs. CACHING and WRITES: The caching policy might not be much different from that of FFS: try to keep in memory the FAT copies first, do not immediately go to the disk for data blocks as they might have a short lifetime, etc. RECOVERY: using delayed writes to the FAT might lead to bad inconsistencies to the filesystem in case of power failures. One possible approach could be the following: 1) create a "special" file in the root directory, with fixed size and structure. This file uses exactly one cluster per CG, say at the beginning of the CG, where the portion of the FAT for that CG is saved. Writing to this cluster is cheaper than writing to the FAT because of physical contiguity. Also, there is no risk of corruption of this file because its structure does not change, hence modifications to its content do not require changing the FAT or the directory entry (unless one wants to update the modification times; but this is not very important). 2) When the FS is updated, modifications to the FAT are written to this local FAT first, and then, at a convenient time, dumped to the ordinary FAT. 3) Recovery after a crash: first check for the existence of the "special" file, then look for inconsistencies between the regular FAT and this special file, and try to reconstruct things. Not many new ideas, as you can see, just trying to see if known technology can be reused. Luigi