Date: Sat, 13 Dec 1997 19:18:36 +1030 From: Mike Smith <mike@smith.net.au> To: bgingery@gtcs.com Cc: hackers@FreeBSD.ORG Subject: Re: blocksize on devfs entries (and related) Message-ID: <199712130848.TAA01888@word.smith.net.au> In-Reply-To: Your message of "Tue, 09 Dec 1997 15:09:42 PDT." <199712092209.PAA07923@home.gtcs.com>
next in thread | previous in thread | raw e-mail | index | archive | help
I haven't noticed any commentary on this, Brian, so I thought I should raise a few points that you appear to have missed. > Theoretically, the physical layout of the device should be stored > whether or not there's any filesystem on it. This is a fundamentally flawed approach, and I am glad that Julian's new SLICE model (at this stage) completely ignores any incidental parametric information associated with an extent. > To me some answers to these ... > > 1. physical block/sector size needs to be stored by DEVICE > this may or may not match the logical blocksize of any > filesystem resident on the device. Optimal transfer blocksize > for each of read and write ALSO need to be stored. Physical blocksize vs. logical blocksize is a problematic issue. On one hand, there is the desire to maintain simplicity by mandating a single blocksize across all boundaries and forcing translation at the lowest practical level. The downside with this is dealing with legal logical block operations that result in partial block operations at the lowest level. One approach with plenty of historical precedent is to use a blocksize "sufficiently large" that it is a multiple of the likely device blocksizes, and make that the 'uniform standard'. Another is to cascade blocksizes upwards, where the blocksize at a given point in the tree is the lowest common multiple of that of all points below. This obviously requires some extra smarts in each layer that consumes multiple lower layers. > 2. physical layout (sect/track, tracks/cyl) also needs to > be stored for any DASD. Also any OTHER known info which > may be used to optimize the filesystem building process for > the device, such as rotational speed, seek timing .. If > this is not stored with driver info in the devfs, then > some pointer or common reference point should be made to > the "file entry" that contains the info. Physical layout is a joke, and has been for many years. This suggestion costs you a lot of credibility. Qualitative parametric information may be useful, eg. "this disk is slow", presuming that a set of usefully general metrics can be established. Unfortunately, obtaining measurements such as this can be slow, and the results are often nondeterministic. > 3. If at the controller level it is possible to concatinate > or RAID join devices, that information needs to be stored > for the device. If this is intrinsic to the device driver > or the physical device - no matter. This is not useful. An upper layer should not care whether the extent it is consuming is a concatenation of extents. This is an issue for management tools, which should have an OOB technique for recovering structure information. > 6. When a device is opened ro, if the underlying hardware has > ANY indication that it's a ro open, then if it is later upgraded > there should at least be a hook for it to be notified that it > has been upgraded. Current state (ro/rw) should be avaialable > to user processes without "testing it by opening a write file" > to a filesystem (or even raw device). The RO->RW upgrade notification is a contentious issue, but one that definitely needs thinking through. How would you suggest it be handled? Should the standard be to reopen the device, or pass a special ioctl, or add a new device entrypoint? > Other thoughts. Especially WRT possible experimental work, and > emulators, it will be QUITE convenient to have everything that can > be used to optimize the construction of a filesystem (of any of many > many kinds) or slice-out and construct a filesystem. As wine, dosemu > and bochs (to just name three) expand the emulations supporting other > OSs, being free with filesystems for those OSs, other than purely > "native" becomes all the more important. I can't actually parse this; I'm not sure if you're actually trying to say anything at all. > SoftPC/SoftWindows and Bochs both create internally what amounts to a > FAT filesystem within a file - a vnode filesystem, but not using > system provisions for it. That pretty well eliminates "device" access > to the filesystem and (e.g.) doing a mount_msdos on 'em for other > processing and data exchange, without adapting the emulator's code > to *parallel* what we've already got with FreeBSD. Incorrect. It is relatively straightforward to create a vnode disk, slice it, build a FAT filesystem in one slice and then pass that slice to your favorite PC emulator. > Yet, why deny these the optimization information which will allow > them to map (within the constraints of their architecture) a new > filesystem for best throughput, if it's actually available. Because any "optimisation information" that you could pass them would be wrong. Any optimisation attempting to operate based on incorrect parameters can only be a pessimisation. > Now let me raise some additional questions -- > > > Should a DASD be mappable ONLY with horizontal slices? > With what we're all doing today, it seems that taking a certain > number of cylinders for slices is best - but other access methods > may find an underlying physical structure more convenient if > a slice specifies a range of heads and cylinders that do NOT > presume that all heads/cylinders from starting to ending according > to physical layout are part of the same slice. It may be quite > convenient to have a cluster of heads across physical devices > forming a logical device or slice, without fully dedicating those > physical devices to that use. This is a nonsense question in the context of ZBR and "logical extent" devices (eg. SCSI, ATAPI, most ATA devices). > And, I'll mention again, DISK formats are not the only > random-access mass-storage formats on the horizon! I'm guessing > that for speed of inclusion into product lines, all will emulate > a disk drive - but that may not be the most efficient way of using > them (in fact, probably not). They also can be expected to have > "direct access" methods according to their physical architecture, > with some form of tree-access the MOST efficient! In most cases, the internal architecture of the device will be optimised for two basic operations; retrieval of large contiguous extents, and read/write of small randomly scattered regions. Data access patterns are unlikely to change radically, particularly given the momentum that modern systems have. I'll let you work out what the two above are, and why they are so common. But trust me, they are. > Finally - one of the most powerful potentials of the devfs is > handling non-DASD devices! The connecting or turning-on of a device > (nic/fax/printer/external-modem/scanner/parallel-to-parallel conn- > ection to another PC, even industrial controls of some kind) SHOULD > cause it to "arrive". If its turn-on generates a signal that can be > caught by a minimal driver, that may trigger a load of a full driver > (arrival event) and its inclusion in the devfs listings. Similarly, > killing such a device might trigger an immediate or delayed unloading > of the same driver, and removal from the devfs. This is trivially obvious, and forms the basic argument for the use of DEVFS. You fail to draw the parallel between system startup and the conceptual "massive arrival of devices" which is still the major argument for such a system. mike
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199712130848.TAA01888>