From owner-freebsd-hackers Sun Sep 14 14:44:46 1997 Return-Path: Received: (from root@localhost) by hub.freebsd.org (8.8.7/8.8.7) id OAA02069 for hackers-outgoing; Sun, 14 Sep 1997 14:44:46 -0700 (PDT) Received: from usr09.primenet.com (tlambert@usr09.primenet.com [206.165.6.209]) by hub.freebsd.org (8.8.7/8.8.7) with ESMTP id OAA02064 for ; Sun, 14 Sep 1997 14:44:39 -0700 (PDT) Received: (from tlambert@localhost) by usr09.primenet.com (8.8.5/8.8.5) id OAA22143; Sun, 14 Sep 1997 14:44:34 -0700 (MST) From: Terry Lambert Message-Id: <199709142144.OAA22143@usr09.primenet.com> Subject: Re: Do *you* have problems with floppies? To: joerg_wunsch@uriah.heep.sax.de Date: Sun, 14 Sep 1997 21:44:33 +0000 (GMT) Cc: hackers@FreeBSD.ORG In-Reply-To: <19970914142654.GG28248@uriah.heep.sax.de> from "J Wunsch" at Sep 14, 97 02:26:54 pm X-Mailer: ELM [version 2.4 PL23] Content-Type: text Sender: owner-freebsd-hackers@FreeBSD.ORG X-Loop: FreeBSD.org Precedence: bulk > > Rewriting the track is intrinsically more reliable, because it > > preserves the inter-sector gaps with less hysterisis. The tradeoff > > is in read-before-write. > > There's no good option in the NE765 to write an entire track. You can > do a multi-sector write, but the FDC still disassembles this into > single write operations, with a read-before-write to find the > respective sector ID fields. The only operation that writes an entire > track without first reading the ID fields is FORMAT TRACK. The point is that the timeing between requests is under the control of the floppy controller, who can DMA what it wants out of the track buffer. I think that this is inherently more reliable than moving all over the track using user driven sector at a time. In a sector at a time that is user instead of controller driven, there are potentially large harmonics because of the code path. Without some serious test equipment (maybe I can borrow TEAC's? 8-)), I can only theorize about whether or not these harmonics will result in destructive and constructive interferences... but I find it highly probable that it will, unless the code happens to be exactly tuned to what the controller itself would do. This is only possible if you fully give over control of the CPU to the write process, like the BIOS does, at which point it's effectively locking the harmonics to the controller harmonics in a delayed PLL. Which means no destructive interference. > > The reason that this is more reliable is that rate at which write > > requests can be handled. Ideally, they will be chained in a single > > write command. > > But still, it's only a matter of whether the driver requests several > WRITE SECTOR commands, or whether the FDC splits the multisector > command into single WRITE SECTOR operations. Yes. With the exception that they are phase-locked by the controller, and we hope the controller is smart about these things. Or at least smarter than us (which isn't hard, because it doesn't have to deal with load issues). > As long as the inter- > sector gap is large enough for the interrupt code to setup the next > transfer (which is even on a 386/sx-16), you don't lose anything. Agreed. But this way you interrupt per track instead of per sector; there's much less change of being delayed by another ISR. This may in fact be the root cause of the problems, given that the problem seems to go up under SCSI load. Actually, I'm a bit fearful: what happens to a motherboard DMA, as in the floppy transfers, during an Interrupt? During a controller initiated bus master DMA? It may be that it's necessary to mask interrupts during the transfer. That would *really* suck. 8-(. > I agree that this loss could indeed be handled by a track buffer, that > does a read ahead of the sectors that are passing the head before the > desired sector arrives, and could hand out the data out of this buffer > if they are requested later on, which is likely. The problematic > thing with this is that there's no means in the NE765 to say ``READ > ANY SECTOR'', so you have to specify a ``READ ID'' first, losing this > sector's worth of data, in order to know which sector to read next. Actually... 0x42 READ TRACK does not check the sector number stored in the ID field. This could be a curse as well as a blessing; I don't know how it could deal with interleaved data. The 0xE6 READ NORMAL DATA can do multiple sector reads; unlike the READ TRACK, it does the index ID's. But again, it's phase locked under the control of the floppy controller, which may be all that's needed. > > > Why? The inter-sector gaps of floppies are large enough to give the > > > CPUs that are in use these days time to setup the next transfers. > > > Because of the need to synchronize, of course. Relative seeks are > > not very reliable (see "The Undocumented PC" for details). > > Why are they not very reliable? All the seeks are relative. Because of track drift in the head positioning mechanism. It's like a Calcomp plotter, that only has relative coordinates (only they "register" -- resynchornize -- a lot more frequently). > Van > Gilluwe's chapter about floppies made me quickly aware that he's not > very experienced in this field either, so take his statements with the > necessary grain of salt. How else could he still write nonsense about > ``head loading'', even though the last drives that did an on-demand > head loading were the good ol' 8-inch drives? (Still true in the > second edition, i verified this in a bookstore.) OK, he's certainly not the authority that one would want; but on that particular point, I agree with his argument. The head load/unload crap, and some of the commands he claims you'd never use are BS, but you can ignore that and still get some useful data out of him. 8-|. > If the application wasn't quick enough to deliver more data, the track > buffer wouldn't gain much either. You could only fall back to a > sector-by-a-time mode then, or artificially defer the actual write > operation (and bogusly report a ``good'' status to the caller), to > collect more data in the meantime. Exactly. You defer the writes. Essentially, you are doing nothing more major than write gathering. You flush the deferral on a time limit, or on a track change. Since you read before write, and you write only tracks, and you have two buffers, then if you write to a sector in a track which is no longer deferred, the track is still in "cache" in the buffer (which was "marked clean" after the deferral expiration. So there's no need to do another read-before-write. > Iff the application was quick enough to deliver more data, the track > buffer doesn't gain you anything as well. The application could still > issue a large write(2) syscall (e.g. 18 KB), which you split into > single-sector transfers. Or write with a multitrack option. I have to admit that I kludged my track-at-a-time test code. I didn't do any of the deferral work; instead, I simulated it un user space, reading and writing *only* 18k buffers, which the read/write code treated as a multitrack run of pre-write-gathered sectors. > Nothing's lost. You can do many > not-so-simple, nitty-gritty things inside a floppy driver, but you > should keep the old sentence in mind ``Never try to optimize something > before you've profiled it.'' Track buffers belong into this class of > non-optimizations. The only optimization i see is the above mentioned > use of a track buffer to do read-ahead of unwanted but available > sectors after a seek operation, in the hope that somebody is > interested in the gathered data later on in the game. It's not intended as an optimization, really -- it's intended as a workaround for timing issues which I believe are causing problems in the single sector at a time case. I'm not really interested in speed, so much as I'm interested in eliminating potential harmonic effects from the equation. Even if they aren't the problem, then at least we would *know* they weren't the problem instead of waving our hands. 8-(. > I have no doubts that it is possible to use a track buffer (Linux > does, and IMHO NetBSD does, at least they do multi-sector transfers). > Anyway, before i accept it as something useful, you have to prove > first that it's really improving something more than your ego. ;-) Well, like I said, the code is not a fait accompli; it's kludged test code. But it solves the harmonic issues which I think might be causing the problems. I don't claim to *know* that they are, so it's not an issue of ego: I'm willing to be proven wrong. 8-). Just to clear this up: I *never* take comments on code or ideas as attacks on me personally. If the code or idea is good, it will stand without me, and if it's not, it'll fall without me. > > > If msdosfs is too stupid to cache the FAT, that's nothing a device > > > driver should fix. There's the entire buffer cache in between. > > > > I disagree; there should be a two track cache intrinsic to the floppy > > driver. The "other" track will always contain the fat during any > > sequential access, because the sequential access requires a traversal > > of the FAT chain. > > What the heck should the driver deal with FATs, i-node regions, or > block pointers? It is a matter of filesystem implementations to take > care of caching their data. It's a matter of drivers to make these > data available, without any consideration about what data this might > be. If a track buffer helps improving some filesystem performance, > this only shows that the filesystem implementation has been poorly > designed. A two track buffer is topoligically equivalent to cache memory on a SCSI controller. Floppies are horrendously slow things. I was describing the *effect* two track buffers would have on a FAT -- the device driver *would* fix the issue. It's still up to the msdosfs to cache all of the FAT -- I think it should -- but that's independent of the fact that there would be a general win for msdosfs from track buffering. It doesn't matter that it's not the responsibility of the driver for it to have a positive effect. 8-). > > The MSDOSFS should cache the FAT before this is invoked, in any case, > > because of the concept of long fat chains, which may overrrun a track > > buffer (see the paper referenced in the previous posting). > > What is wrong with caching these metadata in the buffer cache? UFS > has way more (and way more scattered) metadata, and it has properly > shown that storing these data in the buffer cache improves > performance. Storing the entire FAT as data with a different locality than the user data stored in FAT FS files was shown to be a win in the CMU/Usenix paper I referenced. Might as well accept empirical data when it's offered. 8-). > > SVR3 actually had a working FT driver in the kernel, and it used a > > double buffer so that it could rewrite during resynchronization > > This merely sounds like the above idea of making some use of the > sectors that are currently passing by. In the case of optimizing this > for writes, you have the additional problem that you need to reorder > the device queue all the time. (There's no disksort() in FreeBSD.) > This is needed in order to be able to correctly report the success > status back to the caller for each sector. For raw device IO, this is > impossible, since only one transfer is queued by physio(9) by a time. The FT buffer would have to be a different size; this is unlikely to win as shared code. Actually, you might want to contact Vadim Antinov; he did the BSDI driver before they had financial problems with the USL lawsuit. I believe he works at Sprint now; I don't know how axnious he would be to get back into the bowels of FT drivers, though. > (Hmm, the driver could try to be smarter if this transfer is more than > one sector's worth of data.) Yes. This is my kludged test case. > For filesystem operation, it can indeed be a win. Yes. But you're right that it's not a good enough reason to do it. My reasoning was to take the timing issue out of the scheduler and interrupt processing in the OS, and give them over to the floppy controller in the hopes that it would resolve the problems people are seeing. That it would be a speed win for most normal usage is just a sidebar I felt was worth mentioning, not my rationale for doing the dirty deed. 8-). Regards, Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers.