From owner-freebsd-hackers  Sun Sep 14 14:44:46 1997
Return-Path: <owner-freebsd-hackers>
Received: (from root@localhost)
          by hub.freebsd.org (8.8.7/8.8.7) id OAA02069
          for hackers-outgoing; Sun, 14 Sep 1997 14:44:46 -0700 (PDT)
Received: from usr09.primenet.com (tlambert@usr09.primenet.com [206.165.6.209])
          by hub.freebsd.org (8.8.7/8.8.7) with ESMTP id OAA02064
          for <hackers@FreeBSD.ORG>; Sun, 14 Sep 1997 14:44:39 -0700 (PDT)
Received: (from tlambert@localhost)
	by usr09.primenet.com (8.8.5/8.8.5) id OAA22143;
	Sun, 14 Sep 1997 14:44:34 -0700 (MST)
From: Terry Lambert <tlambert@primenet.com>
Message-Id: <199709142144.OAA22143@usr09.primenet.com>
Subject: Re: Do *you* have problems with floppies?
To: joerg_wunsch@uriah.heep.sax.de
Date: Sun, 14 Sep 1997 21:44:33 +0000 (GMT)
Cc: hackers@FreeBSD.ORG
In-Reply-To: <19970914142654.GG28248@uriah.heep.sax.de> from "J Wunsch" at Sep 14, 97 02:26:54 pm
X-Mailer: ELM [version 2.4 PL23]
Content-Type: text
Sender: owner-freebsd-hackers@FreeBSD.ORG
X-Loop: FreeBSD.org
Precedence: bulk

> > Rewriting the track is intrinsically more reliable, because it
> > preserves the inter-sector gaps with less hysterisis.  The tradeoff
> > is in read-before-write.
> 
> There's no good option in the NE765 to write an entire track.  You can
> do a multi-sector write, but the FDC still disassembles this into
> single write operations, with a read-before-write to find the
> respective sector ID fields.  The only operation that writes an entire
> track without first reading the ID fields is FORMAT TRACK.

The point is that the timeing between requests is under the control
of the floppy controller, who can DMA what it wants out of the track
buffer.  I think that this is inherently more reliable than moving
all over the track using user driven sector at a time.  In a sector
at a time that is user instead of controller driven, there are potentially
large harmonics because of the code path.  Without some serious test
equipment (maybe I can borrow TEAC's? 8-)), I can only theorize about
whether or not these harmonics will result in destructive and
constructive interferences... but I find it highly probable that
it will, unless the code happens to be exactly tuned to what the
controller itself would do.  This is only possible if you fully
give over control of the CPU to the write process, like the BIOS
does, at which point it's effectively locking the harmonics to
the controller harmonics in a delayed PLL.  Which means no destructive
interference.


> > The reason that this is more reliable is that rate at which write
> > requests can be handled.  Ideally, they will be chained in a single
> > write command.
> 
> But still, it's only a matter of whether the driver requests several
> WRITE SECTOR commands, or whether the FDC splits the multisector
> command into single WRITE SECTOR operations.

Yes.  With the exception that they are phase-locked by the controller,
and we hope the controller is smart about these things.  Or at least
smarter than us (which isn't hard, because it doesn't have to deal
with load issues).

> As long as the inter-
> sector gap is large enough for the interrupt code to setup the next
> transfer (which is even on a 386/sx-16), you don't lose anything.

Agreed.  But this way you interrupt per track instead of per sector;
there's much less change of being delayed by another ISR.  This may
in fact be the root cause of the problems, given that the problem
seems to go up under SCSI load.

Actually, I'm a bit fearful: what happens to a motherboard DMA, as
in the floppy transfers, during an Interrupt?  During a controller
initiated bus master DMA?  It may be that it's necessary to mask
interrupts during the transfer.  That would *really* suck.  8-(.

> I agree that this loss could indeed be handled by a track buffer, that
> does a read ahead of the sectors that are passing the head before the
> desired sector arrives, and could hand out the data out of this buffer
> if they are requested later on, which is likely.  The problematic
> thing with this is that there's no means in the NE765 to say ``READ
> ANY SECTOR'', so you have to specify a ``READ ID'' first, losing this
> sector's worth of data, in order to know which sector to read next.

Actually... 0x42 READ TRACK does not check the sector number stored in
the ID field.  This could be a curse as well as a blessing; I don't
know how it could deal with interleaved data.

The 0xE6 READ NORMAL DATA can do multiple sector reads; unlike the
READ TRACK, it does the index ID's.  But again, it's phase locked
under the control of the floppy controller, which may be all that's
needed.


> > > Why?  The inter-sector gaps of floppies are large enough to give the
> > > CPUs that are in use these days time to setup the next transfers. 
> 
> > Because of the need to synchronize, of course.  Relative seeks are
> > not very reliable (see "The Undocumented PC" for details).
> 
> Why are they not very reliable?  All the seeks are relative.

Because of track drift in the head positioning mechanism.  It's like
a Calcomp plotter, that only has relative coordinates (only they
"register" -- resynchornize -- a lot more frequently).

> Van
> Gilluwe's chapter about floppies made me quickly aware that he's not
> very experienced in this field either, so take his statements with the
> necessary grain of salt.  How else could he still write nonsense about
> ``head loading'', even though the last drives that did an on-demand
> head loading were the good ol' 8-inch drives?  (Still true in the
> second edition, i verified this in a bookstore.)

OK, he's certainly not the authority that one would want; but on
that particular point, I agree with his argument.  The head load/unload
crap, and some of the commands he claims you'd never use are BS, but
you can ignore that and still get some useful data out of him.  8-|.


> If the application wasn't quick enough to deliver more data, the track
> buffer wouldn't gain much either.  You could only fall back to a
> sector-by-a-time mode then, or artificially defer the actual write
> operation (and bogusly report a ``good'' status to the caller), to
> collect more data in the meantime.

Exactly.  You defer the writes.  Essentially, you are doing nothing
more major than write gathering.  You flush the deferral on a time
limit, or on a track change.

Since you read before write, and you write only tracks, and you have
two buffers, then if you write to a sector in a track which is no
longer deferred, the track is still in "cache" in the buffer (which
was "marked clean" after the deferral expiration.  So there's no
need to do another read-before-write.


> Iff the application was quick enough to deliver more data, the track
> buffer doesn't gain you anything as well.  The application could still
> issue a large write(2) syscall (e.g. 18 KB), which you split into
> single-sector transfers.

Or write with a multitrack option.  I have to admit that I kludged
my track-at-a-time test code.  I didn't do any of the deferral work;
instead, I simulated it un user space, reading and writing *only*
18k buffers, which the read/write code treated as a multitrack
run of pre-write-gathered sectors.

> Nothing's lost.  You can do many
> not-so-simple, nitty-gritty things inside a floppy driver, but you
> should keep the old sentence in mind ``Never try to optimize something
> before you've profiled it.''  Track buffers belong into this class of
> non-optimizations.  The only optimization i see is the above mentioned
> use of a track buffer to do read-ahead of unwanted but available
> sectors after a seek operation, in the hope that somebody is
> interested in the gathered data later on in the game.

It's not intended as an optimization, really -- it's intended as a
workaround for timing issues which I believe are causing problems in
the single sector at a time case.  I'm not really interested in
speed, so much as I'm interested in eliminating potential harmonic
effects from the equation.  Even if they aren't the problem, then
at least we would *know* they weren't the problem instead of waving
our hands.  8-(.


> I have no doubts that it is possible to use a track buffer (Linux
> does, and IMHO NetBSD does, at least they do multi-sector transfers).
> Anyway, before i accept it as something useful, you have to prove
> first that it's really improving something more than your ego. ;-)

Well, like I said, the code is not a fait accompli; it's kludged
test code.  But it solves the harmonic issues which I think might
be causing the problems.  I don't claim to *know* that they are, so
it's not an issue of ego: I'm willing to be proven wrong.  8-).

Just to clear this up: I *never* take comments on code or ideas
as attacks on me personally.  If the code or idea is good, it will
stand without me, and if it's not, it'll fall without me.


> > > If msdosfs is too stupid to cache the FAT, that's nothing a device
> > > driver should fix.  There's the entire buffer cache in between.
> > 
> > I disagree; there should be a two track cache intrinsic to the floppy
> > driver.  The "other" track will always contain the fat during any
> > sequential access, because the sequential access requires a traversal
> > of the FAT chain.
> 
> What the heck should the driver deal with FATs, i-node regions, or
> block pointers?  It is a matter of filesystem implementations to take
> care of caching their data.  It's a matter of drivers to make these
> data available, without any consideration about what data this might
> be.  If a track buffer helps improving some filesystem performance,
> this only shows that the filesystem implementation has been poorly
> designed.

A two track buffer is topoligically equivalent to cache memory on a
SCSI controller.  Floppies are horrendously slow things.  I was
describing the *effect* two track buffers would have on a FAT -- the
device driver *would* fix the issue.

It's still up to the msdosfs to cache all of the FAT -- I think it
should -- but that's independent of the fact that there would be a
general win for msdosfs from track buffering.  It doesn't matter
that it's not the responsibility of the driver for it to have a
positive effect.  8-).


> > The MSDOSFS should cache the FAT before this is invoked, in any case,
> > because of the concept of long fat chains, which may overrrun a track
> > buffer (see the paper referenced in the previous posting).
> 
> What is wrong with caching these metadata in the buffer cache?  UFS
> has way more (and way more scattered) metadata, and it has properly
> shown that storing these data in the buffer cache improves
> performance.

Storing the entire FAT as data with a different locality than the user
data stored in FAT FS files was shown to be a win in the CMU/Usenix
paper I referenced.  Might as well accept empirical data when it's
offered.  8-).


> > SVR3 actually had a working FT driver in the kernel, and it used a
> > double buffer so that it could rewrite during resynchronization 
> 
> This merely sounds like the above idea of making some use of the
> sectors that are currently passing by.  In the case of optimizing this
> for writes, you have the additional problem that you need to reorder
> the device queue all the time.  (There's no disksort() in FreeBSD.)
> This is needed in order to be able to correctly report the success
> status back to the caller for each sector.  For raw device IO, this is
> impossible, since only one transfer is queued by physio(9) by a time.

The FT buffer would have to be a different size; this is unlikely to
win as shared code.  Actually, you might want to contact Vadim Antinov;
he did the BSDI driver before they had financial problems with the
USL lawsuit.  I believe he works at Sprint now; I don't know how
axnious he would be to get back into the bowels of FT drivers, though.

> (Hmm, the driver could try to be smarter if this transfer is more than
> one sector's worth of data.)

Yes.  This is my kludged test case.


> For filesystem operation, it can indeed be a win.

Yes.  But you're right that it's not a good enough reason to do it.
My reasoning was to take the timing issue out of the scheduler and
interrupt processing in the OS, and give them over to the floppy
controller in the hopes that it would resolve the problems people
are seeing.  That it would be a speed win for most normal usage
is just a sidebar I felt was worth mentioning, not my rationale for
doing the dirty deed.  8-).


					Regards,
					Terry Lambert
					terry@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.