Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 26 Jul 1998 09:50:49 -0500
From:      Karl Denninger  <karl@mcs.net>
To:        Garrett Wollman <wollman@khavrinen.lcs.mit.edu>
Cc:        Dan Swartzendruber <dswartz@druber.com>, current@FreeBSD.ORG
Subject:   Re: MMAP problems
Message-ID:  <19980726095049.51700@mcs.net>
In-Reply-To: <199807260252.WAA05646@khavrinen.lcs.mit.edu>; from Garrett Wollman on Sat, Jul 25, 1998 at 10:52:34PM -0400
References:  <19980725155148.43084@mcs.net> <3.0.5.32.19980725172640.00944ac0@mail.kersur.net> <19980725163243.36509@mcs.net> <199807260252.WAA05646@khavrinen.lcs.mit.edu>

next in thread | previous in thread | raw e-mail | index | archive | help
On Sat, Jul 25, 1998 at 10:52:34PM -0400, Garrett Wollman wrote:
> <<On Sat, 25 Jul 1998 16:32:43 -0500, Karl Denninger  <karl@mcs.net> said:
> 
> > I can verify that CAM is not related to this; it happens with NON-CAM 
> > kernels as well.
> 
> I've been seeing it for several months.
> 
> I believe it to be a coherency problem.  The relevant operations here
> are:
> 
> 1) A diablo server process appends to a spool file using explicit
> I/O.  (Note that the file is not opened in O_APPEND mode.)

Yep.  Do you have any kind of guess as to whether opening the file O_APPEND 
would be legit (and would it fix this?)  I don't THINK the server process
ever "backs up", so this *should* be ok, but I don't want to make that
change without having a "better than a guess" shot at it.

> 2) A boatload of dnewslink processes simultaneously mmap the pages of
> the spool file containing the article in question, suck the article
> out of it, and blast it over to the remote feed.

Yep.  That's the basic model.

Diablo beats the shit out of MMAP and I/O; the code is very clever in trying
to avoid unnecessary I/O...

> Here's my particular guess...  I think this happens when the dnewslink
> processes are reading another, short, article in the last page of the
> file, while a diablo server is writing a new article.  Somewhere,
> there is a race condition in which the kernel has copied the new data
> into the buffer, but blocks before it updates the valid length; this
> then allows one of the mmaps to succeed, and since that part of the
> buffer is marked invalid, it gets zeroed.  Then the diablo process
> resumes, and marks the end of the buffer valid, although the data it
> was writing has just gotten clobbered.

Hmmm.... why would dnntplink not mmap the file readonly though (and wouldn't
this solve the problem)?

> It looks, from an inspection of the relevant code in ufs_readwrite.c
> and ffs_balloc.c, that this cannot happen, because the data are always
> copied in last.  It does appear that there are potential windows, if
> ffs_balloc() blocks, where other processes might see invalid data in
> the file through mmap as a result of vnode_pager_setsize() having
> already been run, but it does not appear such garbage could possibly
> persist and be written back to disk, and I certainly see it directly
> on the disk, not just in memory.
> 
> -GAWollman

Yep.  After about 6 hours of pouring over the code last night (literally and
figuratively :-) this is what I think is going on as well.

And I can confirm that the trash IS being written to disk; its definitely
there on stable storage when you go look for it later.

The data which gets written is usually a block of zeros, but it may not be;
it can also be random trash.  Its also not always one block (it could be
more than one), but it IS always, at least from what I'm seeing here, a
multiple of 512 bytes (disk blocksize).

--
-- 
Karl Denninger (karl@MCS.Net)| MCSNet - Serving Chicagoland and Wisconsin
http://www.mcs.net/          | T1's from $600 monthly / All Lines K56Flex/DOV
			     | NEW! Corporate ISDN Prices dropped by up to 50%!
Voice: [+1 312 803-MCS1 x219]| EXCLUSIVE NEW FEATURE ON ALL PERSONAL ACCOUNTS
Fax:   [+1 312 803-4929]     | *SPAMBLOCK* Technology now included at no cost

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-current" in the body of the message



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?19980726095049.51700>