Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 17 Aug 1998 05:37:37 +0000 (GMT)
From:      Terry Lambert <tlambert@primenet.com>
To:        nate@mt.sri.com (Nate Williams)
Cc:        dg@root.com, tlambert@primenet.com, current@FreeBSD.ORG, karl@mcs.net
Subject:   Re: Better VM patches (was Tentative fix for VM bug)
Message-ID:  <199808170537.WAA19870@usr09.primenet.com>
In-Reply-To: <199808170217.UAA04040@mt.sri.com> from "Nate Williams" at Aug 16, 98 08:17:30 pm

next in thread | previous in thread | raw e-mail | index | archive | help
> >    I have a suggestion. Let's not throw out random guesses about what may or
> > may not be a problem. Let's actually understand the issue thoroghly, come up
> > with a fix, and then tell people all about it.
> 
> Actually, I'm with Terry here.  I think throwing out random guesses is a
> *much* better solution than what's occurred so far.  At least this way
> folks have a clue about what *might* be going on, and some of the
> 'random guesses' may trigger someone's mind.

Actually this is too adversarial.

There is a real problem with the vnode_pager_alloc; it should *NOT*
set the actual size of the backing file to something other than the
actual size of the backing file.

I think I cleared up the misunderstanding caused by my inability
to communicate *why* this was a problem in my initial post.


My "wild guess" that fits the most problems is that there is a page
that is multiply referenced (or an object; a page makes more sense
to me becuase of the symptoms I've seen).  This is a read-cache bug
(which is why I initially asked that someone with the SIG-11 or the
zeroed-page bugs compile their kernel NO_SWAPPING).


> The lack of progress on these bugs from the kernel hackers until Terry
> makes up an 'educated guess' seems to be a good motivator. :) ;)

I think the problems are more severe than are generally thought, but
are very infrequent.

I'm pretty sure that, until my last post, that I had given David the
impression that the file corruption I was seeing was partial page
corruption of a file that ended before a page boundary.  In fact,
I was seeing corruption beginning on a page boundary, and extending
for 4k (or the end of the file, whichever came first).

I don't think anyone has been very good at communicating these bugs,
or their severity ("What idiot would extend a file that has been
mmap'ed without redoing the mapping?", etc.).

David's patches for the NFS problem were well thought out.  I don't
think he needed me poking him to find them.  8-).

The reason I did the backup-one patch at all was that I was looking for
a panacea; a multiply referenced page, however it has occurred, is about
the only thing that can explain my problem, other than bad hardware
(which I refuse to believe, since "it worked before").

While trodding down the mmap path after the backup-one failed to preterb
Karl's bug or result in a "freeing free page" panic, I found the mmap
backing object end-of-file problem.  This actually doesn't help me; I
am still hunting my normally-accessed-file corrupted by contents of
mmaped-file-from-different-process bug, and I may still be looking for
the "pages zeroed at random" problem.  No one who has this problem has
enabled DIAGNOSTIC with the new patch to see if the insert is stomping
things, so I can't tell if John fixing the bogus-invalid-during-cleanup
bug was all that was necessary for that.  8-(.


Anyway, after all that, I am actually very happy to be using the
-current list as something other than an overflow from -ports or
-questions or -I-didn't-read-the-FAQ.  So shoot me.  8-).


					Terry Lambert
					terry@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-current" in the body of the message



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199808170537.WAA19870>