From owner-freebsd-current Fri May 31 14:01:14 1996 Return-Path: owner-current Received: (from root@localhost) by freefall.freebsd.org (8.7.5/8.7.3) id OAA14223 for current-outgoing; Fri, 31 May 1996 14:01:14 -0700 (PDT) Received: from phaeton.artisoft.com (phaeton.Artisoft.COM [198.17.250.211]) by freefall.freebsd.org (8.7.5/8.7.3) with SMTP id OAA14213; Fri, 31 May 1996 14:01:09 -0700 (PDT) Received: (from terry@localhost) by phaeton.artisoft.com (8.6.11/8.6.9) id NAA18657; Fri, 31 May 1996 13:58:07 -0700 From: Terry Lambert Message-Id: <199605312058.NAA18657@phaeton.artisoft.com> Subject: Re: Latest VM fixes are holding up To: sysseh@devetir.qld.gov.au (Stephen Hocking) Date: Fri, 31 May 1996 13:58:07 -0700 (MST) Cc: dyson@freebsd.org, current@freebsd.org In-Reply-To: <199605310615.GAA04339@netfl15a.devetir.qld.gov.au> from "Stephen Hocking" at May 31, 96 04:15:14 pm X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-current@freebsd.org X-Loop: FreeBSD.org Precedence: bulk > They're holdng up fairly well here too, the only crashes I'm seeing > (which are entirely reproducible on one machine by doing "make hierarchy") > are of the "Panic: cleaned vnode isn't" type. I'm seeing a couple of the > a day. If only I had the space to create & install a completely debuggable > kernel I'd do it. Now, when is Terry's FS stuff going in? My FS stuff doesn't fix this particular bug. This is a synchronization bug in the vclean, and is endemic to the way VOP_LOCK has to work for vclean to work. This is unrelated to the FS code changes, except that the VOP_LOCK crap is also technically a layering violation (and in general, it means trouble for the UNIONFS and NULLFS). It's going to be a serious bugger for FS reentrancy for system call reentrancy for SMP and kernel preemption for RT schduling and kernel threading. Here's the "free vnode isn't" patch: ] Hi there ppl, ] ] New server here (P6-200/256Mb RAM) crashed at least 2 ] times today with this panic message, so I've applied ] the patch Terry Lambert proposed here ~1-1.5 mos ago: ] ] (file vfs_subr.c): ] ] > My guess from the -curent code is: ] > ] > It's probably most correctly fixed by changing: ] > ] > vp == NULL) ] > To: ] > vp == NULL || /* list empty*/ ] > vp->v_usecount) /* queue wrapped*/ ] > ] > Or something similar using one of the circular queue macros. Then ] > remove the stupid: ] > ] > if (vp->v_usecount) ] > panic("free vnode isn't"); ] ] ] System works for an hour now and I wonder if this thing ] is kosher and whether we have some new knowledge on this ] problem ? ] ] Rashid Technically, there was a work-around for this integrated, but I think it depended heavily on the VM operation ordering (which has now changed); in other words, the workaround addressed the symptom, not the problem. Like a hernia, shoving it back in will only make it pop out somewhere else later. This means that it's possible that the fix is the same, even though the problem appears to have moved. This fix is a kludge; the real fix is to not disassociate the vnode from the underlying FS: instead, allocate it in the in core inode structure. For stacking layers wilth NULL in core data besides the vnode, this is easy. For others it's hard, because without bringing back the bmap call per reference, it's hard to move from a vnode/offset to a device/offset page cache. There's some code hints for doing this in the vahalia book. Some of Jeffrey Hsu's recent Lite2 integration work might fix this, but there is (last I heard before I went out of town on business) a bug in the lite2 code that is not going to be easy to trace. Try the kludge; if you can go through the code and understand why it's necessary (but a bad soloution to the problem), you can probably think about dealing with the real issue, and maybe come up with your own patch. Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers.