Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 7 Nov 1996 20:43:45 -0500 (EST)
From:      Thomas David Rivers <ponds!rivers@dg-rtp.dg.com>
To:        ponds!lambert.org!terry, ponds!uunet.uu.net!ponds!ponds!rivers
Cc:        ponds!uunet.uu.net!ponds!Artisoft.COM!ponds!freebsd.org!dyson, ponds!uunet.uu.net!ponds!Artisoft.COM!ponds!freefall.cdrom.com!freebsd-hackers, ponds!uunet.uu.net!ponds!Artisoft.COM!ponds!lakes.water.net!rivers, ponds!uunet.uu.net!ponds!Artisoft.COM!ponds!lambert.org!terry
Subject:   Re: More info on the daily panics...
Message-ID:  <199611080143.UAA03593@lakes.water.net>

next in thread | raw e-mail | index | archive | help
> 
> >  Well - the jury is "in" - the patch didn't affect my problem.
> > 
> >  This morning, at 7:06am - I got a reboot:
> > 
> > 	panic: ffs_valloc: dup alloc
> > 
> >  I seem to recall you mentioning this was an added solution to
> > some previous changes... could those be required to make this
> > a better fix?  Recall, I'm using 2.1.5-STABLE as of Oct. 17th,
> > not 2.2.
> > 
> >  Any more avenues to explore?
> 
> The patch only prevents a condition from occuring instead of panic'ing
> when it does occur.  Ie: it prevents one less error condition that can't
> be handled from needing to be checked, and then "not handled" (panic).

 Ok, then we can deduce that my panic was not from this condition, right?
(Since this condition can no longer occur, and I still get the panic...)

> 
> The previous changes didn't affect that specific area of the code, they
> were a usage workaround designed not to tickle the sensitive race.  They
> were a kludge fix, since the code should be robust in isolation.  The
> "doctor: don't do that" answer isn't applicable when the interfaces are
> (supposedly) treated as if they were black boxes.
> 
> 
> What FS's do you have mounted?  This is very important.  Not all FS's
> verify the generation count on the vnode like they should, and other
> nasty bits which may cause your problem as well.  Very likely, if David
> is correct about the proximal cause, it is an FS-specific problem.  That
> it errors out not in the call tree for the FS is a result of the panic
> check being in the wrong place (list reference instead of list insertion).

 Hmmm... it seems to error-out in the ffs code (ffs_valloc) - isn't that
specific to ffs? 

 Anyway, to answer your question; I have ufs and nfs file systems mounted,
my /etc/fstab:

   /dev/wd0s1b                     none            swap    sw 0 0
   /dev/wd0a                       /               ufs     rw 1 1
   /dev/wd0s1e                     /usr            ufs     rw 1 1
   proc                            /proc           procfs  rw 0 0
   lakes:/disk1            /disk1  nfs rw 0 0
   lakes:/disk1/usr        /disk1/usr      nfs rw 0 0
   lakes:/usr/X11R6        /usr/X11R6      nfs rw 0 0

nothing else is mounted...




> 
> 
> Please see my recent posting.  If David is right about your particular
> problem, then putting the check in the vrele() like I suggested there
> will cause a panic at the point of corruption rather than the point
> of use of corrupted data.

 Ok - that's done...

> 
> This would have the effect of isolating the exact line of code causing
> your problem, which in the multiple vrele() case is probably an error
> path that is not frequently used.
> 
> It would also make the panic condition repeat reliably, for what that's
> worth (probably a lot, at this point).

 Yes - it would be helpful - particularly since I have to wait 'n'
days for this to occur, with 'n' usually less than 3...  The problem
almost always occurs at 3:00 AM, but sometimes it occurs at 1:13pm...
[I couldn't relate any particular item started by cron(1) that caused
this...]

> 
> 
> The vnode->inode->vnode and the inode->vnode->inode integrity checks,
> like the "ffs_valloc: dup alloc" check, is a post-event check.  You
> will see the corrupt data before you attempt to use it, and panic
> earlier than you would have, but the proximal cause of the corruption
> would still not be identifiable from the stack trace.  8-(.

 Well - at least would have incontravertable evidence of the corruption...
If I don't trip over the vrele() check, and still get the panic;
we can look elsewhere...

 
 Just let me add, up front, that I appreciate everyone's effort on this!

	- Dave R. -



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199611080143.UAA03593>