Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 24 Jun 1997 09:47:56 -0700 (PDT)
From:      Jaye Mathisen  <mrcpu@cdsnet.net>
To:        Thomas David Rivers <ponds!rivers@dg-rtp.dg.com>
Cc:        ponds!lambert.org!terry@cdsnet.net, ponds!sdf.com!tom@cdsnet.net, hackers@freebsd.org
Subject:   Re: panics/file system corruption - was Re: OpenBSD
Message-ID:  <Pine.NEB.3.95.970624094321.26020A-100000@mail.cdsnet.net>
In-Reply-To: <199706241205.IAA04547@lakes.water.net>

next in thread | previous in thread | raw e-mail | index | archive | help

John Dyson mentioned in some private email to me that he found some more
vnode locking problems in the code, and was rewriting the sections in
question for a 3.0-SMP release.  I think it ended up being some conditions
that weren't being tested, or somesuch.  It least is a bit of relief to me
that regardless of whether or not this fixes my specific problem, at least
some more bugs were fixed. :)

The code would need to be backported to 2.x.

So I haven't been able to test his fixes, although I can include the 3.0
section of code if somebody knowledgeable about the internals wants to
take a crack at it.

On Tue, 24 Jun 1997, Thomas David Rivers wrote:

> 
> I just tripped over this; thought I would try to take a stab
> at an answer....
> 
> > On Fri, 20 Jun 1997, Terry Lambert wrote:
> > 
> > > > > Anybody running FreeBSD given it a shot just to see?  I  have been
> > > > > thinking about it to see if it fixes my UFS problems that are seemingly
> > > > > unrepairable.
> > > > 
> > > >   UFS problem?
> > > 
> > > He's talking about his "free xxx isn't" race condition errors.
> > 
> >   What exactly is it about this condition that makes it occur on some
> > machines?  I don't see it on a 16GB and a 8GB news spool here.  No
> > corruption problmes either (although it was not clear to me, whether the
> > corruption is just a result of the panic, or just another effect of this
> > problem).
> > 
> 
> 
>   I'm not sure what the problem is - but it seems to be timing related.
> 
>   I can readily reproduce the newfs-doesn't-write-zeros problem on two
>  different 386 machines, one with IDE, one with SCSI.  Jaye appears to
>  have the problem on his news machine.  I definitely have it on my
>  news machine.  
> 
>   I thought it might be related to the number of elements off of the
>  vnode free list - but when doing a newfs (during a clean install); that 
>  number seems to be fixed at around 1.  
> 
>   I also thought it may simply be a problem with writing blocks around
>  a multiple of the cluster size; but that doesn't seem to be the case,
>  as I have followed the write()s in newfs all the way to the SCSI driver.
> 
>   Here's what I currently believe - Somewhere; a buffer is being lost.  
>  The loss of the buffer is timing dependent; because judicions printf()s 
>  in the kernel alter the timing (and the stack) and cause things to work 
>  correctly. [Of course, since the stack is being altered, it could also
>  be a stack corruption problem...]
> 
>   That is how, I believe, only an unlucky few have had the pleasure
>  of this problem...
> 
>   I have reproduced this on a dedicated machine now.  If you (or anyone
>  else) would like access to that machine to try and solve it - just
>  let me know!  If you let me know, we can set up a time where you can
>  get to it from the net...
> 
> 	- Dave Rivers -
> 




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.NEB.3.95.970624094321.26020A-100000>