Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 24 Jun 1997 08:05:20 -0400 (EDT)
From:      Thomas David Rivers <ponds!rivers@dg-rtp.dg.com>
To:        ponds!lambert.org!terry, ponds!sdf.com!tom
Cc:        ponds!FreeBSD.ORG!hackers, ponds!cdsnet.net!mrcpu
Subject:   panics/file system corruption - was Re: OpenBSD
Message-ID:  <199706241205.IAA04547@lakes.water.net>

next in thread | raw e-mail | index | archive | help

I just tripped over this; thought I would try to take a stab
at an answer....

> On Fri, 20 Jun 1997, Terry Lambert wrote:
> 
> > > > Anybody running FreeBSD given it a shot just to see?  I  have been
> > > > thinking about it to see if it fixes my UFS problems that are seemingly
> > > > unrepairable.
> > > 
> > >   UFS problem?
> > 
> > He's talking about his "free xxx isn't" race condition errors.
> 
>   What exactly is it about this condition that makes it occur on some
> machines?  I don't see it on a 16GB and a 8GB news spool here.  No
> corruption problmes either (although it was not clear to me, whether the
> corruption is just a result of the panic, or just another effect of this
> problem).
> 
> > 					Terry Lambert
> > 					terry@lambert.org
> > ---
> > Any opinions in this posting are my own and not those of my present
> > or previous employers.
> > 
> > 
> 
> Tom
> 


  I'm not sure what the problem is - but it seems to be timing related.

  I can readily reproduce the newfs-doesn't-write-zeros problem on two
 different 386 machines, one with IDE, one with SCSI.  Jaye appears to
 have the problem on his news machine.  I definitely have it on my
 news machine.  

  I thought it might be related to the number of elements off of the
 vnode free list - but when doing a newfs (during a clean install); that 
 number seems to be fixed at around 1.  

  I also thought it may simply be a problem with writing blocks around
 a multiple of the cluster size; but that doesn't seem to be the case,
 as I have followed the write()s in newfs all the way to the SCSI driver.

  Here's what I currently believe - Somewhere; a buffer is being lost.  
 The loss of the buffer is timing dependent; because judicions printf()s 
 in the kernel alter the timing (and the stack) and cause things to work 
 correctly. [Of course, since the stack is being altered, it could also
 be a stack corruption problem...]

  That is how, I believe, only an unlucky few have had the pleasure
 of this problem...

  I have reproduced this on a dedicated machine now.  If you (or anyone
 else) would like access to that machine to try and solve it - just
 let me know!  If you let me know, we can set up a time where you can
 get to it from the net...

	- Dave Rivers -




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199706241205.IAA04547>