Date: Thu, 4 Jan 2001 02:39:20 +0200 (IST) From: Roman Shterenzon <roman@xpert.com> To: Greg Lehey <grog@lemis.com> Cc: Daniel Lang <dl@leo.org>, Andy Newman <andy@silverbrook.com.au>, <freebsd-stable@freebsd.org> Subject: Re: kern/21148: multiple crashes while using vinum Message-ID: <Pine.LNX.4.30.0101040234380.21369-100000@jamus.xpert.com> In-Reply-To: <20010104105428.D4336@wantadilla.lemis.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, 4 Jan 2001, Greg Lehey wrote: ...snip... > > The reason is, that _some code_ writes into unallocated memory, in > > my case overwriting a data-structure of an ata-request with a few > > zero bytes, causing the panic. The stack trace allows me to trace > > the problem back to this point, but not further. I later experienced > > a similar problem on a scsi-only system. > > Yes, this looks very much like the other issues. But you must > understand that there's nothing I can do without further information. > > > The reason, why I filed this pr unter 'vinum' is, that it only > > occured on boxes using vinum, and perfectly reproducable via simple > > operations like a 'find /vinum/file/system -print' on a larger and > > moderately filled vinum-filesystem. Perfectly reproducable means: > > each night, periodic daily caused the panic (traceable to the find > > call in /etc/security, finding files with setuid bits). > > > > As far as I know, the only way to trace this writing into > > unallocated/otherallocated memory resp. buffer overrun > > would be to set a watchpoint to the overwritten data-structure > > within the kernel-debugger. > > The trouble with that is that this only happens when the system is > very active, and there are thousands of potential buffer headers which > could be trashed. I do have a trace facility within Vinum, but even > with that it's difficult to figure out what's going on. I don't agree about "very active". My system in question was calm during the test, I just run find /raid -print and it crashed. > > My stack-traces showed that this memory region stays the same on the > > same machine with the same kernel (although I can't tell how > > reliable this is). > > If you mean that the same part of the buffer header gets smashed every > time, yes, this is reliably reproducible (well, in other words, when > it happens (at random), it happens in the same place every time). It That is correct. Both me and Daniel had the crash occuring exactly at the same place. > may mean that Vinum is doing it, but as far as I can tell it's always > 6 words being zeroed out, and I don't do that anywhere in Vinum. The > other possibility, which I consider most likely, is that the data > structures accidentally get freed and used by some other driver (or, > possibly, that some other driver freed them first and then continued > using them). This would explain the observed correlation with the fxp > driver. Do you think that the later is more probable? I had fxp card there. > > b) I still believe, that there is a problem somewhere in the > > vinum code (probably within raid5 routines, since a mirror > > setup worked fine). > > Correct. I have no doubt about it. But some bugs are difficult to > find, and I need help. Hmm.. that part of the code in question, isn't it shared for both raid1 and raid5? --Roman Shterenzon, UNIX System Administrator and Consultant [ Xpert UNIX Systems Ltd., Herzlia, Israel. Tel: +972-9-9522361 ] To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-stable" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.LNX.4.30.0101040234380.21369-100000>