From owner-freebsd-stable Wed Jan 3 16:24:37 2001 From owner-freebsd-stable@FreeBSD.ORG Wed Jan 3 16:24:33 2001 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from wantadilla.lemis.com (wantadilla.lemis.com [192.109.197.80]) by hub.freebsd.org (Postfix) with ESMTP id 6B64C37B400; Wed, 3 Jan 2001 16:24:31 -0800 (PST) Received: by wantadilla.lemis.com (Postfix, from userid 1004) id 086CE6A90D; Thu, 4 Jan 2001 10:54:29 +1030 (CST) Date: Thu, 4 Jan 2001 10:54:28 +1030 From: Greg Lehey To: Daniel Lang Cc: Andy Newman , Roman Shterenzon , freebsd-gnats-submit@freebsd.org, freebsd-stable@freebsd.org Subject: Re: kern/21148: multiple crashes while using vinum Message-ID: <20010104105428.D4336@wantadilla.lemis.com> References: <200101012239.f01MdiH40906@freefall.freebsd.org> <20010103145232.B10169@atrbg11.informatik.tu-muenchen.de> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i In-Reply-To: <20010103145232.B10169@atrbg11.informatik.tu-muenchen.de>; from dl@leo.org on Wed, Jan 03, 2001 at 02:52:35PM +0000 Organization: LEMIS, PO Box 460, Echunga SA 5153, Australia Phone: +61-8-8388-8286 Fax: +61-8-8388-8725 Mobile: +61-418-838-708 WWW-Home-Page: http://www.lemis.com/~grog X-PGP-Fingerprint: 6B 7B C3 8C 61 CD 54 AF 13 24 52 F8 6D A4 95 EF Sender: owner-freebsd-stable@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG On Wednesday, 3 January 2001 at 14:52:35 +0000, Daniel Lang wrote: > Dear Greg, Andy, Roman, > > grog@FreeBSD.org wrote on Mon, Jan 01, 2001 at 11:41:19PM +0000: >> Synopsis: multiple crashes while using vinum > [..] >> State-Changed-Why: >> No feedback from submitter. >> >> http://www.freebsd.org/cgi/query-pr.cgi?pr=21148 > > Well, I've sent you stack-traces, with (and alas as well without) > debugging symbols, I am perfectly aware of your instruction page > about debugging vinum, and not an ignorant moron, who complains > without reading. Unfortunately you don't seem to trust me > or other people in this matter. As my closing message says, the reason I closed the PR was: >> No feedback from submitter. I sent you a message on 10 September 2000 asking for additional information. I received none. There's no reason to get all upset now, or make claims about my intentions. This was just a dead PR, and you've made it clear, both before and now, that you have no intention of following up on it. This is not a question of "ignorant morons" or "trust". > The reason is, that _some code_ writes into unallocated memory, in > my case overwriting a data-structure of an ata-request with a few > zero bytes, causing the panic. The stack trace allows me to trace > the problem back to this point, but not further. I later experienced > a similar problem on a scsi-only system. Yes, this looks very much like the other issues. But you must understand that there's nothing I can do without further information. > The reason, why I filed this pr unter 'vinum' is, that it only > occured on boxes using vinum, and perfectly reproducable via simple > operations like a 'find /vinum/file/system -print' on a larger and > moderately filled vinum-filesystem. Perfectly reproducable means: > each night, periodic daily caused the panic (traceable to the find > call in /etc/security, finding files with setuid bits). > > As far as I know, the only way to trace this writing into > unallocated/otherallocated memory resp. buffer overrun > would be to set a watchpoint to the overwritten data-structure > within the kernel-debugger. The trouble with that is that this only happens when the system is very active, and there are thousands of potential buffer headers which could be trashed. I do have a trace facility within Vinum, but even with that it's difficult to figure out what's going on. > My stack-traces showed that this memory region stays the same on the > same machine with the same kernel (although I can't tell how > reliable this is). If you mean that the same part of the buffer header gets smashed every time, yes, this is reliably reproducible (well, in other words, when it happens (at random), it happens in the same place every time). It may mean that Vinum is doing it, but as far as I can tell it's always 6 words being zeroed out, and I don't do that anywhere in Vinum. The other possibility, which I consider most likely, is that the data structures accidentally get freed and used by some other driver (or, possibly, that some other driver freed them first and then continued using them). This would explain the observed correlation with the fxp driver. > My experiences with kernel code and kernel-debugging with > ddb are very limited. So is my time (I know this applies > to anyone). Therefore I ceased spending time to set up > remote-gdb sessions and sending you stack traces trying to be > helpful, since you obviously didn't seem to be interested. > > I further decided not to use vinum any more. We spent some > cash on a few hardware RAIDs, and the boxes run smooth now, > since. > > I am just writing this to state: > a) I did respond to your requests, trying to be as helpful as > I could. Well, I sent you a message on 10 September 2000, asking for additional information. You didn't send it to me. > You could blame me for not knowing or willing to learn how to > set up a ddb/gdb session using watchpoints and waiting for the > next crash in an environmen that should be productive (and now > is). No, I wouldn't do that. > b) I still believe, that there is a problem somewhere in the > vinum code (probably within raid5 routines, since a mirror > setup worked fine). Correct. I have no doubt about it. But some bugs are difficult to find, and I need help. Greg -- Finger grog@lemis.com for PGP public key See complete headers for address and phone numbers To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-stable" in the body of the message