From owner-freebsd-bugs Wed Dec 6 18:44:44 2000 From owner-freebsd-bugs@FreeBSD.ORG Wed Dec 6 18:44:41 2000 Return-Path: Delivered-To: freebsd-bugs@freebsd.org Received: from wantadilla.lemis.com (wantadilla.lemis.com [192.109.197.80]) by hub.freebsd.org (Postfix) with ESMTP id 3630A37B400 for ; Wed, 6 Dec 2000 18:44:16 -0800 (PST) Received: (from grog@localhost) by wantadilla.lemis.com (8.11.0/8.9.3) id eB72cW729192; Thu, 7 Dec 2000 13:08:32 +1030 (CST) (envelope-from grog) Date: Thu, 7 Dec 2000 13:08:32 +1030 From: Greg Lehey To: Josef Karthauser Cc: Carlos Amengual , Soren Schmidt , freebsd-bugs@FreeBSD.ORG Subject: Re: kern/22086: DMA errors during intensive disk activity on vinum volume Message-ID: <20001207130832.U27667@wantadilla.lemis.com> References: <200011290755.IAA75175@freebsd.dk> <010d01c05a10$a6901190$0400000a@hin> <20001206123009.F79990@bsdi.com> Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit User-Agent: Mutt/1.2.5i In-Reply-To: <20001206123009.F79990@bsdi.com>; from joe@tao.org.uk on Wed, Dec 06, 2000 at 12:30:09PM +0000 Organization: LEMIS, PO Box 460, Echunga SA 5153, Australia Phone: +61-8-8388-8286 Fax: +61-8-8388-8725 Mobile: +61-418-838-708 WWW-Home-Page: http://www.lemis.com/~grog X-PGP-Fingerprint: 6B 7B C3 8C 61 CD 54 AF 13 24 52 F8 6D A4 95 EF Sender: grog@wantadilla.lemis.com Sender: owner-freebsd-bugs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org On Wednesday, 6 December 2000 at 12:30:09 +0000, Josef Karthauser wrote: > On Wed, Nov 29, 2000 at 03:28:26PM +0100, Carlos Amengual wrote: >> On miércoles 29 de noviembre de 2000 8:55, "Soren Schmidt" wrote: >>> It seems Greg Lehey wrote: >>>>> >>>>> This is an old problem with vinum, somehow vinum manage to trash >>>>> the ad_request struct between where the ata drives makes it in >>>>> ad_strategy and when it actually gets used later.. >>>>> I thought this was fixed at least in -current long ago, but >>>>> apparently this was either not backported to -stable or the fix >>>>> was not good enough. >>>> >>>> No, the problem went into hiding. It was you who thought it had gone >>>> away. >>> >>> Yes, at a time, but back then we found out that fxp just accelerated >>> the problem, it did not create it. >>> >>>> 1. This config does not include an fxp device, which you had in the >>>> past blamed for the problems: >>>> >>>>> Now, I'm not sure if its vinum or the fxp driver that has a >>>>> problem, but since others have seen strange problems semilar to >>>>> this on other type of systems, I'd say it points very much at >>>>> fxp... The latest fix you put into -current is needed though... >>> >>> See above, I provided you with dumps and what not, but it newer >>> got past that. Since you removed your maintainer bit on vinum it >>> all went into a halt. In the meantime I have decommisioned all >>> the vinum boxes I had out there, fortunately all had either >>> promise or highpoint controllers so they now run ATA RAID's >>> happily without any problems, so vinum is no longer an issue for >>> me at least... >>> >>>> 2. Robin seems to be able to work through the source code, so this >>>> time we may be able to catch it. >>> >>> More power to you :) >> No lack of consideration for Greg's work intended, but as RAID is essentially >> meant to be used in critical-mission servers, people isn't likely to be >> willing to play & debug something on which their businesses will depend, >> especially when hardware controllers (which apparently work) are available. > > I sent Greg a large amount of debug that it took me a week collection > with respect to this problem. > > Did you ever get anywhere with this Greg? I never got a response from > you! Sorry, I've been swamped. I've finally taken a look at it. The first reason why I didn't reply earlier was that you attached the info (most of which I didn't ask for and didn't need) as a .gz attachment, so I couldn't just read it out of my mail reader. Looking through now, I find that you end up in remote serial debug in unlockrange(), though it's not clear how you got there. But it's clear that this isn't the same problem: the buffer header you looked at is perfectly OK. It's also not the same problem that you described in your mail message of 31 October, where bp->b_iodone had been zeroed out, along (probably) with some adjoining fields. IIRC you no longer have access to the machine in question, but if you do, I should now be in a position to continue debugging. Otherwise, the good news is that I will be getting more equipment Real Soon Now, and should be in a better position to reproduce this problem. Greg -- Finger grog@lemis.com for PGP public key See complete headers for address and phone numbers To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-bugs" in the body of the message