From owner-freebsd-scsi@FreeBSD.ORG Fri Jul 5 08:30:01 2013 Return-Path: Delivered-To: freebsd-scsi@smarthost.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id CC229859 for ; Fri, 5 Jul 2013 08:30:01 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:1900:2254:206c::16:87]) by mx1.freebsd.org (Postfix) with ESMTP id A4E9C1B0C for ; Fri, 5 Jul 2013 08:30:01 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.7/8.14.7) with ESMTP id r658U1WQ013654 for ; Fri, 5 Jul 2013 08:30:01 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.7/8.14.7/Submit) id r658U1X4013653; Fri, 5 Jul 2013 08:30:01 GMT (envelope-from gnats) Date: Fri, 5 Jul 2013 08:30:01 GMT Message-Id: <201307050830.r658U1X4013653@freefall.freebsd.org> To: freebsd-scsi@FreeBSD.org Cc: From: Markus Gebert Subject: Re: kern/179932: [ciss] ciss i/o stall problem with HP Bl Gen8 (and HP Bl Gen7 + Storage Blade) X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list Reply-To: Markus Gebert List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 05 Jul 2013 08:30:01 -0000 The following reply was made to PR kern/179932; it has been noted by GNATS. From: Markus Gebert To: bug-followup@FreeBSD.org, =?iso-8859-1?Q?Philipp_M=E4chler?= , "sean_bruno@yahoo.com" Cc: Subject: Re: kern/179932: [ciss] ciss i/o stall problem with HP Bl Gen8 (and HP Bl Gen7 + Storage Blade) Date: Fri, 5 Jul 2013 10:19:58 +0200 Hey Sean I'm glad to hear you're getting the same controller as ours to test. In = the meantime it seems that the backported ciss changes from head seem to = help a lot on the G8 blades with the p220 controllers. It's quite likely = that the G8 problem is already fixed in head. Of course, we can't be = sure yet, but still it might be better to focus on the G7 with p410 and = storage blade, where the issue has occured even with ciss from head. So = it's good your getting a p410. We discussed your test scenario. ZFS is known to go nuts and do really = much IO once a zpool get quite full, so is your goal just to maximise IO = to reproduce the problem more reliably? Or is there a specific reason = why you want us to fill a zpool? Our problem is that half of the G7 blades are productive, so filling the = zpool is no option there. The second half is where the first half = replicates all data to, so they're kind of hot standby and we're more = flexibel doing tests there, but we still have to keep the replication = running, which makes filling the pool impossible as well. The day before yesterday we installed the patched kernel that has ciss = from head and CISS_DEBUG defined on all these standby systems. We run = zpool scrubs non-stop on all of them to generate IO and as they are = replication targets, they also receive some amount of write IO. Like = that, we hope to get a system to stall more often, so we can progress = more quickly debugging the G7 problem. If you think that more write IO = would help, we can look into using iozone, but a stated before, we won't = be able to do things like filling the zpool. Also, once a G7 blade stalls, is there any information apart from = alltrace and DDB ciss debug print you want as to pull out of the system? When reading through the ciss driver source I noticed that the DDB print = may only outpout information about the first controller. Since the = storage blade contains a second p410, do you think it'd be worth to = alter the debug function to print out information about any ciss = controller in the system? Markus