Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 5 Jul 2013 08:30:01 GMT
From:      Markus Gebert <markus.gebert@hostpoint.ch>
To:        freebsd-scsi@FreeBSD.org
Subject:   Re: kern/179932: [ciss] ciss i/o stall problem with HP Bl Gen8 (and HP Bl Gen7 + Storage Blade)
Message-ID:  <201307050830.r658U1X4013653@freefall.freebsd.org>

next in thread | raw e-mail | index | archive | help
The following reply was made to PR kern/179932; it has been noted by GNATS.

From: Markus Gebert <markus.gebert@hostpoint.ch>
To: bug-followup@FreeBSD.org,
 =?iso-8859-1?Q?Philipp_M=E4chler?= <philipp.maechler@hostpoint.ch>,
 "sean_bruno@yahoo.com" <sean_bruno@yahoo.com>
Cc:  
Subject: Re: kern/179932: [ciss] ciss i/o stall problem with HP Bl Gen8 (and HP Bl Gen7 + Storage Blade)
Date: Fri, 5 Jul 2013 10:19:58 +0200

 Hey Sean
 
 I'm glad to hear you're getting the same controller as ours to test. In =
 the meantime it seems that the backported ciss changes from head seem to =
 help a lot on the G8 blades with the p220 controllers. It's quite likely =
 that the G8 problem is already fixed in head. Of course, we can't be =
 sure yet, but still it might be better to focus on the G7 with p410 and =
 storage blade, where the issue has occured even with ciss from head. So =
 it's good your getting a p410.
 
 We discussed your test scenario. ZFS is known to go nuts and do really =
 much IO once a zpool get quite full, so is your goal just to maximise IO =
 to reproduce the problem more reliably? Or is there a specific reason =
 why you want us to fill a zpool?
 
 Our problem is that half of the G7 blades are productive, so filling the =
 zpool is no option there. The second half is where the first half =
 replicates all data to, so they're kind of hot standby and we're more =
 flexibel doing tests there, but we still have to keep the replication =
 running, which makes filling the pool impossible as well.
 
 The day before yesterday we installed the patched kernel that has ciss =
 from head and CISS_DEBUG defined on all these standby systems. We run =
 zpool scrubs non-stop on all of them to generate IO and as they are =
 replication targets, they also receive some amount of write IO. Like =
 that, we hope to get a system to stall more often, so we can progress =
 more quickly debugging the G7 problem. If you think that more write IO =
 would help, we can look into using iozone, but a stated before, we won't =
 be able to do things like filling the zpool.
 
 Also, once a G7 blade stalls, is there any information apart from =
 alltrace and DDB ciss debug print you want as to pull out of the system?
 
 When reading through the ciss driver source I noticed that the DDB print =
 may only outpout information about the first controller. Since the =
 storage blade contains a second p410, do you think it'd be worth to =
 alter the debug function to print out information about any ciss =
 controller in the system?
 
 
 Markus
 



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201307050830.r658U1X4013653>