Date: Fri, 5 Jul 2013 08:30:01 GMT From: Markus Gebert <markus.gebert@hostpoint.ch> To: freebsd-scsi@FreeBSD.org Subject: Re: kern/179932: [ciss] ciss i/o stall problem with HP Bl Gen8 (and HP Bl Gen7 + Storage Blade) Message-ID: <201307050830.r658U1X4013653@freefall.freebsd.org>
next in thread | raw e-mail | index | archive | help
The following reply was made to PR kern/179932; it has been noted by GNATS. From: Markus Gebert <markus.gebert@hostpoint.ch> To: bug-followup@FreeBSD.org, =?iso-8859-1?Q?Philipp_M=E4chler?= <philipp.maechler@hostpoint.ch>, "sean_bruno@yahoo.com" <sean_bruno@yahoo.com> Cc: Subject: Re: kern/179932: [ciss] ciss i/o stall problem with HP Bl Gen8 (and HP Bl Gen7 + Storage Blade) Date: Fri, 5 Jul 2013 10:19:58 +0200 Hey Sean I'm glad to hear you're getting the same controller as ours to test. In = the meantime it seems that the backported ciss changes from head seem to = help a lot on the G8 blades with the p220 controllers. It's quite likely = that the G8 problem is already fixed in head. Of course, we can't be = sure yet, but still it might be better to focus on the G7 with p410 and = storage blade, where the issue has occured even with ciss from head. So = it's good your getting a p410. We discussed your test scenario. ZFS is known to go nuts and do really = much IO once a zpool get quite full, so is your goal just to maximise IO = to reproduce the problem more reliably? Or is there a specific reason = why you want us to fill a zpool? Our problem is that half of the G7 blades are productive, so filling the = zpool is no option there. The second half is where the first half = replicates all data to, so they're kind of hot standby and we're more = flexibel doing tests there, but we still have to keep the replication = running, which makes filling the pool impossible as well. The day before yesterday we installed the patched kernel that has ciss = from head and CISS_DEBUG defined on all these standby systems. We run = zpool scrubs non-stop on all of them to generate IO and as they are = replication targets, they also receive some amount of write IO. Like = that, we hope to get a system to stall more often, so we can progress = more quickly debugging the G7 problem. If you think that more write IO = would help, we can look into using iozone, but a stated before, we won't = be able to do things like filling the zpool. Also, once a G7 blade stalls, is there any information apart from = alltrace and DDB ciss debug print you want as to pull out of the system? When reading through the ciss driver source I noticed that the DDB print = may only outpout information about the first controller. Since the = storage blade contains a second p410, do you think it'd be worth to = alter the debug function to print out information about any ciss = controller in the system? Markus
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201307050830.r658U1X4013653>