Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 28 Oct 2011 14:14:28 +1100
From:      Jan Mikkelsen <janm@transactionware.com>
To:        Vincent Hoffman <vince@unsane.co.uk>
Cc:        FreeBSD Stable Mailing List <freebsd-stable@freebsd.org>, Jeremy Chadwick <freebsd@jdc.parodius.com>
Subject:   Re: mfi timeouts
Message-ID:  <992755CA-6479-4B9A-A3D5-DD5C1871089A@transactionware.com>
In-Reply-To: <4EA9EBB5.2090004@unsane.co.uk>
References:  <4EA9E0C3.5080306@unsane.co.uk> <20111027230452.GA22060@icarus.home.lan> <4EA9EBB5.2090004@unsane.co.uk>

next in thread | previous in thread | raw e-mail | index | archive | help
Hi,

There is a patch linked to from this PR, which seems very similar:

http://www.freebsd.org/cgi/query-pr.cgi?pr=3Dkern/140416

http://lists.freebsd.org/pipermail/freebsd-scsi/2011-March/004839.html

The problem is also consistent with running mfiutil clearing the =
problem.

I'm about to deploy mfi controllers in a similar configuration, so I'd =
be very curious about whether the patch fixes the problem for you.

Regards,

Jan Mikkelsen


On 28/10/2011, at 10:39 AM, Vincent Hoffman wrote:

> On 28/10/2011 00:04, Jeremy Chadwick wrote:
>> On Thu, Oct 27, 2011 at 11:52:51PM +0100, Vincent Hoffman wrote:
>>>    I've recently installed a new NAS at work which uses a rebranded =
LSI
>>> megaraid sas
>>> [root@banshee ~]# mfiutil show adapter
>>> mfi0 Adapter:
>>>    Product Name: Supermicro SMC2108
>>>   Serial Number:
>>>        Firmware: 12.12.0-0047
>>>     RAID Levels: JBOD, RAID0, RAID1, RAID5, RAID6, RAID10, RAID50
>>>  Battery Backup: present
>>>           NVRAM: 32K
>>>  Onboard Memory: 512M
>>>  Minimum Stripe: 8k
>>>  Maximum Stripe: 1M
>>>=20
>>> I'm running 8-STABLE as of 2011-10-23 (for zfs v28 as is got 26 3Tb =
drives)
>>>=20
>>> I'm seeing a lot of messages like
>>> mfi0: COMMAND 0xffffff8000b216c8 TIMEOUT AFTER 60 SECONDS
>>> mfi0: COMMAND 0xffffff8000b216c8 TIMEOUT AFTER 90 SECONDS
>>> mfi0: COMMAND 0xffffff8000b216c8 TIMEOUT AFTER 120 SECONDS
>>> mfi0: COMMAND 0xffffff8000b216c8 TIMEOUT AFTER 150 SECONDS
>>> mfi0: COMMAND 0xffffff8000b216c8 TIMEOUT AFTER 180 SECONDS
>>> mfi0: COMMAND 0xffffff8000b216c8 TIMEOUT AFTER 210 SECONDS
>>> mfi0: COMMAND 0xffffff8000b216c8 TIMEOUT AFTER 240 SECONDS
>>> mfi0: COMMAND 0xffffff8000b216c8 TIMEOUT AFTER 271 SECONDS
>>> mfi0: COMMAND 0xffffff8000b216c8 TIMEOUT AFTER 301 SECONDS
>>> mfi0: COMMAND 0xffffff8000b216c8 TIMEOUT AFTER 331 SECONDS
>>> mfi0: COMMAND 0xffffff8000b216c8 TIMEOUT AFTER 361 SECONDS
>>> mfi0: COMMAND 0xffffff8000b216c8 TIMEOUT AFTER 391 SECONDS
>>> mfi0: COMMAND 0xffffff8000b21b08 TIMEOUT AFTER 55 SECONDS
>>> mfi0: COMMAND 0xffffff8000b21b08 TIMEOUT AFTER 85 SECONDS
>>>=20
>>> At which time I'm seeing IO stall on the array connected to the mfi
>>> adapter, this can continue for
>>> 20 minutes or so resuming randomly (or so it seems although a little
>>> more on this later on)
>>>=20
>>>> =46rom pciconf -lv
>>> mfi0@pci0:5:0:0:        class=3D0x010400 card=3D0x070015d9 =
chip=3D0x00791000
>>> rev=3D0x04 hdr=3D0x00
>>>    vendor     =3D 'LSI Logic (Was: Symbios Logic, NCR)'
>>>    class      =3D mass storage
>>>    subclass   =3D RAID
>>>=20
>>>> =46rom dmesg
>>> mfi0: <LSI MegaSAS Gen2> port 0xe000-0xe0ff mem
>>> 0xfbd9c000-0xfbd9ffff,0xfbdc0000-0xfbdfffff irq 32 at device 0.0 on =
pci5
>>> mfi0: Megaraid SAS driver Ver 3.00
>>> mfi0: 12330 (372962922s/0x0020/info) - Shutdown command received =
from host
>>> mfi0: 12331 (boot + 4s/0x0020/info) - Firmware initialization =
started
>>> (PCI ID 0079/1000/0700/15d9)
>>> mfi0: 12332 (boot + 4s/0x0020/info) - Firmware version 2.120.53-1235
>>> mfi0: 12333 (boot + 7s/0x0008/info) - Battery Present
>>> mfi0: 12334 (boot + 7s/0x0020/info) - Package version 12.12.0-0047
>>> mfi0: 12335 (boot + 7s/0x0020/info) - Board Revision
>>>=20
>>> I have found this thread from a bit of googleing but it doesnt end =
too well.
>>> =
http://lists.freebsd.org/pipermail/freebsd-stable/2011-September/063821.ht=
ml
>>> Was this ever taken further?
>>>=20
>>> One thing I have noticed is that the stall (and timeout messages) =
seem
>>> to go away if I query the card using mfiutil, I currently have a =
cron
>>> doing this every 2 minutes to see if this has been coincidence or =
not.
>>>=20
>>>=20
>>> Any suggestions welcome and i'm happy to provide more info if i can =
but
>>> I dont have a duplicate to do too much debugging on, I'm happy to =
try
>>> patches though.
>>>=20
>>> Is this worth filing a PR?
>> Can you please provide uname -a output?  The version of FreeBSD =
you're
>> using matters greatly here.
>>=20
> Sure
> FreeBSD banshee.foobar.net 8.2-STABLE FreeBSD 8.2-STABLE #2: Wed Oct =
26
> 16:14:09 BST 2011   =20
> toor@banshee.foobar.net:/usr/obj/usr/src/sys/BANSHEE  amd64
> [root@banshee /usr/src]# svn info
> Path: .
> Working Copy Root Path: /usr/src
> URL: http://svn.freebsd.org/base/stable/8
> Repository Root: http://svn.freebsd.org/base
> Repository UUID: ccf9f872-aa2e-dd11-9fc8-001c23d0bc1f
> Revision: 226708
> Node Kind: directory
> Schedule: normal
> Last Changed Author: brueffer
> Last Changed Rev: 226671
> Last Changed Date: 2011-10-23 19:37:57 +0100 (Sun, 23 Oct 2011)
>=20
>=20
> It's looking like the mfiutil query stopping the stall is not a =
coincidence
> the last 2 have lasted less than the every 2 minutes that i set the =
cron
> to run, much less than previously.
> The cron is a simple /usr/sbin/mfiutil show volumes | grep -v OPTIMAL=20=

> So get at least get an email if the volume breaks ;)
> Oct 28 00:01:06 banshee mfi0: COMMAND 0xffffff8000b22d18 TIMEOUT AFTER
> 59 SECONDS
> Oct 28 00:01:36 banshee mfi0: COMMAND 0xffffff8000b22d18 TIMEOUT AFTER
> 89 SECONDS
> Oct 28 00:13:09 banshee mfi0: COMMAND 0xffffff8000b205c8 TIMEOUT AFTER
> 50 SECONDS
> Oct 28 00:13:39 banshee mfi0: COMMAND 0xffffff8000b205c8 TIMEOUT AFTER
> 80 SECONDS
>=20
> I'm guessing this must kick something on the card.
>=20
> Vince
>=20
> _______________________________________________
> freebsd-stable@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to =
"freebsd-stable-unsubscribe@freebsd.org"




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?992755CA-6479-4B9A-A3D5-DD5C1871089A>