Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 8 Nov 2011 14:50:34 -0500
From:      John Baldwin <jhb@freebsd.org>
To:        freebsd-stable@freebsd.org
Cc:        Jan Mikkelsen <janm@transactionware.com>, Jeremy Chadwick <freebsd@jdc.parodius.com>, Vincent Hoffman <vince@unsane.co.uk>
Subject:   Re: mfi timeouts
Message-ID:  <201111081450.34686.jhb@freebsd.org>
In-Reply-To: <4EB1BA7A.2000307@unsane.co.uk>
References:  <4EA9E0C3.5080306@unsane.co.uk> <992755CA-6479-4B9A-A3D5-DD5C1871089A@transactionware.com> <4EB1BA7A.2000307@unsane.co.uk>

next in thread | previous in thread | raw e-mail | index | archive | help
On Wednesday, November 02, 2011 5:47:38 pm Vincent Hoffman wrote:
> On 28/10/2011 04:14, Jan Mikkelsen wrote:
> > Hi,
> >
> > There is a patch linked to from this PR, which seems very similar:
> >
> > http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/140416
> >
> > http://lists.freebsd.org/pipermail/freebsd-scsi/2011-March/004839.html
> >
> > The problem is also consistent with running mfiutil clearing the problem.
> >
> > I'm about to deploy mfi controllers in a similar configuration, so I'd be 
very curious about whether the patch fixes the problem for you.
> The patch you linked to seems to have removed the stalls, although I
> have only had it running for a day. I'll post if it stalls again though.
> 
> I did manage to scrounge the use of a Dell r410 with a
> LSI Logic / Symbios Logic MegaRAID SAS 2108 [Liberator] (rev 05)
> Badged as Dell PERC H700 Adapter
> 
> to test out the patch I originally found but had the same issue as this post
> 
> http://lists.freebsd.org/pipermail/freebsd-stable/2011-September/063821.html
> 
> 
> I couldnt get the dell to stall in the first place either though so it
> could be a specific firmware version that the issue.
> 
> Anyway thanks for the pointers.

Hmm, did you try the patch I had posted from that earlier thread?  It had
two changes in it, one was similar to the patch in the PR, the second added
MSI-X support.  I've since tweaked it to make the MSI-X support off by
default but possible to enable via loader.conf.  Would you be willing to
try the updated patch at www.freebsd.org/~jhb/patches/mfi.patch?

> Vince
> 
> >
> > Regards,
> >
> > Jan Mikkelsen
> >
> >
> > On 28/10/2011, at 10:39 AM, Vincent Hoffman wrote:
> >
> >> On 28/10/2011 00:04, Jeremy Chadwick wrote:
> >>> On Thu, Oct 27, 2011 at 11:52:51PM +0100, Vincent Hoffman wrote:
> >>>>    I've recently installed a new NAS at work which uses a rebranded LSI
> >>>> megaraid sas
> >>>> [root@banshee ~]# mfiutil show adapter
> >>>> mfi0 Adapter:
> >>>>    Product Name: Supermicro SMC2108
> >>>>   Serial Number:
> >>>>        Firmware: 12.12.0-0047
> >>>>     RAID Levels: JBOD, RAID0, RAID1, RAID5, RAID6, RAID10, RAID50
> >>>>  Battery Backup: present
> >>>>           NVRAM: 32K
> >>>>  Onboard Memory: 512M
> >>>>  Minimum Stripe: 8k
> >>>>  Maximum Stripe: 1M
> >>>>
> >>>> I'm running 8-STABLE as of 2011-10-23 (for zfs v28 as is got 26 3Tb 
drives)
> >>>>
> >>>> I'm seeing a lot of messages like
> >>>> mfi0: COMMAND 0xffffff8000b216c8 TIMEOUT AFTER 60 SECONDS
> >>>> mfi0: COMMAND 0xffffff8000b216c8 TIMEOUT AFTER 90 SECONDS
> >>>> mfi0: COMMAND 0xffffff8000b216c8 TIMEOUT AFTER 120 SECONDS
> >>>> mfi0: COMMAND 0xffffff8000b216c8 TIMEOUT AFTER 150 SECONDS
> >>>> mfi0: COMMAND 0xffffff8000b216c8 TIMEOUT AFTER 180 SECONDS
> >>>> mfi0: COMMAND 0xffffff8000b216c8 TIMEOUT AFTER 210 SECONDS
> >>>> mfi0: COMMAND 0xffffff8000b216c8 TIMEOUT AFTER 240 SECONDS
> >>>> mfi0: COMMAND 0xffffff8000b216c8 TIMEOUT AFTER 271 SECONDS
> >>>> mfi0: COMMAND 0xffffff8000b216c8 TIMEOUT AFTER 301 SECONDS
> >>>> mfi0: COMMAND 0xffffff8000b216c8 TIMEOUT AFTER 331 SECONDS
> >>>> mfi0: COMMAND 0xffffff8000b216c8 TIMEOUT AFTER 361 SECONDS
> >>>> mfi0: COMMAND 0xffffff8000b216c8 TIMEOUT AFTER 391 SECONDS
> >>>> mfi0: COMMAND 0xffffff8000b21b08 TIMEOUT AFTER 55 SECONDS
> >>>> mfi0: COMMAND 0xffffff8000b21b08 TIMEOUT AFTER 85 SECONDS
> >>>>
> >>>> At which time I'm seeing IO stall on the array connected to the mfi
> >>>> adapter, this can continue for
> >>>> 20 minutes or so resuming randomly (or so it seems although a little
> >>>> more on this later on)
> >>>>
> >>>>> From pciconf -lv
> >>>> mfi0@pci0:5:0:0:        class=0x010400 card=0x070015d9 chip=0x00791000
> >>>> rev=0x04 hdr=0x00
> >>>>    vendor     = 'LSI Logic (Was: Symbios Logic, NCR)'
> >>>>    class      = mass storage
> >>>>    subclass   = RAID
> >>>>
> >>>>> From dmesg
> >>>> mfi0: <LSI MegaSAS Gen2> port 0xe000-0xe0ff mem
> >>>> 0xfbd9c000-0xfbd9ffff,0xfbdc0000-0xfbdfffff irq 32 at device 0.0 on 
pci5
> >>>> mfi0: Megaraid SAS driver Ver 3.00
> >>>> mfi0: 12330 (372962922s/0x0020/info) - Shutdown command received from 
host
> >>>> mfi0: 12331 (boot + 4s/0x0020/info) - Firmware initialization started
> >>>> (PCI ID 0079/1000/0700/15d9)
> >>>> mfi0: 12332 (boot + 4s/0x0020/info) - Firmware version 2.120.53-1235
> >>>> mfi0: 12333 (boot + 7s/0x0008/info) - Battery Present
> >>>> mfi0: 12334 (boot + 7s/0x0020/info) - Package version 12.12.0-0047
> >>>> mfi0: 12335 (boot + 7s/0x0020/info) - Board Revision
> >>>>
> >>>> I have found this thread from a bit of googleing but it doesnt end too 
well.
> >>>> http://lists.freebsd.org/pipermail/freebsd-stable/2011-
September/063821.html
> >>>> Was this ever taken further?
> >>>>
> >>>> One thing I have noticed is that the stall (and timeout messages) seem
> >>>> to go away if I query the card using mfiutil, I currently have a cron
> >>>> doing this every 2 minutes to see if this has been coincidence or not.
> >>>>
> >>>>
> >>>> Any suggestions welcome and i'm happy to provide more info if i can but
> >>>> I dont have a duplicate to do too much debugging on, I'm happy to try
> >>>> patches though.
> >>>>
> >>>> Is this worth filing a PR?
> >>> Can you please provide uname -a output?  The version of FreeBSD you're
> >>> using matters greatly here.
> >>>
> >> Sure
> >> FreeBSD banshee.foobar.net 8.2-STABLE FreeBSD 8.2-STABLE #2: Wed Oct 26
> >> 16:14:09 BST 2011    
> >> toor@banshee.foobar.net:/usr/obj/usr/src/sys/BANSHEE  amd64
> >> [root@banshee /usr/src]# svn info
> >> Path: .
> >> Working Copy Root Path: /usr/src
> >> URL: http://svn.freebsd.org/base/stable/8
> >> Repository Root: http://svn.freebsd.org/base
> >> Repository UUID: ccf9f872-aa2e-dd11-9fc8-001c23d0bc1f
> >> Revision: 226708
> >> Node Kind: directory
> >> Schedule: normal
> >> Last Changed Author: brueffer
> >> Last Changed Rev: 226671
> >> Last Changed Date: 2011-10-23 19:37:57 +0100 (Sun, 23 Oct 2011)
> >>
> >>
> >> It's looking like the mfiutil query stopping the stall is not a 
coincidence
> >> the last 2 have lasted less than the every 2 minutes that i set the cron
> >> to run, much less than previously.
> >> The cron is a simple /usr/sbin/mfiutil show volumes | grep -v OPTIMAL 
> >> So get at least get an email if the volume breaks ;)
> >> Oct 28 00:01:06 banshee mfi0: COMMAND 0xffffff8000b22d18 TIMEOUT AFTER
> >> 59 SECONDS
> >> Oct 28 00:01:36 banshee mfi0: COMMAND 0xffffff8000b22d18 TIMEOUT AFTER
> >> 89 SECONDS
> >> Oct 28 00:13:09 banshee mfi0: COMMAND 0xffffff8000b205c8 TIMEOUT AFTER
> >> 50 SECONDS
> >> Oct 28 00:13:39 banshee mfi0: COMMAND 0xffffff8000b205c8 TIMEOUT AFTER
> >> 80 SECONDS
> >>
> >> I'm guessing this must kick something on the card.
> >>
> >> Vince
> >>
> >> _______________________________________________
> >> freebsd-stable@freebsd.org mailing list
> >> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
> >> To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org"
> > _______________________________________________
> > freebsd-stable@freebsd.org mailing list
> > http://lists.freebsd.org/mailman/listinfo/freebsd-stable
> > To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org"
> 
> _______________________________________________
> freebsd-stable@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org"
> 

-- 
John Baldwin



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201111081450.34686.jhb>