Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 14 Feb 2011 09:27:11 -0500
From:      John Baldwin <jhb@freebsd.org>
To:        freebsd-drivers@freebsd.org
Subject:   Re: MFI Driver Behavior
Message-ID:  <201102140927.11654.jhb@freebsd.org>
In-Reply-To: <4D55BBDC.7000604@soe.ucsc.edu>
References:  <4D55BBDC.7000604@soe.ucsc.edu>

next in thread | previous in thread | raw e-mail | index | archive | help
On Friday, February 11, 2011 5:44:44 pm Erich Weiler wrote:
> Hi All - I've posted on the forums but no one seems to have any ideas... 
>   I have an odd lock up issue with a Perc H800 controller under 
> 8.2-PRERELEASE.
> 
> We have a FreeBSD server running:
> 
> Code:
> 
> FreeBSD 8.2-PRERELEASE (GENERIC) #0: Thu Dec 16 14:59:46 PST 2010
> 
> It's a Dell R610. It has two MD1200 disk arrays on it, SAS chained 
> together. The controller that manages them is a Perc H800, with the 
> latest firmware available.
> 
> I have the disks exported JBOD from the controller. And, the disks are 
> roped into a ZFS filesystem, which is exported via NFS to the local net.
> 
> Everything works well most of the time, but every once in a while (like 
> once every few days), the filesystem completely hangs and we see these 
> errors on the console:
> 
> mfi0: COMMAND 0xffffff80009b5870 TIMEOUT AFTER 61793 SECONDS
> mfi0: COMMAND 0xffffff80009b5870 TIMEOUT AFTER 61823 SECONDS
> mfi0: COMMAND 0xffffff80009b5870 TIMEOUT AFTER 61853 SECONDS
> mfi0: COMMAND 0xffffff80009b5870 TIMEOUT AFTER 61923 SECONDS
> 
> (this is after the filesystem has been hung for a day)
> 
> etc... When I start poking around with mfiutil, it shows everything is 
> OK, the disks are all OK, the volumes are good, the event logs show no 
> errors. The "Patrol" feature is disabled. The battery is fine.
> 
> After running "mfiutil show volumes", the lockup magically frees itself. 
> But, I don't want them to happen in the first place, and I certainly 
> don't want to have to manually run a "mfiutil show volumes" or whatever 
> to unlock it every time. Has anyone seen this before?
> 
> I've actually tried another H800 controller we had on the shelf as well, 
> just to rule out a hardware problem with the first one, but we see the 
> same behavior on both controllers.
> 
> "zpool status" also shows the disks as all OK, and a "zpool scrub" turns 
> up no problems.
> 
> Any insight much appreciated!!  Since multiple controllers exhibit the 
> same behavior, I was thinking it's falling more into a driver issue at 
> this point.  I hope I'm right!  I emailed the author of the MFI driver 
> for FreeBSD, but have not heard anything back, so I was hoping someone 
> here would have an idea of where I could turn next.
> 
> If even there was a way I could determine what the "0xffffff80009b5870" 
> MFI command is, that would be a big help, so I would have a better idea 
> of where to continue my investigations.

That value is just a pointer to the command structure in the device driver for 
the command that timed out.  It probably is not that useful.  The best person 
to ask about this is probably Scott Long (scottl@FreeBSD.org).  The fact that 
'show volumes' unsticks the controller sounds quite odd.  Are you using MSI?  
If so, have you tried disabling MSI?

-- 
John Baldwin



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201102140927.11654.jhb>