From owner-freebsd-drivers@FreeBSD.ORG Fri Feb 11 23:00:47 2011 Return-Path: Delivered-To: freebsd-drivers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 63A0E106566C for ; Fri, 11 Feb 2011 23:00:47 +0000 (UTC) (envelope-from weiler@soe.ucsc.edu) Received: from mail-01.cse.ucsc.edu (mail-01.cse.ucsc.edu [128.114.48.32]) by mx1.freebsd.org (Postfix) with ESMTP id 530988FC13 for ; Fri, 11 Feb 2011 23:00:47 +0000 (UTC) Received: from erich-weilers-macbook-pro.local (hgfw-01.soe.ucsc.edu [128.114.58.17]) by mail-01.cse.ucsc.edu (Postfix) with ESMTPSA id 7871B1009B90 for ; Fri, 11 Feb 2011 14:44:45 -0800 (PST) Message-ID: <4D55BBDC.7000604@soe.ucsc.edu> Date: Fri, 11 Feb 2011 14:44:44 -0800 From: Erich Weiler User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US; rv:1.9.2.13) Gecko/20101207 Thunderbird/3.1.7 MIME-Version: 1.0 To: freebsd-drivers@freebsd.org Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Subject: MFI Driver Behavior X-BeenThere: freebsd-drivers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Writing device drivers for FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 11 Feb 2011 23:00:47 -0000 Hi All - I've posted on the forums but no one seems to have any ideas... I have an odd lock up issue with a Perc H800 controller under 8.2-PRERELEASE. We have a FreeBSD server running: Code: FreeBSD 8.2-PRERELEASE (GENERIC) #0: Thu Dec 16 14:59:46 PST 2010 It's a Dell R610. It has two MD1200 disk arrays on it, SAS chained together. The controller that manages them is a Perc H800, with the latest firmware available. I have the disks exported JBOD from the controller. And, the disks are roped into a ZFS filesystem, which is exported via NFS to the local net. Everything works well most of the time, but every once in a while (like once every few days), the filesystem completely hangs and we see these errors on the console: mfi0: COMMAND 0xffffff80009b5870 TIMEOUT AFTER 61793 SECONDS mfi0: COMMAND 0xffffff80009b5870 TIMEOUT AFTER 61823 SECONDS mfi0: COMMAND 0xffffff80009b5870 TIMEOUT AFTER 61853 SECONDS mfi0: COMMAND 0xffffff80009b5870 TIMEOUT AFTER 61923 SECONDS (this is after the filesystem has been hung for a day) etc... When I start poking around with mfiutil, it shows everything is OK, the disks are all OK, the volumes are good, the event logs show no errors. The "Patrol" feature is disabled. The battery is fine. After running "mfiutil show volumes", the lockup magically frees itself. But, I don't want them to happen in the first place, and I certainly don't want to have to manually run a "mfiutil show volumes" or whatever to unlock it every time. Has anyone seen this before? I've actually tried another H800 controller we had on the shelf as well, just to rule out a hardware problem with the first one, but we see the same behavior on both controllers. "zpool status" also shows the disks as all OK, and a "zpool scrub" turns up no problems. Any insight much appreciated!! Since multiple controllers exhibit the same behavior, I was thinking it's falling more into a driver issue at this point. I hope I'm right! I emailed the author of the MFI driver for FreeBSD, but have not heard anything back, so I was hoping someone here would have an idea of where I could turn next. If even there was a way I could determine what the "0xffffff80009b5870" MFI command is, that would be a big help, so I would have a better idea of where to continue my investigations. -erich