From owner-freebsd-scsi@FreeBSD.ORG Wed Nov 2 08:56:30 2011 Return-Path: Delivered-To: freebsd-scsi@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 05000106564A for ; Wed, 2 Nov 2011 08:56:30 +0000 (UTC) (envelope-from peter.maloney@brockmann-consult.de) Received: from moutng.kundenserver.de (moutng.kundenserver.de [212.227.126.171]) by mx1.freebsd.org (Postfix) with ESMTP id A41808FC0C for ; Wed, 2 Nov 2011 08:56:29 +0000 (UTC) Received: from [10.3.0.26] ([141.4.215.32]) by mrelayeu.kundenserver.de (node=mrbap0) with ESMTP (Nemesis) id 0MWhTP-1RSWZ91Dsx-00XIsw; Wed, 02 Nov 2011 09:43:52 +0100 Message-ID: <4EB102C7.8080401@brockmann-consult.de> Date: Wed, 02 Nov 2011 09:43:51 +0100 From: Peter Maloney User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.18) Gecko/20110617 Thunderbird/3.1.11 MIME-Version: 1.0 To: Jason Wolfe References: <4EAEF431.7090108@brockmann-consult.de> In-Reply-To: X-Enigmail-Version: 1.1.2 X-Provags-ID: V02:K0:l9N7rDkQkC+AsK40qVaA1cTE/ku/nKfJ0okSl1Qynrs Ka2sNOCjWC1hyonoMbaQpXymtJ2LtwiwMBSuVq7vs921YGoT26 Z8ys2XphzaR+0Liq/4uHWdt16gvXMCYlUm/6fHjoMrl7he8Cbk vgTIF77H3yUDH/0PRDhgxmIaUTbxWkdwCu8uyVIpST82509kWG 0oiLWhcvNao78rhX3f+dynv4tmFKOAQJw5p1zrnnIIc0aSGFll 5Rmh6LdrRTJv4xlwReOI2fFU4vXY3tznUq4L5uj+jVcarzejFp jJDa0fpZGNefckAoAs2ny1Lb7ST9xpafxr1Mc4q0f0WaTKU1Ik 4wHAQsIyyhaGb9fbuVmbMqgH+3DhIcGlGuP1xQfsL Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Cc: freebsd-scsi@freebsd.org Subject: Re: mps/LSI SAS2008 controller crashes when smartctl is run with upped disk tags X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 02 Nov 2011 08:56:30 -0000 On 11/01/2011 09:32 PM, Jason Wolfe wrote: > On Mon, Oct 31, 2011 at 12:17 PM, Peter Maloney > > wrote: > > Dear Jason, > > I get a simlar problem on a system with an LSI 9211-8i with 20 SATA > disks attached (2 SSDs and 18 spnning disks). My system doesn't hang, > panic, or reset though. I just lose access to one disk, which is then > considered FAULTED in my zpool status (with the ZFS file system). If I > physically remove the FAULTED disk and run "gpart recover da0", I > get a > panic. Otherwise, the system keeps running in a degraded state. > When I > reboot and resilver, some data is found damaged and repaired, not just > refreshed with the latest state. The server has 1 HBA and 2 > backplanes, > and I have the 2 mirrored root disks on different backplanes. > Maybe that > is why mine runs degraded and yours hang. > > This happened twice so far (in around a month or two), and both > times it > was one of the mirrored root disks (SSDs) that faulted. > > My tags are set to 255. I will try reproducing it as you said, and > then > if it fails, rebooting and trying again setting tags to 2 as you > suggested. > > And *thank you very much for this information*. This is the last > outstanding issue with this server. I hope this workaround helps. > > # camcontrol tags /dev/da0 > (pass0:mps0:0:7:0): device openings: 255 > > > Peter, > > This happens 'randomly' for you, or do you have some automated process > running smartctl that trips the drives up occasionally? It appears to be completely random, but it could be something specific going on that I just didn't think of. I don't know how to trigger it. I wrote a script once that looped over the disks once with smartctl (which I installed from ports) and recorded the device id, size of the disks, etc.. But it didn't cause a crash, and I didn't try looping it constantly to crash it. The system uses "zfs send" to send the whole pool to another machine. It uses rsync to back up some servers on to it. It serves a bunch of data over NFS and has samba online also but not in use. The primary user of the NFS shares is VMWare ESXi, which has a terrible problem with synchronous writes, which might put a heavier load on the system. > The way I'm getting around it currently is to just move > /usr/local/sbin/smartctl elsewhere, and replacing it with a wrapper > that simply drops the tags to 1, executes to the new smartctl location > with the options passed, then moves the tags back to whatever you > prefer. There will obviously be a small detriment here, but it should > be fairly quick and hopefully not even noticeable in your case. In my reading, I found that people think that reducing the io queues (via kernel parameters) for zfs actually improves performance (moving the queue to the OS I guess), so if the tags is similar, then I wasn't thinking there would be too much of a drop. And also luckily, this system of mine is not a performance machine... just a huge file server. So if it is slower but more stable that way, I will leave tags set to 2 forever. > > If smartctl is not triggering these events for you, any idea what is? I have no real clue, but my guess is that some NFS shares are using the ZIL (zfs log device) a lot, and since that device is horribly inefficient (scoring like 1500 iops during ZIL use on a disk that scores 50-140k on other tests), it causes the IO system to be overloaded, and trigger the failure, purely based on load rather than something particular like smartctl. So for now, I disabled my ZIL to see if it still crashes. Also on my list of things to try is: -change to the IT firmware instead of IR, since ZFS prefers to have no RAID in there at all. -change the tags to 2 -try the LSI driver for the 9210-8i http://www.lsi.com/products/storagec...AS9210-8i.aspx Here is my forum thread about it: http://forums.freebsd.org/showthread.php?t=26656 Are you using ZFS? Is your root volume in hardware RAID or software RAID? I am curious because you say your systems hang, and mine just runs degraded. > > Jason Peter -- -------------------------------------------- Peter Maloney Brockmann Consult Max-Planck-Str. 2 21502 Geesthacht Germany Tel: +49 4152 889 300 Fax: +49 4152 889 333 E-mail: peter.maloney@brockmann-consult.de Internet: http://www.brockmann-consult.de --------------------------------------------