From owner-freebsd-fs@freebsd.org Mon Sep 28 15:07:05 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 996BCA0A0E0 for ; Mon, 28 Sep 2015 15:07:05 +0000 (UTC) (envelope-from allan@physics.umn.edu) Received: from mail.physics.umn.edu (smtp.spa.umn.edu [128.101.220.4]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 75DD91175 for ; Mon, 28 Sep 2015 15:07:04 +0000 (UTC) (envelope-from allan@physics.umn.edu) Received: from c-66-41-25-68.hsd1.mn.comcast.net ([66.41.25.68] helo=[192.168.0.107]) by mail.physics.umn.edu with esmtpsa (TLSv1:AES128-SHA:128) (Exim 4.77 (FreeBSD)) (envelope-from ) id 1Zga12-000Bhm-8j; Mon, 28 Sep 2015 10:06:56 -0500 Subject: Re: Cannot replace broken hard drive with LSI HBA To: =?UTF-8?Q?Karli_Sj=c3=b6berg?= , "freebsd-fs@freebsd.org" References: <1443447383.5271.66.camel@data-b104.adm.slu.se> From: Graham Allan Message-ID: <5609578E.1050606@physics.umn.edu> Date: Mon, 28 Sep 2015 10:06:54 -0500 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:38.0) Gecko/20100101 Thunderbird/38.2.0 MIME-Version: 1.0 In-Reply-To: <1443447383.5271.66.camel@data-b104.adm.slu.se> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 28 Sep 2015 15:07:05 -0000 I have seen this and keep experiencing it. I posted a question about it a while back but I don't think there was much response. https://lists.freebsd.org/pipermail/freebsd-fs/2014-July/019715.html My original question was with 9.1, and at the time we discovered that if you ran the LSI utility "sas2ircu", for example simply "sas2ircu 0 DISPLAY", it was seem to ang for a while, then issue a bus reset, and the replaced drives are detected. Now that I also see the same issue on 9.3, running sas2ircu in this situation usually seems to cause a panic, so it's not exactly progress. https://lists.freebsd.org/pipermail/freebsd-scsi/2015-August/006794.html I am using Dell servers, generally R710 and R720, with LSI 9207-8e controllers, Supermicro JBZOD chassis, and mostly WD drives. I got the above problems using firmware 16 (probably) with both 9.1 and 9.3. Regarding your experience with firmware 20, I believe it is "known bad", though some seem to disagree. Certainly when building my recent-ish large 9.3 servers I specifically tested it and got consistent data corruption. There is now a newer release of firmware 20 , "20.00.04.00" which seems to be fixed - see this thread: https://lists.freebsd.org/pipermail/freebsd-scsi/2015-August/006793.html This is kind of painful as the new firmware was posted by LSI with no comment or no release notes, yet if you follow all the references there are hints that it was known internally to be problematic. It's bad if selecting the HBA firmware for FreeBSD is degenerated to a "black art" but that seems to be where it is right now. I don't know that there are any other viable choices for SAS HBA besides LSI - I've never heard of any. Your bugzilla link is interesting. We are also using WD drives and Supermicro enclosures so there is a lot in common. I wonder if these changes are in 10.2-RELEASE? Graham On 9/28/2015 8:36 AM, Karli Sjöberg wrote: > Hey all! > > I´m just giving a shout out here to see if anyone else have had similar > experiences working with LSI/Avago HBA's in FreeBSD. > > For some time now, about a year or so, we´ve had several times were hard > drives have dropped out, you pull it out, pop a new back in, but it > never shows up in the OS. When inserted, nothing prints in the logs, and > physically, it just blinks for a half a second, then nothing. The entire > server then needs to be rebooted to get the drive back. > > As for the hardware, we have several SuperMicro servers, an HP, and an > old SUN server that all have this problem. It´s happened with both old > and new drives from different manufacturers and sizes. The only thing in > common has been the LSI/Avago HBA. > > The software is FreeBSD-10.1-STABLE as per this[*] bug, very close to > 10.2-RELEASE, mps driver version 20 and the firmware has been flashed to > 19. Also tried firmware version 20 but ZFS went nuts, displaying > checksum errors on just about every disk in the pool. > > I´ts gotten to the point I´m fed up and have to ask if someone else > could think of a fix, since neither software nor firmware upgrade seems > to make a difference. Or to suggest another HBA instead? > > Thanks in advance! > > /K > > [*]: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=191348 > > _______________________________________________ > freebsd-fs@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" >