From owner-freebsd-scsi@FreeBSD.ORG Mon Nov 4 11:06:56 2013 Return-Path: Delivered-To: freebsd-scsi@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id 06FB06EA for ; Mon, 4 Nov 2013 11:06:56 +0000 (UTC) (envelope-from owner-bugmaster@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:1900:2254:206c::16:87]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id E86982C49 for ; Mon, 4 Nov 2013 11:06:55 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.7/8.14.7) with ESMTP id rA4B6tdD048513 for ; Mon, 4 Nov 2013 11:06:55 GMT (envelope-from owner-bugmaster@FreeBSD.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.7/8.14.7/Submit) id rA4B6t2B048511 for freebsd-scsi@FreeBSD.org; Mon, 4 Nov 2013 11:06:55 GMT (envelope-from owner-bugmaster@FreeBSD.org) Date: Mon, 4 Nov 2013 11:06:55 GMT Message-Id: <201311041106.rA4B6t2B048511@freefall.freebsd.org> X-Authentication-Warning: freefall.freebsd.org: gnats set sender to owner-bugmaster@FreeBSD.org using -f From: FreeBSD bugmaster To: freebsd-scsi@FreeBSD.org Subject: Current problem reports assigned to freebsd-scsi@FreeBSD.org X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 04 Nov 2013 11:06:56 -0000 Note: to view an individual PR, use: http://www.freebsd.org/cgi/query-pr.cgi?pr=(number). The following is a listing of current problems submitted by FreeBSD users. These represent problem reports covering all versions including experimental development code and obsolete releases. S Tracker Resp. Description -------------------------------------------------------------------------------- o kern/179932 scsi [ciss] ciss i/o stall problem with HP Bl Gen8 (and HP o kern/178795 scsi [mps] MSI for mps driver doesn't work under vmware o kern/165982 scsi [mpt] mpt instability, drive resets, and losses on Fre o kern/165740 scsi [cam] SCSI code must drain callbacks before free f kern/162256 scsi [mpt] QUEUE FULL EVENT and 'mpt_cam_event: 0x0' o docs/151336 scsi Missing documentation of scsi_ and ata_ functions in c o kern/148083 scsi [aac] Strange device reporting o kern/144648 scsi [aac] Strange values of speed and bus width in dmesg o kern/142351 scsi [mpt] LSILogic driver performance problems o kern/134488 scsi [mpt] MPT SCSI driver probes max. 8 LUNs per device o kern/130621 scsi [mpt] tranfer rate is inscrutable slow when use lsi213 f kern/129602 scsi [ahd] ahd(4) gets confused and wedges SCSI bus f kern/123674 scsi [ahc] ahc driver dumping o sparc/121676 scsi [iscsi] iscontrol do not connect iscsi-target on sparc 14 problems total. From owner-freebsd-scsi@FreeBSD.ORG Mon Nov 4 17:51:24 2013 Return-Path: Delivered-To: freebsd-scsi@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id 5D338ADE for ; Mon, 4 Nov 2013 17:51:24 +0000 (UTC) (envelope-from chuck@tuffli.net) Received: from mail-wi0-f180.google.com (mail-wi0-f180.google.com [209.85.212.180]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id E990726C7 for ; Mon, 4 Nov 2013 17:51:23 +0000 (UTC) Received: by mail-wi0-f180.google.com with SMTP id ey11so905101wid.13 for ; Mon, 04 Nov 2013 09:51:16 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:date:message-id:subject:from:to :content-type; bh=veaBbeAETkXz6lxugxRM7R0xeMabt7FhI5NuP5ZfUWA=; b=ZGpz16qa9gTS7yzlWFK133rfqQABYXjoyU+Bi72zmFR/jk8GRybY7rPSKGoYoGOlIV WU5U0Ojs+qyRt9n8rAqVSP6Y038Ntz+RMVdVrk881rOycat+Eyeqt2a+ZWUGvyjgybSx Y5thPrT1EmPfu8OzFq17KhvsH7pLDiwbB9yhMMAZqyG4TH3pGfmdJq+1TthS54sppIUs pESdZ5gaiKOM0fOq77hqE2nUZh7BtDw6VXegJRN37oaHedjEVIQY+Ra+JcvzRz0/mR57 ISnHQjPushsqCil+guZ0MdeZGg3B+YK5JvP6cdqtxj4VRIDHEeyxdKarhpj/HZPeZOMi 4acg== X-Gm-Message-State: ALoCoQmX7WExaLL/nvZZIb0duyEvlpwk67fZzKKZPHth2PhUPPjje/zu/nt9nYwe/0xjdQz0Af8m MIME-Version: 1.0 X-Received: by 10.194.11.67 with SMTP id o3mr14379511wjb.0.1383587476015; Mon, 04 Nov 2013 09:51:16 -0800 (PST) Received: by 10.194.38.41 with HTTP; Mon, 4 Nov 2013 09:51:15 -0800 (PST) Date: Mon, 4 Nov 2013 09:51:15 -0800 Message-ID: Subject: Advice on supporting 9.x / 10.x CAM driver From: Chuck Tuffli To: freebsd-scsi Content-Type: text/plain; charset=ISO-8859-1 X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 04 Nov 2013 17:51:24 -0000 There was a small-ish change in CCB flags and buffer mapping that occurred between 9-stable and 10 that at first glance prevents a 9.x driver from compiling on a 10.x system. All of which is fine as this is a major release. What I'm curious about is have others come up with a strategy to support their drivers on both 9.x and 10.x? If so, how are you managing this? Different branches under a VCS? #ifdef macros? Some sort of compatibility shim? TIA! --chuck From owner-freebsd-scsi@FreeBSD.ORG Wed Nov 6 16:12:32 2013 Return-Path: Delivered-To: freebsd-scsi@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id 511DD4E2 for ; Wed, 6 Nov 2013 16:12:32 +0000 (UTC) (envelope-from sean_bruno@yahoo.com) Received: from nm1-vm1.bullet.mail.bf1.yahoo.com (nm1-vm1.bullet.mail.bf1.yahoo.com [98.139.213.163]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id E65622A40 for ; Wed, 6 Nov 2013 16:12:31 +0000 (UTC) Received: from [98.139.215.141] by nm1.bullet.mail.bf1.yahoo.com with NNFMP; 06 Nov 2013 16:12:24 -0000 Received: from [98.139.211.201] by tm12.bullet.mail.bf1.yahoo.com with NNFMP; 06 Nov 2013 16:12:24 -0000 Received: from [127.0.0.1] by smtp210.mail.bf1.yahoo.com with NNFMP; 06 Nov 2013 16:12:24 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s1024; t=1383754344; bh=SFeJ2bPC+DWBwcf2DbwZ4fLxJmAWH0s7iG4FaP3v5HQ=; h=X-Yahoo-Newman-Id:X-Yahoo-Newman-Property:X-YMail-OSG:X-Yahoo-SMTP:X-Rocket-Received:Subject:From:Reply-To:To:Cc:In-Reply-To:References:Content-Type:Date:Message-ID:Mime-Version:X-Mailer; b=OdmqC4jKGP8jeghpO+fOoRTavm7iREviZpQMptiX+ghIiVKR0HoYTB+wxq8dqOJViCi+yU0FAv5OA0md8/q0tXvchDVCvvl9xTR5j1pxdE9YqaVRFaSWxC5Fr6CfqPyhh3Ltirdo13AudJXi0DCu7EZaCEwuFSoydAMLiK34/cI= X-Yahoo-Newman-Id: 745431.65902.bm@smtp210.mail.bf1.yahoo.com X-Yahoo-Newman-Property: ymail-3 X-YMail-OSG: nd5.fdUVM1lWl5QpDiVSfIrNdCPA8WKiIvm1Puli0PD7IFx jm9b2XAmh4yYzn5xWKHP0sMvL0cBtxm7LKdKq3vkHE3QM7se5c21Q84S1USy vyviV8VhdswwHvq6.pU0Z1ZiNJvL3zjZHnRsWkMspXxWljRmCay_6ego1GO5 d0_ADziWh9g8_AFAQvBh8.DMQHDzzz_IGgZrLwfglKV.Gyrzra9lmMMDktRa elAaWB1KGTyOGb4ebVX5KtntFnLfBWdjE27JO.WoQLYOC3IWlssCwkpApcEt t_g9uJ3K3VgP4nHerNPJcR6qarw_8EboplstqgO2cIS2iTYkulm4cBrKaYYx DqrqJlam8reoP5PTS.cYqpF_vfsOetY0dsVLAd1_h_jusa348viPFeiIBGN. N6tf8NPtKcZ9UZPHxXD8mj7YAyJuJ9fb5140s63DoHjSjEkKubKB4FRrR0Xf S52rqbWKzFnNb3O3tC1anbylsDMCKcoHutelJAvuljE_5diWT1vh4VdGDJLJ nqG7DZqT8mkU6Ytdp8ZWPxgCF3ohKrCJRorM5WpEXshh9qFWB_uM6Ww-- X-Yahoo-SMTP: u5BKR6OswBC_iZJVfGRoMkTIpc8pEA4- X-Rocket-Received: from [192.168.100.228] (sean_bruno@24.23.220.111 with ) by smtp210.mail.bf1.yahoo.com with SMTP; 06 Nov 2013 08:12:24 -0800 PST Subject: Re: Advice on supporting 9.x / 10.x CAM driver From: Sean Bruno To: Chuck Tuffli In-Reply-To: References: Content-Type: multipart/signed; micalg="pgp-sha1"; protocol="application/pgp-signature"; boundary="=-8QEyD02Big1G16M7TCo+" Date: Wed, 06 Nov 2013 08:12:20 -0800 Message-ID: <1383754340.52387.9.camel@localhost> Mime-Version: 1.0 X-Mailer: Evolution 2.32.1 FreeBSD GNOME Team Port Cc: freebsd-scsi X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list Reply-To: sbruno@freebsd.org List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 06 Nov 2013 16:12:32 -0000 --=-8QEyD02Big1G16M7TCo+ Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable On Mon, 2013-11-04 at 09:51 -0800, Chuck Tuffli wrote: >=20 > There was a small-ish change in CCB flags and buffer mapping that > occurred between 9-stable and 10 that at first glance prevents a 9.x > driver from compiling on a 10.x system. All of which is fine as this > is a major release. >=20 > What I'm curious about is have others come up with a strategy to > support their drivers on both 9.x and 10.x? If so, how are you > managing this? Different branches under a VCS? #ifdef macros? Some > sort of compatibility shim? TIA!=20 Mostly, what I've seen is checks for the FreeBSD Version to determine what code is executed. =20 Sean --=-8QEyD02Big1G16M7TCo+ Content-Type: application/pgp-signature; name="signature.asc" Content-Description: This is a digitally signed message part -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.15 (FreeBSD) iQEcBAABAgAGBQJSempkAAoJEBkJRdwI6BaHj4UIAJAYaxV4OeP8hZY0YSiAZSWO F6E2ffoeqxvLf6zk7VakzNZcPSDHMLW+WrvGCfjVyLFespuK8izxjP7N9rUMAwJJ 4Ys9u6Z9Qs3MgAGgMQPQ5sp/US5bOUtaVkPqK6glEYkBjKfW4K0eqXYqhZmYomtD 2fGrf41dwV5z4ZmIuXaU68hTG1byzZhFQSCu287UROsrqMSvd7qtuHb2z7JOTZWo WCwp4HLHRD6wSag7T2memRAbNapY97boT34z8JV3dYh7iIllw2I7y8bMH1t0NMlK e7AEVH4jPDzmn0sG7ndMhprkGBnbGMXsBQQnqEyKtela88RkUnmvaXpwYn8Tlik= =aJFE -----END PGP SIGNATURE----- --=-8QEyD02Big1G16M7TCo+-- From owner-freebsd-scsi@FreeBSD.ORG Wed Nov 6 17:01:58 2013 Return-Path: Delivered-To: freebsd-scsi@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id 78BFBB22; Wed, 6 Nov 2013 17:01:58 +0000 (UTC) (envelope-from cowens@greatbaysoftware.com) Received: from p3plsmtpa08-06.prod.phx3.secureserver.net (p3plsmtpa08-06.prod.phx3.secureserver.net [173.201.193.107]) by mx1.freebsd.org (Postfix) with ESMTP id 1CF492D5C; Wed, 6 Nov 2013 17:01:57 +0000 (UTC) Received: from jack.bspruce.com ([173.14.128.81]) by p3plsmtpa08-06.prod.phx3.secureserver.net with id mH1v1m00D1lWJaM01H1vZM; Wed, 06 Nov 2013 10:01:57 -0700 Message-ID: <527A7603.7090303@greatbaysoftware.com> Date: Wed, 06 Nov 2013 12:01:55 -0500 From: Charles Owens Organization: Great Bay Software User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:24.0) Gecko/20100101 Thunderbird/24.1.0 MIME-Version: 1.0 To: Mark Johnston Subject: Re: adding BBU relearn support to mfiutil References: <20130304033836.GA33631@oddish> <1365196956.17311.13.camel@localhost> <20130406000809.GA96223@raichu> In-Reply-To: <20130406000809.GA96223@raichu> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Content-Filtered-By: Mailman/MimeDel 2.1.14 Cc: freebsd-scsi@freebsd.org, Steve McCoy X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 06 Nov 2013 17:01:58 -0000 Hi, we've been playing with this patch in the context of 8.4-RELEASE-p4 (we extracted r250483 and r250497 from stable/8 and applied to releng/8.4). I'm seeing some results that make me question whether or not caching is really working correctly after a BBU relearn operation has completed -- or maybe whether or not the new BBU patch is talking to LSI controller properly. Our test system had a BBU in the failed state (relearn needed). We used the "start learn command" and it seemed to go well, but strangely, when process is seems to have completed, and now several days later, status is still LEARN_CYCLE_REQUESTED (as seen with "mfiutil show battery"). This may be entirely normal -- maybe it says that because the autolearn feature is now enabled? The "cache" status command also suggests also is a bit strange. Here is the raw output of these status commands: # mfiutil cache mfid0 mfi0 volume mfid0 cache settings: I/O caching: disabled write caching: write-back write cache with bad BBU: disabled read ahead: adaptive drive write cache: enabled Cache disabled due to dead battery or ongoing battery relearn # ./mfiutil show battery mfi0: Battery State: Manufacture Date: 3/18/2010 Serial Number: 77 Manufacturer: LS1111001A Model: 3598501 Chemistry: LION Design Capacity: 1215 mAh Full Charge Capacity: 65262 mAh Current Capacity: 61543 mAh Charge Cycles: 120 Current Charge: 94% Design Voltage: 3700 mV Current Voltage: 4081 mV Temperature: 23 C Autolearn period: 30 days Next learn time: Tue Nov 26 20:06:40 2013 Learn delay interval: 0 hours Autolearn mode: enabled Status: LEARN_CYCLE_REQUESTED /Why does cache status now say "Cache disabled due to dead battery or ongoing battery relearn"/? Shouldn't this no longer be the case since I've run the "learn" operation? Does this indicate that the I/O caching is really disabled? I'd appreciate any and all assistance. Here's a bit of other info that might be of interest: # mfiutil show adapter mfi0 Adapter: Product Name: Integrated Intel(R) RAID Controller SROMBSASMP2 Serial Number: Firmware: 11.0.1-0036 RAID Levels: JBOD, RAID0, RAID1, RAID5, RAID6, RAID10, RAID50 Battery Backup: present NVRAM: 32K Onboard Memory: 512M Minimum Stripe: 8k Maximum Stripe: 1M # mfiutil show drives mfi0 Physical Drives: 1 ( 136G) ONLINE SAS E1:S0 2 ( 136G) ONLINE SAS E1:S1 3 ( 136G) ONLINE SAS E1:S4 4 ( 136G) ONLINE SAS E1:S2 5 ( 136G) HOT SPARE SAS E1:S3 The storage volume is 4-drives, RAID10. System has 16GB RAM, dual Xeon E5530 CPUs, on an Intel S5520UR motherboard. Thanks! Charles Owens Great Bay Software On Fri Apr 5 20:08:09 2013, Mark Johnston wrote: > > On Fri, Apr 05, 2013 at 02:22:36PM -0700, Sean Bruno wrote: >> >> On Sun, 2013-03-03 at 22:38 -0500, Mark Johnston wrote: >>> >>> Hi Everyone, >>> >>> I recently needed to add a couple of features to mfiutil related to BBU >>> relearning. I've pasted a patch below which >>> >>> 1. adds extra fields to the output of "mfiutil show battery" showing BBU >>> properties. This is essentially the output of >>> >>> # MegaCli -AdpBbuInfo -GetBbuProperties -aLL >>> >>> and consists of info about battery learning: the learn period, the >>> time at which the controller will start the next relearn, and the BBU >>> mode (which indicates whether the battery supports transparent >>> relearning). >>> >>> 2. adds a couple of subcommands under "mfiutil bbu" which lets users set >>> the BBU properties which can be set by MegaCli. >>> >>> 3. adds a command "mfiutil start learn" which immediately kicks off a >>> battery relearn. >>> >>> These changes grew out of concern about the fact that the controller >>> write cache is set to write-through mode during a relearn period (which >>> usually lasts for several hours). This ended up causing some mysterious >>> and intermittent performance issues, so I needed a way of getting more >>> info about what was going on (using MegaCli isn't really an option for >>> several reasons). Some BBUs support transparent relearning, which >>> basically means that the controller write cache doesn't get turned off >>> during a relearn. However, LSI's default config doesn't enable it, and >>> now mfiutil can be used to do that (through "mfiutil bbu bbu-mode"). >>> >>> I was hoping someone would be able to review the patch. If anyone's able >>> and willing to test it, I'd very much appreciate feedback from that. >>> >>> Thanks! >>> -Mark >> >> >> Just to document for the record. Finally got around to testing this >> today with Mark providing updates. Looks good overall with a couple of >> nits that he is handling at the moment (man page and variable name >> collision). > > > The updated patch is here: > http://people.freebsd.org/~markj/patches/20130405-mfi-bbu.diff > > I'll commit it in a few days if there aren't any problems. > > Thanks, > -Mark > _______________________________________________ > freebsd-scsi@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-scsi > To unsubscribe, send any mail to "freebsd-scsi-unsubscribe@freebsd.org" > > > From owner-freebsd-scsi@FreeBSD.ORG Wed Nov 6 22:04:14 2013 Return-Path: Delivered-To: freebsd-scsi@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id 1275F96F; Wed, 6 Nov 2013 22:04:14 +0000 (UTC) (envelope-from markjdb@gmail.com) Received: from mail-ie0-x22a.google.com (mail-ie0-x22a.google.com [IPv6:2607:f8b0:4001:c03::22a]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id CBF442254; Wed, 6 Nov 2013 22:04:13 +0000 (UTC) Received: by mail-ie0-f170.google.com with SMTP id at1so251846iec.1 for ; Wed, 06 Nov 2013 14:04:12 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=sender:date:from:to:cc:subject:message-id:references:mime-version :content-type:content-disposition:in-reply-to:user-agent; bh=wLkVMSXlSvwDbOE4hx+9QkD+fSCogt9V80siYa43+WE=; b=cxYdMa/dN8eUoUiag2T1o3WkzwHc8ti5YgH4RZsqwWWYWfmE+Ttv884z4jJhgve9rJ srmbnzdRL7HUpjK06cbQe+/6Zb0BBSYIGJhk2OduTbbgS4+K6k7vderXuONpcAlxwlfg DJsCFUnoQUgi++OCTXbd95T58wAR9ehzF62pWnCTK27rntD0VS6qtslf39BCPX1GZSTC bRI5zo4k0bfy7djjt+xUIOcucTylK8ITy3qK5JVl653NXvY11l7+9cwedBR/SJzI1Fuo 5EaCxyjN/q1+7JBg4MW+ZHdwwaf7tq9vk+lOwQo86I9y9kckXCEbUTS+TRftgiSr0ieY eXLQ== X-Received: by 10.50.29.4 with SMTP id f4mr22183266igh.11.1383775452566; Wed, 06 Nov 2013 14:04:12 -0800 (PST) Received: from charmander.sandvine.com ([64.7.137.182]) by mx.google.com with ESMTPSA id m1sm16292190igj.10.2013.11.06.14.04.10 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Wed, 06 Nov 2013 14:04:11 -0800 (PST) Sender: Mark Johnston Date: Wed, 6 Nov 2013 18:03:57 -0500 From: Mark Johnston To: Charles Owens Subject: Re: adding BBU relearn support to mfiutil Message-ID: <20131106230356.GA86666@charmander.sandvine.com> References: <20130304033836.GA33631@oddish> <1365196956.17311.13.camel@localhost> <20130406000809.GA96223@raichu> <527A7603.7090303@greatbaysoftware.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <527A7603.7090303@greatbaysoftware.com> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-scsi@freebsd.org, Steve McCoy X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 06 Nov 2013 22:04:14 -0000 On Wed, Nov 06, 2013 at 12:01:55PM -0500, Charles Owens wrote: > Hi, we've been playing with this patch in the context of 8.4-RELEASE-p4 > (we extracted r250483 and r250497 from stable/8 and applied to > releng/8.4). I'm seeing some results that make me question whether or > not caching is really working correctly after a BBU relearn operation > has completed -- or maybe whether or not the new BBU patch is talking to > LSI controller properly. > > Our test system had a BBU in the failed state (relearn needed). We used > the "start learn command" and it seemed to go well, but strangely, when > process is seems to have completed, and now several days later, status > is still LEARN_CYCLE_REQUESTED (as seen with "mfiutil show battery"). > This may be entirely normal -- maybe it says that because the autolearn > feature is now enabled? I suspect that the status is bogus and that the battery is in fact dead. There seem to be a few firmware bugs in the BBU status reporting, at least with iBBU07. In your output below, I see: Design Capacity: 1215 mAh Full Charge Capacity: 65262 mAh Current Capacity: 61543 mAh which clearly isn't right. I've seen this problem before as well: over time, the full charge capacity decreases, and eventually it seems to wrap around to 65535. MegaCli (LSI's binary RAID management tool) reports exactly the same thing, so it's a problem with the controller firmware. If you look at MegaCli output you get things like "Absolute charge: 6000%". So I suspect that the status is incorrect as well; when I've run into this problem, I still see "status: normal". > > The "cache" status command also suggests also is a bit strange. Here is > the raw output of these status commands: > > # mfiutil cache mfid0 > mfi0 volume mfid0 cache settings: > I/O caching: disabled > write caching: write-back > write cache with bad BBU: disabled > read ahead: adaptive > drive write cache: enabled > Cache disabled due to dead battery or ongoing battery relearn > > > # ./mfiutil show battery > mfi0: Battery State: > Manufacture Date: 3/18/2010 > Serial Number: 77 > Manufacturer: LS1111001A > Model: 3598501 > Chemistry: LION > Design Capacity: 1215 mAh > Full Charge Capacity: 65262 mAh > Current Capacity: 61543 mAh > Charge Cycles: 120 > Current Charge: 94% > Design Voltage: 3700 mV > Current Voltage: 4081 mV > Temperature: 23 C > Autolearn period: 30 days > Next learn time: Tue Nov 26 20:06:40 2013 > Learn delay interval: 0 hours > Autolearn mode: enabled > Status: LEARN_CYCLE_REQUESTED > > > /Why does cache status now say "Cache disabled due to dead battery or > ongoing battery relearn"/? Shouldn't this no longer be the case since > I've run the "learn" operation? Does this indicate that the I/O caching > is really disabled? I believe so. You can try changing the write caching policy to write-back with bad BBU and see if that re-enables the cache. If it does, that's more evidence that the BBU is dead and needs to be replaced. > > I'd appreciate any and all assistance. Here's a bit of other info that > might be of interest: > > # mfiutil show adapter > mfi0 Adapter: > Product Name: Integrated Intel(R) RAID Controller SROMBSASMP2 > Serial Number: > Firmware: 11.0.1-0036 > RAID Levels: JBOD, RAID0, RAID1, RAID5, RAID6, RAID10, RAID50 > Battery Backup: present > NVRAM: 32K > Onboard Memory: 512M > Minimum Stripe: 8k > Maximum Stripe: 1M > > # mfiutil show drives > mfi0 Physical Drives: > 1 ( 136G) ONLINE SAS E1:S0 > 2 ( 136G) ONLINE SAS E1:S1 > 3 ( 136G) ONLINE SAS E1:S4 > 4 ( 136G) ONLINE SAS E1:S2 > 5 ( 136G) HOT SPARE SAS E1:S3 > > The storage volume is 4-drives, RAID10. System has 16GB RAM, dual Xeon > E5530 CPUs, on an Intel S5520UR motherboard. It might be useful to check the output of "mfiutil show events -c info". > > Thanks! > > Charles Owens > Great Bay Software > > > > On Fri Apr 5 20:08:09 2013, Mark Johnston wrote: > > > > On Fri, Apr 05, 2013 at 02:22:36PM -0700, Sean Bruno wrote: > >> > >> On Sun, 2013-03-03 at 22:38 -0500, Mark Johnston wrote: > >>> > >>> Hi Everyone, > >>> > >>> I recently needed to add a couple of features to mfiutil related to BBU > >>> relearning. I've pasted a patch below which > >>> > >>> 1. adds extra fields to the output of "mfiutil show battery" showing BBU > >>> properties. This is essentially the output of > >>> > >>> # MegaCli -AdpBbuInfo -GetBbuProperties -aLL > >>> > >>> and consists of info about battery learning: the learn period, the > >>> time at which the controller will start the next relearn, and the BBU > >>> mode (which indicates whether the battery supports transparent > >>> relearning). > >>> > >>> 2. adds a couple of subcommands under "mfiutil bbu" which lets users set > >>> the BBU properties which can be set by MegaCli. > >>> > >>> 3. adds a command "mfiutil start learn" which immediately kicks off a > >>> battery relearn. > >>> > >>> These changes grew out of concern about the fact that the controller > >>> write cache is set to write-through mode during a relearn period (which > >>> usually lasts for several hours). This ended up causing some mysterious > >>> and intermittent performance issues, so I needed a way of getting more > >>> info about what was going on (using MegaCli isn't really an option for > >>> several reasons). Some BBUs support transparent relearning, which > >>> basically means that the controller write cache doesn't get turned off > >>> during a relearn. However, LSI's default config doesn't enable it, and > >>> now mfiutil can be used to do that (through "mfiutil bbu bbu-mode"). > >>> > >>> I was hoping someone would be able to review the patch. If anyone's able > >>> and willing to test it, I'd very much appreciate feedback from that. > >>> > >>> Thanks! > >>> -Mark > >> > >> > >> Just to document for the record. Finally got around to testing this > >> today with Mark providing updates. Looks good overall with a couple of > >> nits that he is handling at the moment (man page and variable name > >> collision). > > > > > > The updated patch is here: > > http://people.freebsd.org/~markj/patches/20130405-mfi-bbu.diff > > > > I'll commit it in a few days if there aren't any problems. > > > > Thanks, > > -Mark > > _______________________________________________ > > freebsd-scsi@freebsd.org mailing list > > http://lists.freebsd.org/mailman/listinfo/freebsd-scsi > > To unsubscribe, send any mail to "freebsd-scsi-unsubscribe@freebsd.org" > > > > > > From owner-freebsd-scsi@FreeBSD.ORG Thu Nov 7 02:02:15 2013 Return-Path: Delivered-To: freebsd-scsi@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id E841763C for ; Thu, 7 Nov 2013 02:02:15 +0000 (UTC) (envelope-from david@gwynne.id.au) Received: from mail-pb0-f42.google.com (mail-pb0-f42.google.com [209.85.160.42]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id BD37320C1 for ; Thu, 7 Nov 2013 02:02:15 +0000 (UTC) Received: by mail-pb0-f42.google.com with SMTP id jt11so364032pbb.15 for ; Wed, 06 Nov 2013 18:02:14 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:content-type:mime-version:subject:from :in-reply-to:date:cc:content-transfer-encoding:message-id:references :to; bh=CG/pPfuTgwYlG+swAZLbmmV3yo3AVYygY3jkvZVhU/s=; b=U4CW84JmglvYHcM2Mimy6kJA4XecZF701XxaR2FFFS7s4TdnB3d2NsQPnTjU3eKta9 /ktCRdaag8fGoyX7x60/zaKwGonj24YGsYxMHrhHagKFVmPnIeDvAbaQ9EUMq9Yi5MnW 16vufNYZMZArmu2TbiffnEenvJ/1W3GqWv3hxYbFAx9yUmnWzVM9CCqXausRj3OP7Itd JBSnodi2tThNCJV3BxoIp7L6pRh8+egsAbHglNeahyE/E7rYQeQeD2tmfYOTFvHK5khQ 3T6jfdDtLZfFFwXV79xn60Af60UwL65EriykmYrFNR3CYhttSVcvxQNr3Bg2t1LYPmm/ muVQ== X-Gm-Message-State: ALoCoQnbp3rnGOBlDOHFMuBoSA2xadNwBsGAXMDEqXWwZdk/FP80kBDaRjgp8wg9K1J4vTNTo22I X-Received: by 10.68.193.131 with SMTP id ho3mr6285760pbc.81.1383789734476; Wed, 06 Nov 2013 18:02:14 -0800 (PST) Received: from opiate.eait.uq.edu.au (a82-4.nat.uq.edu.au. [130.102.82.4]) by mx.google.com with ESMTPSA id qf7sm1951397pac.14.2013.11.06.18.02.10 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Wed, 06 Nov 2013 18:02:12 -0800 (PST) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 7.0 \(1816\)) Subject: Re: adding BBU relearn support to mfiutil From: David Gwynne In-Reply-To: <20131106230356.GA86666@charmander.sandvine.com> Date: Thu, 7 Nov 2013 12:02:07 +1000 Content-Transfer-Encoding: quoted-printable Message-Id: <7351EE9D-4250-450F-9D1F-57E12102B6B2@gwynne.id.au> References: <20130304033836.GA33631@oddish> <1365196956.17311.13.camel@localhost> <20130406000809.GA96223@raichu> <527A7603.7090303@greatbaysoftware.com> <20131106230356.GA86666@charmander.sandvine.com> To: Mark Johnston X-Mailer: Apple Mail (2.1816) Cc: Steve McCoy , freebsd-scsi@freebsd.org X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 07 Nov 2013 02:02:16 -0000 On 7 Nov 2013, at 9:03 am, Mark Johnston wrote: > On Wed, Nov 06, 2013 at 12:01:55PM -0500, Charles Owens wrote: >> Hi, we've been playing with this patch in the context of = 8.4-RELEASE-p4=20 >> (we extracted r250483 and r250497 from stable/8 and applied to=20 >> releng/8.4). I'm seeing some results that make me question whether = or=20 >> not caching is really working correctly after a BBU relearn operation=20= >> has completed -- or maybe whether or not the new BBU patch is talking = to=20 >> LSI controller properly. >>=20 >> Our test system had a BBU in the failed state (relearn needed). We = used=20 >> the "start learn command" and it seemed to go well, but strangely, = when=20 >> process is seems to have completed, and now several days later, = status=20 >> is still LEARN_CYCLE_REQUESTED (as seen with "mfiutil show battery"). = =20 >> This may be entirely normal -- maybe it says that because the = autolearn=20 >> feature is now enabled? >=20 > I suspect that the status is bogus and that the battery is in fact = dead. > There seem to be a few firmware bugs in the BBU status reporting, at > least with iBBU07. In your output below, I see: >=20 > Design Capacity: 1215 mAh > Full Charge Capacity: 65262 mAh > Current Capacity: 61543 mAh >=20 > which clearly isn't right. I've seen this problem before as well: over > time, the full charge capacity decreases, and eventually it seems to > wrap around to 65535. MegaCli (LSI's binary RAID management tool) = reports > exactly the same thing, so it's a problem with the controller = firmware. > If you look at MegaCli output you get things like "Absolute charge: = 6000%". > So I suspect that the status is incorrect as well; when I've run into > this problem, I still see "status: normal". >=20 ive been staring at bbus on dell perc5s and perc6s recently after we had = a bunch of bbus get too old. i havent seen the full charge or current capacity values wrap, but what = i did figure out is that the write cache wont be enabled if the SOH flag = is set in whats reported by the BBU STATE response. the SOH flag seems = to either be based on whether the firmware thinks the battery will last = a reasonable amount of time (like 72h or something), or whether the bbu = full capacity is above 30% of its design capacity. either way, the reality is that batteries degrade and need to be = replaced. the nearly four year old battery that has gone through 120 = learn cycles in your output below is what i call a good candidate for = replacement. later megaraid firmwares (well, firmwares on later megaraids) have more = status bits that clearly indicate whether the firmware wants you to = replace the battery. it takes an annoying amount of interpretation on = the older ones. dlg >>=20 >> The "cache" status command also suggests also is a bit strange. Here = is=20 >> the raw output of these status commands: >>=20 >> # mfiutil cache mfid0 >> mfi0 volume mfid0 cache settings: >> I/O caching: disabled >> write caching: write-back >> write cache with bad BBU: disabled >> read ahead: adaptive >> drive write cache: enabled >> Cache disabled due to dead battery or ongoing battery relearn >>=20 >>=20 >> # ./mfiutil show battery >> mfi0: Battery State: >> Manufacture Date: 3/18/2010 >> Serial Number: 77 >> Manufacturer: LS1111001A >> Model: 3598501 >> Chemistry: LION >> Design Capacity: 1215 mAh >> Full Charge Capacity: 65262 mAh >> Current Capacity: 61543 mAh >> Charge Cycles: 120 >> Current Charge: 94% >> Design Voltage: 3700 mV >> Current Voltage: 4081 mV >> Temperature: 23 C >> Autolearn period: 30 days >> Next learn time: Tue Nov 26 20:06:40 2013 >> Learn delay interval: 0 hours >> Autolearn mode: enabled >> Status: LEARN_CYCLE_REQUESTED >>=20 >>=20 >> /Why does cache status now say "Cache disabled due to dead battery = or=20 >> ongoing battery relearn"/? Shouldn't this no longer be the case = since=20 >> I've run the "learn" operation? Does this indicate that the I/O = caching=20 >> is really disabled? >=20 > I believe so. You can try changing the write caching policy to = write-back > with bad BBU and see if that re-enables the cache. If it does, that's > more evidence that the BBU is dead and needs to be replaced. >=20 >>=20 >> I'd appreciate any and all assistance. Here's a bit of other info = that=20 >> might be of interest: >>=20 >> # mfiutil show adapter >> mfi0 Adapter: >> Product Name: Integrated Intel(R) RAID Controller SROMBSASMP2 >> Serial Number: >> Firmware: 11.0.1-0036 >> RAID Levels: JBOD, RAID0, RAID1, RAID5, RAID6, RAID10, RAID50 >> Battery Backup: present >> NVRAM: 32K >> Onboard Memory: 512M >> Minimum Stripe: 8k >> Maximum Stripe: 1M >>=20 >> # mfiutil show drives >> mfi0 Physical Drives: >> 1 ( 136G) ONLINE = SAS E1:S0 >> 2 ( 136G) ONLINE = SAS E1:S1 >> 3 ( 136G) ONLINE = SAS E1:S4 >> 4 ( 136G) ONLINE = SAS E1:S2 >> 5 ( 136G) HOT SPARE = SAS E1:S3 >>=20 >> The storage volume is 4-drives, RAID10. System has 16GB RAM, dual = Xeon=20 >> E5530 CPUs, on an Intel S5520UR motherboard. >=20 > It might be useful to check the output of "mfiutil show events -c = info". >=20 >>=20 >> Thanks! >>=20 >> Charles Owens >> Great Bay Software >>=20 >>=20 >>=20 >> On Fri Apr 5 20:08:09 2013, Mark Johnston wrote: >>>=20 >>> On Fri, Apr 05, 2013 at 02:22:36PM -0700, Sean Bruno wrote: >>>>=20 >>>> On Sun, 2013-03-03 at 22:38 -0500, Mark Johnston wrote: >>>>>=20 >>>>> Hi Everyone, >>>>>=20 >>>>> I recently needed to add a couple of features to mfiutil related = to BBU >>>>> relearning. I've pasted a patch below which >>>>>=20 >>>>> 1. adds extra fields to the output of "mfiutil show battery" = showing BBU >>>>> properties. This is essentially the output of >>>>>=20 >>>>> # MegaCli -AdpBbuInfo -GetBbuProperties -aLL >>>>>=20 >>>>> and consists of info about battery learning: the learn period, the >>>>> time at which the controller will start the next relearn, and the = BBU >>>>> mode (which indicates whether the battery supports transparent >>>>> relearning). >>>>>=20 >>>>> 2. adds a couple of subcommands under "mfiutil bbu" which lets = users set >>>>> the BBU properties which can be set by MegaCli. >>>>>=20 >>>>> 3. adds a command "mfiutil start learn" which immediately kicks = off a >>>>> battery relearn. >>>>>=20 >>>>> These changes grew out of concern about the fact that the = controller >>>>> write cache is set to write-through mode during a relearn period = (which >>>>> usually lasts for several hours). This ended up causing some = mysterious >>>>> and intermittent performance issues, so I needed a way of getting = more >>>>> info about what was going on (using MegaCli isn't really an option = for >>>>> several reasons). Some BBUs support transparent relearning, which >>>>> basically means that the controller write cache doesn't get turned = off >>>>> during a relearn. However, LSI's default config doesn't enable it, = and >>>>> now mfiutil can be used to do that (through "mfiutil bbu = bbu-mode"). >>>>>=20 >>>>> I was hoping someone would be able to review the patch. If = anyone's able >>>>> and willing to test it, I'd very much appreciate feedback from = that. >>>>>=20 >>>>> Thanks! >>>>> -Mark >>>>=20 >>>>=20 >>>> Just to document for the record. Finally got around to testing this >>>> today with Mark providing updates. Looks good overall with a couple = of >>>> nits that he is handling at the moment (man page and variable name >>>> collision). >>>=20 >>>=20 >>> The updated patch is here: >>> http://people.freebsd.org/~markj/patches/20130405-mfi-bbu.diff >>>=20 >>> I'll commit it in a few days if there aren't any problems. >>>=20 >>> Thanks, >>> -Mark >>> _______________________________________________ >>> freebsd-scsi@freebsd.org mailing list >>> http://lists.freebsd.org/mailman/listinfo/freebsd-scsi >>> To unsubscribe, send any mail to = "freebsd-scsi-unsubscribe@freebsd.org" >>>=20 >>>=20 >>>=20 > _______________________________________________ > freebsd-scsi@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-scsi > To unsubscribe, send any mail to = "freebsd-scsi-unsubscribe@freebsd.org" From owner-freebsd-scsi@FreeBSD.ORG Thu Nov 7 17:57:59 2013 Return-Path: Delivered-To: freebsd-scsi@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id 7F5C7D26; Thu, 7 Nov 2013 17:57:59 +0000 (UTC) (envelope-from cowens@greatbaysoftware.com) Received: from p3plsmtpa11-07.prod.phx3.secureserver.net (p3plsmtpa11-07.prod.phx3.secureserver.net [68.178.252.108]) by mx1.freebsd.org (Postfix) with ESMTP id B6C7B21CA; Thu, 7 Nov 2013 17:57:58 +0000 (UTC) Received: from jack.bspruce.com ([174.62.183.95]) by p3plsmtpa11-07.prod.phx3.secureserver.net with id mhwG1m00S23uTxa01hwHq1; Thu, 07 Nov 2013 10:56:18 -0700 Message-ID: <527BD440.8010701@greatbaysoftware.com> Date: Thu, 07 Nov 2013 12:56:16 -0500 From: Charles Owens Organization: Great Bay Software User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:24.0) Gecko/20100101 Thunderbird/24.1.0 MIME-Version: 1.0 To: Mark Johnston Subject: Re: adding BBU relearn support to mfiutil References: <20130304033836.GA33631@oddish> <1365196956.17311.13.camel@localhost> <20130406000809.GA96223@raichu> <527A7603.7090303@greatbaysoftware.com> <20131106230356.GA86666@charmander.sandvine.com> In-Reply-To: <20131106230356.GA86666@charmander.sandvine.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Content-Filtered-By: Mailman/MimeDel 2.1.14 Cc: Jason Damron , freebsd-scsi@freebsd.org, Steve McCoy X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 07 Nov 2013 17:57:59 -0000 On 11/6/13 6:03 PM, Mark Johnston wrote: > On Wed, Nov 06, 2013 at 12:01:55PM -0500, Charles Owens wrote: >> Hi, we've been playing with this patch in the context of 8.4-RELEASE-p4 >> (we extracted r250483 and r250497 from stable/8 and applied to >> releng/8.4). I'm seeing some results that make me question whether or >> not caching is really working correctly after a BBU relearn operation >> has completed -- or maybe whether or not the new BBU patch is talking to >> LSI controller properly. >> >> Our test system had a BBU in the failed state (relearn needed). We used >> the "start learn command" and it seemed to go well, but strangely, when >> process is seems to have completed, and now several days later, status >> is still LEARN_CYCLE_REQUESTED (as seen with "mfiutil show battery"). >> This may be entirely normal -- maybe it says that because the autolearn >> feature is now enabled? > I suspect that the status is bogus and that the battery is in fact dead. > There seem to be a few firmware bugs in the BBU status reporting, at > least with iBBU07. In your output below, I see: > > Design Capacity: 1215 mAh > Full Charge Capacity: 65262 mAh > Current Capacity: 61543 mAh > > which clearly isn't right. I've seen this problem before as well: over > time, the full charge capacity decreases, and eventually it seems to > wrap around to 65535. MegaCli (LSI's binary RAID management tool) reports > exactly the same thing, so it's a problem with the controller firmware. > If you look at MegaCli output you get things like "Absolute charge: 6000%". > So I suspect that the status is incorrect as well; when I've run into > this problem, I still see "status: normal". > >> The "cache" status command also suggests also is a bit strange. Here is >> the raw output of these status commands: >> >> # mfiutil cache mfid0 >> mfi0 volume mfid0 cache settings: >> I/O caching: disabled >> write caching: write-back >> write cache with bad BBU: disabled >> read ahead: adaptive >> drive write cache: enabled >> Cache disabled due to dead battery or ongoing battery relearn >> >> >> # ./mfiutil show battery >> mfi0: Battery State: >> Manufacture Date: 3/18/2010 >> Serial Number: 77 >> Manufacturer: LS1111001A >> Model: 3598501 >> Chemistry: LION >> Design Capacity: 1215 mAh >> Full Charge Capacity: 65262 mAh >> Current Capacity: 61543 mAh >> Charge Cycles: 120 >> Current Charge: 94% >> Design Voltage: 3700 mV >> Current Voltage: 4081 mV >> Temperature: 23 C >> Autolearn period: 30 days >> Next learn time: Tue Nov 26 20:06:40 2013 >> Learn delay interval: 0 hours >> Autolearn mode: enabled >> Status: LEARN_CYCLE_REQUESTED >> >> >> /Why does cache status now say "Cache disabled due to dead battery or >> ongoing battery relearn"/? Shouldn't this no longer be the case since >> I've run the "learn" operation? Does this indicate that the I/O caching >> is really disabled? > I believe so. You can try changing the write caching policy to write-back > with bad BBU and see if that re-enables the cache. If it does, that's > more evidence that the BBU is dead and needs to be replaced. > >> I'd appreciate any and all assistance. Here's a bit of other info that >> might be of interest: >> >> # mfiutil show adapter >> mfi0 Adapter: >> Product Name: Integrated Intel(R) RAID Controller SROMBSASMP2 >> Serial Number: >> Firmware: 11.0.1-0036 >> RAID Levels: JBOD, RAID0, RAID1, RAID5, RAID6, RAID10, RAID50 >> Battery Backup: present >> NVRAM: 32K >> Onboard Memory: 512M >> Minimum Stripe: 8k >> Maximum Stripe: 1M >> >> # mfiutil show drives >> mfi0 Physical Drives: >> 1 ( 136G) ONLINE SAS E1:S0 >> 2 ( 136G) ONLINE SAS E1:S1 >> 3 ( 136G) ONLINE SAS E1:S4 >> 4 ( 136G) ONLINE SAS E1:S2 >> 5 ( 136G) HOT SPARE SAS E1:S3 >> >> The storage volume is 4-drives, RAID10. System has 16GB RAM, dual Xeon >> E5530 CPUs, on an Intel S5520UR motherboard. > It might be useful to check the output of "mfiutil show events -c info". > > This is good info, thank you. The "show events" command tells us when the battery first was detected as "failed": 49336 (Sun Mar 3 21:53:40 UTC 2013/BATTERY/info) - Battery charge complete 49340 (boot + 4s/BATTERY/info) - Battery Present 49341 (boot + 4s/BATTERY/FATAL) - Battery has failed and cannot support data retention. Please replace the battery 49365 (boot + 45s/BATTERY/WARN) - BBU disabled; changing WB virtual disks to WT 49367 (Mon Mar 4 05:13:09 UTC 2013/BATTERY/info) - Battery temperature is normal So, given this strong indication that the BBU is really dead, and that I'd still like to test the effects of write-caching, I used this command: mfiutil cache mfid0 bad-bbu-write-cache enable Now the "cached disabled" messages is gone: # mfiutil cache mfid0 mfi0 volume mfid0 cache settings: I/O caching: writes write caching: write-back write cache with bad BBU: enabled read ahead: adaptive drive write cache: enabled The obvious interpretation is that write-caching is now operational (in the preferred write-back mode). Strangely, though, my performance tests (with both pgbench and bonnie) still showed no meaningful effect from having the cache operational! I toggled between caching / no-caching with these commands: # mfiutil cache mfid0 writes Setting write cache policy to write-back # mfiutil cache mfid0 disable Disabling caching of I/O writes Again, no difference in performance was seen. On a whim, I also tried write-through mode, and to my surprise, bonnie showed significantly reduced performance! (consistent over multiple samples) This is really confusing. To me it suggests that there's some kind of disconnect between caching-status as seen with mfiutil and caching-status in reality. Chief exhibits being that write-caching appears to have still been happening even: * after the "cache mfid0 disable" command was issued, and * earlier, before the "cache mfid0 bad-bbu-write-cache enable" command was issued (when "mfiutil cache mfid0" still showed "Cache disabled due to dead battery or ongoing battery relearn"). ** If this is the case then it suggests that the system before today was in a dangerous state... actively doing write-back caching with a bad BBU (despite what mfiutil claimed about the cache being disabled)! ** Your thoughts? Is there any other way to explain this? Here is the data from bonnie: ***** write-through caching (2 samples) # bonnie -s 2000 File './Bonnie.1351', size: 2097152000 ... -------Sequential Output-------- ---Sequential Input-- --Random-- -Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks--- Machine MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU /sec %CPU 2000 61515 21.3 46388 4.3 57432 16.0 247823 99.9 1629696 100.0 55687.0 212.4 Machine MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU /sec %CPU 2000 60001 20.7 51828 4.9 51666 13.9 247501 100.0 1657454 100.0 53136.4 251.0 ***** write-back caching (2 samples) -------Sequential Output-------- ---Sequential Input-- --Random-- -Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks--- Machine MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU /sec %CPU 2000 128564 44.6 90065 8.7 245325 47.8 248492 100.0 1558747 99.7 61967.5 179.1 Machine MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU /sec %CPU 2000 184059 64.0 141360 13.8 129801 22.2 246222 99.2 1556723 100.0 51728.4 159.7 (and, again... same performance is seen after issuing "cache disable" command) Thanks much, Charles Owens Great Bay Software From owner-freebsd-scsi@FreeBSD.ORG Thu Nov 7 18:44:23 2013 Return-Path: Delivered-To: freebsd-scsi@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id 8107A96E; Thu, 7 Nov 2013 18:44:23 +0000 (UTC) (envelope-from markjdb@gmail.com) Received: from mail-ie0-x22f.google.com (mail-ie0-x22f.google.com [IPv6:2607:f8b0:4001:c03::22f]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 4547124C9; Thu, 7 Nov 2013 18:44:23 +0000 (UTC) Received: by mail-ie0-f175.google.com with SMTP id aq17so1516323iec.34 for ; Thu, 07 Nov 2013 10:44:22 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=sender:date:from:to:cc:subject:message-id:references:mime-version :content-type:content-disposition:in-reply-to:user-agent; bh=kAqNfY34OM951STru8z00NcmhGV1gGWPIDYURuAw8Do=; b=XC9Hek3r30pG0Pp3bmZDyOmegzPj5HqYnw9rISD8wrdcXgFgbs+PBDDF22Sd3uwKuQ IvVSttQ3u8NmmDK3QTlQyuAuyRz7Gs70c96U+GAjBgmuveY78IPCV4aglUIznbJa7Rd/ Fsbli232/PqwblgCo09/6isePX9VeSL0tMV8vtnwxQ0+01aV2Qh44EurmiI3qKxPYm/X jrAMibyiDntMaH5s+srqJPcawd49R6WHDLglkJIilAeL/DTyQDF2xuye+ZeLdZ+6TZrV pWNxE8e5dr3yyjQwWHP9AqFx/7v71DXgUH/c2yu3ms6wauWG/scG5Ybo1C48yfGhp0Xj Uscg== X-Received: by 10.50.30.42 with SMTP id p10mr3049533igh.5.1383849862471; Thu, 07 Nov 2013 10:44:22 -0800 (PST) Received: from charmander.sandvine.com ([64.7.137.182]) by mx.google.com with ESMTPSA id j16sm21446021igf.6.2013.11.07.10.44.20 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Thu, 07 Nov 2013 10:44:21 -0800 (PST) Sender: Mark Johnston Date: Thu, 7 Nov 2013 14:44:03 -0500 From: Mark Johnston To: Charles Owens Subject: Re: adding BBU relearn support to mfiutil Message-ID: <20131107194402.GA1695@charmander.sandvine.com> References: <20130304033836.GA33631@oddish> <1365196956.17311.13.camel@localhost> <20130406000809.GA96223@raichu> <527A7603.7090303@greatbaysoftware.com> <20131106230356.GA86666@charmander.sandvine.com> <527BD440.8010701@greatbaysoftware.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <527BD440.8010701@greatbaysoftware.com> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: Jason Damron , freebsd-scsi@freebsd.org, Steve McCoy X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 07 Nov 2013 18:44:23 -0000 On Thu, Nov 07, 2013 at 12:56:16PM -0500, Charles Owens wrote: > On 11/6/13 6:03 PM, Mark Johnston wrote: > > On Wed, Nov 06, 2013 at 12:01:55PM -0500, Charles Owens wrote: > >> Hi, we've been playing with this patch in the context of 8.4-RELEASE-p4 > >> (we extracted r250483 and r250497 from stable/8 and applied to > >> releng/8.4). I'm seeing some results that make me question whether or > >> not caching is really working correctly after a BBU relearn operation > >> has completed -- or maybe whether or not the new BBU patch is talking to > >> LSI controller properly. > >> > >> Our test system had a BBU in the failed state (relearn needed). We used > >> the "start learn command" and it seemed to go well, but strangely, when > >> process is seems to have completed, and now several days later, status > >> is still LEARN_CYCLE_REQUESTED (as seen with "mfiutil show battery"). > >> This may be entirely normal -- maybe it says that because the autolearn > >> feature is now enabled? > > I suspect that the status is bogus and that the battery is in fact dead. > > There seem to be a few firmware bugs in the BBU status reporting, at > > least with iBBU07. In your output below, I see: > > > > Design Capacity: 1215 mAh > > Full Charge Capacity: 65262 mAh > > Current Capacity: 61543 mAh > > > > which clearly isn't right. I've seen this problem before as well: over > > time, the full charge capacity decreases, and eventually it seems to > > wrap around to 65535. MegaCli (LSI's binary RAID management tool) reports > > exactly the same thing, so it's a problem with the controller firmware. > > If you look at MegaCli output you get things like "Absolute charge: 6000%". > > So I suspect that the status is incorrect as well; when I've run into > > this problem, I still see "status: normal". > > > >> The "cache" status command also suggests also is a bit strange. Here is > >> the raw output of these status commands: > >> > >> # mfiutil cache mfid0 > >> mfi0 volume mfid0 cache settings: > >> I/O caching: disabled > >> write caching: write-back > >> write cache with bad BBU: disabled > >> read ahead: adaptive > >> drive write cache: enabled > >> Cache disabled due to dead battery or ongoing battery relearn > >> > >> > >> # ./mfiutil show battery > >> mfi0: Battery State: > >> Manufacture Date: 3/18/2010 > >> Serial Number: 77 > >> Manufacturer: LS1111001A > >> Model: 3598501 > >> Chemistry: LION > >> Design Capacity: 1215 mAh > >> Full Charge Capacity: 65262 mAh > >> Current Capacity: 61543 mAh > >> Charge Cycles: 120 > >> Current Charge: 94% > >> Design Voltage: 3700 mV > >> Current Voltage: 4081 mV > >> Temperature: 23 C > >> Autolearn period: 30 days > >> Next learn time: Tue Nov 26 20:06:40 2013 > >> Learn delay interval: 0 hours > >> Autolearn mode: enabled > >> Status: LEARN_CYCLE_REQUESTED > >> > >> > >> /Why does cache status now say "Cache disabled due to dead battery or > >> ongoing battery relearn"/? Shouldn't this no longer be the case since > >> I've run the "learn" operation? Does this indicate that the I/O caching > >> is really disabled? > > I believe so. You can try changing the write caching policy to write-back > > with bad BBU and see if that re-enables the cache. If it does, that's > > more evidence that the BBU is dead and needs to be replaced. > > > >> I'd appreciate any and all assistance. Here's a bit of other info that > >> might be of interest: > >> > >> # mfiutil show adapter > >> mfi0 Adapter: > >> Product Name: Integrated Intel(R) RAID Controller SROMBSASMP2 > >> Serial Number: > >> Firmware: 11.0.1-0036 > >> RAID Levels: JBOD, RAID0, RAID1, RAID5, RAID6, RAID10, RAID50 > >> Battery Backup: present > >> NVRAM: 32K > >> Onboard Memory: 512M > >> Minimum Stripe: 8k > >> Maximum Stripe: 1M > >> > >> # mfiutil show drives > >> mfi0 Physical Drives: > >> 1 ( 136G) ONLINE SAS E1:S0 > >> 2 ( 136G) ONLINE SAS E1:S1 > >> 3 ( 136G) ONLINE SAS E1:S4 > >> 4 ( 136G) ONLINE SAS E1:S2 > >> 5 ( 136G) HOT SPARE SAS E1:S3 > >> > >> The storage volume is 4-drives, RAID10. System has 16GB RAM, dual Xeon > >> E5530 CPUs, on an Intel S5520UR motherboard. > > It might be useful to check the output of "mfiutil show events -c info". > > > > > > This is good info, thank you. > > The "show events" command tells us when the battery first was detected > as "failed": > > 49336 (Sun Mar 3 21:53:40 UTC 2013/BATTERY/info) - Battery charge complete > 49340 (boot + 4s/BATTERY/info) - Battery Present > 49341 (boot + 4s/BATTERY/FATAL) - Battery has failed and cannot support data retention. Please replace the battery > 49365 (boot + 45s/BATTERY/WARN) - BBU disabled; changing WB virtual disks to WT > 49367 (Mon Mar 4 05:13:09 UTC 2013/BATTERY/info) - Battery temperature is normal > > > > So, given this strong indication that the BBU is really dead, and that > I'd still like to test the effects of write-caching, I used this > command: mfiutil cache mfid0 bad-bbu-write-cache enable > > Now the "cached disabled" messages is gone: > > # mfiutil cache mfid0 > mfi0 volume mfid0 cache settings: > I/O caching: writes > write caching: write-back > write cache with bad BBU: enabled > read ahead: adaptive > drive write cache: enabled > > > The obvious interpretation is that write-caching is now operational (in > the preferred write-back mode). Strangely, though, my performance tests > (with both pgbench and bonnie) still showed no meaningful effect from > having the cache operational! I toggled between caching / no-caching > with these commands: > > # mfiutil cache mfid0 writes > Setting write cache policy to write-back > > # mfiutil cache mfid0 disable > Disabling caching of I/O writes > > > Again, no difference in performance was seen. > > On a whim, I also tried write-through mode, and to my surprise, bonnie > showed significantly reduced performance! (consistent over multiple > samples) This is really confusing. To me it suggests that there's some > kind of disconnect between caching-status as seen with mfiutil and > caching-status in reality. Chief exhibits being that write-caching > appears to have still been happening even: > > * after the "cache mfid0 disable" command was issued, and > * earlier, before the "cache mfid0 bad-bbu-write-cache enable" command > was issued (when "mfiutil cache mfid0" still showed "Cache disabled > due to dead battery or ongoing battery relearn"). > > ** If this is the case then it suggests that the system before today was > in a dangerous state... actively doing write-back caching with a bad BBU > (despite what mfiutil claimed about the cache being disabled)! ** Yup. That's rather frightening. :( > > Your thoughts? Is there any other way to explain this? Nothing that comes to mind. The reason I did some work to improve LSI BBU reporting was because we were noticing intermittent performance problems that turned out to be caused by the controller flipping to write-through mode during BBU relearn cycles. However, I've never bothered verifying that the cache is actually in write-through mode when the battery is dead. I think there's a machine in my lab which shows similar problems, so I will try to take a look at it soon, do some write perf testing and see what MegaCli reports. It'll take me a few days at least to get to this though. I'm not sure how this might be fixed in the case that it turns out to be another firmware bug. -Mark > > > Here is the data from bonnie: > > ***** write-through caching (2 samples) > > # bonnie -s 2000 > File './Bonnie.1351', size: 2097152000 > ... > -------Sequential Output-------- ---Sequential Input-- --Random-- > -Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks--- > Machine MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU /sec %CPU > 2000 61515 21.3 46388 4.3 57432 16.0 247823 99.9 1629696 100.0 55687.0 212.4 > > Machine MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU /sec %CPU > 2000 60001 20.7 51828 4.9 51666 13.9 247501 100.0 1657454 100.0 53136.4 251.0 > > ***** write-back caching (2 samples) > > -------Sequential Output-------- ---Sequential Input-- --Random-- > -Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks--- > Machine MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU /sec %CPU > 2000 128564 44.6 90065 8.7 245325 47.8 248492 100.0 1558747 99.7 61967.5 179.1 > > Machine MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU /sec %CPU > 2000 184059 64.0 141360 13.8 129801 22.2 246222 99.2 1556723 100.0 51728.4 159.7 > > (and, again... same performance is seen after issuing "cache disable" > command) > > > Thanks much, > > Charles Owens > Great Bay Software > From owner-freebsd-scsi@FreeBSD.ORG Thu Nov 7 21:09:56 2013 Return-Path: Delivered-To: freebsd-scsi@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id 2ACC8F56; Thu, 7 Nov 2013 21:09:56 +0000 (UTC) (envelope-from cowens@greatbaysoftware.com) Received: from p3plsmtpa11-01.prod.phx3.secureserver.net (p3plsmtpa11-01.prod.phx3.secureserver.net [68.178.252.102]) by mx1.freebsd.org (Postfix) with ESMTP id EBE7B2E49; Thu, 7 Nov 2013 21:09:55 +0000 (UTC) Received: from jack.bspruce.com ([174.62.183.95]) by p3plsmtpa11-01.prod.phx3.secureserver.net with id ml8E1m00D23uTxa01l8FVr; Thu, 07 Nov 2013 14:08:17 -0700 Message-ID: <527C013E.3070607@greatbaysoftware.com> Date: Thu, 07 Nov 2013 16:08:14 -0500 From: Charles Owens Organization: Great Bay Software User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:24.0) Gecko/20100101 Thunderbird/24.1.0 MIME-Version: 1.0 To: Mark Johnston Subject: Re: adding BBU relearn support to mfiutil References: <20130304033836.GA33631@oddish> <1365196956.17311.13.camel@localhost> <20130406000809.GA96223@raichu> <527A7603.7090303@greatbaysoftware.com> <20131106230356.GA86666@charmander.sandvine.com> <527BD440.8010701@greatbaysoftware.com> <20131107194402.GA1695@charmander.sandvine.com> In-Reply-To: <20131107194402.GA1695@charmander.sandvine.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: Jason Damron , freebsd-scsi@freebsd.org, Steve McCoy X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 07 Nov 2013 21:09:56 -0000 On 11/7/13 2:44 PM, Mark Johnston wrote: > On Thu, Nov 07, 2013 at 12:56:16PM -0500, Charles Owens wrote: >> On 11/6/13 6:03 PM, Mark Johnston wrote: >>> On Wed, Nov 06, 2013 at 12:01:55PM -0500, Charles Owens wrote: >>>> Hi, we've been playing with this patch in the context of 8.4-RELEASE-p4 >>>> (we extracted r250483 and r250497 from stable/8 and applied to >>>> releng/8.4). I'm seeing some results that make me question whether or >>>> not caching is really working correctly after a BBU relearn operation >>>> has completed -- or maybe whether or not the new BBU patch is talking to >>>> LSI controller properly. >>>> >>>> Our test system had a BBU in the failed state (relearn needed). We used >>>> the "start learn command" and it seemed to go well, but strangely, when >>>> process is seems to have completed, and now several days later, status >>>> is still LEARN_CYCLE_REQUESTED (as seen with "mfiutil show battery"). >>>> This may be entirely normal -- maybe it says that because the autolearn >>>> feature is now enabled? >>> I suspect that the status is bogus and that the battery is in fact dead. >>> There seem to be a few firmware bugs in the BBU status reporting, at >>> least with iBBU07. In your output below, I see: >>> >>> Design Capacity: 1215 mAh >>> Full Charge Capacity: 65262 mAh >>> Current Capacity: 61543 mAh >>> >>> which clearly isn't right. I've seen this problem before as well: over >>> time, the full charge capacity decreases, and eventually it seems to >>> wrap around to 65535. MegaCli (LSI's binary RAID management tool) reports >>> exactly the same thing, so it's a problem with the controller firmware. >>> If you look at MegaCli output you get things like "Absolute charge: 6000%". >>> So I suspect that the status is incorrect as well; when I've run into >>> this problem, I still see "status: normal". >>> >>>> The "cache" status command also suggests also is a bit strange. Here is >>>> the raw output of these status commands: >>>> >>>> # mfiutil cache mfid0 >>>> mfi0 volume mfid0 cache settings: >>>> I/O caching: disabled >>>> write caching: write-back >>>> write cache with bad BBU: disabled >>>> read ahead: adaptive >>>> drive write cache: enabled >>>> Cache disabled due to dead battery or ongoing battery relearn >>>> >>>> >>>> # ./mfiutil show battery >>>> mfi0: Battery State: >>>> Manufacture Date: 3/18/2010 >>>> Serial Number: 77 >>>> Manufacturer: LS1111001A >>>> Model: 3598501 >>>> Chemistry: LION >>>> Design Capacity: 1215 mAh >>>> Full Charge Capacity: 65262 mAh >>>> Current Capacity: 61543 mAh >>>> Charge Cycles: 120 >>>> Current Charge: 94% >>>> Design Voltage: 3700 mV >>>> Current Voltage: 4081 mV >>>> Temperature: 23 C >>>> Autolearn period: 30 days >>>> Next learn time: Tue Nov 26 20:06:40 2013 >>>> Learn delay interval: 0 hours >>>> Autolearn mode: enabled >>>> Status: LEARN_CYCLE_REQUESTED >>>> >>>> >>>> /Why does cache status now say "Cache disabled due to dead battery or >>>> ongoing battery relearn"/? Shouldn't this no longer be the case since >>>> I've run the "learn" operation? Does this indicate that the I/O caching >>>> is really disabled? >>> I believe so. You can try changing the write caching policy to write-back >>> with bad BBU and see if that re-enables the cache. If it does, that's >>> more evidence that the BBU is dead and needs to be replaced. >>> >>>> I'd appreciate any and all assistance. Here's a bit of other info that >>>> might be of interest: >>>> >>>> # mfiutil show adapter >>>> mfi0 Adapter: >>>> Product Name: Integrated Intel(R) RAID Controller SROMBSASMP2 >>>> Serial Number: >>>> Firmware: 11.0.1-0036 >>>> RAID Levels: JBOD, RAID0, RAID1, RAID5, RAID6, RAID10, RAID50 >>>> Battery Backup: present >>>> NVRAM: 32K >>>> Onboard Memory: 512M >>>> Minimum Stripe: 8k >>>> Maximum Stripe: 1M >>>> >>>> # mfiutil show drives >>>> mfi0 Physical Drives: >>>> 1 ( 136G) ONLINE SAS E1:S0 >>>> 2 ( 136G) ONLINE SAS E1:S1 >>>> 3 ( 136G) ONLINE SAS E1:S4 >>>> 4 ( 136G) ONLINE SAS E1:S2 >>>> 5 ( 136G) HOT SPARE SAS E1:S3 >>>> >>>> The storage volume is 4-drives, RAID10. System has 16GB RAM, dual Xeon >>>> E5530 CPUs, on an Intel S5520UR motherboard. >>> It might be useful to check the output of "mfiutil show events -c info". >>> >>> >> This is good info, thank you. >> >> The "show events" command tells us when the battery first was detected >> as "failed": >> >> 49336 (Sun Mar 3 21:53:40 UTC 2013/BATTERY/info) - Battery charge complete >> 49340 (boot + 4s/BATTERY/info) - Battery Present >> 49341 (boot + 4s/BATTERY/FATAL) - Battery has failed and cannot support data retention. Please replace the battery >> 49365 (boot + 45s/BATTERY/WARN) - BBU disabled; changing WB virtual disks to WT >> 49367 (Mon Mar 4 05:13:09 UTC 2013/BATTERY/info) - Battery temperature is normal >> >> >> >> So, given this strong indication that the BBU is really dead, and that >> I'd still like to test the effects of write-caching, I used this >> command: mfiutil cache mfid0 bad-bbu-write-cache enable >> >> Now the "cached disabled" messages is gone: >> >> # mfiutil cache mfid0 >> mfi0 volume mfid0 cache settings: >> I/O caching: writes >> write caching: write-back >> write cache with bad BBU: enabled >> read ahead: adaptive >> drive write cache: enabled >> >> >> The obvious interpretation is that write-caching is now operational (in >> the preferred write-back mode). Strangely, though, my performance tests >> (with both pgbench and bonnie) still showed no meaningful effect from >> having the cache operational! I toggled between caching / no-caching >> with these commands: >> >> # mfiutil cache mfid0 writes >> Setting write cache policy to write-back >> >> # mfiutil cache mfid0 disable >> Disabling caching of I/O writes >> >> >> Again, no difference in performance was seen. >> >> On a whim, I also tried write-through mode, and to my surprise, bonnie >> showed significantly reduced performance! (consistent over multiple >> samples) This is really confusing. To me it suggests that there's some >> kind of disconnect between caching-status as seen with mfiutil and >> caching-status in reality. Chief exhibits being that write-caching >> appears to have still been happening even: >> >> * after the "cache mfid0 disable" command was issued, and >> * earlier, before the "cache mfid0 bad-bbu-write-cache enable" command >> was issued (when "mfiutil cache mfid0" still showed "Cache disabled >> due to dead battery or ongoing battery relearn"). >> >> ** If this is the case then it suggests that the system before today was >> in a dangerous state... actively doing write-back caching with a bad BBU >> (despite what mfiutil claimed about the cache being disabled)! ** > Yup. That's rather frightening. :( > >> Your thoughts? Is there any other way to explain this? > Nothing that comes to mind. The reason I did some work to improve LSI BBU > reporting was because we were noticing intermittent performance problems > that turned out to be caused by the controller flipping to write-through > mode during BBU relearn cycles. > > However, I've never bothered verifying that the cache is actually in > write-through mode when the battery is dead. I think there's a machine > in my lab which shows similar problems, so I will try to take a look at > it soon, do some write perf testing and see what MegaCli reports. It'll > take me a few days at least to get to this though. > > I'm not sure how this might be fixed in the case that it turns out to be > another firmware bug. > > -Mark After some reflection... I think part of the story is that before now I've been primarily relying on pgbench (since the vast majority of our application's I/O workload is with the DB). For whatever reason the randomized I/O that pgbench generates doesn't seem to benefit much from the LSI controller's write-caching -- whereas the effects of caching are readily visible with bonnie. The controller may have indeed properly switched from WB-to-WT as the BBU failed (and as reported in the controller events log)... but my focus on pgbench results led me to conclude that it hadn't. Even with what I reported above I was still bouncing back and forth between pgbench and bonnie... and I think the largely-steady pgbench results were muddying things. If I now toggle the "bad-bbu-write-cache" setting between enable and disable I'm seeing (with bonnie) the expected behavior. With it set to "disable", I correctly see poor performance, showing that it has dropped back to WT mode. Setting it "enable" gives good performance (WB). This is repeatable. Upshot is that I'm now suspecting that what I said before was a false alarm (happily). Of course, I do have a failed BBU, which is not so good. In a couple weeks I should have access to another box with a failed BBU, so I'll have a chance to repeat the testing with bonnie. -- Charles From owner-freebsd-scsi@FreeBSD.ORG Thu Nov 7 21:20:49 2013 Return-Path: Delivered-To: freebsd-scsi@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id C8D035B9 for ; Thu, 7 Nov 2013 21:20:49 +0000 (UTC) (envelope-from cowens@greatbaysoftware.com) Received: from p3plsmtpa11-02.prod.phx3.secureserver.net (p3plsmtpa11-02.prod.phx3.secureserver.net [68.178.252.103]) by mx1.freebsd.org (Postfix) with ESMTP id A74742F09 for ; Thu, 7 Nov 2013 21:20:49 +0000 (UTC) Received: from jack.bspruce.com ([174.62.183.95]) by p3plsmtpa11-02.prod.phx3.secureserver.net with id mlK41m00Y23uTxa01lK6BM; Thu, 07 Nov 2013 14:19:13 -0700 Message-ID: <527C03C8.5030609@greatbaysoftware.com> Date: Thu, 07 Nov 2013 16:19:04 -0500 From: Charles Owens Organization: Great Bay Software User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:24.0) Gecko/20100101 Thunderbird/24.1.0 MIME-Version: 1.0 To: freebsd-scsi@freebsd.org Subject: mfiutil for LSI MegaSAS 1064R? Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Content-Filtered-By: Mailman/MimeDel 2.1.14 X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 07 Nov 2013 21:20:49 -0000 Hi, I was surprised to discover that mfiutil doesn't seem to know how to talk to this controller, eg. # mfiutil show volumes mfiutil: Failed to get volume list: Inappropriate ioctl for device All subcommands give the same result. This was discovered with FreeBSD 8.3. I suspect I'm stuck, yes? I could use MegaCli, though that's not ideal. Are their any plans to add support for this card (which is admittedly pretty old)? Thanks Charles Owens Great Bay Software, Inc. From owner-freebsd-scsi@FreeBSD.ORG Thu Nov 7 22:27:00 2013 Return-Path: Delivered-To: freebsd-scsi@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id D8500F94 for ; Thu, 7 Nov 2013 22:27:00 +0000 (UTC) (envelope-from prvs=1023f01d59=killing@multiplay.co.uk) Received: from mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) (using TLSv1 with cipher RC4-MD5 (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 7772022C2 for ; Thu, 7 Nov 2013 22:26:59 +0000 (UTC) Received: from r2d2 ([82.69.141.170]) by mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) (MDaemon PRO v10.0.4) with ESMTP id md50006638812.msg for ; Thu, 07 Nov 2013 22:26:52 +0000 X-Spam-Processed: mail1.multiplay.co.uk, Thu, 07 Nov 2013 22:26:52 +0000 (not processed: message from valid local sender) X-MDDKIM-Result: neutral (mail1.multiplay.co.uk) X-MDRemoteIP: 82.69.141.170 X-Return-Path: prvs=1023f01d59=killing@multiplay.co.uk X-Envelope-From: killing@multiplay.co.uk X-MDaemon-Deliver-To: freebsd-scsi@freebsd.org Message-ID: <8AFECB3414D14ED9947956FBAD9A9214@multiplay.co.uk> From: "Steven Hartland" To: "Charles Owens" , References: <527C03C8.5030609@greatbaysoftware.com> Subject: Re: mfiutil for LSI MegaSAS 1064R? Date: Thu, 7 Nov 2013 22:26:56 -0000 MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset="iso-8859-1"; reply-type=response Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2900.5931 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6157 X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 07 Nov 2013 22:27:00 -0000 If MegaCli works mfiutil in theory should. Question that springs to mind are: 1. What driver are you using? Might be a silly question but good to confirm. 2. Are you sure your kernel and world are in sync? regards Steve ----- Original Message ----- From: "Charles Owens" To: Sent: Thursday, November 07, 2013 9:19 PM Subject: mfiutil for LSI MegaSAS 1064R? > Hi, > > I was surprised to discover that mfiutil doesn't seem to know how to > talk to this controller, eg. > > # mfiutil show volumes > mfiutil: Failed to get volume list: Inappropriate ioctl for device > > All subcommands give the same result. This was discovered with FreeBSD 8.3. > > I suspect I'm stuck, yes? I could use MegaCli, though that's not > ideal. Are their any plans to add support for this card (which is > admittedly pretty old)? > > Thanks > > Charles Owens > Great Bay Software, Inc. > _______________________________________________ > freebsd-scsi@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-scsi > To unsubscribe, send any mail to "freebsd-scsi-unsubscribe@freebsd.org" > ================================================ This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to postmaster@multiplay.co.uk. From owner-freebsd-scsi@FreeBSD.ORG Fri Nov 8 04:42:22 2013 Return-Path: Delivered-To: freebsd-scsi@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id 92F30DE for ; Fri, 8 Nov 2013 04:42:22 +0000 (UTC) (envelope-from cowens@greatbaysoftware.com) Received: from p3plsmtpa11-06.prod.phx3.secureserver.net (p3plsmtpa11-06.prod.phx3.secureserver.net [68.178.252.107]) by mx1.freebsd.org (Postfix) with ESMTP id 715042B9A for ; Fri, 8 Nov 2013 04:42:22 +0000 (UTC) Received: from jack.bspruce.com ([174.62.183.95]) by p3plsmtpa11-06.prod.phx3.secureserver.net with id msgk1m00M23uTxa01sgl3Z; Thu, 07 Nov 2013 21:40:46 -0700 Message-ID: <527C6B4C.4@greatbaysoftware.com> Date: Thu, 07 Nov 2013 23:40:44 -0500 From: Charles Owens Organization: Great Bay Software User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:24.0) Gecko/20100101 Thunderbird/24.1.0 MIME-Version: 1.0 To: Steven Hartland , freebsd-scsi@freebsd.org Subject: Re: mfiutil for LSI MegaSAS 1064R? References: <527C03C8.5030609@greatbaysoftware.com> <8AFECB3414D14ED9947956FBAD9A9214@multiplay.co.uk> In-Reply-To: <8AFECB3414D14ED9947956FBAD9A9214@multiplay.co.uk> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 08 Nov 2013 04:42:22 -0000 Yes... that's exactly it -- kernel and mfiutil not in sync. False alarm -- now back to sanity. Thank you, Charles On 11/7/13 5:26 PM, Steven Hartland wrote: > If MegaCli works mfiutil in theory should. > > Question that springs to mind are: > 1. What driver are you using? Might be a silly question but good to > confirm. > 2. Are you sure your kernel and world are in sync? > > regards > Steve > ----- Original Message ----- From: "Charles Owens" > > To: > Sent: Thursday, November 07, 2013 9:19 PM > Subject: mfiutil for LSI MegaSAS 1064R? > > >> Hi, >> >> I was surprised to discover that mfiutil doesn't seem to know how to >> talk to this controller, eg. >> >> # mfiutil show volumes >> mfiutil: Failed to get volume list: Inappropriate ioctl for device >> >> All subcommands give the same result. This was discovered with >> FreeBSD 8.3. >> >> I suspect I'm stuck, yes? I could use MegaCli, though that's not >> ideal. Are their any plans to add support for this card (which is >> admittedly pretty old)? >> >> Thanks >> >> Charles Owens >> Great Bay Software, Inc. >> _______________________________________________ >> freebsd-scsi@freebsd.org mailing list >> http://lists.freebsd.org/mailman/listinfo/freebsd-scsi >> To unsubscribe, send any mail to "freebsd-scsi-unsubscribe@freebsd.org" >> > > ================================================ > This e.mail is private and confidential between Multiplay (UK) Ltd. > and the person or entity to whom it is addressed. In the event of > misdirection, the recipient is prohibited from using, copying, > printing or otherwise disseminating it or any information contained in > it. > In the event of misdirection, illegible or incomplete transmission > please telephone +44 845 868 1337 > or return the E.mail to postmaster@multiplay.co.uk. > > >