From owner-freebsd-scsi@FreeBSD.ORG Tue Jan 17 02:02:19 2012 Return-Path: Delivered-To: freebsd-scsi@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 1CDB4106566B; Tue, 17 Jan 2012 02:02:19 +0000 (UTC) (envelope-from jwd@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id EF3BE8FC12; Tue, 17 Jan 2012 02:02:18 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.5/8.14.5) with ESMTP id q0H22ISW090659; Tue, 17 Jan 2012 02:02:18 GMT (envelope-from jwd@freefall.freebsd.org) Received: (from jwd@localhost) by freefall.freebsd.org (8.14.5/8.14.5/Submit) id q0H22IAp090658; Tue, 17 Jan 2012 02:02:18 GMT (envelope-from jwd) Date: Tue, 17 Jan 2012 02:02:18 +0000 From: John To: "Desai, Kashyap" , "Kenneth D. Merry" Message-ID: <20120117020218.GA59053@FreeBSD.org> References: <20120114051618.GA41288@FreeBSD.org> <20120114232245.GA57880@nargothrond.kdm.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.4.2.3i Cc: "freebsd-scsi@freebsd.org" Subject: Re: mps driver chain_alloc_fail / performance ? X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 17 Jan 2012 02:02:19 -0000 ----- Desai, Kashyap's Original Message ----- > Which driver version is this ? In our 09.00.00.00 Driver (which is in pipeline to be committed) has 2048 chain buffer counter. I'm not sure how to answer your question directly. We're using the driver that comes with FreeBSD. Not a driver directly from LSI. If we can get a copy of your 9.0 driver we can try testing against it. > And our Test team has verified it with almost 150+ Drives. Currently, we have 8 shelves, 25 drives per shelf, dual attached configured with geom multipath using Active/Active. Ignoring SSDs and OS disks on the internal card, we see 400 da devices on mps1 & mps2. For the record, the shelves are: ses0 at mps1 bus 0 scbus7 target 0 lun 0 ses0: Fixed Enclosure Services SCSI-5 device ses0: 600.000MB/s transfers ses0: Command Queueing enabled ses0: SCSI-3 SES Device > As suggested by Ken, Can you try increasing MPS_CHAIN_FRAMES to 4096 OR 2048 Absolutely. The current value is 2048. We are currently running with this patch to increase the value and output a singular alerting message: --- sys/dev/mps/mpsvar.h.orig 2012-01-15 19:28:51.000000000 -0500 +++ sys/dev/mps/mpsvar.h 2012-01-15 20:14:07.000000000 -0500 @@ -34,7 +34,7 @@ #define MPS_REQ_FRAMES 1024 #define MPS_EVT_REPLY_FRAMES 32 #define MPS_REPLY_FRAMES MPS_REQ_FRAMES -#define MPS_CHAIN_FRAMES 2048 +#define MPS_CHAIN_FRAMES 4096 #define MPS_SENSE_LEN SSD_FULL_SIZE #define MPS_MSI_COUNT 1 #define MPS_SGE64_SIZE 12 @@ -242,8 +242,11 @@ sc->chain_free--; if (sc->chain_free < sc->chain_free_lowwater) sc->chain_free_lowwater = sc->chain_free; - } else + } else { sc->chain_alloc_fail++; + if (sc->chain_alloc_fail == 1) + device_printf(sc->mps_dev,"Insufficient chain_list buffers."); + } return (chain); } If the logic for outputting the message is appropriate I think it would be nice to get it committed. > ~ Kashyap > > > Kenneth D. Merry said: > > > > The firmware on those boards is a little old. You might consider > > upgrading. We updated the the FW this morning and we're now showing: mps0: port 0x5000-0x50ff mem 0xf5ff0000-0xf5ff3fff,0xf5f80000-0xf5fbffff irq 30 at device 0.0 on pci13 mps0: Firmware: 12.00.00.00 mps0: IOCCapabilities: 1285c mps1: port 0x7000-0x70ff mem 0xfbef0000-0xfbef3fff,0xfbe80000-0xfbebffff irq 48 at device 0.0 on pci33 mps1: Firmware: 12.00.00.00 mps1: IOCCapabilities: 1285c mps2: port 0x6000-0x60ff mem 0xfbcf0000-0xfbcf3fff,0xfbc80000-0xfbcbffff irq 56 at device 0.0 on pci27 mps2: Firmware: 12.00.00.00 mps2: IOCCapabilities: 1285c We last updated about around November of last year. > > > # camcontrol inquiry da10 > > > pass21: Fixed Direct Access SCSI-5 device > > > pass21: Serial Number 6XR14KYV0000B148LDKM > > > pass21: 600.000MB/s transfers, Command Queueing Enabled > > > > That's a lot of drives! I've only run up to 60 drives. See above. In general, I'm relatively pleased with how the system responds with all these drives. > > > When running the system under load, I see the following reported: > > > > > > hw.mps.2.allow_multiple_tm_cmds: 0 > > > hw.mps.2.io_cmds_active: 0 > > > hw.mps.2.io_cmds_highwater: 1019 > > > hw.mps.2.chain_free: 2048 > > > hw.mps.2.chain_free_lowwater: 0 > > > hw.mps.2.chain_alloc_fail: 13307 <---- ?? The current test case run is showing: hw.mps.2.debug_level: 0 hw.mps.2.allow_multiple_tm_cmds: 0 hw.mps.2.io_cmds_active: 109 hw.mps.2.io_cmds_highwater: 1019 hw.mps.2.chain_free: 4042 hw.mps.2.chain_free_lowwater: 3597 hw.mps.2.chain_alloc_fail: 0 It may be a few hours before it progresses to the point where it ran low last time. > > Bump MPS_CHAIN_FRAMES to something larger. You can try 4096 and see > > what happens. Agreed. Let me know if you thing there is anything we should add to the patch above. > > > A few layers up, it seems like it would be nice if the buffer > > > exhaustion was reported outside of debug being enabled... at least > > > maybe the first time. > > > > It used to report being out of chain frames every time it happened, > > which wound up being too much. You're right, doing it once might be good. Thanks, that's how I tried to put the patch together. > > Once you bump up the number of chain frames to the point where you aren't > > running out, I doubt the driver will be the big bottleneck. It'll probably > > be other things higher up the stack. Question. What "should" the layer of code above the mps driver do if the driver returns ENOBUFS? I'm wondering if it might explain some incorrect results. > > What sort of ZFS topology did you try? > > > > I know for raidz2, and perhaps for raidz, ZFS is faster if your number > > of data disks is a power of 2. > > > > If you want raidz2 protection, try creating arrays in groups of 10, so > > you wind up having 8 data disks. The fasted we've seen is with a pool made of mirrors, though this uses up the most space. It also caused the most alloc fails (and leads to my question about ENOBUFS). Thank you both for your help. Any comments are always welcome! If I haven't answered a question, or otherwise said something that doesn't make sense, let me know. Thanks, John