From owner-freebsd-scsi@FreeBSD.ORG  Tue Jan 17 02:02:19 2012
Return-Path: <owner-freebsd-scsi@FreeBSD.ORG>
Delivered-To: freebsd-scsi@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 1CDB4106566B;
	Tue, 17 Jan 2012 02:02:19 +0000 (UTC) (envelope-from jwd@FreeBSD.org)
Received: from freefall.freebsd.org (freefall.freebsd.org
	[IPv6:2001:4f8:fff6::28])
	by mx1.freebsd.org (Postfix) with ESMTP id EF3BE8FC12;
	Tue, 17 Jan 2012 02:02:18 +0000 (UTC)
Received: from freefall.freebsd.org (localhost [127.0.0.1])
	by freefall.freebsd.org (8.14.5/8.14.5) with ESMTP id q0H22ISW090659;
	Tue, 17 Jan 2012 02:02:18 GMT
	(envelope-from jwd@freefall.freebsd.org)
Received: (from jwd@localhost)
	by freefall.freebsd.org (8.14.5/8.14.5/Submit) id q0H22IAp090658;
	Tue, 17 Jan 2012 02:02:18 GMT (envelope-from jwd)
Date: Tue, 17 Jan 2012 02:02:18 +0000
From: John <jwd@freebsd.org>
To: "Desai, Kashyap" <Kashyap.Desai@lsi.com>,
	"Kenneth D. Merry" <ken@freebsd.org>
Message-ID: <20120117020218.GA59053@FreeBSD.org>
References: <20120114051618.GA41288@FreeBSD.org>
	<20120114232245.GA57880@nargothrond.kdm.org>
	<B2FD678A64EAAD45B089B123FDFC3ED7299CF90E7C@inbmail01.lsi.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <B2FD678A64EAAD45B089B123FDFC3ED7299CF90E7C@inbmail01.lsi.com>
User-Agent: Mutt/1.4.2.3i
Cc: "freebsd-scsi@freebsd.org" <freebsd-scsi@freebsd.org>
Subject: Re: mps driver chain_alloc_fail / performance ?
X-BeenThere: freebsd-scsi@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: SCSI subsystem <freebsd-scsi.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-scsi>
List-Post: <mailto:freebsd-scsi@freebsd.org>
List-Help: <mailto:freebsd-scsi-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
	<mailto:freebsd-scsi-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 17 Jan 2012 02:02:19 -0000

----- Desai, Kashyap's Original Message -----
> Which driver version is this ? In our 09.00.00.00 Driver (which is in pipeline to be committed) has 2048 chain buffer counter.

   I'm not sure how to answer your question directly. We're using the driver
that comes with FreeBSD. Not a driver directly from LSI. If we can get a copy
of your 9.0 driver we can try testing against it.

> And our Test team has verified it with almost 150+ Drives.

   Currently, we have 8 shelves, 25 drives per shelf, dual attached
configured with geom multipath using Active/Active. Ignoring SSDs and
OS disks on the internal card, we see 400 da devices on mps1 & mps2.
For the record, the shelves are:

ses0 at mps1 bus 0 scbus7 target 0 lun 0
ses0: <HP D2700 SAS AJ941A 0131> Fixed Enclosure Services SCSI-5 device 
ses0: 600.000MB/s transfers
ses0: Command Queueing enabled
ses0: SCSI-3 SES Device


> As suggested by Ken, Can you try increasing MPS_CHAIN_FRAMES  to 4096 OR 2048

   Absolutely. The current value is 2048. We are currently running with
this patch to increase the value and output a singular alerting message:
--- sys/dev/mps/mpsvar.h.orig	2012-01-15 19:28:51.000000000 -0500
+++ sys/dev/mps/mpsvar.h	2012-01-15 20:14:07.000000000 -0500
@@ -34,7 +34,7 @@
 #define MPS_REQ_FRAMES		1024
 #define MPS_EVT_REPLY_FRAMES	32
 #define MPS_REPLY_FRAMES	MPS_REQ_FRAMES
-#define MPS_CHAIN_FRAMES	2048
+#define MPS_CHAIN_FRAMES	4096
 #define MPS_SENSE_LEN		SSD_FULL_SIZE
 #define MPS_MSI_COUNT		1
 #define MPS_SGE64_SIZE		12
@@ -242,8 +242,11 @@
 		sc->chain_free--;
 		if (sc->chain_free < sc->chain_free_lowwater)
 			sc->chain_free_lowwater = sc->chain_free;
-	} else
+	} else {
 		sc->chain_alloc_fail++;
+		if (sc->chain_alloc_fail == 1)
+			device_printf(sc->mps_dev,"Insufficient chain_list buffers.");
+	}
 	return (chain);
 }
 

   If the logic for outputting the message is appropriate I think
it would be nice to get it committed.

> ~ Kashyap
> 
> > Kenneth D. Merry said:
> > 
> > The firmware on those boards is a little old.  You might consider
> > upgrading.

   We updated the the FW this morning and we're now showing:

mps0: <LSI SAS2116> port 0x5000-0x50ff mem 0xf5ff0000-0xf5ff3fff,0xf5f80000-0xf5fbffff irq 30 at device 0.0 on pci13
mps0: Firmware: 12.00.00.00
mps0: IOCCapabilities: 1285c<ScsiTaskFull,DiagTrace,SnapBuf,EEDP,TransRetry,EventReplay,HostDisc>
mps1: <LSI SAS2116> port 0x7000-0x70ff mem 0xfbef0000-0xfbef3fff,0xfbe80000-0xfbebffff irq 48 at device 0.0 on pci33
mps1: Firmware: 12.00.00.00
mps1: IOCCapabilities: 1285c<ScsiTaskFull,DiagTrace,SnapBuf,EEDP,TransRetry,EventReplay,HostDisc>
mps2: <LSI SAS2116> port 0x6000-0x60ff mem 0xfbcf0000-0xfbcf3fff,0xfbc80000-0xfbcbffff irq 56 at device 0.0 on pci27
mps2: Firmware: 12.00.00.00
mps2: IOCCapabilities: 1285c<ScsiTaskFull,DiagTrace,SnapBuf,EEDP,TransRetry,EventReplay,HostDisc>

   We last updated about around November of last year.

> > > # camcontrol inquiry da10
> > > pass21: <HP EG0600FBLSH HPD2> Fixed Direct Access SCSI-5 device
> > > pass21: Serial Number 6XR14KYV0000B148LDKM
> > > pass21: 600.000MB/s transfers, Command Queueing Enabled
> > 
> > That's a lot of drives!  I've only run up to 60 drives.

   See above. In general, I'm relatively pleased with how the system
responds with all these drives.

> > >    When running the system under load, I see the following reported:
> > >
> > > hw.mps.2.allow_multiple_tm_cmds: 0
> > > hw.mps.2.io_cmds_active: 0
> > > hw.mps.2.io_cmds_highwater: 1019
> > > hw.mps.2.chain_free: 2048
> > > hw.mps.2.chain_free_lowwater: 0
> > > hw.mps.2.chain_alloc_fail: 13307     <---- ??

   The current test case run is showing:

hw.mps.2.debug_level: 0
hw.mps.2.allow_multiple_tm_cmds: 0
hw.mps.2.io_cmds_active: 109
hw.mps.2.io_cmds_highwater: 1019
hw.mps.2.chain_free: 4042
hw.mps.2.chain_free_lowwater: 3597
hw.mps.2.chain_alloc_fail: 0

   It may be a few hours before it progresses to the point where it
ran low last time.

> > Bump MPS_CHAIN_FRAMES to something larger.  You can try 4096 and see
> > what happens.

   Agreed. Let me know if you thing there is anything we should add to
the patch above.

> > >    A few layers up, it seems like it would be nice if the buffer
> > > exhaustion was reported outside of debug being enabled... at least
> > > maybe the first time.
> > 
> > It used to report being out of chain frames every time it happened,
> > which wound up being too much.  You're right, doing it once might be good.

Thanks, that's how I tried to put the patch together.

> > Once you bump up the number of chain frames to the point where you  aren't
> > running out, I doubt the driver will be the big bottleneck.  It'll probably
> > be other things higher up the stack.

Question. What "should" the layer of code above the mps driver do if the driver
returns ENOBUFS? I'm wondering if it might explain some incorrect results.

> > What sort of ZFS topology did you try?
> > 
> > I know for raidz2, and perhaps for raidz, ZFS is faster if your number
> > of data disks is a power of 2.
> > 
> > If you want raidz2 protection, try creating arrays in groups of 10, so
> > you wind up having 8 data disks.

The fasted we've seen is with a pool made of mirrors, though this uses
up the most space. It also caused the most alloc fails (and leads to my
question about ENOBUFS).

Thank you both for your help. Any comments are always welcome! If I haven't
answered a question, or otherwise said something that doesn't make
sense, let me know.

Thanks,
John