From owner-freebsd-current@FreeBSD.ORG  Mon Mar 22 16:27:19 2010
Return-Path: <owner-freebsd-current@FreeBSD.ORG>
Delivered-To: freebsd-current@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 93FC1106566B;
	Mon, 22 Mar 2010 16:27:19 +0000 (UTC)
	(envelope-from scottl@samsco.org)
Received: from pooker.samsco.org (pooker.samsco.org [168.103.85.57])
	by mx1.freebsd.org (Postfix) with ESMTP id 5A26B8FC0C;
	Mon, 22 Mar 2010 16:27:19 +0000 (UTC)
Received: from phobos.samsco.home (phobos.samsco.home [192.168.254.11])
	(authenticated bits=0)
	by pooker.samsco.org (8.14.3/8.14.3) with ESMTP id o2MGRFPX050285;
	Mon, 22 Mar 2010 10:27:15 -0600 (MDT)
	(envelope-from scottl@samsco.org)
Mime-Version: 1.0 (Apple Message framework v1077)
Content-Type: text/plain; charset=us-ascii
From: Scott Long <scottl@samsco.org>
In-Reply-To: <3c0b01821003220852r61ca0ae3o95bea1c23ddc34d9@mail.gmail.com>
Date: Mon, 22 Mar 2010 10:27:15 -0600
Content-Transfer-Encoding: quoted-printable
Message-Id: <50456989-F196-4907-A170-85806A73D25F@samsco.org>
References: <4BA4E7A9.3070502@FreeBSD.org> <4BA6517C.3050509@FreeBSD.org>
	<20100322124018.7430f45e@ernst.jennejohn.org>
	<201003220839.12907.jhb@freebsd.org>
	<3c0b01821003220852r61ca0ae3o95bea1c23ddc34d9@mail.gmail.com>
To: Alexander Sack <pisymbol@gmail.com>
X-Mailer: Apple Mail (2.1077)
X-Spam-Status: No, score=-1.0 required=3.8 tests=ALL_TRUSTED
	autolearn=unavailable version=3.3.0
X-Spam-Checker-Version: SpamAssassin 3.3.0 (2010-01-18) on pooker.samsco.org
Cc: FreeBSD-Current <freebsd-current@freebsd.org>, freebsd-arch@freebsd.org
Subject: Re: Increasing MAXPHYS
X-BeenThere: freebsd-current@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussions about the use of FreeBSD-current
	<freebsd-current.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>, 
	<mailto:freebsd-current-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-current>
List-Post: <mailto:freebsd-current@freebsd.org>
List-Help: <mailto:freebsd-current-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>,
	<mailto:freebsd-current-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 22 Mar 2010 16:27:19 -0000

On Mar 22, 2010, at 9:52 AM, Alexander Sack wrote:
> On Mon, Mar 22, 2010 at 8:39 AM, John Baldwin <jhb@freebsd.org> wrote:
>> On Monday 22 March 2010 7:40:18 am Gary Jennejohn wrote:
>>> On Sun, 21 Mar 2010 19:03:56 +0200
>>> Alexander Motin <mav@FreeBSD.org> wrote:
>>>=20
>>>> Scott Long wrote:
>>>>> Are there non-CAM drivers that look at MAXPHYS, or that silently =
assume
>> that
>>>>> MAXPHYS will never be more than 128k?
>>>>=20
>>>> That is a question.
>>>>=20
>>>=20
>>> I only did a quick&dirty grep looking for MAXPHYS in /sys.
>>>=20
>>> Some drivers redefine MAXPHYS to be 512KiB.  Some use their own =
local
>>> MAXPHYS which is usually 128KiB.
>>>=20
>>> Some look at MAXPHYS to figure out other things; the details escape =
me.
>>>=20
>>> There's one driver which actually uses 100*MAXPHYS for something, =
but I
>>> didn't check the details.
>>>=20
>>> Lots of them were non-CAM drivers AFAICT.
>>=20
>> The problem is the drivers that _don't_ reference MAXPHYS.  The =
driver author
>> at the time "knew" that MAXPHYS was 128k, so he did the =
MAXPHYS-dependent
>> calculation and just put the result in the driver (e.g. only =
supporting up to
>> 32 segments (32 4k pages =3D=3D 128k) in a bus dma tag as a magic =
number to
>> bus_dma_tag_create() w/o documenting that the '32' was derived from =
128k or
>> what the actual hardware limit on nsegments is).  These cannot be =
found by a
>> simple grep, they require manually inspecting each driver.
>=20
> 100% awesome comment.  On another kernel, I myself was guilty of this
> crime (I did have a nice comment though above the def).
>=20
> This has been a great thread since our application really needs some
> of the optimizations that are being thrown around here.  We have found
> in real live performance testing that we are almost always either
> controller bound (i.e. adding more disks to spread IOPs has little to
> no effect in large array configurations on throughput, we suspect that
> is hitting the RAID controller's firmware limitations) or tps bound,
> i.e. I never thought going from 128k -> 256k per transaction would
> have a dramatic effect on throughput (but I never verified).
>=20
> Back to HBAs,  AFAIK, every modern iteration of the most popular HBAs
> can easily do way more than a 128k scatter/gather I/O.  Do you guys
> know of any *modern* (circa within the last 3-4 years) that can not do
> more than 128k at a shot?

>64K broken in MPT at the moment.  The hardware can do it, the driver =
thinks it can do it, but it fails.  AAC hardware traditionally cannot, =
but maybe the firmware has been improved in the past few years.  I know =
that there are other low-performance devices that can't do more than 64 =
or 128K, but none are coming to mind at the moment.  Still, it shouldn't =
be a universal assumption that all hardware can do big I/O's.

Another consideration is that some hardware can do big I/O's, but not =
very efficiently.  Not all DMA engines are created equal, and moving to =
compound commands and excessively long S/G lists can be a pessimization. =
 For example, MFI hardware does a hinted prefetch on the segment list, =
but once you exceed a certain limit, that prefetch doesn't work anymore =
and the firmware has to take the slow path to execute the i/o.  I =
haven't quantified this penalty yet, but it's something that should be =
thought about.

>=20
> In other words, I've always thought the limit was kernel imposed and
> not what the memory controller on the card can do (I certainly never
> got the impression talking with some of the IHVs over the years that
> they were designing their hardware for a 128k limit - I sure hope
> not!).

You'd be surprised at the engineering compromises and handicaps that are =
committed at IHVs because of misguided marketters.

Scott