Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 22 Mar 2010 11:52:44 -0400
From:      Alexander Sack <pisymbol@gmail.com>
To:        John Baldwin <jhb@freebsd.org>
Cc:        FreeBSD-Current <freebsd-current@freebsd.org>, freebsd-arch@freebsd.org
Subject:   Re: Increasing MAXPHYS
Message-ID:  <3c0b01821003220852r61ca0ae3o95bea1c23ddc34d9@mail.gmail.com>
In-Reply-To: <201003220839.12907.jhb@freebsd.org>
References:  <4BA4E7A9.3070502@FreeBSD.org> <4BA6517C.3050509@FreeBSD.org> <20100322124018.7430f45e@ernst.jennejohn.org> <201003220839.12907.jhb@freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, Mar 22, 2010 at 8:39 AM, John Baldwin <jhb@freebsd.org> wrote:
> On Monday 22 March 2010 7:40:18 am Gary Jennejohn wrote:
>> On Sun, 21 Mar 2010 19:03:56 +0200
>> Alexander Motin <mav@FreeBSD.org> wrote:
>>
>> > Scott Long wrote:
>> > > Are there non-CAM drivers that look at MAXPHYS, or that silently ass=
ume
> that
>> > > MAXPHYS will never be more than 128k?
>> >
>> > That is a question.
>> >
>>
>> I only did a quick&dirty grep looking for MAXPHYS in /sys.
>>
>> Some drivers redefine MAXPHYS to be 512KiB. =A0Some use their own local
>> MAXPHYS which is usually 128KiB.
>>
>> Some look at MAXPHYS to figure out other things; the details escape me.
>>
>> There's one driver which actually uses 100*MAXPHYS for something, but I
>> didn't check the details.
>>
>> Lots of them were non-CAM drivers AFAICT.
>
> The problem is the drivers that _don't_ reference MAXPHYS. =A0The driver =
author
> at the time "knew" that MAXPHYS was 128k, so he did the MAXPHYS-dependent
> calculation and just put the result in the driver (e.g. only supporting u=
p to
> 32 segments (32 4k pages =3D=3D 128k) in a bus dma tag as a magic number =
to
> bus_dma_tag_create() w/o documenting that the '32' was derived from 128k =
or
> what the actual hardware limit on nsegments is). =A0These cannot be found=
 by a
> simple grep, they require manually inspecting each driver.

100% awesome comment.  On another kernel, I myself was guilty of this
crime (I did have a nice comment though above the def).

This has been a great thread since our application really needs some
of the optimizations that are being thrown around here.  We have found
in real live performance testing that we are almost always either
controller bound (i.e. adding more disks to spread IOPs has little to
no effect in large array configurations on throughput, we suspect that
is hitting the RAID controller's firmware limitations) or tps bound,
i.e. I never thought going from 128k -> 256k per transaction would
have a dramatic effect on throughput (but I never verified).

Back to HBAs,  AFAIK, every modern iteration of the most popular HBAs
can easily do way more than a 128k scatter/gather I/O.  Do you guys
know of any *modern* (circa within the last 3-4 years) that can not do
more than 128k at a shot?

In other words, I've always thought the limit was kernel imposed and
not what the memory controller on the card can do (I certainly never
got the impression talking with some of the IHVs over the years that
they were designing their hardware for a 128k limit - I sure hope
not!).

-aps



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3c0b01821003220852r61ca0ae3o95bea1c23ddc34d9>