From owner-freebsd-current@FreeBSD.ORG Mon Mar 22 16:27:19 2010 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 93FC1106566B; Mon, 22 Mar 2010 16:27:19 +0000 (UTC) (envelope-from scottl@samsco.org) Received: from pooker.samsco.org (pooker.samsco.org [168.103.85.57]) by mx1.freebsd.org (Postfix) with ESMTP id 5A26B8FC0C; Mon, 22 Mar 2010 16:27:19 +0000 (UTC) Received: from phobos.samsco.home (phobos.samsco.home [192.168.254.11]) (authenticated bits=0) by pooker.samsco.org (8.14.3/8.14.3) with ESMTP id o2MGRFPX050285; Mon, 22 Mar 2010 10:27:15 -0600 (MDT) (envelope-from scottl@samsco.org) Mime-Version: 1.0 (Apple Message framework v1077) Content-Type: text/plain; charset=us-ascii From: Scott Long In-Reply-To: <3c0b01821003220852r61ca0ae3o95bea1c23ddc34d9@mail.gmail.com> Date: Mon, 22 Mar 2010 10:27:15 -0600 Content-Transfer-Encoding: quoted-printable Message-Id: <50456989-F196-4907-A170-85806A73D25F@samsco.org> References: <4BA4E7A9.3070502@FreeBSD.org> <4BA6517C.3050509@FreeBSD.org> <20100322124018.7430f45e@ernst.jennejohn.org> <201003220839.12907.jhb@freebsd.org> <3c0b01821003220852r61ca0ae3o95bea1c23ddc34d9@mail.gmail.com> To: Alexander Sack X-Mailer: Apple Mail (2.1077) X-Spam-Status: No, score=-1.0 required=3.8 tests=ALL_TRUSTED autolearn=unavailable version=3.3.0 X-Spam-Checker-Version: SpamAssassin 3.3.0 (2010-01-18) on pooker.samsco.org Cc: FreeBSD-Current , freebsd-arch@freebsd.org Subject: Re: Increasing MAXPHYS X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 22 Mar 2010 16:27:19 -0000 On Mar 22, 2010, at 9:52 AM, Alexander Sack wrote: > On Mon, Mar 22, 2010 at 8:39 AM, John Baldwin wrote: >> On Monday 22 March 2010 7:40:18 am Gary Jennejohn wrote: >>> On Sun, 21 Mar 2010 19:03:56 +0200 >>> Alexander Motin wrote: >>>=20 >>>> Scott Long wrote: >>>>> Are there non-CAM drivers that look at MAXPHYS, or that silently = assume >> that >>>>> MAXPHYS will never be more than 128k? >>>>=20 >>>> That is a question. >>>>=20 >>>=20 >>> I only did a quick&dirty grep looking for MAXPHYS in /sys. >>>=20 >>> Some drivers redefine MAXPHYS to be 512KiB. Some use their own = local >>> MAXPHYS which is usually 128KiB. >>>=20 >>> Some look at MAXPHYS to figure out other things; the details escape = me. >>>=20 >>> There's one driver which actually uses 100*MAXPHYS for something, = but I >>> didn't check the details. >>>=20 >>> Lots of them were non-CAM drivers AFAICT. >>=20 >> The problem is the drivers that _don't_ reference MAXPHYS. The = driver author >> at the time "knew" that MAXPHYS was 128k, so he did the = MAXPHYS-dependent >> calculation and just put the result in the driver (e.g. only = supporting up to >> 32 segments (32 4k pages =3D=3D 128k) in a bus dma tag as a magic = number to >> bus_dma_tag_create() w/o documenting that the '32' was derived from = 128k or >> what the actual hardware limit on nsegments is). These cannot be = found by a >> simple grep, they require manually inspecting each driver. >=20 > 100% awesome comment. On another kernel, I myself was guilty of this > crime (I did have a nice comment though above the def). >=20 > This has been a great thread since our application really needs some > of the optimizations that are being thrown around here. We have found > in real live performance testing that we are almost always either > controller bound (i.e. adding more disks to spread IOPs has little to > no effect in large array configurations on throughput, we suspect that > is hitting the RAID controller's firmware limitations) or tps bound, > i.e. I never thought going from 128k -> 256k per transaction would > have a dramatic effect on throughput (but I never verified). >=20 > Back to HBAs, AFAIK, every modern iteration of the most popular HBAs > can easily do way more than a 128k scatter/gather I/O. Do you guys > know of any *modern* (circa within the last 3-4 years) that can not do > more than 128k at a shot? >64K broken in MPT at the moment. The hardware can do it, the driver = thinks it can do it, but it fails. AAC hardware traditionally cannot, = but maybe the firmware has been improved in the past few years. I know = that there are other low-performance devices that can't do more than 64 = or 128K, but none are coming to mind at the moment. Still, it shouldn't = be a universal assumption that all hardware can do big I/O's. Another consideration is that some hardware can do big I/O's, but not = very efficiently. Not all DMA engines are created equal, and moving to = compound commands and excessively long S/G lists can be a pessimization. = For example, MFI hardware does a hinted prefetch on the segment list, = but once you exceed a certain limit, that prefetch doesn't work anymore = and the firmware has to take the slow path to execute the i/o. I = haven't quantified this penalty yet, but it's something that should be = thought about. >=20 > In other words, I've always thought the limit was kernel imposed and > not what the memory controller on the card can do (I certainly never > got the impression talking with some of the IHVs over the years that > they were designing their hardware for a 128k limit - I sure hope > not!). You'd be surprised at the engineering compromises and handicaps that are = committed at IHVs because of misguided marketters. Scott