From owner-freebsd-arch@FreeBSD.ORG Mon Mar 22 16:19:49 2010 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 23849106564A; Mon, 22 Mar 2010 16:19:49 +0000 (UTC) (envelope-from pisymbol@gmail.com) Received: from mail-gx0-f211.google.com (mail-gx0-f211.google.com [209.85.217.211]) by mx1.freebsd.org (Postfix) with ESMTP id BD6788FC1F; Mon, 22 Mar 2010 16:19:48 +0000 (UTC) Received: by gxk3 with SMTP id 3so994503gxk.13 for ; Mon, 22 Mar 2010 09:19:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=4kQo9/KiBiBLYVqfysneIeFu6+5+qsMJTr9lZrB0DLo=; b=xl5XQtSQwiLPK/zdSCeOm/L5erXKwLi8BNrgxczQYPoXBtupxOKVc3M0wD5WsdxER8 QwhcW3IZ77tKPs68ghbe2xv3uCyZAkB4drn2OJHzbwwlaBmA9xJuxGo/Fxcwi0wWj88w j1I6tJ8Bh/drdDXU2qhGpvBlPl9Q2ik/gKdWc= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; b=mN8jORWhrg/qEYHrz+01St/nvznheNnXdYZp7aqfCNyHBkpDlmbkCL3MjsQc96K7z9 WeLd3FwZZlwHjpDez94qH+iL10cGDWaf7SucgFy2aa0OY+cm6XB2IoK88IxWrRGUF3zZ SD/HBR1cWqbF9SRIubpe87oQk14KG3Epp9ui0= MIME-Version: 1.0 Received: by 10.101.189.30 with SMTP id r30mr8435712anp.70.1269273165090; Mon, 22 Mar 2010 08:52:45 -0700 (PDT) In-Reply-To: <201003220839.12907.jhb@freebsd.org> References: <4BA4E7A9.3070502@FreeBSD.org> <4BA6517C.3050509@FreeBSD.org> <20100322124018.7430f45e@ernst.jennejohn.org> <201003220839.12907.jhb@freebsd.org> Date: Mon, 22 Mar 2010 11:52:44 -0400 Message-ID: <3c0b01821003220852r61ca0ae3o95bea1c23ddc34d9@mail.gmail.com> From: Alexander Sack To: John Baldwin Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Cc: FreeBSD-Current , gary.jennejohn@freenet.de, freebsd-arch@freebsd.org Subject: Re: Increasing MAXPHYS X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 22 Mar 2010 16:19:49 -0000 On Mon, Mar 22, 2010 at 8:39 AM, John Baldwin wrote: > On Monday 22 March 2010 7:40:18 am Gary Jennejohn wrote: >> On Sun, 21 Mar 2010 19:03:56 +0200 >> Alexander Motin wrote: >> >> > Scott Long wrote: >> > > Are there non-CAM drivers that look at MAXPHYS, or that silently ass= ume > that >> > > MAXPHYS will never be more than 128k? >> > >> > That is a question. >> > >> >> I only did a quick&dirty grep looking for MAXPHYS in /sys. >> >> Some drivers redefine MAXPHYS to be 512KiB. =A0Some use their own local >> MAXPHYS which is usually 128KiB. >> >> Some look at MAXPHYS to figure out other things; the details escape me. >> >> There's one driver which actually uses 100*MAXPHYS for something, but I >> didn't check the details. >> >> Lots of them were non-CAM drivers AFAICT. > > The problem is the drivers that _don't_ reference MAXPHYS. =A0The driver = author > at the time "knew" that MAXPHYS was 128k, so he did the MAXPHYS-dependent > calculation and just put the result in the driver (e.g. only supporting u= p to > 32 segments (32 4k pages =3D=3D 128k) in a bus dma tag as a magic number = to > bus_dma_tag_create() w/o documenting that the '32' was derived from 128k = or > what the actual hardware limit on nsegments is). =A0These cannot be found= by a > simple grep, they require manually inspecting each driver. 100% awesome comment. On another kernel, I myself was guilty of this crime (I did have a nice comment though above the def). This has been a great thread since our application really needs some of the optimizations that are being thrown around here. We have found in real live performance testing that we are almost always either controller bound (i.e. adding more disks to spread IOPs has little to no effect in large array configurations on throughput, we suspect that is hitting the RAID controller's firmware limitations) or tps bound, i.e. I never thought going from 128k -> 256k per transaction would have a dramatic effect on throughput (but I never verified). Back to HBAs, AFAIK, every modern iteration of the most popular HBAs can easily do way more than a 128k scatter/gather I/O. Do you guys know of any *modern* (circa within the last 3-4 years) that can not do more than 128k at a shot? In other words, I've always thought the limit was kernel imposed and not what the memory controller on the card can do (I certainly never got the impression talking with some of the IHVs over the years that they were designing their hardware for a 128k limit - I sure hope not!). -aps