From owner-freebsd-current@FreeBSD.ORG Sun Mar 21 16:13:07 2010 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 052F3106564A; Sun, 21 Mar 2010 16:13:07 +0000 (UTC) (envelope-from scottl@samsco.org) Received: from pooker.samsco.org (pooker.samsco.org [168.103.85.57]) by mx1.freebsd.org (Postfix) with ESMTP id A9AA88FC21; Sun, 21 Mar 2010 16:13:04 +0000 (UTC) Received: from phobos.samsco.home (phobos.samsco.home [192.168.254.11]) (authenticated bits=0) by pooker.samsco.org (8.14.3/8.14.3) with ESMTP id o2LGD1Tn036769; Sun, 21 Mar 2010 10:13:01 -0600 (MDT) (envelope-from scottl@samsco.org) Mime-Version: 1.0 (Apple Message framework v1077) Content-Type: text/plain; charset=us-ascii From: Scott Long In-Reply-To: <4BA6279E.3010201@FreeBSD.org> Date: Sun, 21 Mar 2010 10:13:01 -0600 Content-Transfer-Encoding: quoted-printable Message-Id: References: <1269109391.00231800.1269099002@10.7.7.3> <1269120182.00231865.1269108002@10.7.7.3> <1269120188.00231888.1269109203@10.7.7.3> <1269123795.00231922.1269113402@10.7.7.3> <1269130981.00231933.1269118202@10.7.7.3> <1269130986.00231939.1269119402@10.7.7.3> <1269134581.00231948.1269121202@10.7.7.3> <1269134585.00231959.1269122405@10.7.7.3> <4BA6279E.3010201@FreeBSD.org> To: Alexander Motin X-Mailer: Apple Mail (2.1077) X-Spam-Status: No, score=-1.0 required=3.8 tests=ALL_TRUSTED autolearn=unavailable version=3.3.0 X-Spam-Checker-Version: SpamAssassin 3.3.0 (2010-01-18) on pooker.samsco.org Cc: freebsd-current@freebsd.org, Ivan Voras , freebsd-arch@freebsd.org Subject: Re: Increasing MAXPHYS X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 21 Mar 2010 16:13:07 -0000 On Mar 21, 2010, at 8:05 AM, Alexander Motin wrote: > Ivan Voras wrote: >> Julian Elischer wrote: >>> You can get better throughput by using TSC for timing because the = geom >>> and devstat code does a bit of timing.. Geom can be told to turn off >>> it's timing but devstat can't. The 170 ktps is with TSC as timer, >>> and geom timing turned off. >>=20 >> I see. I just ran randomio on a gzero device and with 10 userland >> threads (this is a slow 2xquad machine) I get g_up and g_down = saturated >> fast with ~~ 120 ktps. Randomio uses gettimeofday() for measurements. >=20 > I've just got 140Ktps from two real Intel X25-M SSDs on ICH10R AHCI > controller and single Core2Quad CPU. So at least on synthetic tests it > is potentially reachable even with casual hardware, while it = completely > saturated quad-core CPU. >=20 >> Hmm, it looks like it could be easy to spawn more g_* threads (and, >> barring specific class behaviour, it has a fair chance of working out = of >> the box) but the incoming queue will need to also be broken up for >> greater effect. >=20 > According to "notes", looks there is a good chance to obtain races, as > some places expect only one up and one down thread. >=20 I agree that more threads just creates many more race complications. = Even if it didn't, the storage driver is a serialization point; it = doesn't matter if you have a dozen g_* threads if only one of them can = be in the top half of the driver at a time. No amount of fine-grained = locking is going to help this. I'd like to go in the opposite direction. The queue-dispatch-queue = model of GEOM is elegant and easy to extend, but very wasteful for the = simple case, where the simple case is one or two simple partition = transforms (mbr, bsdlabel) and/or a simple stripe/mirror transform. = None of these need a dedicated dispatch context in order to operate. = What I'd like to explore is compiling the GEOM stack at creation time = into a linear array of operations that happen without a g_down/g_up = context switch. As providers and consumers taste each other and build a = stack, that stack gets compiled into a graph, and that graph gets = executed directly from the calling context, both from the dev_strategy() = side on the top and the bio_done() on the bottom. GEOM classes that = need a detached context can mark themselves as such, doing so will = prevent a graph from being created, and the current dispatch model will = be retained. I expect that this will reduce i/o latency by a great margin, thus = directly addressing the performance problem that FusionIO makes an = example of. I'd like to also explore having the g_bio model not require = a malloc at every stage in the stack/graph; even though going through = UMA is fairly fast, it still represents overhead that can be eliminated. = It also represents an out-of-memory failure case that can be prevented. I might try to work on this over the summer. It's really a research = project in my head at this point, but I'm hopeful that it'll show = results. Scott