From owner-freebsd-hackers@FreeBSD.ORG Tue Jul 29 21:51:20 2014 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 7688D4FB; Tue, 29 Jul 2014 21:51:20 +0000 (UTC) Received: from mail-qa0-x22f.google.com (mail-qa0-x22f.google.com [IPv6:2607:f8b0:400d:c00::22f]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 106632A31; Tue, 29 Jul 2014 21:51:19 +0000 (UTC) Received: by mail-qa0-f47.google.com with SMTP id i13so368129qae.34 for ; Tue, 29 Jul 2014 14:51:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=FG/kshe92B2xwlDeapIm0f13D80apjRna8e2Hhagcsc=; b=DD4CX96Ey69B+RKNgNueYx4XctprC/OyEXe9USQf0sJQiuskVYF9teCJTAgsRt6ksP z/Xcu0dDVsdwOgdrvSisaLrVJlsT6kSbDrabq6q+MJEmCg2FAi5WdLsDNFGeYlsgouEs vsRs5Qcupr0vz3Zl2M8l0lSwFn3FkxfpXnvOpdqQyBdVJ2gi1zzCAj9VH5URlG5DK2o/ l5iHVMgWXt7YM0lhTwC8iEciVlkRRF4Lqf4S8B0snU9qAXixOgpS42YP/18QzpHbOPxJ kE30P/VUpd/6DqdgDK34pekUXbWNTA74UHwOT+IezaLg+gueem59R7rxwxA6QuEqmWe6 /7kw== MIME-Version: 1.0 X-Received: by 10.224.42.196 with SMTP id t4mr7988740qae.48.1406670679199; Tue, 29 Jul 2014 14:51:19 -0700 (PDT) Received: by 10.140.27.172 with HTTP; Tue, 29 Jul 2014 14:51:19 -0700 (PDT) In-Reply-To: References: <00E55D89-BDD1-41AD-BBF6-6752B90E8324@ccsys.com> Date: Tue, 29 Jul 2014 14:51:19 -0700 Message-ID: Subject: Re: Working on NUMA support From: Andrew Bates To: Adrian Chadd Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.18 Cc: "freebsd-hackers@freebsd.org" , Jeff Roberson X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 29 Jul 2014 21:51:20 -0000 Hey Adrian, Yes, there has been progress on this - although admittedly not as much as we'd like at this point. I believe to what you're talking about, we have the layout for CPU affinity/locality. I need to go through and cleanup a good half-dozen branches of code. Myself a mere mortal standing on the shoulders of giants in a room of titans, I have to merge in my changes with Jeff's pertinent branch to get this closer to useable. >From my experience and research, in terms of access/response time: 1. localized DMA < all remote 2. (localized DMA + spillover remote) >= all remote As ugly as it may be, I think I said that right.. There have been a few changes since that original email, but yes what we're working to address is the userland <---> kernelspace. On Sat, Jul 26, 2014 at 1:11 PM, Adrian Chadd wrote: > Hi all! > > Has there been any further progress on this? > > I've been working on making the receive side scaling support usable by > mere mortals and I've reached a point where I'm going to need this > awareness in the 10ge/40ge drivers for the hardware I have access to. > > I'm right now more interested in the kernel driver/allocator side of > things, so: > > * when bringing up a NIC, figure out what are the "most local" CPUs to run > on; > * for each NIC queue, figure out what the "most local" bus resources > are for NIC resources like descriptors and packet memory (eg mbufs); > * for each NIC queue, figure out what the "most local" resources are > for local driver structures that the NIC doesn't touch (eg per-queue > state); > * for each RSS bucket, figure out what the "most local" resources are > for things like packet memory (mbufs), tcp/udp/inp control structures, > etc. > > I had a chat with jhb yesterday and he reminded me that y'all at > isilon have been looking into this. > > He described a few interesting cases from the kernel side to me. > > * On architectures with external IO controllers, the path cost from an > IO device to multiple CPUs may be (almost) equivalent, so there's not > a huge penalty to allocate things on the wrong CPU. I think it'll be > nice to get CPU local affinity where possible so we can parallelise > DRAM access fully, but we can play with this and see. > * On architectures with CPU-integrated IO controllers, there's a large > penalty for doing inter-CPU IO, > * .. but there's not such a huge penalty for doing inter-CPU memory access. > > Given that, we may find that we should always put the IO resources > local to the CPU it's attached to, even if we decide to run some / all > of the IO for the device on another CPU. Ie, any RAM that the IO > device is doing data or descriptor DMA into should be local to that > device. John said that in his experience it seemed the penalty for a > non-local CPU touching memory was much less than device DMA crossing > QPI. > > So the tricky bit is figuring that out and expressing it all in a way > that allows us to do memory allocation and CPU binding in a more aware > way. The other half of this tricky thing is to allow it to be easily > overridden by a curious developer or system administrator that wants > to experiment with different policies. > > Now, I'm very specifically only addressing the low level kernel IO / > memory allocation requirements here. There's other things to worry > about up in userland; I think you're trying to address that in your > KPI descriptions. > > Thoughts? > > > -a > -- V/Respectfully, Andrew M Bates