From owner-freebsd-hackers@FreeBSD.ORG Tue Sep 24 00:16:47 2013 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id 2B6B69AC; Tue, 24 Sep 2013 00:16:47 +0000 (UTC) (envelope-from adrian.chadd@gmail.com) Received: from mail-wg0-x22e.google.com (mail-wg0-x22e.google.com [IPv6:2a00:1450:400c:c00::22e]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 85BA823A4; Tue, 24 Sep 2013 00:16:46 +0000 (UTC) Received: by mail-wg0-f46.google.com with SMTP id k14so3766510wgh.1 for ; Mon, 23 Sep 2013 17:16:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date:message-id:subject :from:to:cc:content-type; bh=LhM+3pPuO9en/vqO7xzKF2aEwHdfEVXtULHq5TNMgcY=; b=qsuTf5YlAbZfb7vcsGQeeo/arCTyQI/lGKVsyhVtH9cSfq9ka7MM3h7V2eeRt2rk67 17aovFoiK8hvJmvi0pAJ6PLfkBAnHThx+xutpTIHdapgxZIL4b3ZvIk5vJATSLl3DBF3 MwKtf4pxOJ96WeAJvQ1Rwub8a8NA3Q3/NXuH9dbWvQpulwzMOTEfb0nIuzDyi7/3dMC9 mSSETFqU1KQBy13fX+FMdtPTGkG4OiX7To79crMWtTBxbI5FtZKncbzaviqOl+VGmK5+ JYYsgeR08gxMFZ2hBE+cntMRouu2uomN4/IlBrY1E10RiL/U+eHXCAYWEbL+JgZYQlT7 +PBw== MIME-Version: 1.0 X-Received: by 10.180.10.136 with SMTP id i8mr15626450wib.46.1379981804885; Mon, 23 Sep 2013 17:16:44 -0700 (PDT) Sender: adrian.chadd@gmail.com Received: by 10.216.73.133 with HTTP; Mon, 23 Sep 2013 17:16:44 -0700 (PDT) In-Reply-To: References: <1379520488.49964.YahooMailNeo@web193502.mail.sg3.yahoo.com> <22E7E628-E997-4B64-B229-92E425D85084@f5.com> <1379649991.82562.YahooMailNeo@web193502.mail.sg3.yahoo.com> Date: Mon, 23 Sep 2013 17:16:44 -0700 X-Google-Sender-Auth: tOC-hEMLG2MhA2xnqB-nbxZ6YNA Message-ID: Subject: Re: About Transparent Superpages and Non-transparent superapges From: Adrian Chadd To: Sebastian Kuzminsky Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.14 Cc: Patrick Dung , "freebsd-hackers@freebsd.org" , "ivoras@freebsd.org" X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 24 Sep 2013 00:16:47 -0000 On 23 September 2013 14:30, Sebastian Kuzminsky wrote: > On Sep 23, 2013, at 15:24 , Adrian Chadd wrote: > > > On 20 September 2013 08:20, Sebastian Kuzminsky > wrote: > > > > It's transparent for the kernel: all of UMA and > kmem_malloc()/kmem_free() is backed by 1 gig superpages. > > > > .. not entirely true, as I've found out at work. :( > > Can you expand on this, Adrian? > > Did you compile & boot the github branch i pointed to, and run in to a > situation where kmem_malloc() returned memory not backed by 1 gig pages, on > hardware that supports it? > > I haven't done that yet, sorry. So the direct map is backed by 1GB pages, except when it can't be: * first 1GB - because of the memory hole(s) * the 4th GB - because of the PCI IO hole(s) * the end of RAM - because of memory remapping so you don't lose hundreds of megabytes of RAM behind said memory/IO/ROM holes, the end of RAM isn't on a 1GB boundary. So, those regions seem to get mapped by smaller pages. I'm still tinkering with this; I'd like to hack things up to (a) get all the VM structures in the last gig of aligned RAM, so it falls inside a 1GB direct mapped page, and (b) prefer that 1GB page for kernel allocations, so things like mbufs, vm_page entries, etc all end up coming from the same 1GB direct map page. I _think_ I have an idea of what to do - I'll create a couple of 1GB sized freelists in the last two 1GB direct mapped regions at the end of RAM, then I'll hack up the vm_phys allocator to prefer allocating from those. The VM structures stuff is a bit more annoying - it gets allocated from the top of RAM early on during boot; so unless your machine has the last region of RAM fall exactly on a 1GB boundary, it'll be backed by 4k/2m pages. I tested this out by setting hw.physmem to force things to be rounded on a boundary and it helped for a while. Unfortunately the fact that everything else gets allocated from random places in physical memory meant that I'm thrashing the TLB cache - there's only 4 1GB slots on Sandy Bridge Xeon; and with 64gig of RAM I'm seeing a 10-12% miss load when serving lots of traffic from SSD (all with mbuf and vm structure allocations.) So, if I can remove that 10% of CPU cycles taken walking pages, I'll be happy. Note: I'm a newbie here in the physical mapping code. :-) -adrian