From owner-freebsd-current@FreeBSD.ORG Mon Jul 15 19:41:42 2013 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 3BB963F3; Mon, 15 Jul 2013 19:41:42 +0000 (UTC) (envelope-from alan.l.cox@gmail.com) Received: from mail-ea0-x235.google.com (mail-ea0-x235.google.com [IPv6:2a00:1450:4013:c01::235]) by mx1.freebsd.org (Postfix) with ESMTP id A29F55E2; Mon, 15 Jul 2013 19:41:41 +0000 (UTC) Received: by mail-ea0-f181.google.com with SMTP id a15so8006290eae.26 for ; Mon, 15 Jul 2013 12:41:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:reply-to:in-reply-to:references:date:message-id :subject:from:to:cc:content-type; bh=UFIFRouyvoJZHlCqhqIivRA7Z0tqFTvkIl00iXFjnK8=; b=0nfw6v26bK4HOMfx8DLYPQDlar8Rk49YKrdgMnkyEDUWHhVch/M2LqhbIwow3Fclne OujcnLvpyDrRK+ke8MOwC/sD2o/J8ciqzTIWHs1WZdiP5xunRua7CWeCznIsAtcCDSi/ 9CquaWfPgSGt7CV9OIjgo+wLdoA2i4w0+Ln+B4OPXy07mMVU1Vk0b61abMIHBYuu5/h+ PBnTF3UgkaKmMr/U/B3OEwhAPdpet03pI+Jrc97zFUZX5MCiB5mnxxGk0m5LyvKL2dD+ QOtK7FPa0BtzsswbSKQzo0c9FJ23+cWU0vM+Fx+xegn+2XWjW9fXmmWXJflvZkWu9riI VkUA== MIME-Version: 1.0 X-Received: by 10.14.216.73 with SMTP id f49mr60078871eep.119.1373917299999; Mon, 15 Jul 2013 12:41:39 -0700 (PDT) Received: by 10.223.61.130 with HTTP; Mon, 15 Jul 2013 12:41:39 -0700 (PDT) In-Reply-To: <201306190832.r5J8WZFE082135@elf.torek.net> References: <201306190832.r5J8WZFE082135@elf.torek.net> Date: Mon, 15 Jul 2013 12:41:39 -0700 Message-ID: Subject: Re: expanding past 1 TB on amd64 From: Alan Cox To: Chris Torek , neel@freebsd.org Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.14 Cc: freebsd-current X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list Reply-To: alc@freebsd.org List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 15 Jul 2013 19:41:42 -0000 On Wed, Jun 19, 2013 at 1:32 AM, Chris Torek wrote: > In src/sys/amd64/include/vmparam.h is this handy map: > > * 0x0000000000000000 - 0x00007fffffffffff user map > * 0x0000800000000000 - 0xffff7fffffffffff does not exist (hole) > * 0xffff800000000000 - 0xffff804020100fff recursive page table (512GB > slot) > * 0xffff804020101000 - 0xfffffdffffffffff unused > * 0xfffffe0000000000 - 0xfffffeffffffffff 1TB direct map > * 0xffffff0000000000 - 0xffffff7fffffffff unused > * 0xffffff8000000000 - 0xffffffffffffffff 512GB kernel map > > showing that the system can deal with at most 1 TB of address space > (because of the direct map), using at most half of that for kernel > memory (less, really, due to the inevitable VM fragmentation). > > New boards are coming soonish that will have the ability to go > past that (24 DIMMs of 64 GB each = 1.5 TB). Or, if some crazy > people :-) might want to use a most of a 768 GB board (24 DIMMs of > 32 GB each, possible today although the price is kind of > staggering) as wired-down kernel memory, the 512 GB VM area is > already a problem. > > I have not wrapped my head around the amd64 pmap code but figured > I'd ask: what might need to change to support larger spaces? > Obviously NKPML4E in amd64/include/pmap.h, for the kernel start > address; and NDMPML4E for the direct map. It looks like this > would adjust KERNBASE and the direct map appropriately. But would > that suffice, or have I missed something? > > For that matter, if these are changed to make space for future > expansion, what would be a good expansion size? Perhaps multiply > the sizes by 16? (If memory doubles roughly every 18 months, > that should give room for at least 5 years.) > > Chris, Neel, The actual data that I've seen shows that DIMMs are doubling in size at about half that pace, about every three years. For example, see http://users.ece.cmu.edu/~omutlu/pub/mutlu_memory-scaling_imw13_invited-talk.pdfslide #8. So, I think that a factor of 16 is a lot more than we'll need in the next five years. I would suggest configuring the kernel virtual address space for 4 TB. Once you go beyond 512 GB, 4 TB is the net "plateau" in terms of address translation cost. At 4 TB all of the PML4 entries for the kernel virtual address space will reside in the same L2 cache line, so a page table walk on a TLB miss for an instruction fetch will effectively prefetch the PML4 entry for the kernel heap and vice versa. Also, I don't know if this is immediately relevant to the patch, but the reason that the direct map is currently twice the size of the kernel virtual address space is that the largest machine (in terms of physical memory) that we were running on a couple of years ago had a sparse physical address space. Specifically, we needed to have a direct map spanning 1 TB in order to support 256 GB of RAM on that machine. This may, for example, become an issue if you try to autosize the direct map based upon the amount of DRAM. Alan