Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 5 Feb 2013 09:12:41 -0800
From:      Neel Natu <neelnatu@gmail.com>
To:        Konstantin Belousov <kostikbel@gmail.com>
Cc:        alc@freebsd.org, davide@freebsd.org, hackers@freebsd.org, avg@freebsd.org, rank1seeker@gmail.com
Subject:   Re: dynamically calculating NKPT [was: Re: huge ktr buffer]
Message-ID:  <CAFgRE9GMeY4dVAzqUsHz2emo82dVODBDw2xYJMcPmxxTm6Rx=g@mail.gmail.com>
In-Reply-To: <20130205151413.GL2522@kib.kiev.ua>
References:  <CAFgRE9F4JMutV9jJ_m7_9va67xiX4YXMT%2BRm6rUoDPMPymsg4w@mail.gmail.com> <20130205151413.GL2522@kib.kiev.ua>

next in thread | previous in thread | raw e-mail | index | archive | help
Hi Konstantin,

On Tue, Feb 5, 2013 at 7:14 AM, Konstantin Belousov <kostikbel@gmail.com> wrote:
> On Mon, Feb 04, 2013 at 03:05:15PM -0800, Neel Natu wrote:
>> Hi,
>>
>> I have a patch to dynamically calculate NKPT for amd64 kernels. This
>> should fix the various issues that people pointed out in the email
>> thread.
>>
>> Please review and let me know if there are any objections to committing this.
>>
>> Also, thanks to Alan (alc@) for reviewing and providing feedback on
>> the initial version of the patch.
>>
>> Patch (also available at http://people.freebsd.org/~neel/patches/nkpt_diff.txt):
>>
>> Index: sys/amd64/include/pmap.h
>> ===================================================================
>> --- sys/amd64/include/pmap.h  (revision 246277)
>> +++ sys/amd64/include/pmap.h  (working copy)
>> @@ -113,13 +113,7 @@
>>       ((unsigned long)(l2) << PDRSHIFT) | \
>>       ((unsigned long)(l1) << PAGE_SHIFT))
>>
>> -/* Initial number of kernel page tables. */
>> -#ifndef NKPT
>> -#define      NKPT            32
>> -#endif
>> -
>>  #define NKPML4E              1               /* number of kernel PML4 slots */
>> -#define NKPDPE               howmany(NKPT, NPDEPG)/* number of kernel PDP slots */
>>
>>  #define      NUPML4E         (NPML4EPG/2)    /* number of userland PML4 pages */
>>  #define      NUPDPE          (NUPML4E*NPDPEPG)/* number of userland PDP pages */
>> @@ -181,6 +175,7 @@
>>  #define      PML4map         ((pd_entry_t *)(addr_PML4map))
>>  #define      PML4pml4e       ((pd_entry_t *)(addr_PML4pml4e))
>>
>> +extern int nkpt;             /* Initial number of kernel page tables */
>>  extern u_int64_t KPDPphys;   /* physical address of kernel level 3 */
>>  extern u_int64_t KPML4phys;  /* physical address of kernel level 4 */
>>
>> Index: sys/amd64/amd64/minidump_machdep.c
>> ===================================================================
>> --- sys/amd64/amd64/minidump_machdep.c        (revision 246277)
>> +++ sys/amd64/amd64/minidump_machdep.c        (working copy)
>> @@ -232,7 +232,7 @@
>>       /* Walk page table pages, set bits in vm_page_dump */
>>       pmapsize = 0;
>>       pdp = (uint64_t *)PHYS_TO_DMAP(KPDPphys);
>> -     for (va = VM_MIN_KERNEL_ADDRESS; va < MAX(KERNBASE + NKPT * NBPDR,
>> +     for (va = VM_MIN_KERNEL_ADDRESS; va < MAX(KERNBASE + nkpt * NBPDR,
>>           kernel_vm_end); ) {
>>               /*
>>                * We always write a page, even if it is zero. Each
>> @@ -364,7 +364,7 @@
>>       /* Dump kernel page directory pages */
>>       bzero(fakepd, sizeof(fakepd));
>>       pdp = (uint64_t *)PHYS_TO_DMAP(KPDPphys);
>> -     for (va = VM_MIN_KERNEL_ADDRESS; va < MAX(KERNBASE + NKPT * NBPDR,
>> +     for (va = VM_MIN_KERNEL_ADDRESS; va < MAX(KERNBASE + nkpt * NBPDR,
>>           kernel_vm_end); va += NBPDP) {
>>               i = (va >> PDPSHIFT) & ((1ul << NPDPEPGSHIFT) - 1);
>>
>> Index: sys/amd64/amd64/pmap.c
>> ===================================================================
>> --- sys/amd64/amd64/pmap.c    (revision 246277)
>> +++ sys/amd64/amd64/pmap.c    (working copy)
>> @@ -202,6 +202,10 @@
>>  vm_offset_t virtual_avail;   /* VA of first avail page (after kernel bss) */
>>  vm_offset_t virtual_end;     /* VA of last avail page (end of kernel AS) */
>>
>> +int nkpt;
>> +SYSCTL_INT(_machdep, OID_AUTO, nkpt, CTLFLAG_RD, &nkpt, 0,
>> +    "Number of kernel page table pages allocated on bootup");
>> +
>>  static int ndmpdp;
>>  static vm_paddr_t dmaplimit;
>>  vm_offset_t kernel_vm_end = VM_MIN_KERNEL_ADDRESS;
>> @@ -495,17 +499,42 @@
>>
>>  CTASSERT(powerof2(NDMPML4E));
>>
>> +/* number of kernel PDP slots */
>> +#define      NKPDPE(ptpgs)           howmany((ptpgs), NPDEPG)
>> +
>>  static void
>> +nkpt_init(vm_paddr_t addr)
>> +{
>> +     int pt_pages;
>> +
>> +#ifdef NKPT
>> +     pt_pages = NKPT;
>> +#else
>> +     pt_pages = howmany(addr, 1 << PDRSHIFT);
>> +     pt_pages += NKPDPE(pt_pages);
>> +
>> +     /*
>> +      * Add some slop beyond the bare minimum required for bootstrapping
>> +      * the kernel.
>> +      *
>> +      * This is quite important when allocating KVA for kernel modules.
>> +      * The modules are required to be linked in the negative 2GB of
>> +      * the address space.  If we run out of KVA in this region then
>> +      * pmap_growkernel() will need to allocate page table pages to map
>> +      * the entire 512GB of KVA space which is an unnecessary tax on
>> +      * physical memory.
>> +      */
>> +     pt_pages += 4;          /* 8MB additional slop for kernel modules */
> 8MB might be to low. I just checked one of my machines with fully
> modularized kernel, it takes slightly more than 6 MB to load 50 modules.
> I think that 16MB would be safer, but it probably needs to be scaled
> down based on the available phys memory. amd64 kernel could be booted
> on 128MB machine still.

Sounds fine. I can bump it up to 8 pages.

Also, wrt your comment about scaling this number based on available
memory, I wonder if it makes sense to optimize for 16KB of additional
space.

I would much rather work with you and Alan to fix pmap_growkernel() so
we don't need to care about this slack in the first place :-)

best
Neel



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAFgRE9GMeY4dVAzqUsHz2emo82dVODBDw2xYJMcPmxxTm6Rx=g>