Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 22 Aug 2001 00:29:28 -0700
From:      Terry Lambert <tlambert2@mindspring.com>
To:        Mitsuru IWASAKI <iwasaki@jp.FreeBSD.org>
Cc:        peter@wemm.org, arch@FreeBSD.ORG, audit@FreeBSD.ORG, kumabu@t3.rim.or.jp
Subject:   Re: CFR: Timing to enable CR4.PGE bit
Message-ID:  <3B835F58.68534CCE@mindspring.com>
References:  <20010809035801V.iwasaki@jp.FreeBSD.org> <20010817072149.0BCD63811@overcee.netplex.com.au> <20010822020634P.iwasaki@jp.FreeBSD.org>

next in thread | previous in thread | raw e-mail | index | archive | help
Mitsuru IWASAKI wrote:
> > This part is fine.
> 
> OK, I'll commit this one first.

What does setting PGE early do for you?

I use PGE to avoid TLB shootdown on a number of memory regions
shared between user and kernel space (including zero system
call time functions), but setting it early seems wrong.

Specifically, the conceptual idea is to make a VM that looks
exactly like real memory, with the smallest relocation code
chunk possible, so that as much as possible can be done in C
code, and there's as little strangeness as possible (e.g. the
evil that is machdep.c, and the "magic" numbers in pmap.h
that have to match exactly the magic address at which the
kernel gets linked, and have to be offset exactly by the SMP
pages and other "off by one" hidden values).


> > However:
> >
> > > Also I have another thing to be confirmed.  Should we utilize TLB by
> > > enabling PGE bit at very later stage?  I think it would be more
> > > efficient to cache page entries with G flag in multi-user environment,
> > > not in kernel bootstrap.  If we enable PGE bit in locore.s, TLB could
> > > be occupied by entries which is referenced by initialization code
> > > (yes, most of them are executed only once).
> > > # but I could be wrong...

PGE might be useful for shared libraries.  It's set on the
kernel itself, which means that trapping to kernel mode does
not end up costing unnecessary overhead.  It's kind of ugly,
when the 4M page is set on the kernel, which loses the page
table page for the 4k pages (yuck), and it's not nice for
the case where the kernel gets larger than 4M.

From a practical point of view, the hassle of having to set
and unset a bit in CR3 to cause the TLB shootdown to occur
is not really worth setting the PGE bit so early that you do
not have most of the PTE's set up.


> > The G bit does not "lock" the TLB entries in.  All it does is stop
> > unnecessary flushes when %cr3 is changed.  If entries are not used
> > for a short while, they will be recycled when the TLB slot is needed
> > for something else soon enough.  ie: this should not be a problem.

It also stops necessary ones, unless you bounce it off, hit
CR3, and bounce it back on... that's the strange code around
the 4M page enable code.

> My point is that users need higher system performance in multi-user
> environment rather than in kernel bootstrap.  Also PGE bit has effects
> in multi-user environment where %cr3 is changed frequently.
> I think enabling PGE in early stage of kernel bootstrap won't give us
> performance advantages, entries which is used in bootstrap will remain
> in the TLB as Intel's document says;
> ----
> 3.7. TRANSLATION LOOKASIDE BUFFERS (TLBS)
> [snip]
> When the processor loads a page-directory or page-table entry for a
> global page into a TLB, the entry will remain in the TLB indefinitely.
> The only way to deterministically invalidate global page entries is to
> clear the PGE flag and then invalidate the TLBs or to use the INVLPG
> instruction to invalidate individual page-directory or page-table
> entries in the TLBs.
> ----

The INVLPG doesn't work exactly like you think it should, with
PGE on, on more recent processors, unfortunately.


> According to i386/locore.s, it seems that PTEs for kernel text, data,
> bss and symbols have PG_G bit, I worry that it is enough many to fill
> TLB slot out...

The kernel is in a 4M page in most cases, so it's not an
issue in most cases.  It's really very important that you
not have to flush in the case of a kernel entry (interrupt,
system call, etc.), since it _will_ make a protection domain
crossing significantly more expensive.

Also, note that the 4M pages are in a seperate 8 entry conflict
domain, and aren't in the same 16 entry data or 16 entry
instruction TLB's, on every processor where they are supported,
so the kernel is not competing with user space code anyway.

NB: 4M pages only make sense in certain specific limited
situations... using up 4M chunks of KVA space is generally a
bad idea, unless the objects you are using them for are really
4M or larger in size.  This is particularly true on 4G machines,
where you really don't have any sparseness to burn on unused
pages, and can't afford to use the remainder space without the
same mapping you used for the rest of it (e.g. for libc.so, a
copy-on-write page that is also executable, unless you split
the code and data across the page boundary).

-- Terry

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3B835F58.68534CCE>