Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 9 Jun 2010 03:01:12 +0530
From:      "Jayachandran C." <c.jayachandran@gmail.com>
To:        Alan Cox <alc@cs.rice.edu>
Cc:        Kostik Belousov <kostikbel@gmail.com>, "Jayachandran C." <jchandra@freebsd.org>, mips@freebsd.org
Subject:   Re: svn commit: r208589 - head/sys/mips/mips
Message-ID:  <AANLkTinzIUOykgwtHlJ2vDwYS9as3ha_BYiy_qRd5h2Q@mail.gmail.com>
In-Reply-To: <4C0DE424.9080601@cs.rice.edu>
References:  <AANLkTimIa3jmBPMhWIOcY6DenGpZ2ZYmqwDTWspVx0-u@mail.gmail.com> <AANLkTil2gE1niUWCHnsTlQvibhxBh7QYwD0TTWo0rj5c@mail.gmail.com> <AANLkTinA2D5iTDGPbflHVzLyAZW-ZewjJkUWWL8FVskr@mail.gmail.com> <4C07E07B.9060802@cs.rice.edu> <AANLkTimjyPc_AXKP1yaJaF1BN7CAGBeNikVzcp9OCb4P@mail.gmail.com> <4C09345F.9040300@cs.rice.edu> <AANLkTinmFOZY3OlaoKStxlNIRBt2G2I4ILkQ1P0CjozG@mail.gmail.com> <4C0D2BEA.6060103@cs.rice.edu> <AANLkTikZxx_30H9geHvZYkYd0sE-wiuZljEd0PAi14ca@mail.gmail.com> <4C0D3F40.2070101@cs.rice.edu> <20100607202844.GU83316@deviant.kiev.zoral.com.ua> <4C0D64B7.7060604@cs.rice.edu> <AANLkTilBxdXxXrWC1cAT0wX9ubmFrvaAdk4feG6PwDYQ@mail.gmail.com> <4C0DE424.9080601@cs.rice.edu>

next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, Jun 8, 2010 at 12:03 PM, Alan Cox <alc@cs.rice.edu> wrote:
> C. Jayachandran wrote:
>>
>> On Tue, Jun 8, 2010 at 2:59 AM, Alan Cox <alc@cs.rice.edu> wrote:
>>
>>>
>>> On 6/7/2010 3:28 PM, Kostik Belousov wrote:
>>>
>>>>
>>>> Selecting a random message in the thread to ask my question.
>>>> Is the issue that page table pages should be allocated from the specif=
ic
>>>> physical region of the memory ? If yes, doesn't i386 PAE has similar
>>>> issue with page directory pointer table ? I see a KASSERT in i386
>>>> pmap that verifies that the allocated table is below 4G, but I do not
>>>> understand how uma ensures the constraint (I suspect that it does not)=
.
>>>>
>>>>
>>>
>>> For i386 PAE, the UMA backend allocator uses kmem_alloc_contig() to
>>> ensure
>>> that the memory is below 4G. =A0The crucial difference between i386 PAE=
 and
>>> MIPS is that for i386 PAE only the top-level table needs to be below a
>>> specific address threshold. =A0Moreover, this level is allocated in a
>>> place,
>>> pmap_pinit(), where we are allowed to sleep.
>>>
>>
>> Yes. I saw the PAE top level page table code and thought I could use
>> that mechanism for allocating MIPS page table pages in the direct
>> mapped memory. The other reference I used was
>> pmap_alloc_zeroed_contig_pages() function in sun4v/sun4v/pmap.c which
>> uses the vm_phys_alloc_contig() and VM_WAIT.
>
> That's unfortunate. =A0:-( =A0Since sun4v is essentially dead code, I've =
never
> spent much time thinking about its pmap implementation. =A0I'll mechanica=
lly
> apply changes to it, but that's about it. =A0I wouldn't recommend using i=
t as
> a reference.
>
>> ... =A0I had also thought of
>> using the VM_FREEPOOL_DIRECT which seemed to be for a similar purpose,
>> but could find see any usage in the kernel.
>>
>>
>
> VM_FREEPOOL_DIRECT is used by at least amd64 and ia64 for page table page=
s
> and small kernel memory allocations. =A0Unlike mips, these machines don't=
 have
> MMU support for a direct map. =A0Their direct maps are just a range of
> mappings in the regular (kernel) page table. =A0So, unlike mips, accesses
> through their direct map may still miss in the TLB and require a page tab=
le
> walk. =A0VM_FREEPOOL_* is a way to increase the physical locality (or
> clustering) of page allocations, so that, for example, page table page
> accesses by the pmap on amd64 are less likely to miss in the TLB. =A0Howe=
ver,
> it doesn't place a hard restriction on the range of physical addresses th=
at
> will be used, which you need for mips.
>
> The impact of this clustering can be significant. =A0For example, on amd6=
4 we
> use 2MB page mappings to implement the direct map. =A0However, old Optero=
ns
> only had 8 data TLB entries for 2MB page mappings. =A0For a uniprocessor
> kernel running on such an Opteron, I measured an 18% reduction in system
> time during a buildworld with the introduction of VM_FREEPOOL_DIRECT. =A0=
(See
> the commit logs for vm/vm_phys.c and the comment that precedes the
> VM_NFREEORDER definition on amd64.)
>
> Until such time as superpage support is ported to mips from the amd64/i38=
6
> pmaps, I don't think there is a point in having more than one VM_FREEPOOL=
_*
> on mips. =A0And then, the point would be to reduce fragmentation of the
> physical memory that could be caused by small allocations, such as page
> table pages.

Thanks for the detailed explanation.

Also, after looking at the code again,  I think vm_phys_alloc_contig()
can optimized not to look into segments which lie outside the area of
interest. The patch is:

Index: sys/vm/vm_phys.c
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
--- sys/vm/vm_phys.c    (revision 208890)
+++ sys/vm/vm_phys.c    (working copy)
@@ -595,7 +595,7 @@
        vm_object_t m_object;
        vm_paddr_t pa, pa_last, size;
        vm_page_t deferred_vdrop_list, m, m_ret;
-       int flind, i, oind, order, pind;
+       int segind, i, oind, order, pind;

        size =3D npages << PAGE_SHIFT;
        KASSERT(size !=3D 0,
@@ -611,21 +611,20 @@
 #if VM_NRESERVLEVEL > 0
 retry:
 #endif
-       for (flind =3D 0; flind < vm_nfreelists; flind++) {
+       for (segind =3D 0; segind < vm_phys_nsegs; segind++) {
+               /*
+                * A free list may contain physical pages
+                * from one or more segments.
+                */
+               seg =3D &vm_phys_segs[segind];
+               if (seg->start > high || low >=3D seg->end)
+                       continue;
+
                for (oind =3D min(order, VM_NFREEORDER - 1); oind <
VM_NFREEORDER; oind++) {
                        for (pind =3D 0; pind < VM_NFREEPOOL; pind++) {
-                               fl =3D vm_phys_free_queues[flind][pind];
+                               fl =3D (*seg->free_queues)[pind];
                                TAILQ_FOREACH(m_ret, &fl[oind].pl, pageq) {
                                        /*
-                                        * A free list may contain
physical pages
-                                        * from one or more segments.
-                                        */
-                                       seg =3D &vm_phys_segs[m_ret->segind=
];
-                                       if (seg->start > high ||
-                                           low >=3D seg->end)
-                                               continue;
-
-                                       /*
                                         * Is the size of this
allocation request
                                         * larger than the largest block si=
ze?
                                         */


-----

This change, along with the vmparam.h changes for HIGHMEM, I think we
should be able to use  vm_phys_alloc_contig() for page table pages (or
have I again missed something fundamental?).

JC.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?AANLkTinzIUOykgwtHlJ2vDwYS9as3ha_BYiy_qRd5h2Q>