Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 7 Jun 2010 23:28:44 +0300
From:      Kostik Belousov <kostikbel@gmail.com>
To:        Alan Cox <alc@cs.rice.edu>
Cc:        "Jayachandran C." <jchandra@freebsd.org>, mips@freebsd.org
Subject:   Re: svn commit: r208589 - head/sys/mips/mips
Message-ID:  <20100607202844.GU83316@deviant.kiev.zoral.com.ua>
In-Reply-To: <4C0D3F40.2070101@cs.rice.edu>
References:  <AANLkTimIa3jmBPMhWIOcY6DenGpZ2ZYmqwDTWspVx0-u@mail.gmail.com> <AANLkTil2gE1niUWCHnsTlQvibhxBh7QYwD0TTWo0rj5c@mail.gmail.com> <AANLkTinA2D5iTDGPbflHVzLyAZW-ZewjJkUWWL8FVskr@mail.gmail.com> <4C07E07B.9060802@cs.rice.edu> <AANLkTimjyPc_AXKP1yaJaF1BN7CAGBeNikVzcp9OCb4P@mail.gmail.com> <4C09345F.9040300@cs.rice.edu> <AANLkTinmFOZY3OlaoKStxlNIRBt2G2I4ILkQ1P0CjozG@mail.gmail.com> <4C0D2BEA.6060103@cs.rice.edu> <AANLkTikZxx_30H9geHvZYkYd0sE-wiuZljEd0PAi14ca@mail.gmail.com> <4C0D3F40.2070101@cs.rice.edu>

next in thread | previous in thread | raw e-mail | index | archive | help

--eM1qf7WYQBf+P1UJ
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Mon, Jun 07, 2010 at 01:49:36PM -0500, Alan Cox wrote:
> C. Jayachandran wrote:
> >On Mon, Jun 7, 2010 at 10:57 PM, Alan Cox <alc@cs.rice.edu> wrote:
> > =20
> >>C. Jayachandran wrote:
> >>   =20
> >>>On Fri, Jun 4, 2010 at 10:44 PM, Alan Cox <alc@cs.rice.edu> wrote:
> >>>
> >>>     =20
> >>>>C. Jayachandran wrote:
> >>>>
> >>>>       =20
> >>>>>On Thu, Jun 3, 2010 at 10:33 PM, Alan Cox <alc@cs.rice.edu> wrote:
> >>>>>
> >>>>>
> >>>>>         =20
> >>>>[snip]
> >>>>
> >>>>       =20
> >>>>>>Add a static counter to pmap_ptpgzone_allocf().  When the counter
> >>>>>>reaches
> >>>>>>some not-too-small prime number, don't call vm_phys_alloc_contig(),
> >>>>>>instead
> >>>>>>set "m" to NULL and reset the counter to zero.  Setting "m" to NULL
> >>>>>>should
> >>>>>>then trigger the vm_contig_grow_cache() call.  Make sense?
> >>>>>>
> >>>>>>
> >>>>>>           =20
> >>>>>Is this to move the vm_contig_grow_cache() out of the
> >>>>>pmap_ptpgzone_allocf() and into the function calling uma_zalloc()?  I
> >>>>>am wondering why the prime number is needed.
> >>>>>
> >>>>>Another implementation I had thought about was to have a kernel
> >>>>>process maintain a pool of direct mapped pages for MIPS. This will be
> >>>>>woken up if the pool goes below a low water mark - any comments on
> >>>>>this?
> >>>>>
> >>>>>
> >>>>>
> >>>>>         =20
> >>>>Here is how the page table page allocation should be done.  It's not=
=20
> >>>>much
> >>>>harder than the vm_contig_grow_cache() change.
> >>>>
> >>>>1. Look at vm/vm_phys.c, in particular, vm_phys_init().  Create a new
> >>>>freelist, like      VM_FREELIST_ISADMA, for the direct-mapped memory =
on
> >>>>MIPS.  Perhaps, call it VM_FREELIST_LOWMEM.  The controlling macros
> >>>>should
> >>>>be defined in mips/include/vmparam.h.
> >>>>
> >>>>2. Trivially refactor vm_phys_alloc_pages().  Specifically, take the=
=20
> >>>>body
> >>>>of
> >>>>the outer for loop and make it a new function, vm_phys_alloc_freelist=
(),
> >>>> that takes the variable "flind" as a parameter.
> >>>>
> >>>>3. Eliminate the UMA zone for page table pages.  In place of
> >>>>vm_phys_alloc_contig() call your new function with VM_FREELIST_LOWMEM.
> >>>> (The
> >>>>vm_contig_grow_cache() remains unchanged.)  Go back to calling
> >>>>vm_page_free() to release page table pages.
> >>>>
> >>>>For the record, the changes that you made to start using a zone for p=
age
> >>>>table page allocation introduced at least one possible race condition
> >>>>between pmap_remove_pages() and pmap_remove_all().  Specifically, by
> >>>>dropping and reacquiring the page queues and pmap lock when you free a
> >>>>page
> >>>>table page, you allow a pmap_remove_all() call while pmap_remove_page=
s()
> >>>>is
> >>>>running to leave the variable "npv" in pmap_remove_pages() pointing a=
t a
> >>>>freed pv entry.
> >>>>
> >>>>       =20
> >>>My first attempt is attached, it comes up multiuser but crashes if I
> >>>stress it a bit (panic: Bad link elm 0xc0078f00 prev->next !=3D elm).
> >>>Will look at this tomorrow, and see if I can find the cause.
> >>>
> >>>
> >>>     =20
> >>Where exactly is this occurring?
> >>
> >>   =20
> >>>In the meantime, it may be worth looking at the patch to see if this
> >>>is in line with the approach you had suggested. I have tried to  use
> >>>VM_FREELIST_HIGHMEM which is already there, instead of adding
> >>>VM_FREELIST_LOWMEM.
> >>>
> >>>
> >>>     =20
> >>Basically, yes.  At first, I was taken aback by defining=20
> >>VM_FREELIST_DEFAULT
> >>as other than zero, but it shouldn't break anything.
> >>   =20
> >
> >I can change this to add VM_FREELIST_LOWMEM. This works okay so far,
> >even though I haven't really checked the code to see if it works when
> >there is no memory in HIGHMEM.
> >
> > =20
>=20
> I think that it will.  It will just waste a little time looking at empty=
=20
> queues.
>=20
> >>The one issue that I see with the patch is that you'll need to release =
and
> >>reacquire the pmap and page queues locks around the call to
> >>vm_contig_grow_cache().  However, if you release these locks, you need =
to
> >>restart the pmap_allocpte(), i.e., perform the "goto retry;", because=
=20
> >>while
> >>you slept another processor/thread might have already allocated and
> >>installed the page table page that you are trying to allocate into the=
=20
> >>page
> >>table.
> >>
> >>For what it's worth, this same race condition exists in the current cod=
e=20
> >>in
> >>subversion using an UMA zone, but not in the code that predates use of =
an
> >>UMA zone.
> >>   =20
> >
> >I have been testing with an updated version of the patch - my -j128
> >buildworld works on this. The sys/vm side changes are simple enough,
> >but the he page table page allocation part is still not complete. The
> >version I have now is (with gmail whitespace damage):
> >
> >---
> >static vm_page_t
> >pmap_alloc_pte_page(pmap_t pmap, unsigned int index, int wait, vm_offset=
_t=20
> >*vap)
> >{
> >       vm_paddr_t paddr;
> >       vm_page_t m;
> >       int tries, flags;
> >
> >       tries =3D 0;
> >retry:
> >       mtx_lock(&vm_page_queue_free_mtx);
> >       m =3D vm_phys_alloc_freelist_pages(VM_FREELIST_DEFAULT,
> >           VM_FREEPOOL_DEFAULT, 0);
> >       if (m =3D=3D NULL) {
> >               mtx_unlock(&vm_page_queue_free_mtx);
> >               if (tries < ((wait & M_NOWAIT) !=3D 0 ? 1 : 3)) {
> >                       vm_contig_grow_cache(tries, 0,=20
> >                       MIPS_KSEG0_LARGEST_PHYS);
> >                       tries++;
> >                       goto retry;
> >               } else {
> >                       printf("[%d] %s wait %x, fail!\n", tries, __func_=
_,
> >                           wait);
> >                       return (NULL);
> >               }
> >       }
> >
> >       m->pindex =3D index;
> >       m->valid =3D VM_PAGE_BITS_ALL;
> >       atomic_add_int(&cnt.v_wire_count, 1);
> >       m->wire_count =3D 1;
> >       if (m->flags & PG_ZERO)
> >               vm_page_zero_count--;
> >       else
> >               pmap_zero_page(m);
> >       flags =3D PG_ZERO | PG_UNMANAGED;
> >       m->flags =3D flags;
> >       m->oflags =3D 0;
> >       m->act_count =3D 0;
> >       if ((m->flags & PG_CACHED) !=3D 0)
> >               printf("Not yet! handle cached  page %p flags 0x%x\n",
> >m, m->flags);
> >       else
> >               cnt.v_free_count--;
> >
> >       m->act_count =3D 0;
> >       mtx_unlock(&vm_page_queue_free_mtx);
> >}
> >
> >
> >----
> >
> >I tried to match the vm_page_alloc() code here. I'm not sure if I have
> >to handle the PG_CACHED case, and if it is okay to change the VM
> >counters here.
> >
> > =20
>=20
> You do.  In the end, I suspect that this function will be added to the=20
> machine-independent layer.
>=20
> >Also, I will change the code to  return NULL all the way and retry
> >allocation (similar to the old code which called VM_WAIT and then
> >returned NULL).
> >
> >Probably I should get the whole code in line with the old code,
> >replacing the old vm_page_alloc() call with the function above without
> >retry, and the VM_WAIT with a new function which does the
> >vm_contig_grow_cache().
> >
> > =20
>=20
> You may find it easier if you move the call to vm_contig_grow_cache() to=
=20
> pmap_allocpte().

Selecting a random message in the thread to ask my question.
Is the issue that page table pages should be allocated from the specific
physical region of the memory ? If yes, doesn't i386 PAE has similar
issue with page directory pointer table ? I see a KASSERT in i386
pmap that verifies that the allocated table is below 4G, but I do not
understand how uma ensures the constraint (I suspect that it does not).

--eM1qf7WYQBf+P1UJ
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (FreeBSD)

iEYEARECAAYFAkwNVnsACgkQC3+MBN1Mb4gM3QCdHNkTwnwGd8Yb3M3XdZDpUtHb
ACgAn0DBpuImieTVCYtqMtGuZO0HWiha
=Ji95
-----END PGP SIGNATURE-----

--eM1qf7WYQBf+P1UJ--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20100607202844.GU83316>