Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 30 Sep 2014 14:37:01 +0200
From:      Svatopluk Kraus <onwahe@gmail.com>
To:        Alan Cox <alc@rice.edu>
Cc:        alc@freebsd.org, FreeBSD Arch <freebsd-arch@freebsd.org>
Subject:   Re: vm_page_array and VM_PHYSSEG_SPARSE
Message-ID:  <CAFHCsPWGS9pn4z1cNDr5a1y-ovvHa%2B_z0ieBqvSszKAZ6qdAkw@mail.gmail.com>
In-Reply-To: <5428AF3B.1030906@rice.edu>
References:  <CAFHCsPWkq09_RRDz7fy3UgsRFv8ZbNKdAH2Ft0x6aVSwLPi6BQ@mail.gmail.com> <CAJUyCcPXBuLu0nvaCqpg8NJ6KzAX9BA1Rt%2BooD%2B3pzq%2BFV%2B%2BTQ@mail.gmail.com> <CAFHCsPWq9WqeFnx1a%2BStfSxj=jwcE9GPyVsoyh0%2Bazr3HmM6vQ@mail.gmail.com> <5428AF3B.1030906@rice.edu>

next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, Sep 29, 2014 at 3:00 AM, Alan Cox <alc@rice.edu> wrote:

>   On 09/27/2014 03:51, Svatopluk Kraus wrote:
>
>
> On Fri, Sep 26, 2014 at 8:08 PM, Alan Cox <alan.l.cox@gmail.com> wrote:
>
>>
>>
>>  On Wed, Sep 24, 2014 at 7:27 AM, Svatopluk Kraus <onwahe@gmail.com>
>> wrote:
>>
>>> Hi,
>>>
>>> I and Michal are finishing new ARM pmap-v6 code. There is one problem
>>> we've
>>> dealt with somehow, but now we would like to do it better. It's about
>>> physical pages which are allocated before vm subsystem is initialized.
>>> While later on these pages could be found in vm_page_array when
>>> VM_PHYSSEG_DENSE memory model is used, it's not true for
>>> VM_PHYSSEG_SPARSE
>>> memory model. And ARM world uses VM_PHYSSEG_SPARSE model.
>>>
>>> It really would be nice to utilize vm_page_array for such preallocated
>>> physical pages even when VM_PHYSSEG_SPARSE memory model is used. Things
>>> could be much easier then. In our case, it's about pages which are used
>>> for
>>> level 2 page tables. In VM_PHYSSEG_SPARSE model, we have two sets of such
>>> pages. First ones are preallocated and second ones are allocated after vm
>>> subsystem was inited. We must deal with each set differently. So code is
>>> more complex and so is debugging.
>>>
>>> Thus we need some method how to say that some part of physical memory
>>> should be included in vm_page_array, but the pages from that region
>>> should
>>> not be put to free list during initialization. We think that such
>>> possibility could be utilized in general. There could be a need for some
>>> physical space which:
>>>
>>> (1) is needed only during boot and later on it can be freed and put to vm
>>> subsystem,
>>>
>>> (2) is needed for something else and vm_page_array code could be used
>>> without some kind of its duplication.
>>>
>>> There is already some code which deals with blacklisted pages in
>>> vm_page.c
>>> file. So the easiest way how to deal with presented situation is to add
>>> some callback to this part of code which will be able to either exclude
>>> whole phys_avail[i], phys_avail[i+1] region or single pages. As the
>>> biggest
>>> phys_avail region is used for vm subsystem allocations, there should be
>>> some more coding. (However, blacklisted pages are not dealt with on that
>>> part of region.)
>>>
>>> We would like to know if there is any objection:
>>>
>>> (1) to deal with presented problem,
>>> (2) to deal with the problem presented way.
>>> Some help is very appreciated. Thanks
>>>
>>>
>>
>> As an experiment, try modifying vm_phys.c to use dump_avail instead of
>> phys_avail when sizing vm_page_array.  On amd64, where the same problem
>> exists, this allowed me to use VM_PHYSSEG_SPARSE.  Right now, this is
>> probably my preferred solution.  The catch being that not all architectures
>> implement dump_avail, but my recollection is that arm does.
>>
>
> Frankly, I would prefer this too, but there is one big open question:
>
> What is dump_avail for?
>
>
>
> dump_avail[] is solving a similar problem in the minidump code, hence, the
> prefix "dump_" in its name.  In other words, the minidump code couldn't use
> phys_avail[] either because it didn't describe the full range of physical
> addresses that might be included in a minidump, so dump_avail[] was created.
>
> There is already precedent for what I'm suggesting.  dump_avail[] is
> already (ab)used outside of the minidump code on x86 to solve this same
> problem in x86/x86/nexus.c, and on arm in arm/arm/mem.c.
>
>
>  Using it for vm_page_array initialization and segmentation means that
> phys_avail must be a subset of it. And this must be stated and be visible
> enough. Maybe it should be even checked in code. I like the idea of
> thinking about dump_avail as something what desribes all memory in a
> system, but it's not how dump_avail is defined in archs now.
>
>
>
> When you say "it's not how dump_avail is defined in archs now", I'm not
> sure whether you're talking about the code or the comments.  In terms of
> code, dump_avail[] is a superset of phys_avail[], and I'm not aware of any
> code that would have to change.  In terms of comments, I did a grep looking
> for comments defining what dump_avail[] is, because I couldn't remember
> any.  I found one ... on arm.  So, I don't think it's a onerous task
> changing the definition of dump_avail[].  :-)
>
> Already, as things stand today with dump_avail[] being used outside of the
> minidump code, one could reasonably argue that it should be renamed to
> something like phys_exists[].
>
>
>
> I will experiment with it on monday then. However, it's not only about how
> memory segments are created in vm_phys.c, but it's about how vm_page_array
> size is computed in vm_page.c too.
>
>
>
> Yes, and there is also a place in vm_reserv.c that needs to change.   I've
> attached the patch that I developed and tested a long time ago.  It still
> applies cleanly and runs ok on amd64.
>


I've thought about it once again considering what I've learned yesterday
from all archs in tree.

(1) In arm, there are defined two extra arrays - hwregions and exregions -
which already describes all memory on platform. This regions could be
filled from FDT.

=> There is no need to describe all memory on platform for MI kernel sake,
but there is need to describe memory which

(a) should be kept in vm_page_array and,
(b) is available to vm subsystem.

For (a) phys_held[] or phys_known[] or phys_filed[] or phys_managed[] or ...
For (b) phys_avail[].

(2) In mips, dump_avail[] is defined from phys_avail[] and is same.

=> One step is a renaming of misused dump_avail, for example to
phys_known[]. But why not to do another step? The phys_known[] must be a
superset of phys_avail[] and so superior. Thus phys_avail[] should be
defined from phys_known[].

(3) In sparc64 and powerpc, dump_avail[] is not defined.

=> Why to force all archs to use both phys_known[] and phys_avail[]. If a
pointer to phys_avail[] will be defined, then if not NULL, the array being
pointed will be used. Otherwise, phys_known[] will be used. The pointer can
be defined in MI code and initialized to NULL. So in archs which do not
need both arrays there will be no need to do anything.

The vm_reserv_init() and vm_phys_init() are called from vm_page_startup().
All three functions are only ones in MI code involved in this thing. The
first two of them have no arguments now. If this will be changed and a
pointer to array which should be processed will be given to them, then only
real work remains in vm_page_startup().

What do you think about this?
Svata



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAFHCsPWGS9pn4z1cNDr5a1y-ovvHa%2B_z0ieBqvSszKAZ6qdAkw>