Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 27 Oct 2014 23:42:43 +0100
From:      Svatopluk Kraus <onwahe@gmail.com>
To:        Alan Cox <alc@rice.edu>
Cc:        alc@freebsd.org, FreeBSD Arch <freebsd-arch@freebsd.org>
Subject:   Re: vm_page_array and VM_PHYSSEG_SPARSE
Message-ID:  <CAFHCsPX_ukk%2B_8Lrnj7svnhb4Mz%2BGViOwtaKk_r6S_Fo7gfHGg@mail.gmail.com>
In-Reply-To: <544E7376.6040002@rice.edu>
References:  <CAFHCsPWkq09_RRDz7fy3UgsRFv8ZbNKdAH2Ft0x6aVSwLPi6BQ@mail.gmail.com> <CAJUyCcPXBuLu0nvaCqpg8NJ6KzAX9BA1Rt%2BooD%2B3pzq%2BFV%2B%2BTQ@mail.gmail.com> <CAFHCsPWq9WqeFnx1a%2BStfSxj=jwcE9GPyVsoyh0%2Bazr3HmM6vQ@mail.gmail.com> <5428AF3B.1030906@rice.edu> <CAFHCsPWxF0G%2BbqBYgxH=WtV%2BSt_UTWZj%2BY2-PHfoYSLjC_Qpig@mail.gmail.com> <54497DC1.5070506@rice.edu> <CAFHCsPVj3PGbkSmkKsd2bGvmh3%2BdZLABi=AR7jQ4qJ8CigE=8Q@mail.gmail.com> <544DED4C.3010501@rice.edu> <CAFHCsPV1H6XsOoDFitQFgJOP6u%2BgiEM=N--_7Q9uoWbYnAaeYQ@mail.gmail.com> <544E7376.6040002@rice.edu>

next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, Oct 27, 2014 at 5:31 PM, Alan Cox <alc@rice.edu> wrote:
>
> On 10/27/2014 08:22, Svatopluk Kraus wrote:
>
>
> On Mon, Oct 27, 2014 at 7:59 AM, Alan Cox <alc@rice.edu> wrote:
>>
>> On 10/24/2014 06:33, Svatopluk Kraus wrote:
>>
>>
>> On Fri, Oct 24, 2014 at 12:14 AM, Alan Cox <alc@rice.edu> wrote:
>>>
>>> On 10/08/2014 10:38, Svatopluk Kraus wrote:
>>> > On Mon, Sep 29, 2014 at 3:00 AM, Alan Cox <alc@rice.edu> wrote:
>>> >
>>> >>   On 09/27/2014 03:51, Svatopluk Kraus wrote:
>>> >>
>>> >>
>>> >> On Fri, Sep 26, 2014 at 8:08 PM, Alan Cox <alan.l.cox@gmail.com>
wrote:
>>> >>
>>> >>>
>>> >>>  On Wed, Sep 24, 2014 at 7:27 AM, Svatopluk Kraus <onwahe@gmail.com>
>>> >>> wrote:
>>> >>>
>>> >>>> Hi,
>>> >>>>
>>> >>>> I and Michal are finishing new ARM pmap-v6 code. There is one
problem
>>> >>>> we've
>>> >>>> dealt with somehow, but now we would like to do it better. It's
about
>>> >>>> physical pages which are allocated before vm subsystem is
initialized.
>>> >>>> While later on these pages could be found in vm_page_array when
>>> >>>> VM_PHYSSEG_DENSE memory model is used, it's not true for
>>> >>>> VM_PHYSSEG_SPARSE
>>> >>>> memory model. And ARM world uses VM_PHYSSEG_SPARSE model.
>>> >>>>
>>> >>>> It really would be nice to utilize vm_page_array for such
preallocated
>>> >>>> physical pages even when VM_PHYSSEG_SPARSE memory model is used.
Things
>>> >>>> could be much easier then. In our case, it's about pages which are
used
>>> >>>> for
>>> >>>> level 2 page tables. In VM_PHYSSEG_SPARSE model, we have two sets
of such
>>> >>>> pages. First ones are preallocated and second ones are allocated
after vm
>>> >>>> subsystem was inited. We must deal with each set differently. So
code is
>>> >>>> more complex and so is debugging.
>>> >>>>
>>> >>>> Thus we need some method how to say that some part of physical
memory
>>> >>>> should be included in vm_page_array, but the pages from that region
>>> >>>> should
>>> >>>> not be put to free list during initialization. We think that such
>>> >>>> possibility could be utilized in general. There could be a need
for some
>>> >>>> physical space which:
>>> >>>>
>>> >>>> (1) is needed only during boot and later on it can be freed and
put to vm
>>> >>>> subsystem,
>>> >>>>
>>> >>>> (2) is needed for something else and vm_page_array code could be
used
>>> >>>> without some kind of its duplication.
>>> >>>>
>>> >>>> There is already some code which deals with blacklisted pages in
>>> >>>> vm_page.c
>>> >>>> file. So the easiest way how to deal with presented situation is
to add
>>> >>>> some callback to this part of code which will be able to either
exclude
>>> >>>> whole phys_avail[i], phys_avail[i+1] region or single pages. As the
>>> >>>> biggest
>>> >>>> phys_avail region is used for vm subsystem allocations, there
should be
>>> >>>> some more coding. (However, blacklisted pages are not dealt with
on that
>>> >>>> part of region.)
>>> >>>>
>>> >>>> We would like to know if there is any objection:
>>> >>>>
>>> >>>> (1) to deal with presented problem,
>>> >>>> (2) to deal with the problem presented way.
>>> >>>> Some help is very appreciated. Thanks
>>> >>>>
>>> >>>>
>>> >>> As an experiment, try modifying vm_phys.c to use dump_avail instead
of
>>> >>> phys_avail when sizing vm_page_array.  On amd64, where the same
problem
>>> >>> exists, this allowed me to use VM_PHYSSEG_SPARSE.  Right now, this
is
>>> >>> probably my preferred solution.  The catch being that not all
architectures
>>> >>> implement dump_avail, but my recollection is that arm does.
>>> >>>
>>> >> Frankly, I would prefer this too, but there is one big open question:
>>> >>
>>> >> What is dump_avail for?
>>> >>
>>> >>
>>> >>
>>> >> dump_avail[] is solving a similar problem in the minidump code,
hence, the
>>> >> prefix "dump_" in its name.  In other words, the minidump code
couldn't use
>>> >> phys_avail[] either because it didn't describe the full range of
physical
>>> >> addresses that might be included in a minidump, so dump_avail[] was
created.
>>> >>
>>> >> There is already precedent for what I'm suggesting.  dump_avail[] is
>>> >> already (ab)used outside of the minidump code on x86 to solve this
same
>>> >> problem in x86/x86/nexus.c, and on arm in arm/arm/mem.c.
>>> >>
>>> >>
>>> >>  Using it for vm_page_array initialization and segmentation means
that
>>> >> phys_avail must be a subset of it. And this must be stated and be
visible
>>> >> enough. Maybe it should be even checked in code. I like the idea of
>>> >> thinking about dump_avail as something what desribes all memory in a
>>> >> system, but it's not how dump_avail is defined in archs now.
>>> >>
>>> >>
>>> >>
>>> >> When you say "it's not how dump_avail is defined in archs now", I'm
not
>>> >> sure whether you're talking about the code or the comments.  In
terms of
>>> >> code, dump_avail[] is a superset of phys_avail[], and I'm not aware
of any
>>> >> code that would have to change.  In terms of comments, I did a grep
looking
>>> >> for comments defining what dump_avail[] is, because I couldn't
remember
>>> >> any.  I found one ... on arm.  So, I don't think it's a onerous task
>>> >> changing the definition of dump_avail[].  :-)
>>> >>
>>> >> Already, as things stand today with dump_avail[] being used outside
of the
>>> >> minidump code, one could reasonably argue that it should be renamed
to
>>> >> something like phys_exists[].
>>> >>
>>> >>
>>> >>
>>> >> I will experiment with it on monday then. However, it's not only
about how
>>> >> memory segments are created in vm_phys.c, but it's about how
vm_page_array
>>> >> size is computed in vm_page.c too.
>>> >>
>>> >>
>>> >>
>>> >> Yes, and there is also a place in vm_reserv.c that needs to change.
  I've
>>> >> attached the patch that I developed and tested a long time ago.  It
still
>>> >> applies cleanly and runs ok on amd64.
>>> >>
>>> >>
>>> >>
>>> >
>>> >
>>> > Well, I've created and tested minimalistic patch which - I hope - is
>>> > commitable. It runs ok on pandaboard (arm-v6) and solves presented
problem.
>>> > I would really appreciate if this will be commited. Thanks.
>>>
>>>
>>> Sorry for the slow reply.  I've just been swamped with work lately.  I
>>> finally had some time to look at this in the last day or so.
>>>
>>> The first thing that I propose to do is commit the attached patch.  This
>>> patch changes pmap_init() on amd64, armv6, and i386 so that it no longer
>>> consults phys_avail[] to determine the end of memory.  Instead, it calls
>>> a new function provided by vm_phys.c to obtain the same information from
>>> vm_phys_segs[].
>>>
>>> With this change, the new variable phys_managed in your patch wouldn't
>>> need to be a global.  It could be a local variable in vm_page_startup()
>>> that we pass as a parameter to vm_phys_init() and vm_reserv_init().
>>>
>>> More generally, the long-term vision that I have is that we would stop
>>> using phys_avail[] after vm_page_startup() had completed.  It would only
>>> be used during initialization.  After that we would use vm_phys_segs[]
>>> and functions provided by vm_phys.c.
>>
>>
>> I understand. The patch and the long-term vision are fine for me. I just
was not to bold to pass phys_managed as a parameter to vm_phys_init() and
vm_reserv_init(). However, I certainly was thinking about it. While reading
comment above vm_phys_get_end(), do we care of if last usable address is
0xFFFFFFFF?
>>
>>
>>
>> To date, this hasn't been a problem.  However, handling 0xFFFFFFFF is
easy.  So, the final version of the patch that I committed this weekend
does so.
>>
>> Can you please try the attached patch?  It replaces phys_avail[] with
vm_phys_segs[] in arm's busdma.
>
>
>
> It works fine on arm-v6 pandaboard. I have no objection to commit it.
However, it's only 1:1 replacement.
>
>
>
> Right now, yes.  However, once your patch is committed, it won't be 1:1
anymore, because vm_phys_segs[] will be populated based on dump_avail[]
rather than phys_avail[].
>
> My interpretation of the affected code is that using the ranges defined
by dump_avail[] is actually closer to what this code intended.
>


True in both cases. As you said, it's closer.


>
> In fact, I still keep the following pattern in my head:
>
> present memory in system <=> all RAM and whatsoever
> nobounce memory <=> addressable by DMA
>
>
>
> In general, I don't see how this can be an attribute of the memory,
because it's going to depend on the device.  In other words, a given
physical address may require bouncing for some device but not all devices.
>


True again. I was thinking about it like some common property along all DMA
devices on platform. If it's not that, but test for present RAM, then
dump_avail[] is closer. However, again, does dump_avail[] represent all
present RAM?



>
>
> managed memory by vm subsystem  <=> i.e. kept in vm_page_array
> available memory for vm subsystem <=> can be allocated
>
> So, it's no problem to use phys_avail[], i.e. vm_phys_segs[], but it
could be too much limiting in some scenarios. I would like to see something
different in exclusion_bounce_check() in the future. Something what
reflects NOBOUNCE property and not NOALLOC one like now.
>
>
>>
>>
>>
>>
>> Do you think that the rest of my patch considering changes due to your
patch is ok?
>>
>>
>>
>>
>> Basically, yes.  I do, however, think that
>>
>> +#if defined(__arm__)
>> +       phys_managed = dump_avail;
>> +#else
>> +       phys_managed = phys_avail;
>> +#endif
>>
>> should also be conditioned on VM_PHYSSEG_SPARSE.
>
>
>
>
> So I've prepared new patch. phys_managed[] is passed to vm_phys_init()
and vm_reserv_init() as a parameter and small optimalization is made in
vm_page_startup(). I add VM_PHYSSEG_SPARSE condition to place you
mentioned. Anyhow, I still think that this is only temporary hack. In
general, phys_managed[] should always be distinguished from phys_avail[].
>
>
>>
>>
>>>
>>> >
>>> > BTW, while I was inspecting all archs, I think that maybe it's time
to do
>>> > what was done for busdma not long ago. There are many similar codes
across
>>> > archs which deal with physical memory and could be generalized and
put to
>>> > kern/subr_physmem.c for utilization. All work with physical memory
could be
>>> > simplify to two arrays of regions.
>>> >
>>> > phys_present[] ... describes all present physical memory regions
>>> > phys_exclude[] ... describes various exclusions from phys_present[]
>>> >
>>> > Each excluded region will be labeled by flags to say what kind of
exclusion
>>> > it is. The flags like NODUMP, NOALLOC, NOMANAGE, NOBOUNCE, NOMEMRW
 could
>>> > be combined. This idea is taken from sys/arm/arm/physmem.c.
>>> >
>>> > All other arrays like phys_managed[], phys_avail[], dump_avail[] will
be
>>> > created from these phys_present[] and phys_exclude[].
>>> > This way bootstrap codes in archs could be simplified and unified. For
>>> > example, dealing with either hw.physmem or page with PA 0x00000000
could be
>>> > transparent.
>>> >
>>> > I'm prepared to volunteer if the thing is ripe. However, some tutor
will be
>>> > looked for.
>>>
>>>
>>> I've never really looked at arm/arm/physmem.c before.  Let me do that
>>> before I comment on this.
>>>
>> No problem. This could be long-term aim. However, I hope the
VM_PHYSSEG_SPARSE problem could be dealt with in MI code in present time.
In every case, thanks for your help.
>>
>>
>>
>>
>
>



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAFHCsPX_ukk%2B_8Lrnj7svnhb4Mz%2BGViOwtaKk_r6S_Fo7gfHGg>