Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 23 Oct 2014 17:14:25 -0500
From:      Alan Cox <alc@rice.edu>
To:        Svatopluk Kraus <onwahe@gmail.com>
Cc:        alc@freebsd.org, FreeBSD Arch <freebsd-arch@freebsd.org>
Subject:   Re: vm_page_array and VM_PHYSSEG_SPARSE
Message-ID:  <54497DC1.5070506@rice.edu>
In-Reply-To: <CAFHCsPWxF0G%2BbqBYgxH=WtV%2BSt_UTWZj%2BY2-PHfoYSLjC_Qpig@mail.gmail.com>
References:  <CAFHCsPWkq09_RRDz7fy3UgsRFv8ZbNKdAH2Ft0x6aVSwLPi6BQ@mail.gmail.com>	<CAJUyCcPXBuLu0nvaCqpg8NJ6KzAX9BA1Rt%2BooD%2B3pzq%2BFV%2B%2BTQ@mail.gmail.com>	<CAFHCsPWq9WqeFnx1a%2BStfSxj=jwcE9GPyVsoyh0%2Bazr3HmM6vQ@mail.gmail.com>	<5428AF3B.1030906@rice.edu> <CAFHCsPWxF0G%2BbqBYgxH=WtV%2BSt_UTWZj%2BY2-PHfoYSLjC_Qpig@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
This is a multi-part message in MIME format.
--------------070109070505060304030005
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit

On 10/08/2014 10:38, Svatopluk Kraus wrote:
> On Mon, Sep 29, 2014 at 3:00 AM, Alan Cox <alc@rice.edu> wrote:
>
>>   On 09/27/2014 03:51, Svatopluk Kraus wrote:
>>
>>
>> On Fri, Sep 26, 2014 at 8:08 PM, Alan Cox <alan.l.cox@gmail.com> wrote:
>>
>>>
>>>  On Wed, Sep 24, 2014 at 7:27 AM, Svatopluk Kraus <onwahe@gmail.com>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> I and Michal are finishing new ARM pmap-v6 code. There is one problem
>>>> we've
>>>> dealt with somehow, but now we would like to do it better. It's about
>>>> physical pages which are allocated before vm subsystem is initialized.
>>>> While later on these pages could be found in vm_page_array when
>>>> VM_PHYSSEG_DENSE memory model is used, it's not true for
>>>> VM_PHYSSEG_SPARSE
>>>> memory model. And ARM world uses VM_PHYSSEG_SPARSE model.
>>>>
>>>> It really would be nice to utilize vm_page_array for such preallocated
>>>> physical pages even when VM_PHYSSEG_SPARSE memory model is used. Things
>>>> could be much easier then. In our case, it's about pages which are used
>>>> for
>>>> level 2 page tables. In VM_PHYSSEG_SPARSE model, we have two sets of such
>>>> pages. First ones are preallocated and second ones are allocated after vm
>>>> subsystem was inited. We must deal with each set differently. So code is
>>>> more complex and so is debugging.
>>>>
>>>> Thus we need some method how to say that some part of physical memory
>>>> should be included in vm_page_array, but the pages from that region
>>>> should
>>>> not be put to free list during initialization. We think that such
>>>> possibility could be utilized in general. There could be a need for some
>>>> physical space which:
>>>>
>>>> (1) is needed only during boot and later on it can be freed and put to vm
>>>> subsystem,
>>>>
>>>> (2) is needed for something else and vm_page_array code could be used
>>>> without some kind of its duplication.
>>>>
>>>> There is already some code which deals with blacklisted pages in
>>>> vm_page.c
>>>> file. So the easiest way how to deal with presented situation is to add
>>>> some callback to this part of code which will be able to either exclude
>>>> whole phys_avail[i], phys_avail[i+1] region or single pages. As the
>>>> biggest
>>>> phys_avail region is used for vm subsystem allocations, there should be
>>>> some more coding. (However, blacklisted pages are not dealt with on that
>>>> part of region.)
>>>>
>>>> We would like to know if there is any objection:
>>>>
>>>> (1) to deal with presented problem,
>>>> (2) to deal with the problem presented way.
>>>> Some help is very appreciated. Thanks
>>>>
>>>>
>>> As an experiment, try modifying vm_phys.c to use dump_avail instead of
>>> phys_avail when sizing vm_page_array.  On amd64, where the same problem
>>> exists, this allowed me to use VM_PHYSSEG_SPARSE.  Right now, this is
>>> probably my preferred solution.  The catch being that not all architectures
>>> implement dump_avail, but my recollection is that arm does.
>>>
>> Frankly, I would prefer this too, but there is one big open question:
>>
>> What is dump_avail for?
>>
>>
>>
>> dump_avail[] is solving a similar problem in the minidump code, hence, the
>> prefix "dump_" in its name.  In other words, the minidump code couldn't use
>> phys_avail[] either because it didn't describe the full range of physical
>> addresses that might be included in a minidump, so dump_avail[] was created.
>>
>> There is already precedent for what I'm suggesting.  dump_avail[] is
>> already (ab)used outside of the minidump code on x86 to solve this same
>> problem in x86/x86/nexus.c, and on arm in arm/arm/mem.c.
>>
>>
>>  Using it for vm_page_array initialization and segmentation means that
>> phys_avail must be a subset of it. And this must be stated and be visible
>> enough. Maybe it should be even checked in code. I like the idea of
>> thinking about dump_avail as something what desribes all memory in a
>> system, but it's not how dump_avail is defined in archs now.
>>
>>
>>
>> When you say "it's not how dump_avail is defined in archs now", I'm not
>> sure whether you're talking about the code or the comments.  In terms of
>> code, dump_avail[] is a superset of phys_avail[], and I'm not aware of any
>> code that would have to change.  In terms of comments, I did a grep looking
>> for comments defining what dump_avail[] is, because I couldn't remember
>> any.  I found one ... on arm.  So, I don't think it's a onerous task
>> changing the definition of dump_avail[].  :-)
>>
>> Already, as things stand today with dump_avail[] being used outside of the
>> minidump code, one could reasonably argue that it should be renamed to
>> something like phys_exists[].
>>
>>
>>
>> I will experiment with it on monday then. However, it's not only about how
>> memory segments are created in vm_phys.c, but it's about how vm_page_array
>> size is computed in vm_page.c too.
>>
>>
>>
>> Yes, and there is also a place in vm_reserv.c that needs to change.   I've
>> attached the patch that I developed and tested a long time ago.  It still
>> applies cleanly and runs ok on amd64.
>>
>>
>>
>
>
> Well, I've created and tested minimalistic patch which - I hope - is
> commitable. It runs ok on pandaboard (arm-v6) and solves presented problem.
> I would really appreciate if this will be commited. Thanks.


Sorry for the slow reply.  I've just been swamped with work lately.  I
finally had some time to look at this in the last day or so.

The first thing that I propose to do is commit the attached patch.  This
patch changes pmap_init() on amd64, armv6, and i386 so that it no longer
consults phys_avail[] to determine the end of memory.  Instead, it calls
a new function provided by vm_phys.c to obtain the same information from
vm_phys_segs[].

With this change, the new variable phys_managed in your patch wouldn't
need to be a global.  It could be a local variable in vm_page_startup()
that we pass as a parameter to vm_phys_init() and vm_reserv_init().

More generally, the long-term vision that I have is that we would stop
using phys_avail[] after vm_page_startup() had completed.  It would only
be used during initialization.  After that we would use vm_phys_segs[]
and functions provided by vm_phys.c.

 
>
> BTW, while I was inspecting all archs, I think that maybe it's time to do
> what was done for busdma not long ago. There are many similar codes across
> archs which deal with physical memory and could be generalized and put to
> kern/subr_physmem.c for utilization. All work with physical memory could be
> simplify to two arrays of regions.
>
> phys_present[] ... describes all present physical memory regions
> phys_exclude[] ... describes various exclusions from phys_present[]
>
> Each excluded region will be labeled by flags to say what kind of exclusion
> it is. The flags like NODUMP, NOALLOC, NOMANAGE, NOBOUNCE, NOMEMRW  could
> be combined. This idea is taken from sys/arm/arm/physmem.c.
>
> All other arrays like phys_managed[], phys_avail[], dump_avail[] will be
> created from these phys_present[] and phys_exclude[].
> This way bootstrap codes in archs could be simplified and unified. For
> example, dealing with either hw.physmem or page with PA 0x00000000 could be
> transparent.
>
> I'm prepared to volunteer if the thing is ripe. However, some tutor will be
> looked for.


I've never really looked at arm/arm/physmem.c before.  Let me do that
before I comment on this.



--------------070109070505060304030005
Content-Type: text/plain; charset=ISO-8859-15;
 name="vm_phys_get_end1.patch"
Content-Transfer-Encoding: base64
Content-Disposition: attachment;
 filename="vm_phys_get_end1.patch"

SW5kZXg6IGFtZDY0L2FtZDY0L3BtYXAuYwo9PT09PT09PT09PT09PT09PT09PT09PT09PT09
PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09Ci0tLSBhbWQ2NC9hbWQ2
NC9wbWFwLmMJKHJldmlzaW9uIDI3MzU1MCkKKysrIGFtZDY0L2FtZDY0L3BtYXAuYwkod29y
a2luZyBjb3B5KQpAQCAtMTMwLDYgKzEzMCw3IEBAIF9fRkJTRElEKCIkRnJlZUJTRCQiKTsK
ICNpbmNsdWRlIDx2bS92bV9leHRlcm4uaD4KICNpbmNsdWRlIDx2bS92bV9wYWdlb3V0Lmg+
CiAjaW5jbHVkZSA8dm0vdm1fcGFnZXIuaD4KKyNpbmNsdWRlIDx2bS92bV9waHlzLmg+CiAj
aW5jbHVkZSA8dm0vdm1fcmFkaXguaD4KICNpbmNsdWRlIDx2bS92bV9yZXNlcnYuaD4KICNp
bmNsdWRlIDx2bS91bWEuaD4KQEAgLTEwNjAsOCArMTA2MSw3IEBAIHBtYXBfaW5pdCh2b2lk
KQogCS8qCiAJICogQ2FsY3VsYXRlIHRoZSBzaXplIG9mIHRoZSBwdiBoZWFkIHRhYmxlIGZv
ciBzdXBlcnBhZ2VzLgogCSAqLwotCWZvciAoaSA9IDA7IHBoeXNfYXZhaWxbaSArIDFdOyBp
ICs9IDIpOwotCXB2X25wZyA9IHJvdW5kXzJtcGFnZShwaHlzX2F2YWlsWyhpIC0gMikgKyAx
XSkgLyBOQlBEUjsKKwlwdl9ucGcgPSByb3VuZF8ybXBhZ2Uodm1fcGh5c19nZXRfZW5kKCkp
IC8gTkJQRFI7CiAKIAkvKgogCSAqIEFsbG9jYXRlIG1lbW9yeSBmb3IgdGhlIHB2IGhlYWQg
dGFibGUgZm9yIHN1cGVycGFnZXMuCkluZGV4OiBhcm0vYXJtL3BtYXAtdjYuYwo9PT09PT09
PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09
PT09PT09Ci0tLSBhcm0vYXJtL3BtYXAtdjYuYwkocmV2aXNpb24gMjczNTUwKQorKysgYXJt
L2FybS9wbWFwLXY2LmMJKHdvcmtpbmcgY29weSkKQEAgLTE3Miw2ICsxNzIsNyBAQCBfX0ZC
U0RJRCgiJEZyZWVCU0QkIik7CiAjaW5jbHVkZSA8dm0vdm1fbWFwLmg+CiAjaW5jbHVkZSA8
dm0vdm1fcGFnZS5oPgogI2luY2x1ZGUgPHZtL3ZtX3BhZ2VvdXQuaD4KKyNpbmNsdWRlIDx2
bS92bV9waHlzLmg+CiAjaW5jbHVkZSA8dm0vdm1fZXh0ZXJuLmg+CiAjaW5jbHVkZSA8dm0v
dm1fcmVzZXJ2Lmg+CiAKQEAgLTEzNDMsOCArMTM0NCw3IEBAIHBtYXBfaW5pdCh2b2lkKQog
CS8qCiAJICogQ2FsY3VsYXRlIHRoZSBzaXplIG9mIHRoZSBwdiBoZWFkIHRhYmxlIGZvciBz
dXBlcnBhZ2VzLgogCSAqLwotCWZvciAoaSA9IDA7IHBoeXNfYXZhaWxbaSArIDFdOyBpICs9
IDIpOwotCXB2X25wZyA9IHJvdW5kXzFtcGFnZShwaHlzX2F2YWlsWyhpIC0gMikgKyAxXSkg
LyBOQlBEUjsKKwlwdl9ucGcgPSByb3VuZF8xbXBhZ2Uodm1fcGh5c19nZXRfZW5kKCkpIC8g
TkJQRFI7CiAKIAkvKgogCSAqIEFsbG9jYXRlIG1lbW9yeSBmb3IgdGhlIHB2IGhlYWQgdGFi
bGUgZm9yIHN1cGVycGFnZXMuCkluZGV4OiBpMzg2L2kzODYvcG1hcC5jCj09PT09PT09PT09
PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09
PT0KLS0tIGkzODYvaTM4Ni9wbWFwLmMJKHJldmlzaW9uIDI3MzU1MCkKKysrIGkzODYvaTM4
Ni9wbWFwLmMJKHdvcmtpbmcgY29weSkKQEAgLTEzMyw2ICsxMzMsNyBAQCBfX0ZCU0RJRCgi
JEZyZWVCU0QkIik7CiAjaW5jbHVkZSA8dm0vdm1fZXh0ZXJuLmg+CiAjaW5jbHVkZSA8dm0v
dm1fcGFnZW91dC5oPgogI2luY2x1ZGUgPHZtL3ZtX3BhZ2VyLmg+CisjaW5jbHVkZSA8dm0v
dm1fcGh5cy5oPgogI2luY2x1ZGUgPHZtL3ZtX3JhZGl4Lmg+CiAjaW5jbHVkZSA8dm0vdm1f
cmVzZXJ2Lmg+CiAjaW5jbHVkZSA8dm0vdW1hLmg+CkBAIC03NzksOCArNzgwLDcgQEAgcG1h
cF9pbml0KHZvaWQpCiAJLyoKIAkgKiBDYWxjdWxhdGUgdGhlIHNpemUgb2YgdGhlIHB2IGhl
YWQgdGFibGUgZm9yIHN1cGVycGFnZXMuCiAJICovCi0JZm9yIChpID0gMDsgcGh5c19hdmFp
bFtpICsgMV07IGkgKz0gMik7Ci0JcHZfbnBnID0gcm91bmRfNG1wYWdlKHBoeXNfYXZhaWxb
KGkgLSAyKSArIDFdKSAvIE5CUERSOworCXB2X25wZyA9IHJvdW5kXzRtcGFnZSh2bV9waHlz
X2dldF9lbmQoKSkgLyBOQlBEUjsKIAogCS8qCiAJICogQWxsb2NhdGUgbWVtb3J5IGZvciB0
aGUgcHYgaGVhZCB0YWJsZSBmb3Igc3VwZXJwYWdlcy4KSW5kZXg6IHZtL3ZtX3BoeXMuYwo9
PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09
PT09PT09PT09PT09Ci0tLSB2bS92bV9waHlzLmMJKHJldmlzaW9uIDI3MzU1MCkKKysrIHZt
L3ZtX3BoeXMuYwkod29ya2luZyBjb3B5KQpAQCAtODU2LDYgKzg1NiwxNyBAQCB2bV9waHlz
X2ZyZWVfY29udGlnKHZtX3BhZ2VfdCBtLCB1X2xvbmcgbnBhZ2VzKQogfQogCiAvKgorICog
UmV0dXJuIHRoZSBwaHlzaWNhbCBhZGRyZXNzIG9mIHRoZSBlbmQgb2YgbWVtb3J5LCB0aGF0
IGlzLCB0aGUgYWRkcmVzcyBvZgorICogdGhlIGxhc3QgdXNlYWJsZSBieXRlIG9mIFJBTSBw
bHVzIG9uZS4KKyAqLwordm1fcGFkZHJfdAordm1fcGh5c19nZXRfZW5kKHZvaWQpCit7CisK
KwlyZXR1cm4gKHZtX3BoeXNfc2Vnc1t2bV9waHlzX25zZWdzIC0gMV0uZW5kKTsKK30KKwor
LyoKICAqIFNldCB0aGUgcG9vbCBmb3IgYSBjb250aWd1b3VzLCBwb3dlciBvZiB0d28tc2l6
ZWQgc2V0IG9mIHBoeXNpY2FsIHBhZ2VzLiAKICAqLwogdm9pZApJbmRleDogdm0vdm1fcGh5
cy5oCj09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09
PT09PT09PT09PT09PT09PT0KLS0tIHZtL3ZtX3BoeXMuaAkocmV2aXNpb24gMjczNTUwKQor
Kysgdm0vdm1fcGh5cy5oCSh3b3JraW5nIGNvcHkpCkBAIC04MCw2ICs4MCw3IEBAIHZvaWQg
dm1fcGh5c19maWN0aXRpb3VzX3VucmVnX3JhbmdlKHZtX3BhZGRyX3Qgc3RhCiB2bV9wYWdl
X3Qgdm1fcGh5c19maWN0aXRpb3VzX3RvX3ZtX3BhZ2Uodm1fcGFkZHJfdCBwYSk7CiB2b2lk
IHZtX3BoeXNfZnJlZV9jb250aWcodm1fcGFnZV90IG0sIHVfbG9uZyBucGFnZXMpOwogdm9p
ZCB2bV9waHlzX2ZyZWVfcGFnZXModm1fcGFnZV90IG0sIGludCBvcmRlcik7Cit2bV9wYWRk
cl90IHZtX3BoeXNfZ2V0X2VuZCh2b2lkKTsKIHZvaWQgdm1fcGh5c19pbml0KHZvaWQpOwog
dm1fcGFnZV90IHZtX3BoeXNfcGFkZHJfdG9fdm1fcGFnZSh2bV9wYWRkcl90IHBhKTsKIHZv
aWQgdm1fcGh5c19zZXRfcG9vbChpbnQgcG9vbCwgdm1fcGFnZV90IG0sIGludCBvcmRlcik7
Cg==
--------------070109070505060304030005--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?54497DC1.5070506>