From owner-freebsd-arch@FreeBSD.ORG Mon Sep 29 01:25:05 2014 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 9DEB2736; Mon, 29 Sep 2014 01:25:05 +0000 (UTC) Received: from pp2.rice.edu (proofpoint2.mail.rice.edu [128.42.201.101]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 5BB45186; Mon, 29 Sep 2014 01:25:05 +0000 (UTC) Received: from pps.filterd (pp2.rice.edu [127.0.0.1]) by pp2.rice.edu (8.14.5/8.14.5) with SMTP id s8T0upVF006623; Sun, 28 Sep 2014 20:00:45 -0500 Received: from mh3.mail.rice.edu (mh3.mail.rice.edu [128.42.199.10]) by pp2.rice.edu with ESMTP id 1pp1a4rajp-1; Sun, 28 Sep 2014 20:00:44 -0500 X-Virus-Scanned: by amavis-2.7.0 at mh3.mail.rice.edu, auth channel Received: from 108-254-203-201.lightspeed.hstntx.sbcglobal.net (108-254-203-201.lightspeed.hstntx.sbcglobal.net [108.254.203.201]) (using TLSv1 with cipher RC4-MD5 (128/128 bits)) (No client certificate requested) (Authenticated sender: alc) by mh3.mail.rice.edu (Postfix) with ESMTPSA id D5001403E3; Sun, 28 Sep 2014 20:00:43 -0500 (CDT) Message-ID: <5428AF3B.1030906@rice.edu> Date: Sun, 28 Sep 2014 20:00:43 -0500 From: Alan Cox User-Agent: Mozilla/5.0 (X11; FreeBSD i386; rv:24.0) Gecko/20100101 Thunderbird/24.2.0 MIME-Version: 1.0 To: Svatopluk Kraus , alc@freebsd.org Subject: Re: vm_page_array and VM_PHYSSEG_SPARSE References: In-Reply-To: X-Enigmail-Version: 1.6 Content-Type: multipart/mixed; boundary="------------030707070306010907080705" X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 kscore.is_bulkscore=0 kscore.compositescore=0 circleOfTrustscore=0 compositescore=0.248919945447816 urlsuspect_oldscore=0.0213074882763902 suspectscore=11 recipient_domain_to_sender_totalscore=0 phishscore=0 bulkscore=0 kscore.is_spamscore=0 recipient_to_sender_totalscore=0 recipient_domain_to_sender_domain_totalscore=498 rbsscore=0.248919945447816 spamscore=0 recipient_to_sender_domain_totalscore=0 urlsuspectscore=0.9 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=7.0.1-1402240000 definitions=main-1409290009 X-Content-Filtered-By: Mailman/MimeDel 2.1.18-1 Cc: FreeBSD Arch X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 29 Sep 2014 01:25:05 -0000 This is a multi-part message in MIME format. --------------030707070306010907080705 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit On 09/27/2014 03:51, Svatopluk Kraus wrote: > > On Fri, Sep 26, 2014 at 8:08 PM, Alan Cox > wrote: > > > > On Wed, Sep 24, 2014 at 7:27 AM, Svatopluk Kraus > wrote: > > Hi, > > I and Michal are finishing new ARM pmap-v6 code. There is one > problem we've > dealt with somehow, but now we would like to do it better. > It's about > physical pages which are allocated before vm subsystem is > initialized. > While later on these pages could be found in vm_page_array when > VM_PHYSSEG_DENSE memory model is used, it's not true for > VM_PHYSSEG_SPARSE > memory model. And ARM world uses VM_PHYSSEG_SPARSE model. > > It really would be nice to utilize vm_page_array for such > preallocated > physical pages even when VM_PHYSSEG_SPARSE memory model is > used. Things > could be much easier then. In our case, it's about pages which > are used for > level 2 page tables. In VM_PHYSSEG_SPARSE model, we have two > sets of such > pages. First ones are preallocated and second ones are > allocated after vm > subsystem was inited. We must deal with each set differently. > So code is > more complex and so is debugging. > > Thus we need some method how to say that some part of physical > memory > should be included in vm_page_array, but the pages from that > region should > not be put to free list during initialization. We think that such > possibility could be utilized in general. There could be a > need for some > physical space which: > > (1) is needed only during boot and later on it can be freed > and put to vm > subsystem, > > (2) is needed for something else and vm_page_array code could > be used > without some kind of its duplication. > > There is already some code which deals with blacklisted pages > in vm_page.c > file. So the easiest way how to deal with presented situation > is to add > some callback to this part of code which will be able to > either exclude > whole phys_avail[i], phys_avail[i+1] region or single pages. > As the biggest > phys_avail region is used for vm subsystem allocations, there > should be > some more coding. (However, blacklisted pages are not dealt > with on that > part of region.) > > We would like to know if there is any objection: > > (1) to deal with presented problem, > (2) to deal with the problem presented way. > Some help is very appreciated. Thanks > > > > As an experiment, try modifying vm_phys.c to use dump_avail > instead of phys_avail when sizing vm_page_array. On amd64, where > the same problem exists, this allowed me to use > VM_PHYSSEG_SPARSE. Right now, this is probably my preferred > solution. The catch being that not all architectures implement > dump_avail, but my recollection is that arm does. > > > Frankly, I would prefer this too, but there is one big open question: > > What is dump_avail for? dump_avail[] is solving a similar problem in the minidump code, hence, the prefix "dump_" in its name. In other words, the minidump code couldn't use phys_avail[] either because it didn't describe the full range of physical addresses that might be included in a minidump, so dump_avail[] was created. There is already precedent for what I'm suggesting. dump_avail[] is already (ab)used outside of the minidump code on x86 to solve this same problem in x86/x86/nexus.c, and on arm in arm/arm/mem.c. > Using it for vm_page_array initialization and segmentation means that > phys_avail must be a subset of it. And this must be stated and be > visible enough. Maybe it should be even checked in code. I like the > idea of thinking about dump_avail as something what desribes all > memory in a system, but it's not how dump_avail is defined in archs now. When you say "it's not how dump_avail is defined in archs now", I'm not sure whether you're talking about the code or the comments. In terms of code, dump_avail[] is a superset of phys_avail[], and I'm not aware of any code that would have to change. In terms of comments, I did a grep looking for comments defining what dump_avail[] is, because I couldn't remember any. I found one ... on arm. So, I don't think it's a onerous task changing the definition of dump_avail[]. :-) Already, as things stand today with dump_avail[] being used outside of the minidump code, one could reasonably argue that it should be renamed to something like phys_exists[]. > > I will experiment with it on monday then. However, it's not only about > how memory segments are created in vm_phys.c, but it's about how > vm_page_array size is computed in vm_page.c too. Yes, and there is also a place in vm_reserv.c that needs to change. I've attached the patch that I developed and tested a long time ago. It still applies cleanly and runs ok on amd64. --------------030707070306010907080705 Content-Type: text/plain; charset=ISO-8859-15; name="dump_avail_sparse.patch" Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename="dump_avail_sparse.patch" SW5kZXg6IHZtL3ZtX3BoeXMuYwo9PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09 PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09Ci0tLSB2bS92bV9waHlzLmMJKHJl dmlzaW9uIDIyMDEwMikKKysrIHZtL3ZtX3BoeXMuYwkod29ya2luZyBjb3B5KQpAQCAtMjg5 LDM4ICsyODksMzggQEAgdm1fcGh5c19pbml0KHZvaWQpCiAJaW50IG5kb21haW5zLCBqOwog I2VuZGlmCiAKLQlmb3IgKGkgPSAwOyBwaHlzX2F2YWlsW2kgKyAxXSAhPSAwOyBpICs9IDIp IHsKKwlmb3IgKGkgPSAwOyBkdW1wX2F2YWlsW2kgKyAxXSAhPSAwOyBpICs9IDIpIHsKICNp ZmRlZglWTV9GUkVFTElTVF9JU0FETUEKLQkJaWYgKHBoeXNfYXZhaWxbaV0gPCAxNjc3NzIx NikgewotCQkJaWYgKHBoeXNfYXZhaWxbaSArIDFdID4gMTY3NzcyMTYpIHsKLQkJCQl2bV9w aHlzX2NyZWF0ZV9zZWcocGh5c19hdmFpbFtpXSwgMTY3NzcyMTYsCisJCWlmIChkdW1wX2F2 YWlsW2ldIDwgMTY3NzcyMTYpIHsKKwkJCWlmIChkdW1wX2F2YWlsW2kgKyAxXSA+IDE2Nzc3 MjE2KSB7CisJCQkJdm1fcGh5c19jcmVhdGVfc2VnKGR1bXBfYXZhaWxbaV0sIDE2Nzc3MjE2 LAogCQkJCSAgICBWTV9GUkVFTElTVF9JU0FETUEpOwotCQkJCXZtX3BoeXNfY3JlYXRlX3Nl ZygxNjc3NzIxNiwgcGh5c19hdmFpbFtpICsgMV0sCisJCQkJdm1fcGh5c19jcmVhdGVfc2Vn KDE2Nzc3MjE2LCBkdW1wX2F2YWlsW2kgKyAxXSwKIAkJCQkgICAgVk1fRlJFRUxJU1RfREVG QVVMVCk7CiAJCQl9IGVsc2UgewotCQkJCXZtX3BoeXNfY3JlYXRlX3NlZyhwaHlzX2F2YWls W2ldLAotCQkJCSAgICBwaHlzX2F2YWlsW2kgKyAxXSwgVk1fRlJFRUxJU1RfSVNBRE1BKTsK KwkJCQl2bV9waHlzX2NyZWF0ZV9zZWcoZHVtcF9hdmFpbFtpXSwKKwkJCQkgICAgZHVtcF9h dmFpbFtpICsgMV0sIFZNX0ZSRUVMSVNUX0lTQURNQSk7CiAJCQl9CiAJCQlpZiAoVk1fRlJF RUxJU1RfSVNBRE1BID49IHZtX25mcmVlbGlzdHMpCiAJCQkJdm1fbmZyZWVsaXN0cyA9IFZN X0ZSRUVMSVNUX0lTQURNQSArIDE7CiAJCX0gZWxzZQogI2VuZGlmCiAjaWZkZWYJVk1fRlJF RUxJU1RfSElHSE1FTQotCQlpZiAocGh5c19hdmFpbFtpICsgMV0gPiBWTV9ISUdITUVNX0FE RFJFU1MpIHsKLQkJCWlmIChwaHlzX2F2YWlsW2ldIDwgVk1fSElHSE1FTV9BRERSRVNTKSB7 Ci0JCQkJdm1fcGh5c19jcmVhdGVfc2VnKHBoeXNfYXZhaWxbaV0sCisJCWlmIChkdW1wX2F2 YWlsW2kgKyAxXSA+IFZNX0hJR0hNRU1fQUREUkVTUykgeworCQkJaWYgKGR1bXBfYXZhaWxb aV0gPCBWTV9ISUdITUVNX0FERFJFU1MpIHsKKwkJCQl2bV9waHlzX2NyZWF0ZV9zZWcoZHVt cF9hdmFpbFtpXSwKIAkJCQkgICAgVk1fSElHSE1FTV9BRERSRVNTLCBWTV9GUkVFTElTVF9E RUZBVUxUKTsKIAkJCQl2bV9waHlzX2NyZWF0ZV9zZWcoVk1fSElHSE1FTV9BRERSRVNTLAot CQkJCSAgICBwaHlzX2F2YWlsW2kgKyAxXSwgVk1fRlJFRUxJU1RfSElHSE1FTSk7CisJCQkJ ICAgIGR1bXBfYXZhaWxbaSArIDFdLCBWTV9GUkVFTElTVF9ISUdITUVNKTsKIAkJCX0gZWxz ZSB7Ci0JCQkJdm1fcGh5c19jcmVhdGVfc2VnKHBoeXNfYXZhaWxbaV0sCi0JCQkJICAgIHBo eXNfYXZhaWxbaSArIDFdLCBWTV9GUkVFTElTVF9ISUdITUVNKTsKKwkJCQl2bV9waHlzX2Ny ZWF0ZV9zZWcoZHVtcF9hdmFpbFtpXSwKKwkJCQkgICAgZHVtcF9hdmFpbFtpICsgMV0sIFZN X0ZSRUVMSVNUX0hJR0hNRU0pOwogCQkJfQogCQkJaWYgKFZNX0ZSRUVMSVNUX0hJR0hNRU0g Pj0gdm1fbmZyZWVsaXN0cykKIAkJCQl2bV9uZnJlZWxpc3RzID0gVk1fRlJFRUxJU1RfSElH SE1FTSArIDE7CiAJCX0gZWxzZQogI2VuZGlmCi0JCXZtX3BoeXNfY3JlYXRlX3NlZyhwaHlz X2F2YWlsW2ldLCBwaHlzX2F2YWlsW2kgKyAxXSwKKwkJdm1fcGh5c19jcmVhdGVfc2VnKGR1 bXBfYXZhaWxbaV0sIGR1bXBfYXZhaWxbaSArIDFdLAogCQkgICAgVk1fRlJFRUxJU1RfREVG QVVMVCk7CiAJfQogCWZvciAoZmxpbmQgPSAwOyBmbGluZCA8IHZtX25mcmVlbGlzdHM7IGZs aW5kKyspIHsKSW5kZXg6IHZtL3ZtX3Jlc2Vydi5jCj09PT09PT09PT09PT09PT09PT09PT09 PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT0KLS0tIHZtL3Zt X3Jlc2Vydi5jCShyZXZpc2lvbiAyMjAxMDIpCisrKyB2bS92bV9yZXNlcnYuYwkod29ya2lu ZyBjb3B5KQpAQCAtNDg3LDkgKzQ4Nyw5IEBAIHZtX3Jlc2Vydl9pbml0KHZvaWQpCiAJICog SW5pdGlhbGl6ZSB0aGUgcmVzZXJ2YXRpb24gYXJyYXkuICBTcGVjaWZpY2FsbHksIGluaXRp YWxpemUgdGhlCiAJICogInBhZ2VzIiBmaWVsZCBmb3IgZXZlcnkgZWxlbWVudCB0aGF0IGhh cyBhbiB1bmRlcmx5aW5nIHN1cGVycGFnZS4KIAkgKi8KLQlmb3IgKGkgPSAwOyBwaHlzX2F2 YWlsW2kgKyAxXSAhPSAwOyBpICs9IDIpIHsKLQkJcGFkZHIgPSByb3VuZHVwMihwaHlzX2F2 YWlsW2ldLCBWTV9MRVZFTF8wX1NJWkUpOwotCQl3aGlsZSAocGFkZHIgKyBWTV9MRVZFTF8w X1NJWkUgPD0gcGh5c19hdmFpbFtpICsgMV0pIHsKKwlmb3IgKGkgPSAwOyBkdW1wX2F2YWls W2kgKyAxXSAhPSAwOyBpICs9IDIpIHsKKwkJcGFkZHIgPSByb3VuZHVwMihkdW1wX2F2YWls W2ldLCBWTV9MRVZFTF8wX1NJWkUpOworCQl3aGlsZSAocGFkZHIgKyBWTV9MRVZFTF8wX1NJ WkUgPD0gZHVtcF9hdmFpbFtpICsgMV0pIHsKIAkJCXZtX3Jlc2Vydl9hcnJheVtwYWRkciA+ PiBWTV9MRVZFTF8wX1NISUZUXS5wYWdlcyA9CiAJCQkgICAgUEhZU19UT19WTV9QQUdFKHBh ZGRyKTsKIAkJCXBhZGRyICs9IFZNX0xFVkVMXzBfU0laRTsKSW5kZXg6IHZtL3ZtX3BhZ2Uu Ywo9PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09 PT09PT09PT09PT09PT09Ci0tLSB2bS92bV9wYWdlLmMJKHJldmlzaW9uIDIyMDEwMikKKysr IHZtL3ZtX3BhZ2UuYwkod29ya2luZyBjb3B5KQpAQCAtMzg5LDggKzM4OSw4IEBAIHZtX3Bh Z2Vfc3RhcnR1cCh2bV9vZmZzZXRfdCB2YWRkcikKIAlmaXJzdF9wYWdlID0gbG93X3dhdGVy IC8gUEFHRV9TSVpFOwogI2lmZGVmIFZNX1BIWVNTRUdfU1BBUlNFCiAJcGFnZV9yYW5nZSA9 IDA7Ci0JZm9yIChpID0gMDsgcGh5c19hdmFpbFtpICsgMV0gIT0gMDsgaSArPSAyKQotCQlw YWdlX3JhbmdlICs9IGF0b3AocGh5c19hdmFpbFtpICsgMV0gLSBwaHlzX2F2YWlsW2ldKTsK Kwlmb3IgKGkgPSAwOyBkdW1wX2F2YWlsW2kgKyAxXSAhPSAwOyBpICs9IDIpCisJCXBhZ2Vf cmFuZ2UgKz0gYXRvcChkdW1wX2F2YWlsW2kgKyAxXSAtIGR1bXBfYXZhaWxbaV0pOwogI2Vs aWYgZGVmaW5lZChWTV9QSFlTU0VHX0RFTlNFKQogCXBhZ2VfcmFuZ2UgPSBoaWdoX3dhdGVy IC8gUEFHRV9TSVpFIC0gZmlyc3RfcGFnZTsKICNlbHNlCkluZGV4OiBhbWQ2NC9pbmNsdWRl L3ZtcGFyYW0uaAo9PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09 PT09PT09PT09PT09PT09PT09PT09PT09Ci0tLSBhbWQ2NC9pbmNsdWRlL3ZtcGFyYW0uaAko cmV2aXNpb24gMjIwMTAyKQorKysgYW1kNjQvaW5jbHVkZS92bXBhcmFtLmgJKHdvcmtpbmcg Y29weSkKQEAgLTc5LDcgKzc5LDcgQEAKIC8qCiAgKiBUaGUgcGh5c2ljYWwgYWRkcmVzcyBz cGFjZSBpcyBkZW5zZWx5IHBvcHVsYXRlZC4KICAqLwotI2RlZmluZQlWTV9QSFlTU0VHX0RF TlNFCisjZGVmaW5lCVZNX1BIWVNTRUdfU1BBUlNFCiAKIC8qCiAgKiBUaGUgbnVtYmVyIG9m IFBIWVNTRUcgZW50cmllcyBtdXN0IGJlIG9uZSBncmVhdGVyIHRoYW4gdGhlIG51bWJlcgo= --------------030707070306010907080705--