Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 7 Feb 2011 18:11:03 +0300
From:      Sergey Kandaurov <pluknet@gmail.com>
To:        alc@freebsd.org
Cc:        freebsd-hackers@freebsd.org, Konstantin Belousov <kib@freebsd.org>
Subject:   Re: [rfc] allow to boot with >= 256GB physmem
Message-ID:  <AANLkTin9UNV6SfGvV0jnxOX7WzzKzXNRiL%2BN%2Bx0NCOEE@mail.gmail.com>
In-Reply-To: <AANLkTi=7EwtJLVJQNM5pbtSXz_iBYS55pfjuaG7JWcUY@mail.gmail.com>
References:  <AANLkTikt5=2L0rHyGbsjvG8eV6Ve4JkRM_pcyNiAsPu8@mail.gmail.com> <201101211244.13830.jhb@freebsd.org> <AANLkTinWBkd7BuO40DhuRNgKx=5dyEUP9wMesMV_zx2J@mail.gmail.com> <AANLkTi=7EwtJLVJQNM5pbtSXz_iBYS55pfjuaG7JWcUY@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
--0016363b8848ed2be3049bb2a3d8
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

On 22 January 2011 00:43, Alan Cox <alan.l.cox@gmail.com> wrote:
> On Fri, Jan 21, 2011 at 2:58 PM, Alan Cox <alan.l.cox@gmail.com> wrote:
>>
>> On Fri, Jan 21, 2011 at 11:44 AM, John Baldwin <jhb@freebsd.org> wrote:
>>>
>>> On Friday, January 21, 2011 11:09:10 am Sergey Kandaurov wrote:
>>> > Hello.
>>> >
>>> > Some time ago I faced with a problem booting with 400GB physmem.
>>> > The problem is that vm.max_proc_mmap type overflows with
>>> > such high value, and that results in a broken mmap() syscall.
>>> > The max_proc_mmap value is a signed int and roughly calculated
>>> > at vmmapentry_rsrc_init() as u_long vm_kmem_size quotient:
>>> > vm_kmem_size / sizeof(struct vm_map_entry) / 100.
>>> >
>>> > Although at the time it was introduced at svn r57263 the value
>>> > was quite low (f.e. the related commit log stands:
>>> > "The value defaults to around 9000 for a 128MB machine."),
>>> > the problem is observed on amd64 where KVA space after
>>> > r212784 is factually bound to the only physical memory size.
>>> >
>>> > With INT_MAX here is 0x7fffffff, and sizeof(struct vm_map_entry)
>>> > is 120, it's enough to have sligthly less than 256GB to be able
>>> > to reproduce the problem.
>>> >
>>> > I rewrote vmmapentry_rsrc_init() to set large enough limit for
>>> > max_proc_mmap just to protect from integer type overflow.
>>> > As it's also possible to live tune this value, I also added a
>>> > simple anti-shoot constraint to its sysctl handler.
>>> > I'm not sure though if it's worth to commit the second part.
>>> >
>>> > As this patch may cause some bikeshedding,
>>> > I'd like to hear your comments before I will commit it.
>>> >
>>> > http://plukky.net/~pluknet/patches/max_proc_mmap.diff
>>>
>>> Is there any reason we can't just make this variable and sysctl a long?
>>>
>>
>> Or just delete it.
>>
>> 1. Contrary to what the commit message says, this sysctl does not
>> effectively limit the number of vm map entries.=A0 It only limits the nu=
mber
>> that are created by one system call, mmap().=A0 Other system calls creat=
e vm
>> map entries just as easily, for example, mprotect(), madvise(), mlock(),=
 and
>> minherit().=A0 Basically, anything that alters the properties of a mappi=
ng.
>> Thus, in 2000, after this sysctl was added, the same resource exhaustion
>> induced crash could have been reproduced by trivially changing the progr=
am
>> in PR/16573 to do an mprotect() or two.
>>
>> In a nutshell, if you want to really limit the number of vm map entries
>> that a process can allocate, the implementation is a bit more involved t=
han
>> what was done for this sysctl.
>>
>> 2. UMA implements M_WAITOK, whereas the old zone allocator in 2000 did
>> not.=A0 Moreover, vm map entries for user maps are allocated with M_WAIT=
OK.
>> So, the exact crash reported in PR/16573 couldn't happen any longer.
>>
>
> Actually, I take back part of what I said here.=A0 The old zone allocator=
 did
> implement something like M_WAITOK, and that appears to have been used for
> user maps.=A0 However, the crash described in PR/16573 was actually on th=
e
> allocation of a vm map entry within the *kernel* address space for a proc=
ess
> U area.=A0 This type of allocation did not use the old zone allocator's
> equivalent to M_WAITOK.=A0 However, we no longer have U areas, so the exa=
ct
> crash scenario is clearly no longer possible.=A0 Interestingly, the sysct=
l in
> question has no direct effect on the allocation of kernel vm map entries.
>
> So, I remain skeptical that this sysctl is preventing any resource
> exhaustion based panics in the current kernel.=A0 Again, I would be thril=
led
> to see one or more people do some testing, such as rerunning the program
> from PR/16573.
>
>
>> 3. We now have the "vmemoryuse" resource limit.=A0 When this sysctl was
>> defined, we didn't.=A0 Limiting the virtual memory indirectly but effect=
ively
>> limits the number of vm map entries that a process can allocate.
>>
>> In summary, I would do a little due diligence, for example, run the
>> program from PR/16573 with the limit disabled.=A0 If you can't reproduce=
 the
>> crash, in other words, nothing contradicts point #2 above, then I would =
just
>> delete this sysctl.
>>

I tried the test from PR/16573 running as root. If unmodified it just quick=
ly
bounds on kern.maxproc limit. So, I added signal(SIGCHLD, SIG_IGN); to not
create zombie processes at all to give it more workload. With this change i=
t
also survived. Submitter reported that it crashes with 10000 iterations.
After increasing the limit up to 1000000 I still couldn't get it to crash.

* The testing was done with commented out max_proc_mmap part.
The change effectively reverts r57263.

--=20
wbr,
pluknet

--0016363b8848ed2be3049bb2a3d8
Content-Type: application/octet-stream; name="vm_mmap_maxprocmmap.diff"
Content-Disposition: attachment; filename="vm_mmap_maxprocmmap.diff"
Content-Transfer-Encoding: base64
X-Attachment-Id: f_gjviptnp0

SW5kZXg6IC9zeXMvdm0vdm1fbW1hcC5jCj09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09
PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT0KLS0tIC9zeXMvdm0vdm1fbW1hcC5j
CShyZXZpc2lvbiAyMTgwMjYpCisrKyAvc3lzL3ZtL3ZtX21tYXAuYwkod29ya2luZyBjb3B5KQpA
QCAtNDgsNyArNDgsNiBAQAogCiAjaW5jbHVkZSA8c3lzL3BhcmFtLmg+CiAjaW5jbHVkZSA8c3lz
L3N5c3RtLmg+Ci0jaW5jbHVkZSA8c3lzL2tlcm5lbC5oPgogI2luY2x1ZGUgPHN5cy9sb2NrLmg+
CiAjaW5jbHVkZSA8c3lzL211dGV4Lmg+CiAjaW5jbHVkZSA8c3lzL3N5c3Byb3RvLmg+CkBAIC02
Niw3ICs2NSw2IEBACiAjaW5jbHVkZSA8c3lzL3N0YXQuaD4KICNpbmNsdWRlIDxzeXMvc3lzZW50
Lmg+CiAjaW5jbHVkZSA8c3lzL3ZtbWV0ZXIuaD4KLSNpbmNsdWRlIDxzeXMvc3lzY3RsLmg+CiAK
ICNpbmNsdWRlIDxzZWN1cml0eS9tYWMvbWFjX2ZyYW1ld29yay5oPgogCkBAIC04MCw3ICs3OCw2
IEBACiAjaW5jbHVkZSA8dm0vdm1fcGFnZW91dC5oPgogI2luY2x1ZGUgPHZtL3ZtX2V4dGVybi5o
PgogI2luY2x1ZGUgPHZtL3ZtX3BhZ2UuaD4KLSNpbmNsdWRlIDx2bS92bV9rZXJuLmg+CiAKICNp
ZmRlZiBIV1BNQ19IT09LUwogI2luY2x1ZGUgPHN5cy9wbWNrZXJuLmg+CkBAIC05MiwzMCArODks
NiBAQAogfTsKICNlbmRpZgogCi1zdGF0aWMgaW50IG1heF9wcm9jX21tYXA7Ci1TWVNDVExfSU5U
KF92bSwgT0lEX0FVVE8sIG1heF9wcm9jX21tYXAsIENUTEZMQUdfUlcsICZtYXhfcHJvY19tbWFw
LCAwLAotICAgICJNYXhpbXVtIG51bWJlciBvZiBtZW1vcnktbWFwcGVkIGZpbGVzIHBlciBwcm9j
ZXNzIik7Ci0KLS8qCi0gKiBTZXQgdGhlIG1heGltdW0gbnVtYmVyIG9mIHZtX21hcF9lbnRyeSBz
dHJ1Y3R1cmVzIHBlciBwcm9jZXNzLiAgUm91Z2hseQotICogc3BlYWtpbmcgdm1fbWFwX2VudHJ5
IHN0cnVjdHVyZXMgYXJlIHRpbnksIHNvIGFsbG93aW5nIHRoZW0gdG8gZWF0IDEvMTAwCi0gKiBv
ZiBvdXIgS1ZNIG1hbGxvYyBzcGFjZSBzdGlsbCByZXN1bHRzIGluIGdlbmVyb3VzIGxpbWl0cy4g
IFdlIHdhbnQgYQotICogZGVmYXVsdCB0aGF0IGlzIGdvb2QgZW5vdWdoIHRvIHByZXZlbnQgdGhl
IGtlcm5lbCBydW5uaW5nIG91dCBvZiByZXNvdXJjZXMKLSAqIGlmIGF0dGFja2VkIGZyb20gY29t
cHJvbWlzZWQgdXNlciBhY2NvdW50IGJ1dCBnZW5lcm91cyBlbm91Z2ggc3VjaCB0aGF0Ci0gKiBt
dWx0aS10aHJlYWRlZCBwcm9jZXNzZXMgYXJlIG5vdCB1bmR1bHkgaW5jb252ZW5pZW5jZWQuCi0g
Ki8KLXN0YXRpYyB2b2lkIHZtbWFwZW50cnlfcnNyY19pbml0KHZvaWQgKik7Ci1TWVNJTklUKHZt
bWVyc3JjLCBTSV9TVUJfS1ZNX1JTUkMsIFNJX09SREVSX0ZJUlNULCB2bW1hcGVudHJ5X3JzcmNf
aW5pdCwKLSAgICBOVUxMKTsKLQotc3RhdGljIHZvaWQKLXZtbWFwZW50cnlfcnNyY19pbml0KGR1
bW15KQotICAgICAgICB2b2lkICpkdW1teTsKLXsKLSAgICBtYXhfcHJvY19tbWFwID0gdm1fa21l
bV9zaXplIC8gc2l6ZW9mKHN0cnVjdCB2bV9tYXBfZW50cnkpOwotICAgIG1heF9wcm9jX21tYXAg
Lz0gMTAwOwotfQotCiBzdGF0aWMgaW50IHZtX21tYXBfdm5vZGUoc3RydWN0IHRocmVhZCAqLCB2
bV9zaXplX3QsIHZtX3Byb3RfdCwgdm1fcHJvdF90ICosCiAgICAgaW50ICosIHN0cnVjdCB2bm9k
ZSAqLCB2bV9vb2Zmc2V0X3QgKiwgdm1fb2JqZWN0X3QgKik7CiBzdGF0aWMgaW50IHZtX21tYXBf
Y2RldihzdHJ1Y3QgdGhyZWFkICosIHZtX3NpemVfdCwgdm1fcHJvdF90LCB2bV9wcm90X3QgKiwK
QEAgLTM3NywxOCArMzUwLDYgQEAKIAkJaGFuZGxlX3R5cGUgPSBPQkpUX1ZOT0RFOwogCX0KIG1h
cDoKLQotCS8qCi0JICogRG8gbm90IGFsbG93IG1vcmUgdGhlbiBhIGNlcnRhaW4gbnVtYmVyIG9m
IHZtX21hcF9lbnRyeSBzdHJ1Y3R1cmVzCi0JICogcGVyIHByb2Nlc3MuICBTY2FsZSB3aXRoIHRo
ZSBudW1iZXIgb2YgcmZvcmtzIHNoYXJpbmcgdGhlIG1hcAotCSAqIHRvIG1ha2UgdGhlIGxpbWl0
IHJlYXNvbmFibGUgZm9yIHRocmVhZHMuCi0JICovCi0JaWYgKG1heF9wcm9jX21tYXAgJiYKLQkg
ICAgdm1zLT52bV9tYXAubmVudHJpZXMgPj0gbWF4X3Byb2NfbW1hcCAqIHZtcy0+dm1fcmVmY250
KSB7Ci0JCWVycm9yID0gRU5PTUVNOwotCQlnb3RvIGRvbmU7Ci0JfQotCiAJdGQtPnRkX2Zwb3Ag
PSBmcDsKIAllcnJvciA9IHZtX21tYXAoJnZtcy0+dm1fbWFwLCAmYWRkciwgc2l6ZSwgcHJvdCwg
bWF4cHJvdCwKIAkgICAgZmxhZ3MsIGhhbmRsZV90eXBlLCBoYW5kbGUsIHBvcyk7Cg==
--0016363b8848ed2be3049bb2a3d8--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?AANLkTin9UNV6SfGvV0jnxOX7WzzKzXNRiL%2BN%2Bx0NCOEE>