Date: Mon, 7 Feb 2011 18:11:03 +0300 From: Sergey Kandaurov <pluknet@gmail.com> To: alc@freebsd.org Cc: freebsd-hackers@freebsd.org, Konstantin Belousov <kib@freebsd.org> Subject: Re: [rfc] allow to boot with >= 256GB physmem Message-ID: <AANLkTin9UNV6SfGvV0jnxOX7WzzKzXNRiL%2BN%2Bx0NCOEE@mail.gmail.com> In-Reply-To: <AANLkTi=7EwtJLVJQNM5pbtSXz_iBYS55pfjuaG7JWcUY@mail.gmail.com> References: <AANLkTikt5=2L0rHyGbsjvG8eV6Ve4JkRM_pcyNiAsPu8@mail.gmail.com> <201101211244.13830.jhb@freebsd.org> <AANLkTinWBkd7BuO40DhuRNgKx=5dyEUP9wMesMV_zx2J@mail.gmail.com> <AANLkTi=7EwtJLVJQNM5pbtSXz_iBYS55pfjuaG7JWcUY@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
--0016363b8848ed2be3049bb2a3d8 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable On 22 January 2011 00:43, Alan Cox <alan.l.cox@gmail.com> wrote: > On Fri, Jan 21, 2011 at 2:58 PM, Alan Cox <alan.l.cox@gmail.com> wrote: >> >> On Fri, Jan 21, 2011 at 11:44 AM, John Baldwin <jhb@freebsd.org> wrote: >>> >>> On Friday, January 21, 2011 11:09:10 am Sergey Kandaurov wrote: >>> > Hello. >>> > >>> > Some time ago I faced with a problem booting with 400GB physmem. >>> > The problem is that vm.max_proc_mmap type overflows with >>> > such high value, and that results in a broken mmap() syscall. >>> > The max_proc_mmap value is a signed int and roughly calculated >>> > at vmmapentry_rsrc_init() as u_long vm_kmem_size quotient: >>> > vm_kmem_size / sizeof(struct vm_map_entry) / 100. >>> > >>> > Although at the time it was introduced at svn r57263 the value >>> > was quite low (f.e. the related commit log stands: >>> > "The value defaults to around 9000 for a 128MB machine."), >>> > the problem is observed on amd64 where KVA space after >>> > r212784 is factually bound to the only physical memory size. >>> > >>> > With INT_MAX here is 0x7fffffff, and sizeof(struct vm_map_entry) >>> > is 120, it's enough to have sligthly less than 256GB to be able >>> > to reproduce the problem. >>> > >>> > I rewrote vmmapentry_rsrc_init() to set large enough limit for >>> > max_proc_mmap just to protect from integer type overflow. >>> > As it's also possible to live tune this value, I also added a >>> > simple anti-shoot constraint to its sysctl handler. >>> > I'm not sure though if it's worth to commit the second part. >>> > >>> > As this patch may cause some bikeshedding, >>> > I'd like to hear your comments before I will commit it. >>> > >>> > http://plukky.net/~pluknet/patches/max_proc_mmap.diff >>> >>> Is there any reason we can't just make this variable and sysctl a long? >>> >> >> Or just delete it. >> >> 1. Contrary to what the commit message says, this sysctl does not >> effectively limit the number of vm map entries.=A0 It only limits the nu= mber >> that are created by one system call, mmap().=A0 Other system calls creat= e vm >> map entries just as easily, for example, mprotect(), madvise(), mlock(),= and >> minherit().=A0 Basically, anything that alters the properties of a mappi= ng. >> Thus, in 2000, after this sysctl was added, the same resource exhaustion >> induced crash could have been reproduced by trivially changing the progr= am >> in PR/16573 to do an mprotect() or two. >> >> In a nutshell, if you want to really limit the number of vm map entries >> that a process can allocate, the implementation is a bit more involved t= han >> what was done for this sysctl. >> >> 2. UMA implements M_WAITOK, whereas the old zone allocator in 2000 did >> not.=A0 Moreover, vm map entries for user maps are allocated with M_WAIT= OK. >> So, the exact crash reported in PR/16573 couldn't happen any longer. >> > > Actually, I take back part of what I said here.=A0 The old zone allocator= did > implement something like M_WAITOK, and that appears to have been used for > user maps.=A0 However, the crash described in PR/16573 was actually on th= e > allocation of a vm map entry within the *kernel* address space for a proc= ess > U area.=A0 This type of allocation did not use the old zone allocator's > equivalent to M_WAITOK.=A0 However, we no longer have U areas, so the exa= ct > crash scenario is clearly no longer possible.=A0 Interestingly, the sysct= l in > question has no direct effect on the allocation of kernel vm map entries. > > So, I remain skeptical that this sysctl is preventing any resource > exhaustion based panics in the current kernel.=A0 Again, I would be thril= led > to see one or more people do some testing, such as rerunning the program > from PR/16573. > > >> 3. We now have the "vmemoryuse" resource limit.=A0 When this sysctl was >> defined, we didn't.=A0 Limiting the virtual memory indirectly but effect= ively >> limits the number of vm map entries that a process can allocate. >> >> In summary, I would do a little due diligence, for example, run the >> program from PR/16573 with the limit disabled.=A0 If you can't reproduce= the >> crash, in other words, nothing contradicts point #2 above, then I would = just >> delete this sysctl. >> I tried the test from PR/16573 running as root. If unmodified it just quick= ly bounds on kern.maxproc limit. So, I added signal(SIGCHLD, SIG_IGN); to not create zombie processes at all to give it more workload. With this change i= t also survived. Submitter reported that it crashes with 10000 iterations. After increasing the limit up to 1000000 I still couldn't get it to crash. * The testing was done with commented out max_proc_mmap part. The change effectively reverts r57263. --=20 wbr, pluknet --0016363b8848ed2be3049bb2a3d8 Content-Type: application/octet-stream; name="vm_mmap_maxprocmmap.diff" Content-Disposition: attachment; filename="vm_mmap_maxprocmmap.diff" Content-Transfer-Encoding: base64 X-Attachment-Id: f_gjviptnp0 SW5kZXg6IC9zeXMvdm0vdm1fbW1hcC5jCj09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09 PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT0KLS0tIC9zeXMvdm0vdm1fbW1hcC5j CShyZXZpc2lvbiAyMTgwMjYpCisrKyAvc3lzL3ZtL3ZtX21tYXAuYwkod29ya2luZyBjb3B5KQpA QCAtNDgsNyArNDgsNiBAQAogCiAjaW5jbHVkZSA8c3lzL3BhcmFtLmg+CiAjaW5jbHVkZSA8c3lz L3N5c3RtLmg+Ci0jaW5jbHVkZSA8c3lzL2tlcm5lbC5oPgogI2luY2x1ZGUgPHN5cy9sb2NrLmg+ CiAjaW5jbHVkZSA8c3lzL211dGV4Lmg+CiAjaW5jbHVkZSA8c3lzL3N5c3Byb3RvLmg+CkBAIC02 Niw3ICs2NSw2IEBACiAjaW5jbHVkZSA8c3lzL3N0YXQuaD4KICNpbmNsdWRlIDxzeXMvc3lzZW50 Lmg+CiAjaW5jbHVkZSA8c3lzL3ZtbWV0ZXIuaD4KLSNpbmNsdWRlIDxzeXMvc3lzY3RsLmg+CiAK ICNpbmNsdWRlIDxzZWN1cml0eS9tYWMvbWFjX2ZyYW1ld29yay5oPgogCkBAIC04MCw3ICs3OCw2 IEBACiAjaW5jbHVkZSA8dm0vdm1fcGFnZW91dC5oPgogI2luY2x1ZGUgPHZtL3ZtX2V4dGVybi5o PgogI2luY2x1ZGUgPHZtL3ZtX3BhZ2UuaD4KLSNpbmNsdWRlIDx2bS92bV9rZXJuLmg+CiAKICNp ZmRlZiBIV1BNQ19IT09LUwogI2luY2x1ZGUgPHN5cy9wbWNrZXJuLmg+CkBAIC05MiwzMCArODks NiBAQAogfTsKICNlbmRpZgogCi1zdGF0aWMgaW50IG1heF9wcm9jX21tYXA7Ci1TWVNDVExfSU5U KF92bSwgT0lEX0FVVE8sIG1heF9wcm9jX21tYXAsIENUTEZMQUdfUlcsICZtYXhfcHJvY19tbWFw LCAwLAotICAgICJNYXhpbXVtIG51bWJlciBvZiBtZW1vcnktbWFwcGVkIGZpbGVzIHBlciBwcm9j ZXNzIik7Ci0KLS8qCi0gKiBTZXQgdGhlIG1heGltdW0gbnVtYmVyIG9mIHZtX21hcF9lbnRyeSBz dHJ1Y3R1cmVzIHBlciBwcm9jZXNzLiAgUm91Z2hseQotICogc3BlYWtpbmcgdm1fbWFwX2VudHJ5 IHN0cnVjdHVyZXMgYXJlIHRpbnksIHNvIGFsbG93aW5nIHRoZW0gdG8gZWF0IDEvMTAwCi0gKiBv ZiBvdXIgS1ZNIG1hbGxvYyBzcGFjZSBzdGlsbCByZXN1bHRzIGluIGdlbmVyb3VzIGxpbWl0cy4g IFdlIHdhbnQgYQotICogZGVmYXVsdCB0aGF0IGlzIGdvb2QgZW5vdWdoIHRvIHByZXZlbnQgdGhl IGtlcm5lbCBydW5uaW5nIG91dCBvZiByZXNvdXJjZXMKLSAqIGlmIGF0dGFja2VkIGZyb20gY29t cHJvbWlzZWQgdXNlciBhY2NvdW50IGJ1dCBnZW5lcm91cyBlbm91Z2ggc3VjaCB0aGF0Ci0gKiBt dWx0aS10aHJlYWRlZCBwcm9jZXNzZXMgYXJlIG5vdCB1bmR1bHkgaW5jb252ZW5pZW5jZWQuCi0g Ki8KLXN0YXRpYyB2b2lkIHZtbWFwZW50cnlfcnNyY19pbml0KHZvaWQgKik7Ci1TWVNJTklUKHZt bWVyc3JjLCBTSV9TVUJfS1ZNX1JTUkMsIFNJX09SREVSX0ZJUlNULCB2bW1hcGVudHJ5X3JzcmNf aW5pdCwKLSAgICBOVUxMKTsKLQotc3RhdGljIHZvaWQKLXZtbWFwZW50cnlfcnNyY19pbml0KGR1 bW15KQotICAgICAgICB2b2lkICpkdW1teTsKLXsKLSAgICBtYXhfcHJvY19tbWFwID0gdm1fa21l bV9zaXplIC8gc2l6ZW9mKHN0cnVjdCB2bV9tYXBfZW50cnkpOwotICAgIG1heF9wcm9jX21tYXAg Lz0gMTAwOwotfQotCiBzdGF0aWMgaW50IHZtX21tYXBfdm5vZGUoc3RydWN0IHRocmVhZCAqLCB2 bV9zaXplX3QsIHZtX3Byb3RfdCwgdm1fcHJvdF90ICosCiAgICAgaW50ICosIHN0cnVjdCB2bm9k ZSAqLCB2bV9vb2Zmc2V0X3QgKiwgdm1fb2JqZWN0X3QgKik7CiBzdGF0aWMgaW50IHZtX21tYXBf Y2RldihzdHJ1Y3QgdGhyZWFkICosIHZtX3NpemVfdCwgdm1fcHJvdF90LCB2bV9wcm90X3QgKiwK QEAgLTM3NywxOCArMzUwLDYgQEAKIAkJaGFuZGxlX3R5cGUgPSBPQkpUX1ZOT0RFOwogCX0KIG1h cDoKLQotCS8qCi0JICogRG8gbm90IGFsbG93IG1vcmUgdGhlbiBhIGNlcnRhaW4gbnVtYmVyIG9m IHZtX21hcF9lbnRyeSBzdHJ1Y3R1cmVzCi0JICogcGVyIHByb2Nlc3MuICBTY2FsZSB3aXRoIHRo ZSBudW1iZXIgb2YgcmZvcmtzIHNoYXJpbmcgdGhlIG1hcAotCSAqIHRvIG1ha2UgdGhlIGxpbWl0 IHJlYXNvbmFibGUgZm9yIHRocmVhZHMuCi0JICovCi0JaWYgKG1heF9wcm9jX21tYXAgJiYKLQkg ICAgdm1zLT52bV9tYXAubmVudHJpZXMgPj0gbWF4X3Byb2NfbW1hcCAqIHZtcy0+dm1fcmVmY250 KSB7Ci0JCWVycm9yID0gRU5PTUVNOwotCQlnb3RvIGRvbmU7Ci0JfQotCiAJdGQtPnRkX2Zwb3Ag PSBmcDsKIAllcnJvciA9IHZtX21tYXAoJnZtcy0+dm1fbWFwLCAmYWRkciwgc2l6ZSwgcHJvdCwg bWF4cHJvdCwKIAkgICAgZmxhZ3MsIGhhbmRsZV90eXBlLCBoYW5kbGUsIHBvcyk7Cg== --0016363b8848ed2be3049bb2a3d8--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?AANLkTin9UNV6SfGvV0jnxOX7WzzKzXNRiL%2BN%2Bx0NCOEE>