From owner-freebsd-stable@FreeBSD.ORG Sat Jul 26 03:30:51 2003 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 2381837B401 for ; Sat, 26 Jul 2003 03:30:51 -0700 (PDT) Received: from smithers.nildram.co.uk (smithers.nildram.co.uk [195.112.4.34]) by mx1.FreeBSD.org (Postfix) with ESMTP id C5D3D43F93 for ; Sat, 26 Jul 2003 03:30:49 -0700 (PDT) (envelope-from muttley@gotadsl.co.uk) Received: from [192.168.0.7] (muttley.gotadsl.co.uk [213.208.123.26]) by smithers.nildram.co.uk (Postfix) with ESMTP id 947CD25B27E; Sat, 26 Jul 2003 11:30:45 +0100 (BST) Date: Sat, 26 Jul 2003 11:35:26 +0100 From: Matthew Whelan To: Chuck Swiger In-Reply-To: <3F20692E.2060107@mac.com> References: <20030724155926.7305F231C11@smithers.nildram.co.uk> <3F20692E.2060107@mac.com> Message-Id: <20030725212248.81E9.MUTTLEY@gotadsl.co.uk> MIME-Version: 1.0 Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: 7bit X-Mailer: Becky! ver. 2.05.04 cc: freebsd-stable@freebsd.org Subject: Re: malloc does not return null when out of memory X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 26 Jul 2003 10:30:51 -0000 On Thu, 24 Jul 2003 19:18:06 -0400 Chuck Swiger wrote: > Muttley wrote: > > Yes, I thought briefly about something like this. > > > > Then I thought 'there's a race condition'. > > Where? The FreeBSD implementation is wrapped in a THREAD_LOCK()...? Good point, well made > > Then I realised that other processes might not link against this malloc. > > Perhaps. Statically linked binaries, for example. Maybe Linux ones too? > > Then I realised the race condition doesn't even matter; processes will > > still be killed, as the kernel doesn't care that you're still in > > malloc() when the overcommitted memory is touched, it just knows you've > > touched it and there's no actual memory there. This will result in far > > more processes being killed. I believe that's a bad thing. > > Someone stated that it was a problem that malloc() returned pointers to > virtual address space that had been mapped but not allocated. This patch does > not guarantee that malloc() will return, but, if malloc() does returns a > pointer, using the memory being pointed to will refer to memory that is > allocated. Their main problem was that when memory ran out, processes got killed. The fact the process gets killed earlier doesn't alter the fact that it was killed. > As Barny Wolff said: > > Won't this merely die in malloc, not return 0? > > True. This isn't a perfect solution, but given the choice between: > > 1) malloc(LOTS) returning a pointer, and then sometime later the program dies > with a bus error when using that memory because no more VM is available, or > > 2) malloc(LOTS) causing an immediate failure in malloc(), > > ...choice #2 appears to be significantly better. > > Figuring out what went wrong from a coredump or backtrace for #2 when the > signal happens in malloc() should be obvious; determining why the program > crashed in the middle of referencing memory in some large buffer is > potentially misleading. If you re-read the original post in this thread, you will see that this appeared in the poster's syslog: Jul 23 01:37:57 m0n0wall /kernel: pid 80 (racoon), uid 0, was killed: out of swap space Finding out why the process died was never the big worry. Besides, see below... > Programs which take care to preallocate regions of memory they need before > they start doing a transaction or some other operation that needs to be atomic > would also prefer #2; the patch I proposed could have a beneficial impact on > data integrity for such programs. > Except that the process which cops the bullet in the head is the largest runnable non-system process. Check /usr/src/sys/vm/vm_pageout.c near the end of vm_pageout_scan(). Neither data integrity nor debugging from cores is helped by the patch. > -- > > People who encounter programs crashing in malloc() are likely going to > continue to complain about malloc() not returning NULL when the system is out > of memory. > > If malloc() is referencing memory before returning the pointer, means that the > system is going to reserve VM resources with temporal locality towards memory > _allocation_ rather than memory _reference_. Having the program crash at > memory allocation time rather than usage helps identify when and where this > problem actually happens more clearly, if only by a little bit. See above, and above that. > I'm not sure whether allocating memory sooner that way will make it more > likely that brk()/sbrk() or mmap() will return ENOMEM to the libc malloc() > implementation, but if it does not help, perhaps that means something and > we've identified the location of problem more precisely. Other posts suggest these calls won't ever return ENOMEM based on total system usage, as the kernel doesn't even track it. Even if they went for something in the style of Brent's description of vswap, surely this patch would effectively prevent ENOMEM due forcing the overcommit to zero by killing something as soon as a process requests memory beyond the total physical+swap. I don't see how you can change the undesired behaviour in userland. Cheers, Matt -- Matthew Whelan