From owner-freebsd-current@FreeBSD.ORG Fri Jan 4 00:26:32 2008 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 6153816A417; Fri, 4 Jan 2008 00:26:32 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from cyrus.watson.org (cyrus.watson.org [209.31.154.42]) by mx1.freebsd.org (Postfix) with ESMTP id 2A80313C461; Fri, 4 Jan 2008 00:26:32 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from fledge.watson.org (fledge.watson.org [209.31.154.41]) by cyrus.watson.org (Postfix) with ESMTP id E3EFB483F4; Thu, 3 Jan 2008 19:26:31 -0500 (EST) Date: Fri, 4 Jan 2008 00:26:31 +0000 (GMT) From: Robert Watson X-X-Sender: robert@fledge.watson.org To: =?utf-8?Q?Dag-Erling_Sm=C3=B8rgrav?= In-Reply-To: <863ateemw2.fsf@ds4.des.no> Message-ID: <20080104002002.L30578@fledge.watson.org> References: <477C82F0.5060809@freebsd.org> <863ateemw2.fsf@ds4.des.no> MIME-Version: 1.0 Content-Type: MULTIPART/MIXED; BOUNDARY="621616949-1713681321-1199406391=:30578" Cc: freebsd-current@freebsd.org, Jason Evans , Poul-Henning Kamp Subject: Re: sbrk(2) broken X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 04 Jan 2008 00:26:32 -0000 This message is in MIME format. The first part should be readable text, while the remaining parts are likely unreadable without MIME-aware tools. --621616949-1713681321-1199406391=:30578 Content-Type: TEXT/PLAIN; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: QUOTED-PRINTABLE On Thu, 3 Jan 2008, Dag-Erling Sm=F8rgrav wrote: > Jason Evans writes: >> [sbrk is broken] > > The real question is why we would revert perfectly good code (jemalloc) f= rom=20 > using a modern interface to using one that has been obsolete for twenty= =20 > years, and marked as such in the man page for seven years. > > If rwatson@ wants malloc() to respect resource limits, he can bloody well= =20 > fix mmap(). Until he does, the datasize limit is a joke anyway, as anyon= e=20 > can circumvent it by either using mmap() instead of malloc() or setting= =20 > _malloc_options before calling malloc(). The issue here was that there were a number of reports that out-of-control= =20 applications were toasting systems that weren't getting toasted under 6.x. = I=20 experienced this on my web server, but the ports build cluster has been=20 running into it for months. The symptom is that a single application exhau= sts=20 swap, causing all sorts of things to break (tm), killing of other large=20 processes, etc. To be clear, in the new world order, instead of getting NU= LL=20 back from malloc(3), SIGKILL is delivered to large processes. When I e-mailed Jason Evans and Alan Cox about it, I suggested that we=20 actually teach malloc(3) to enforce an allocation limit itself by querying = a=20 limit once at process startup, and then using its own accounting to decide= =20 when to start failing requests. As an alternative model that would require= =20 some more infrastructural changes, I suggested a new mmap() flag that hinte= d=20 to the kernel that the page should count against a swap/anonymous memory=20 limit, but that we should avoid more serious changes at the last minute bef= ore=20 a release. Alan suggested the the model Jason ended up implementing as a= =20 lower risk way to restore the 6.x resource limits non-disruptively. As it= =20 turned out, this proved much more complicated than expected. The right answer is presumably to introduce a new LIMIT_SWAP, which limits = the=20 allocation of anonymous memory by processes, and size it to something like = 90%=20 of swap space by default. Since that won't be happening before 7.0, I beli= eve=20 the consensus is to simply not MFC the changes for 7 and proceed with the= =20 release. However, having a resource limit on swap use in order to prevent = the=20 above scenario is actually quite important: SIGKILL of arbitrary processes = is=20 not a good way to deal with one run-away process, and the virtual memory si= ze=20 limit, while also useful, prevents you from limiting the allocation of swap= =20 without also limiting memory mapping. So wouldn't help, for example, to li= mit=20 swap used by a web cache that memory mapped cache files. Robert N M Watson Computer Laboratory University of Cambridge --621616949-1713681321-1199406391=:30578--