From owner-freebsd-arch@FreeBSD.ORG Sat Dec 24 06:16:44 2011 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 4C737106564A; Sat, 24 Dec 2011 06:16:44 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail05.syd.optusnet.com.au (mail05.syd.optusnet.com.au [211.29.132.186]) by mx1.freebsd.org (Postfix) with ESMTP id DDB1C8FC15; Sat, 24 Dec 2011 06:16:43 +0000 (UTC) Received: from c211-30-171-136.carlnfd1.nsw.optusnet.com.au (c211-30-171-136.carlnfd1.nsw.optusnet.com.au [211.30.171.136]) by mail05.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id pBO6GXLX018222 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Sat, 24 Dec 2011 17:16:42 +1100 Date: Sat, 24 Dec 2011 17:16:33 +1100 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Alexander Best In-Reply-To: <20111223235642.GA37495@freebsd.org> Message-ID: <20111224160050.T1141@besplex.bde.org> References: <20111223235642.GA37495@freebsd.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-current@freebsd.org, freebsd-arch@freebsd.org Subject: Re: [rfc] removing -mpreferred-stack-boundary=2 flag for i386? X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 24 Dec 2011 06:16:44 -0000 On Fri, 23 Dec 2011, Alexander Best wrote: > is -mpreferred-stack-boundary=2 really necessary for i386 builds any longer? > i built GENERIC (including modules) with and without that flag. the results > are: The same as it has always been. It avoids some bloat. > 1654496 bytes with the flag set > vs. > 1654952 bytes with the flag unset I don't believe this. GENERIC is enormously bloated, so it has size more like 16MB than 1.6MB. Even a savings of 4K instead of 456 bytes is hard to believe. I get a savings of 9K (text) in a 5MB kernel. Changing the default target arch from i386 to pentium-undocumented has reduced the text space savings a little, since the default for passing args is now to preallocate stack space for them and store to this, instead of to push them; this preallocation results in more functions needing to allocate some stack space explicitly, and when some is allocated explicitly, the text space cost for this doesn't depend on the size of the allocation. Anyway, the savings are mostly from from avoiding cache misses from sparse allocation on stacks. Also, FreeBSD-i386 hasn't been programmed to support aligned stacks: - KSTACK_PAGES on i386 is 2, while on amd64 it is 4. Using more stack might push something over the edge - not much care is taken to align the initial stack or to keep the stack aligned in calls from asm code. E.g., any alignment for mi_startup() (and thus proc0?) is accidental. This may result in perfect alignment or perfect misalignment. Hopefully, more care is taken with thread startup. For gcc, the alignment is done bogusly in main() in userland, but there is no main() in the kernel. The alignment doesn't matter much (provided the perfect misalignment is still to a multiple of 4), but when it matters, the random misalignment that results from not trying to do it at all is better than perfect misalignment from getting it wrong. With 4-byte alignment, the only cases that it helps are with 64-bit variables. > the gcc(1) man page states the following: > > " > This extra alignment does consume extra stack space, and generally > increases code size. Code that is sensitive to stack space usage, > such as embedded systems and operating system kernels, may want to > reduce the preferred alignment to -mpreferred-stack-boundary=2. > " > > the comment in sys/conf/kern.mk however sorta suggests that the default > alignment of 4 bytes might improve performance. The default stack alignment is 16 bytes, which unimproves performance. clang handles stack alignment correctly (only does it when it is needed) so it doesn't need a -mpreferred-stack-boundary option and doesn't always break without alignment in main(). Well, at least it used to, IIRC. Testing it now shows that it does the necessary andl of the stack pointer for __aligned(32), but for __aligned(16) it now assumes that the stack is aligned by the caller. So it now needs -mpreferred-stack-boundary=2, but doesn't have it. OTOH, clang doesn't do the andl in main() like gcc does (unless you put a dummy __aligned(32) there), but requires crt to pass an aligned stack. Bruce