From owner-freebsd-arch@FreeBSD.ORG Thu Dec 27 05:32:47 2012 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 5D769DB8 for ; Thu, 27 Dec 2012 05:32:47 +0000 (UTC) (envelope-from peter@wemm.org) Received: from mail-vc0-f172.google.com (mail-vc0-f172.google.com [209.85.220.172]) by mx1.freebsd.org (Postfix) with ESMTP id 02B0B8FC12 for ; Thu, 27 Dec 2012 05:32:46 +0000 (UTC) Received: by mail-vc0-f172.google.com with SMTP id fw7so9466280vcb.17 for ; Wed, 26 Dec 2012 21:32:46 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=wemm.org; s=google; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=Kwk7KiguamN48NhQdvO3KPUCKMMCYoldMI7rKhP2cHM=; b=qXZ9d/7WZmaeHn0oVCrEuHqHlsIMobQfHwVZonkLDWsHkKwa8mbL+1pXmxK43GGUCY JpkIG7R/NRAMkmOkcKVL1HTrFhALQGvh4KpZb7y9Z8wjB7oI4UsCr8yZVmjNDly1aMgC lrm/v/QPsa4yvEuSeo4WmxGqiWteigBzGf9IU= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:x-gm-message-state; bh=Kwk7KiguamN48NhQdvO3KPUCKMMCYoldMI7rKhP2cHM=; b=JunDumg+Dt+yvKGVu6xBcj69mhHyP6hyjDtJ3hh3L+eiEOOpJbDMOttMsnvH9AUrCh EIeDhZCypqifYKEoUJ+8P9EfdMMmI3zvw7i1d+bKAPkprjpJCys505SU8r/wcBeC/naq 5ZqyA74Cv6y0zW6Hru/XMIqmGOVKKcMNatFEao7V9/fVfDuikrBp+7c/Dg1QrWrNFVdK IzLnPg2diuIqJBJ1aQsk5TCxW3roEJ4RVLGuFweGYT+0zdN/KJ5C61KOXIOSAKvTEsb7 a3keygN2XBOp90riStSl0Dt09gYXgK1Dz3j2UqdBi1kp0z0hlFGqvQpooB5v7FHfxOhv fo8A== MIME-Version: 1.0 Received: by 10.52.69.201 with SMTP id g9mr38718166vdu.98.1356586366021; Wed, 26 Dec 2012 21:32:46 -0800 (PST) Received: by 10.220.205.6 with HTTP; Wed, 26 Dec 2012 21:32:45 -0800 (PST) In-Reply-To: <50DBD193.7080505@mu.org> References: <50D49DFF.3060803@ixsystems.com> <50DBC7E2.1070505@mu.org> <50DBD193.7080505@mu.org> Date: Wed, 26 Dec 2012 21:32:45 -0800 Message-ID: Subject: Re: UPDATE Re: making use of userland dtrace on FreeBSD From: Peter Wemm To: Alfred Perlstein Content-Type: text/plain; charset=ISO-8859-1 X-Gm-Message-State: ALoCoQl0VuKfrYS2D88Z5ZN/z0xch/cGbCG70yfOpbHWjtS8PZSkKz1lWEqJD6+8fLUqK0GYd8ZO Cc: "arch@freebsd.org" , Adrian Chadd , Rui Paulo , Alfred Perlstein X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 27 Dec 2012 05:32:47 -0000 On Wed, Dec 26, 2012 at 8:41 PM, Alfred Perlstein wrote: > On 12/26/12 8:21 PM, Peter Wemm wrote: >> >> On Wed, Dec 26, 2012 at 8:00 PM, Alfred Perlstein wrote: >> >>> What would be the drawbacks? I don't want to hurt freebsd for heavy >>> performance, but I think this functionality should work out of the box >>> for >>> most people. >> >> The drawbacks are mostly performance related. It defeats a certain >> hardware optimizations for call/return on leaf functions. It'll >> mostly affect things like math, crypto, compression and multimedia >> libraries (that's ffmpeg, bzip2/gzip/libarchive, openssl, etc) but, we >> generally don't seem to care about that sort of performance anyway, so >> what's one more loss? > > > Can you clarify some? If it was somewhat easy to re-add > -fomit-frame-pointer to critical libraries like this, then that would be OK? No, you can't add MD flags like this. The way to do it is see things like PIC, WARNS, etc where you can do overrides of defaults on a directory basis, and respect the system-wide user overrides. Remember, -fno-omit-frame-pointer is the default on i386 (except at high -O levels with gcc, I dont know where clang, the default compiler, draws the line). Other platforms don't even have frame pointers. You can't just scatter that switch around the place. > To be honest, I'm not sure if you're serious about "generally don't seem to > care" or just feel defeated on the issue and we should care. We took quite a performance beating because of not using the tuned-by-perl assembler code in openssl on amd64, for example. This flows through to benchmarks on things like apache throughput with mod_ssl. Or throughput on stunnel(1). My drive-by comment about not seeming to care any more is that people (except for Bruce) generally don't actually measure the performance impact of their changes any more. The last time this was widespread was when Kris Kennaway used to be constantly abusing machines and reporting the effects as measured by ministat(1). If somebody were to say "this change makes world take 15% longer to compile but makes no meaningful affect on things like bzip2, openssl throughput etc" and posted the actual ministat output to back it up then there wouldn't even be a question on performance at all. It'd only be "is 15% more build time worth ubiquitous dtrace?" And thats a far easier thing to answer. A hand-wave leads to bikesheds. Actual numbers are bikeshed repellant. I myself have killed patches that turned out to be premature optimizations because it actually didn't make any difference. For example, I never committed the lazy tlb shootdown to AMD64 because it made things slower on the hardware of the day - opteron silicon had *hardware* address space tags on their TLB and the lazy shootdown code just added more synchronization work that just added overhead.. eg: buildworld was around 2% slower with the patches. Another example was the mtxpool code that caused cache line thrashing. If we cared about performance that would never have gone in. Sure, it compiled and worked, but the costs weren't quantified till much later and we realized how much trouble they were beyond a certain usage level. What's 2%? It multiplies out.. 2% here, 1% there.. 3% over there, 0.5% somewhere else.. before you know it, there's a pretty big overall hit. >> Of course it wouldn't be required with dwarf unwinding awareness, but >> we don't have that. >> >> We have -fno-omit-frame-pointer on the amd64 kernel whenever debugging >> is compiled in because there's no unwinder for doing stack traces. We >> need a dwarf2+ unwinder and somebody to instrument the call frame >> state through the remaining assembler code. >> > How much work is that exactly? I've only been a gdb user, not a hacker. gdb has a stack unwinder. kdb/ddb/stack(9) do not. There's well established GPL code to do it, as well as libunwind and variants. Basically what this code has to do is run the dwarf2+ state machine to find all the call/return frames instead of assuming the compiler did it. Heck, even glibc has a dwarf2 unwinder built into it as part of their exception processing system. I'm not entirely sure what more work src/lib/libelf and src/lib/libdwarf need. It looks like its got just enough implemented to support the ctfconvert etc and doesn't have an unwinder in it. -- Peter Wemm - peter@wemm.org; peter@FreeBSD.org; peter@yahoo-inc.com; KI6FJV "All of this is for nothing if we don't go to the stars" - JMS/B5 "If Java had true garbage collection, most programs would delete themselves upon execution." -- Robert Sewell bitcoin:188ZjyYLFJiEheQZw4UtU27e2FMLmuRBUE