From owner-freebsd-toolchain@FreeBSD.ORG Sun Sep 15 09:33:18 2013 Return-Path: Delivered-To: freebsd-toolchain@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id 0CF32F27 for ; Sun, 15 Sep 2013 09:33:18 +0000 (UTC) (envelope-from kamikaze@bsdforen.de) Received: from mail.server1.bsdforen.de (bsdforen.de [82.193.243.81]) by mx1.freebsd.org (Postfix) with ESMTP id C553927D1 for ; Sun, 15 Sep 2013 09:33:17 +0000 (UTC) Received: from mobileKamikaze.norad (HSI-KBW-134-3-231-194.hsi14.kabel-badenwuerttemberg.de [134.3.231.194]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (No client certificate requested) by mail.server1.bsdforen.de (Postfix) with ESMTPSA id 1AEB8861C5 for ; Sun, 15 Sep 2013 11:26:33 +0200 (CEST) Message-ID: <52357D49.5020907@bsdforen.de> Date: Sun, 15 Sep 2013 11:26:33 +0200 From: Dominic Fandrey MIME-Version: 1.0 To: freebsd-toolchain@freebsd.org Subject: Profiling with clang Content-Type: text/plain; charset=ascii Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-toolchain@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Maintenance of FreeBSD's integrated toolchain List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 15 Sep 2013 09:33:18 -0000 Recently I've been using profiling with c++ -pg a lot. I'm developing a simulation and have been able to more than double the performance, just by focusing my attention on the top functions listed in the profile. Inlining them, optimising them or finding ways to call them less often. Even though I use clang as my compiler, for profiling I have to refer to the old gcc42. Is there any work on making profiling work with clang? On a side node, clang and gcc47 from ports produce equally fast binaries (there is literally no difference outside of the error margin). For both clang and gcc47 -O3 binaries are not faster than -O2 binaries. They used to be slower, when the code was less polished. -- A: Because it fouls the order in which people normally read text. Q: Why is top-posting such a bad thing? A: Top-posting. Q: What is the most annoying thing on usenet and in e-mail? From owner-freebsd-toolchain@FreeBSD.ORG Sun Sep 15 09:54:17 2013 Return-Path: Delivered-To: freebsd-toolchain@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id 0056126D for ; Sun, 15 Sep 2013 09:54:16 +0000 (UTC) (envelope-from rdivacky@vlakno.cz) Received: from vlakno.cz (mail.vlakno.cz [178.238.39.38]) by mx1.freebsd.org (Postfix) with ESMTP id B875628B7 for ; Sun, 15 Sep 2013 09:54:16 +0000 (UTC) Received: by vlakno.cz (Postfix, from userid 1002) id 2BFB31CC55D1; Sun, 15 Sep 2013 11:44:34 +0200 (CEST) Date: Sun, 15 Sep 2013 11:44:34 +0200 From: Roman Divacky To: Dominic Fandrey Subject: Re: Profiling with clang Message-ID: <20130915094434.GA15535@freebsd.org> References: <52357D49.5020907@bsdforen.de> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <52357D49.5020907@bsdforen.de> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-toolchain@freebsd.org X-BeenThere: freebsd-toolchain@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Maintenance of FreeBSD's integrated toolchain List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 15 Sep 2013 09:54:17 -0000 clang -pg should work just fine... what problems are you seeing? On Sun, Sep 15, 2013 at 11:26:33AM +0200, Dominic Fandrey wrote: > Recently I've been using profiling with c++ -pg a lot. I'm developing > a simulation and have been able to more than double the performance, > just by focusing my attention on the top functions listed in the > profile. Inlining them, optimising them or finding ways to call them > less often. > > Even though I use clang as my compiler, for profiling I have to refer > to the old gcc42. Is there any work on making profiling work with > clang? > > > On a side node, clang and gcc47 from ports produce equally fast > binaries (there is literally no difference outside of the error margin). > > For both clang and gcc47 -O3 binaries are not faster than -O2 binaries. > They used to be slower, when the code was less polished. > > -- > A: Because it fouls the order in which people normally read text. > Q: Why is top-posting such a bad thing? > A: Top-posting. > Q: What is the most annoying thing on usenet and in e-mail? > _______________________________________________ > freebsd-toolchain@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-toolchain > To unsubscribe, send any mail to "freebsd-toolchain-unsubscribe@freebsd.org" From owner-freebsd-toolchain@FreeBSD.ORG Sun Sep 15 10:27:13 2013 Return-Path: Delivered-To: freebsd-toolchain@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id 8C66D3EA; Sun, 15 Sep 2013 10:27:13 +0000 (UTC) (envelope-from kamikaze@bsdforen.de) Received: from mail.server1.bsdforen.de (bsdforen.de [82.193.243.81]) by mx1.freebsd.org (Postfix) with ESMTP id 4E36B29A6; Sun, 15 Sep 2013 10:27:12 +0000 (UTC) Received: from mobileKamikaze.norad (HSI-KBW-134-3-231-194.hsi14.kabel-badenwuerttemberg.de [134.3.231.194]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (No client certificate requested) by mail.server1.bsdforen.de (Postfix) with ESMTPSA id 5C87F85FBC; Sun, 15 Sep 2013 12:27:08 +0200 (CEST) Message-ID: <52358B7B.3060703@bsdforen.de> Date: Sun, 15 Sep 2013 12:27:07 +0200 From: Dominic Fandrey MIME-Version: 1.0 To: Roman Divacky Subject: Re: Profiling with clang References: <52357D49.5020907@bsdforen.de> <20130915094434.GA15535@freebsd.org> In-Reply-To: <20130915094434.GA15535@freebsd.org> Content-Type: text/plain; charset=ascii Content-Transfer-Encoding: 7bit Cc: freebsd-toolchain@freebsd.org X-BeenThere: freebsd-toolchain@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Maintenance of FreeBSD's integrated toolchain List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 15 Sep 2013 10:27:13 -0000 On 15/09/2013 11:44, Roman Divacky wrote: > clang -pg should work just fine... what problems are you seeing? Oh, you're right! I could have sworn there used to be linker problems! Sorry for the noise. -- A: Because it fouls the order in which people normally read text. Q: Why is top-posting such a bad thing? A: Top-posting. Q: What is the most annoying thing on usenet and in e-mail? From owner-freebsd-toolchain@FreeBSD.ORG Wed Sep 18 21:13:46 2013 Return-Path: Delivered-To: toolchain@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id 0FFCE8B2; Wed, 18 Sep 2013 21:13:46 +0000 (UTC) (envelope-from dim@FreeBSD.org) Received: from tensor.andric.com (tensor.andric.com [87.251.56.140]) (using TLSv1 with cipher ADH-CAMELLIA256-SHA (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 8784C26A3; Wed, 18 Sep 2013 21:13:45 +0000 (UTC) Received: from [IPv6:2001:7b8:3a7::7d44:84ff:af35:2619] (unknown [IPv6:2001:7b8:3a7:0:7d44:84ff:af35:2619]) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (No client certificate requested) by tensor.andric.com (Postfix) with ESMTPSA id 3A9C15C44; Wed, 18 Sep 2013 23:13:41 +0200 (CEST) Content-Type: multipart/signed; boundary="Apple-Mail=_389096EB-0326-4041-A3A8-19707FF05C83"; protocol="application/pgp-signature"; micalg=pgp-sha1 Mime-Version: 1.0 (Mac OS X Mail 6.6 \(1510\)) Subject: Re: i386 clang optimisation problem with stack alignment From: Dimitry Andric In-Reply-To: <20130910183456.175162f7@kalimero.tijl.coosemans.org> Date: Wed, 18 Sep 2013 23:13:26 +0200 Message-Id: <9893CCE3-C7EF-4B52-B32E-8F1A0CE022C8@FreeBSD.org> References: <20130910181601.2e89af87@kalimero.tijl.coosemans.org> <20130910183456.175162f7@kalimero.tijl.coosemans.org> To: Tijl Coosemans X-Mailer: Apple Mail (2.1510) Cc: toolchain@FreeBSD.org X-BeenThere: freebsd-toolchain@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Maintenance of FreeBSD's integrated toolchain List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 18 Sep 2013 21:13:46 -0000 --Apple-Mail=_389096EB-0326-4041-A3A8-19707FF05C83 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=us-ascii On Sep 10, 2013, at 18:34, Tijl Coosemans wrote: > On Tue, 10 Sep 2013 18:16:01 +0200 Tijl Coosemans wrote: >> I've attached a small test program extracted from = multimedia/gstreamer-ffmpeg >> (libavcodec/h264_cabac.c:ff_h264_init_cabac_states(H264Context *h)). >>=20 >> When you compile and run it like this on FreeBSD/i386, it results in = a >> SIGBUS: >>=20 >> % cc -o paddd paddd.c -O3 -msse2 -fPIE -fomit-frame-pointer=20 >> % ./paddd >> Bus error >>=20 >> The reason is this instruction where %esp isn't 16-byte aligned: >> paddd (%esp), %xmm7 Hmm, as far as I can see, the problem is related to position independent = code, in combination with omitting the frame pointer: $ cc -o paddd paddd.c -O3 -msse2 -fomit-frame-pointer $ ./paddd $=20 $ cc -o paddd paddd.c -O3 -msse2 -fPIE -fomit-frame-pointer $ ./paddd Bus error (core dumped) $=20 $ cc -o paddd paddd.c -O3 -msse2 -fPIE -fno-omit-frame-pointer $ ./paddd $=20 >> Is this an upstream bug or is this because of local changes (to make = the >> stack 4 byte aligned by default or something)? The 4 byte alignment on i386 changes are from upstream, but we initiated = them after a bit of discussion (see = http://llvm.org/viewvc/llvm-project?view=3Drevision&revision=3D167632 ). Note the problem only occurs at -O3, which enables the vectorizer, so = there might an issue with it in combination with position independent = code generation and omitting frame pointers. If you check what clang = passes to its cc1 stage with your original command line, it gives: "/usr/bin/cc" -cc1 -triple i386-unknown-freebsd10.0 -emit-obj = -disable-free -main-file-name paddd.c -mrelocation-model pic -pic-level = 2 -pie-level 2 -masm-verbose -mconstructor-aliases -target-cpu i486 = -target-feature +sse2 -v -resource-dir /usr/bin/../lib/clang/3.3 -O3 = -fdebug-compilation-dir /home/dim/bugs/paddd -ferror-limit 19 = -fmessage-length 130 -mstackrealign -fobjc-runtime=3Dgnustep = -fobjc-default-synthesize-properties -fdiagnostics-show-option = -fcolor-diagnostics -backend-option -vectorize-loops -o = /tmp/paddd-zdRbKM.o -x c paddd.c So it does pass -mstackrealign, but for some reason it isn't always = effective. For the -fPIE -fomit-frame-pointer case, the prolog for = init_states() becomes : init_states: # @init_states # BB#0: # %vector.ph pushl %ebp pushl %ebx pushl %edi pushl %esi subl $28, %esp calll .L0$pb .L0$pb: popl %edx If you remove -fPIE, the data is directly accessed via its (properly 16 = byte aligned) symbol, so there is no alignment problem: paddd .LCPI0_0, %xmm7 but the stack is not realigned in the prolog either: init_states: # @init_states # BB#0: # %vector.ph pushl %ebx pushl %edi pushl %esi movd 16(%esp), %xmm0 ... Then, if you use -fPIE, but add -fno-omit-frame-pointer: init_states: # @init_states # BB#0: # %vector.ph pushl %ebp movl %esp, %ebp pushl %ebx pushl %edi pushl %esi andl $-16, %esp subl $48, %esp calll .L0$pb .L0$pb: popl %edx .Ltmp0: E.g., here the stack is properly realigned, and the function works fine. In any case: yes, I think this is a bug, and we should report it = upstream. This is a very nice test case to do so. -Dimitry --Apple-Mail=_389096EB-0326-4041-A3A8-19707FF05C83 Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename=signature.asc Content-Type: application/pgp-signature; name=signature.asc Content-Description: Message signed with OpenPGP using GPGMail -----BEGIN PGP SIGNATURE----- Version: GnuPG/MacGPG2 v2.0.20 (Darwin) iEYEARECAAYFAlI6F3wACgkQsF6jCi4glqNiNQCg7WqJ652yHWmKp+kJyXN/gSVo dywAoOa/SDonIUIXm9fhTTbOOwLbalss =Y5MY -----END PGP SIGNATURE----- --Apple-Mail=_389096EB-0326-4041-A3A8-19707FF05C83-- From owner-freebsd-toolchain@FreeBSD.ORG Thu Sep 19 17:56:29 2013 Return-Path: Delivered-To: toolchain@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id EEC8142E; Thu, 19 Sep 2013 17:56:29 +0000 (UTC) (envelope-from tijl@freebsd.org) Received: from mailrelay008.isp.belgacom.be (mailrelay008.isp.belgacom.be [195.238.6.174]) by mx1.freebsd.org (Postfix) with ESMTP id 5D9D32DC5; Thu, 19 Sep 2013 17:56:29 +0000 (UTC) X-Belgacom-Dynamic: yes X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AmMGANA5O1JR8n3K/2dsb2JhbABbgwc4RsE4gSEXdIIlAQEFViMQCw4KCSUPKh4GE4gHCLopj2cHhB4DkCaHVYEwkEaDJjo Received: from 202.125-242-81.adsl-dyn.isp.belgacom.be (HELO kalimero.tijl.coosemans.org) ([81.242.125.202]) by relay.skynet.be with ESMTP; 19 Sep 2013 19:56:21 +0200 Received: from kalimero.tijl.coosemans.org (kalimero.tijl.coosemans.org [127.0.0.1]) by kalimero.tijl.coosemans.org (8.14.7/8.14.7) with ESMTP id r8JHuKHs004816; Thu, 19 Sep 2013 19:56:20 +0200 (CEST) (envelope-from tijl@FreeBSD.org) Date: Thu, 19 Sep 2013 19:56:15 +0200 From: Tijl Coosemans To: Dimitry Andric Subject: Re: i386 clang optimisation problem with stack alignment Message-ID: <20130919195615.5040b4cb@kalimero.tijl.coosemans.org> In-Reply-To: <9893CCE3-C7EF-4B52-B32E-8F1A0CE022C8@FreeBSD.org> References: <20130910181601.2e89af87@kalimero.tijl.coosemans.org> <20130910183456.175162f7@kalimero.tijl.coosemans.org> <9893CCE3-C7EF-4B52-B32E-8F1A0CE022C8@FreeBSD.org> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=PGP-SHA256; boundary="Sig_/ApJEKIl8CXe_0OC19pEPjg2"; protocol="application/pgp-signature" Cc: toolchain@FreeBSD.org X-BeenThere: freebsd-toolchain@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Maintenance of FreeBSD's integrated toolchain List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 19 Sep 2013 17:56:30 -0000 --Sig_/ApJEKIl8CXe_0OC19pEPjg2 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: quoted-printable On Wed, 18 Sep 2013 23:13:26 +0200 Dimitry Andric wrote: > On Sep 10, 2013, at 18:34, Tijl Coosemans wrote: >> On Tue, 10 Sep 2013 18:16:01 +0200 Tijl Coosemans wrote: >>> I've attached a small test program extracted from multimedia/gstreamer-= ffmpeg >>> (libavcodec/h264_cabac.c:ff_h264_init_cabac_states(H264Context *h)). >>>=20 >>> When you compile and run it like this on FreeBSD/i386, it results in a >>> SIGBUS: >>>=20 >>> % cc -o paddd paddd.c -O3 -msse2 -fPIE -fomit-frame-pointer=20 >>> % ./paddd >>> Bus error >>>=20 >>> The reason is this instruction where %esp isn't 16-byte aligned: >>> paddd (%esp), %xmm7 >=20 > Hmm, as far as I can see, the problem is related to position independent > code, in combination with omitting the frame pointer: >=20 > $ cc -o paddd paddd.c -O3 -msse2 -fomit-frame-pointer > $ ./paddd > $=20 >=20 > $ cc -o paddd paddd.c -O3 -msse2 -fPIE -fomit-frame-pointer > $ ./paddd > Bus error (core dumped) > $=20 >=20 > $ cc -o paddd paddd.c -O3 -msse2 -fPIE -fno-omit-frame-pointer > $ ./paddd > $=20 Omitting -fPIE frees up a register and that changes the generated code too much to trigger the bug so I'm not sure it has anything to do with it. -fomit-frame-pointer may be part of the problem though. Without a frame pointer that holds the old value of %esp, the stack cannot be realigned because the old value cannot be restored then. It seems clang/LLVM knows this at least partly because with -fomit-frame-pointer it doesn't realign stack and uses movdqu to store a value at (%esp) (instead of movdqa in the -fno-omit-frame-pointer case). Either clang/LLVM shouldn't use instructions like paddd in this case or it should override -fomit-frame-pointer and use a frame pointer whenever the stack needs realigning. I added a comment to http://llvm.org/bugs/show_bug.cgi?id=3D12250 which seems like the same bug (but on Solaris). --Sig_/ApJEKIl8CXe_0OC19pEPjg2 Content-Type: application/pgp-signature; name=signature.asc Content-Disposition: attachment; filename=signature.asc -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.21 (FreeBSD) iF4EAREIAAYFAlI7OsQACgkQfoCS2CCgtivdNQD/dTMP9mbydFH/nyziZ44nOzvk G4HPKnw7Buh6gnYieAQA/RO5CY1fub5Ivrj8A8lrGNIQ2Lh/dollSCvMbARI59sp =9A9p -----END PGP SIGNATURE----- --Sig_/ApJEKIl8CXe_0OC19pEPjg2--