Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 24 Jan 2011 17:46:51 +0100
From:      Roman Divacky <rdivacky@freebsd.org>
To:        Hans Ottevanger <hansot@iae.nl>
Cc:        freebsd-toolchain@freebsd.org
Subject:   Re: How to build an executable with profiling?
Message-ID:  <20110124164651.GA8672@freebsd.org>
In-Reply-To: <4D3C461C.6000701@iae.nl>
References:  <20110117184411.GA54556@troutmask.apl.washington.edu> <20110118143205.GA34216@freebsd.org> <20110118160252.GA6506@troutmask.apl.washington.edu> <20110120185449.GA92860@freebsd.org> <4D39B75D.6010407@iae.nl> <20110121192751.GA94113@freebsd.org> <4D3C461C.6000701@iae.nl>

next in thread | previous in thread | raw e-mail | index | archive | help
On Sun, Jan 23, 2011 at 04:15:40PM +0100, Hans Ottevanger wrote:
> On 01/21/11 20:27, Roman Divacky wrote:
> >>>This patch does three things:
> >>>
> >>>1) emits "call .mcount" at the begining of every function body
> >>>
> >>
> >>The differences on i386 between profiled and non-profiled code are not
> >>as obvious as with gcc (using diff on assembly output), but on first
> >>inspection it looks correct.
> >
> >cool :)
> >
> >>>2) changes the driver to link in gcrt1.o instead of crt1.o
> >>>
> >>>3) changes all -lfoo to -lfoo_p except when the foo ends with _s in
> >>>    the linker invocation
> >>>
> >>
> >>Maybe it is wise to follow the gcc implementation here.
> >
> >ok, makes sense
> >
> >>>I am not sure that I did the right thing, especially in (3). Anyway,
> >>>the patch works for me (ie. produces a.out.gmon that seems to contain
> >>>meaningful data).
> >>>
> >>>I would appreciate if you guys could test and review this. Letting me
> >>>know if this is correct.
> >>>
> >>
> >>On both my systems (i386 and amd64) something goes severely wrong when
> >>linking several objects (all compiled with -pg, this is amd64):
> >>
> >>Perhaps the invocation of the linker still needs some work (or I must
> >>redo my installation) but anyhow it looks like a good job. Thanks!
> >
> >I rewrote the libraries rewriting part to match gcc as close as possible.
> >I also think that I solved your ld problem..
> >
> >
> >please revert the old patch and test the new one:
> >
> >         http://lev.vlakno.cz/~rdivacky/clang-gprof.patch
> >
> >I believe this one is ok (works for me just fine), please test and report
> >back so I can start integrating this upstream.
> >
> 
> I performed a few quick tests on both i386 and amd64.
> 
> The problems I had with the invocation of ld appear to be solved. The 
> behavior with respect to libraries is now identical to gcc as far I can see.
> 
> The results from gprof also look very promising. For my test program on 
> amd64 the gprof output when using clang is
> 
>   %   cumulative   self              self     total
>  time   seconds   seconds    calls  ms/call  ms/call  name
>  42.5       4.22     4.22        0  100.00%           _mcount [5]
>  22.0       6.41     2.18 14700000     0.00     0.00  f_timint [6]
>  12.4       7.64     1.23 21900000     0.00     0.00  exp [10]
>   8.4       8.48     0.84 22000000     0.00     0.00  vmol [9]
>   5.4       9.02     0.54  6300000     0.00     0.00  f_angle [11]
>   3.8       9.40     0.38        0  100.00%           .mcount (52)
>   1.9       9.59     0.19  1000000     0.00     0.01  qk21 [4]
>   1.9       9.78     0.19  1000000     0.00     0.00  pow [12]
>   0.4       9.82     0.04   200000     0.00     0.03  qags [3]
>   0.4       9.86     0.04   100000     0.00     0.00  zero [14]
>   0.3       9.89     0.03   100000     0.00     0.00  qext [16]
>   0.2       9.91     0.02   800000     0.00     0.00  f_apsis [15]
>   0.1       9.91     0.01  2500000     0.00     0.00  fmax [17]
>   0.1       9.92     0.01   100000     0.00     0.00  apsis [13]
>   0.0       9.92     0.00  1000000     0.00     0.00  fmin [18]
>   0.0       9.93     0.00   100000     0.00     0.03  timint [7]
>   0.0       9.93     0.00   700000     0.00     0.00  tol_apsis [19]
>   0.0       9.94     0.00   200000     0.00     0.00  sort [20]
>   0.0       9.94     0.00        1     1.85  5334.52  main [1]
>   0.0       9.94     0.00   100000     0.00     0.03  angle [8]
> ...
> 
> while using gcc yields
> 
>   %   cumulative   self              self     total
>  time   seconds   seconds    calls  ms/call  ms/call  name
>  44.3       4.23     4.23        0  100.00%           _mcount [5]
>  18.5       6.00     1.76 14700000     0.00     0.00  f_timint [6]
>  13.5       7.28     1.28 21900000     0.00     0.00  exp [10]
>   9.0       8.14     0.86 22000000     0.00     0.00  vmol [9]
>   5.5       8.66     0.52  6300000     0.00     0.00  f_angle [11]
>   4.0       9.04     0.38        0  100.00%           .mcount (52)
>   2.0       9.24     0.19  1000000     0.00     0.00  pow [12]
>   2.0       9.43     0.19  1000000     0.00     0.00  qk21 [4]
>   0.3       9.45     0.03   100000     0.00     0.00  zero [14]
>   0.3       9.48     0.03   200000     0.00     0.02  qags [3]
>   0.2       9.50     0.02   100000     0.00     0.00  qext [16]
>   0.2       9.52     0.02   800000     0.00     0.00  f_apsis [15]
>   0.1       9.53     0.00  2500000     0.00     0.00  fmax [17]
>   0.0       9.53     0.00   700000     0.00     0.00  tol_apsis [18]
>   0.0       9.53     0.00   200000     0.00     0.00  sort [19]
>   0.0       9.54     0.00   100000     0.00     0.00  apsis [13]
>   0.0       9.54     0.00        1     2.21  4927.66  main [1]
>   0.0       9.54     0.00  1000000     0.00     0.00  fmin [20]
>   0.0       9.54     0.00   100000     0.00     0.02  timint [7]
>   0.0       9.54     0.00   100000     0.00     0.02  angle [8]
> ...
> 
> To me this looks quite similar 8-)
 
awesome! :)

> I also tested the interaction of -pg with other options and there I 
> found an issue with -fomit-frame-pointer. Here gcc bails out, as it 
> probably should:
> 
> gcc -pg -O2 -Wall -fomit-frame-pointer -c test.c
> gcc: -pg and -fomit-frame-pointer are incompatible
> 
> while clang continues and silently generates an executable that 
> immediately terminates with a segmentation violation when started.
 
will fix today

> Another minor, unrelated issue I found is that this version of clang on 
> i386 generates ssse2 instruction by default, while gcc and clang in 
> -CURRENT generate the "classical" i387 instructions.

we default to i486 in -CURRENT while upstream defaults to pentium4, so
this is expected.

thank you for your great testing and help! I am gonna push it upstream
now so we'll get it with next clang/llvm update in -current

roman



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20110124164651.GA8672>