Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 24 May 2003 19:52:06 +0200
From:      Shill <devnull@example.com>
To:        freebsd-questions@freebsd.org
Subject:   ELF .data section variables and RWX bits
Message-ID:  <3ECFB146.4000700@example.com>

next in thread | raw e-mail | index | archive | help
I wrote a small program stub to measure execution time on an Athlon 4:

_start:
  rdtsc
  mov ebp, eax
  xor eax, eax
  cpuid
  ; BEGIN TIMED CODE

  ; END TIMED CODE
  xor eax, eax
  cpuid
  rdtsc
  sub ebp, eax
  xor eax, eax
  cpuid
  neg ebp

Note: CPUID is used only as a serializing instruction.

Let n be the number of cycles required to execute the code
between the two RDTSC instructions. At the end of the stub,
ebp is equal to n modulo 2^32.

The stub alone (consistently) requires 104 cycles to execute.

So far, so good.

I wanted to time the latency of a store, so I declared a single
variable within the .data section:

SECTION .data

X: dd 0x12345678

I timed three different programs:
P1) mov ebx, [X]		; load i.e. read only
P2) mov dword [X], 0xaabbccdd	; store i.e. write only
P3) add dword [X], byte 0x4C	; load/execute/store i.e. read+write

P1 requires 170 cycles.
P2 requires 12000 cycles on average (MIN=10000 and MAX=46000)
P3 requires 22500 cycles on average (MIN=14500 and MAX=72000)

NASM gives the ELF .data section the following properties:
  progbits (i.e. explicit contents, unlike the .bss section)
  alloc (i.e. load into memory when the program is run)
  noexec (i.e. clear the allow execute bit)
  write (i.e. set the allow write bit)
  align=4 (i.e. start the section at a multiple of 4)

A cache miss might explain why P1 requires 170 cycles but it does not
explain P2 or P3, as far as I know.

My guess is that the first time X is written to, an exception occurs
(perhaps a TLB miss) and the operating system (FreeBSD in my case) is
required to update something, somewhere.

Could it be that FreeBSD does not set the write bit for the page where X
is stored until X is *actually* written to? But then why would P3 take
much longer than P2?

As you can see, I am very confused. I would be very grateful if an ELF
and/or ld guru could step in and share his insight.

Shill




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3ECFB146.4000700>