Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 23 Nov 2015 13:35:11 +0200
From:      Konstantin Belousov <kostikbel@gmail.com>
To:        Mark Johnston <markj@FreeBSD.org>
Cc:        freebsd-arch@FreeBSD.org
Subject:   Re: zero-cost SDT probes
Message-ID:  <20151123113511.GX58629@kib.kiev.ua>
In-Reply-To: <20151122024542.GA44664@wkstn-mjohnston.west.isilon.com>
References:  <20151122024542.GA44664@wkstn-mjohnston.west.isilon.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Sat, Nov 21, 2015 at 06:45:42PM -0800, Mark Johnston wrote:
> Hi,
> 
> For the past while I've been experimenting with various ways to
> implement "zero-cost" SDT DTrace probes. Basically, at the moment an SDT
> probe site expands to this:
> 
> if (func_ptr != NULL)
> 	func_ptr(<probe args>);
> 
> When the probe is enabled, func_ptr is set to dtrace_probe(); otherwise
> it's NULL. With zero-cost probes, the SDT_PROBE macros expand to
> 
> func(<probe args>);
> 
> When the kernel is running, each probe site has been overwritten with
> NOPs. When a probe is enabled, one of the NOPs is overwritten with a
> breakpoint, and the handler uses the PC to figure out which probe fired.
> This approach has the benefit of incurring less overhead when the probe
> is not enabled; it's more complicated to implement though, which is why
> this hasn't already been done.
> 
> I have a working implementation of this for amd64 and i386[1]. Before
> adding support for the other arches, I'd like to get some idea as to
> whether the approach described below is sound and acceptable.
> 
> The main difficulty is in figuring out where the probe sites actually
> are once the kernel is running. In my patch, a probe site is a call to
> an externally-defined function which is defined in an
> automatically-generated C file. At link time, we first perform a partial
> link of all the kernel's object files. Then, a script uses the relocations
> against the still-undefined probe functions to generate
> 1) stub functions for the probes, so that the kernel can actually be
>    linked, and
> 2) a linker set containing the offsets of each probe site relative to
>    the beginning of the text section.
> The result is linked with the partially-linked kernel to generate the
> final kernel file.
> 
> During boot, we iterate over the linker set, using the offsets plus the
> address of btext to overwrite probe sites with NOPs. SDT probes in kernel
> modules are handled differently (and more simply): the kernel linker just
> has special handling for relocations against symbols named __dtrace_sdt_*;
> this is how illumos/Solaris implements all of this.
> 
> My uncertainty revolves around the use of relocations in the
> partially-linked kernel to determine the address of probe sites in the
> running kernel. With the GNU ld in base, this happens to work because
> the final link doesn't modify the text section. Is this something I can
> rely upon? Will this assumption be false with the advent of lld and LTO?
> Are there other, cleaner ways to implement what I described above?

You could consider using a cheap instruction which is conditionally
converted into the trap, instead. E.g., you could have global page frame
in KVA allocated, and for the normal operations, keep the page mapped
with backing by a scratch page. The probe would be a volatile read from
the page.

When probes are activated, the page is unmapped, which converts the read
into the page fault. This is similar to the write barriers implemented
in some garbare collectors.

There are two issues with this scheme:
- The cost of probe is relatively large, even if the low level trap
handler is further modified to recognize the probes by special
address access.
- The arguments passed to the probes should be put into some predefined
place, e.g. somwhere in the *curthread, since trap handler cannot fetch
them using the ABI conventions.

As I mentioned above, this scheme is used by several implementations of
the language runtimes, but there gc pauses are rare, and slightly larger
cost of the even stopping the mutator is justified even by negligible
cost reduction for normal flow. I am not sure if this approach worths
the complications and overhead for probes.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20151123113511.GX58629>