Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 17 Apr 2018 04:16:35 +1000 (EST)
From:      Bruce Evans <brde@optusnet.com.au>
To:        Brooks Davis <brooks@freebsd.org>
Cc:        freebsd-arch@freebsd.org
Subject:   Re: Do fuswintr/suswintr make sense?
Message-ID:  <20180417031820.A6479@besplex.bde.org>
In-Reply-To: <20180416161012.GB44509@spindle.one-eyed-alien.net>
References:  <20180416161012.GB44509@spindle.one-eyed-alien.net>

next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, 16 Apr 2018, Brooks Davis wrote:

> The fuswintr() and suswintr() are intended to be safe in interrupt
> context.  They are used in the profiling code and if they fail the code
> falls back to triggering a trap with appropriate fields in struct
> thread.  This is fine as such, but amd64, arm, i386, and powerpc have
> implementations that always fail.  arm64, mips, riscv, and sparc64 all
> add code to the trap handler to detect that this particular code has
> faulted and return to the handler before doing and processing that might
> result in a sleep.  This optimization came from 4.4BSD.

Not having it for i386 also came from 4.4BSD.  NetBSD fixed it for i386
in 1994 or earlier (locore.s 1.41)

> Does this optimization actually make sense in 2017, particularly
> given that we're not taking advantage of it on x86 (and worse, our
> implementations of return (-1) aren't inlined so they have cache
> impacts)?

Profiling is even more in need of optimizations in 2017.  But not this
one, at least on i386.  [fs]uswintr() might be worth using if they
looked like [fs]uword16() and the latter and were efficient.
   (I don't see any reason why they can't be essentially the same.  If
   the user memory is not mapped then they will cause the same page
   fault, and they just have to fail instead of trying to handle the
   page fault.)
But [fs]uword16() are now extremely inefficient on i386.  The user and
kernel memory are in a different address spaces.  The functions are
implemented using a trampoline that has to map user memory, and this
is even slower than having a trampoline.  But not as slow as switching
the whole memory map on every crossing of the user-kernel boundary
including for profiling interrupts.

Since accessing 1 word at a time is too slow on i386 no matter how it is
done. the correct optimization looks more like a fancier ast() than
fsuswintr(): use a kernel buffer with a few hundred addresses instead of
only 1, and tell the application about the addresses in a single operation.
Do a full switch back to user context and run a trampoline to add 1 to
many addresses there, since the addresses are expected to be on many
different pages.  (The buffer now is { td_profil_addr; td_profil_ticks; }.
It can hold several ticks but only at 1 address.  Multiple ticks occur
mainly when the system can't keep up; then the address is clobbered but
the ticks count is charged to a new address.)

Bruce



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20180417031820.A6479>