Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 20 May 2013 03:04:27 +1000 (EST)
From:      Bruce Evans <brde@optusnet.com.au>
To:        Jilles Tjoelker <jilles@stack.nl>
Cc:        freebsd-bugs@FreeBSD.org
Subject:   Re: bin/178664: truss(1) may kill process
Message-ID:  <20130520020810.I1934@besplex.bde.org>
In-Reply-To: <201305191210.r4JCA1hm090229@freefall.freebsd.org>
References:  <201305191210.r4JCA1hm090229@freefall.freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Sun, 19 May 2013, Jilles Tjoelker wrote:

> The following reply was made to PR bin/178664; it has been noted by GNATS.
>
> From: Jilles Tjoelker <jilles@stack.nl>
> To: bug-followup@FreeBSD.org, kwiat3k@panic.pl
> Cc:
> Subject: Re: bin/178664: truss(1) may kill process
> Date: Sun, 19 May 2013 14:09:32 +0200
>
> In PR bin/178664, you wrote:
> > [attaching truss(1) may terminate sleep(1) early]
>
> What actually happens is that the nanosleep(2) system call fails with
> [EINTR] immediately when the debugger (ptrace(2)) attaches. You can
> verify this using ktrace(1).

That's ktrace sleep(1) process.  ptrace() could see the EINTR in a register
like it does for gdb, but I think truss doesn't report this detail so neither
does ktrace on the truss process.

> This is really a longstanding known bug, although I don't know where it
> is documented. It is longstanding because it is very hard to fix. The
> kernel wants threads to return to the kernel-userspace boundary when a
> debugger attaches, and this causes the state of the in-progress system
> call to be lost. The effect is much like a signal with SA_RESTART set.
>
> If you care about sleep(1) itself, that is easy to fix. It already
> continues the sleep when nanosleep(2) was interrupted by SIGINFO; this
> can be extended to any [EINTR] error.

sleep(1) is specified to sleep for at least as long as the specified
number of seconds.  It is broken since it "knows" that EINTR can't
happen.

sleep(2) is specified to sleep for at least as long as the specified
number of seconds unless a signal is delivered to the thread and its
action is to invoke a signal-catching function or terminate the process.
Since there is no real signal here, there is no possibility of catching
it, and the unreal signal doesn't terminate the process either (it just
causes nanosleep(2) to return early).  Thus sleep(2) is broken too.

Similarly for nanosleep(2).  Not similarly for clock_nanosleep(2), since
it is just missing in FreeBSD.

> A workaround is to use ktrace(1) instead of truss(1) or strace(1) from
> ports. ktrace(1) generally affects the traced program much less.

Old versions of truss don't have the bug.  This seems to be because they
don't use ptrace (they use procfs and ioctl).

However, all versions of gdb have the bug:
- old (FreeBSD-~5.2) versions of gdb and/or ptrace(2) have the bug in a
   worse form.  Now even ktrace on the sleep(2) process doesn't see the
   EINTR (nanosleep() returns 0 with no error).
- current versions of gdb and/or ptrace make nanosleep() return -1/EINTR,
   but to handle this problem using gdb you would prefer gdb to trap the
   signal before it causes the EINTR.  But unreal signals are especially
   hard to trap.

Bruce



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20130520020810.I1934>