From owner-freebsd-threads@FreeBSD.ORG  Fri Aug 13 12:51:15 2010
Return-Path: <owner-freebsd-threads@FreeBSD.ORG>
Delivered-To: freebsd-threads@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 614E6106564A
	for <freebsd-threads@freebsd.org>; Fri, 13 Aug 2010 12:51:15 +0000 (UTC)
	(envelope-from jilles@stack.nl)
Received: from mx1.stack.nl (relay04.stack.nl [IPv6:2001:610:1108:5010::107])
	by mx1.freebsd.org (Postfix) with ESMTP id C1D818FC08
	for <freebsd-threads@freebsd.org>; Fri, 13 Aug 2010 12:51:14 +0000 (UTC)
Received: from turtle.stack.nl (turtle.stack.nl [IPv6:2001:610:1108:5010::132])
	by mx1.stack.nl (Postfix) with ESMTP id 9FD341DD650;
	Fri, 13 Aug 2010 14:51:13 +0200 (CEST)
Received: by turtle.stack.nl (Postfix, from userid 1677)
	id 93892172B1; Fri, 13 Aug 2010 14:51:13 +0200 (CEST)
Date: Fri, 13 Aug 2010 14:51:13 +0200
From: Jilles Tjoelker <jilles@stack.nl>
To: Kostik Belousov <kostikbel@gmail.com>
Message-ID: <20100813125113.GA73073@stack.nl>
References: <20100811204758.GQ2396@deviant.kiev.zoral.com.ua>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20100811204758.GQ2396@deviant.kiev.zoral.com.ua>
User-Agent: Mutt/1.5.20 (2009-06-14)
Cc: freebsd-threads@freebsd.org
Subject: Re: PTHREAD_CANCEL_DEFERRED
X-BeenThere: freebsd-threads@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Threading on FreeBSD <freebsd-threads.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-threads>, 
	<mailto:freebsd-threads-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-threads>
List-Post: <mailto:freebsd-threads@freebsd.org>
List-Help: <mailto:freebsd-threads-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-threads>,
	<mailto:freebsd-threads-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 13 Aug 2010 12:51:15 -0000

On Wed, Aug 11, 2010 at 11:47:58PM +0300, Kostik Belousov wrote:
> Let consider the thread in the state where the cancelation is enabled
> and cancelation mode set to the PTHREAD_CANCEL_DEFERRED.

> SUSv4 says the following:
> Whenever a thread has cancelability enabled and a cancellation request
> has been made with that thread as the target, and the thread then
> calls any function that is a cancellation point (such as
> pthread_testcancel() or read()), the cancellation request shall be
> acted upon before the function returns. If a thread has cancelability
> enabled and a cancellation request is made with the thread as a target
> while the thread is suspended at a cancellation point, the thread
> shall be awakened and the cancellation request shall be acted upon.

> Take the close(2) as example, and assume that the cancel is enabled
> for the thread in deferred mode. libthr wrapper around the syscall
> executes this:

>      	curthread->cancel_point++;
> 	testcancel(curthread);
> 	__sys_close(fd);
> 	curthread->cancel_point--;

> The pthread_cancel() sends the SIGCANCEL signal to the
> thread. SIGCANCEL handler calls pthread_exit() if thread has the
> cancel_point greater then 0.

> I think this is not right. For instance, thread can be canceled due to
> SIGCANCEL delivery at the point after the return into the usermode
> from close(), but before cancel_point is decremented. IMO, the cited
> statements from the SUSv4 prohibit such behaviour. We should cancel
> either before closing fd, or during the sleep in close(), but then the
> syscall should be aborted ("as if signal is delivered"), and again fd
> should not be closed.

This reasoning is entirely valid for syscalls like open() and wait4()
without WNOWAIT, but close() is a bit different. Regardless of what the
underlying fo_close function returned, it continues cleaning up the file
descriptor (even if the error is EINTR or ERESTART). It seems that
ERESTART could cause another close for the same fd number after the
signal handler returns, which can have disastrous consequences in
multi-threaded applications (data sent to wrong files). Similarly,
applications that think they have to retry close() when it returns EINTR
may cause problems. It would be safest to avoid EINTR or ERESTART errors
from close() entirely, either changing it to another errno or pretending
the close succeeded.

POSIX leaves the state of a file descriptor after close() failed with
EINTR or EIO unspecified. So the only thing a portable application can
do is forget about the file descriptor and hope the implementation freed
it; otherwise, if the implementation frees the file descriptor, the
application may act on a new file opened by another thread.

An example of a type of file that can block on close is a tty. However,
the tty layer ignores any error that could result from this and always
succeeds a close.

It seems that the kernel code in general would become more complicated
by allowing close() to be restarted. Perhaps it is better to change the
libthr wrapper to decrement cancel_point before calling __sys_close().
If a cancel had come in before close(), it will be acted upon like
before, and the cleanup handlers will be called with fd still open. If a
cancel comes in during a blocking close(), it will not be acted upon yet
but the close will abort with EINTR (or just pretend to succeed if it is
a tty); the cancel will be acted upon at the next cancellation point. As
usual with things that rely on EINTR there is a race window, between the
decrement and entering close(), where a cancel may not be acted upon
before close() finishes its thing.

> Actually, besides not being compliant, I think that the current
> behaviour is not useful, since in deferred case, we cannot know
> whether the action by the call that is cancelation point was performed
> or not.

> This is not a rant, I probably will fix the issue if it is agreed
> upon. Opinions ?

I think this is indeed a problem.

In fact, I have a similar issue in sh(1). sh catches SIGINT, with a
signal handler that either sets a flag or longjmp()s to an exception
handler, similar to libthr's cancellation handling. System call restart
is disabled for SIGINT. It would be best to use the longjmp() approach
for blocking operations, as this avoids the race conditions associated
with EINTR, but this is not possible if the blocking operation also
changes state such as when opening a fifo. Therefore, a SIGINT that
comes in just before the blocking open is tried is not processed until
the open is done. Admittedly, because sh is single-threaded, I could
predict the fd that open() would return and (void)close it after an
interruption, but that is very ugly.

What I have seen from glibc's implementation is that it is similar to
ours, except that it avoids sending a signal when the signal handler
would not pthread_exit() right away.

An idea for a fix: add a new bit of state for each signal which means
that the signal interrupts sleep(9) functions with PCATCH, but is not
delivered while the thread is executing on a CPU. Actually delivering
the signal in this state would require another bit to track if the
signal has interrupted a sleep, which would be checked at return from
the syscall. Alternatively, userland would receive EINTR on all blocking
syscalls until it either disables the new mode or accepts the signal
using sigwait() or similar. The latter is simplest on the kernel but may
cause a user program to spin waiting for the EINTR to go away. The EINTR
could be replaced by a different error. In either case,
pthread_testcancel() would use a sigtimedwait() for SIGCANCEL with a
zero-valued timespec, invoking cancellation processing if it succeeds.

Hmm, there is also a slight problem here with syscalls that are
cancellation points, but do not block, such as close() on a regular
file (weird NFS intr stuff excluded). Preserving the semantics requires
an additional syscall to check for cancellation, which is unfortunate
because cancellation is not used much.

Another (ugly) fix is to use some sort of timer to interrupt blocking
operations if processing had apparently not yet started due to the race
condition. This means that functions like open() and wait4() are wrapped
more like

	curthread->cancel_point++;
	testcancel(curthread);
	curthread->cancel_point--;
	ret = __sys_wait4(pid, status, options, rusage);
	if (ret == -1 && errno == EINTR) {
		curthread->cancel_point++;
		testcancel(curthread);
		curthread->cancel_point--;
	}

The signal handler for SIGCANCEL enables a timer that resends SIGCANCEL
to the thread after some time. It looks like the facility underlying
timer_create() provides such timers.

-- 
Jilles Tjoelker