From owner-freebsd-threads@FreeBSD.ORG Fri Aug 13 12:51:15 2010 Return-Path: Delivered-To: freebsd-threads@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 614E6106564A for ; Fri, 13 Aug 2010 12:51:15 +0000 (UTC) (envelope-from jilles@stack.nl) Received: from mx1.stack.nl (relay04.stack.nl [IPv6:2001:610:1108:5010::107]) by mx1.freebsd.org (Postfix) with ESMTP id C1D818FC08 for ; Fri, 13 Aug 2010 12:51:14 +0000 (UTC) Received: from turtle.stack.nl (turtle.stack.nl [IPv6:2001:610:1108:5010::132]) by mx1.stack.nl (Postfix) with ESMTP id 9FD341DD650; Fri, 13 Aug 2010 14:51:13 +0200 (CEST) Received: by turtle.stack.nl (Postfix, from userid 1677) id 93892172B1; Fri, 13 Aug 2010 14:51:13 +0200 (CEST) Date: Fri, 13 Aug 2010 14:51:13 +0200 From: Jilles Tjoelker To: Kostik Belousov Message-ID: <20100813125113.GA73073@stack.nl> References: <20100811204758.GQ2396@deviant.kiev.zoral.com.ua> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20100811204758.GQ2396@deviant.kiev.zoral.com.ua> User-Agent: Mutt/1.5.20 (2009-06-14) Cc: freebsd-threads@freebsd.org Subject: Re: PTHREAD_CANCEL_DEFERRED X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 13 Aug 2010 12:51:15 -0000 On Wed, Aug 11, 2010 at 11:47:58PM +0300, Kostik Belousov wrote: > Let consider the thread in the state where the cancelation is enabled > and cancelation mode set to the PTHREAD_CANCEL_DEFERRED. > SUSv4 says the following: > Whenever a thread has cancelability enabled and a cancellation request > has been made with that thread as the target, and the thread then > calls any function that is a cancellation point (such as > pthread_testcancel() or read()), the cancellation request shall be > acted upon before the function returns. If a thread has cancelability > enabled and a cancellation request is made with the thread as a target > while the thread is suspended at a cancellation point, the thread > shall be awakened and the cancellation request shall be acted upon. > Take the close(2) as example, and assume that the cancel is enabled > for the thread in deferred mode. libthr wrapper around the syscall > executes this: > curthread->cancel_point++; > testcancel(curthread); > __sys_close(fd); > curthread->cancel_point--; > The pthread_cancel() sends the SIGCANCEL signal to the > thread. SIGCANCEL handler calls pthread_exit() if thread has the > cancel_point greater then 0. > I think this is not right. For instance, thread can be canceled due to > SIGCANCEL delivery at the point after the return into the usermode > from close(), but before cancel_point is decremented. IMO, the cited > statements from the SUSv4 prohibit such behaviour. We should cancel > either before closing fd, or during the sleep in close(), but then the > syscall should be aborted ("as if signal is delivered"), and again fd > should not be closed. This reasoning is entirely valid for syscalls like open() and wait4() without WNOWAIT, but close() is a bit different. Regardless of what the underlying fo_close function returned, it continues cleaning up the file descriptor (even if the error is EINTR or ERESTART). It seems that ERESTART could cause another close for the same fd number after the signal handler returns, which can have disastrous consequences in multi-threaded applications (data sent to wrong files). Similarly, applications that think they have to retry close() when it returns EINTR may cause problems. It would be safest to avoid EINTR or ERESTART errors from close() entirely, either changing it to another errno or pretending the close succeeded. POSIX leaves the state of a file descriptor after close() failed with EINTR or EIO unspecified. So the only thing a portable application can do is forget about the file descriptor and hope the implementation freed it; otherwise, if the implementation frees the file descriptor, the application may act on a new file opened by another thread. An example of a type of file that can block on close is a tty. However, the tty layer ignores any error that could result from this and always succeeds a close. It seems that the kernel code in general would become more complicated by allowing close() to be restarted. Perhaps it is better to change the libthr wrapper to decrement cancel_point before calling __sys_close(). If a cancel had come in before close(), it will be acted upon like before, and the cleanup handlers will be called with fd still open. If a cancel comes in during a blocking close(), it will not be acted upon yet but the close will abort with EINTR (or just pretend to succeed if it is a tty); the cancel will be acted upon at the next cancellation point. As usual with things that rely on EINTR there is a race window, between the decrement and entering close(), where a cancel may not be acted upon before close() finishes its thing. > Actually, besides not being compliant, I think that the current > behaviour is not useful, since in deferred case, we cannot know > whether the action by the call that is cancelation point was performed > or not. > This is not a rant, I probably will fix the issue if it is agreed > upon. Opinions ? I think this is indeed a problem. In fact, I have a similar issue in sh(1). sh catches SIGINT, with a signal handler that either sets a flag or longjmp()s to an exception handler, similar to libthr's cancellation handling. System call restart is disabled for SIGINT. It would be best to use the longjmp() approach for blocking operations, as this avoids the race conditions associated with EINTR, but this is not possible if the blocking operation also changes state such as when opening a fifo. Therefore, a SIGINT that comes in just before the blocking open is tried is not processed until the open is done. Admittedly, because sh is single-threaded, I could predict the fd that open() would return and (void)close it after an interruption, but that is very ugly. What I have seen from glibc's implementation is that it is similar to ours, except that it avoids sending a signal when the signal handler would not pthread_exit() right away. An idea for a fix: add a new bit of state for each signal which means that the signal interrupts sleep(9) functions with PCATCH, but is not delivered while the thread is executing on a CPU. Actually delivering the signal in this state would require another bit to track if the signal has interrupted a sleep, which would be checked at return from the syscall. Alternatively, userland would receive EINTR on all blocking syscalls until it either disables the new mode or accepts the signal using sigwait() or similar. The latter is simplest on the kernel but may cause a user program to spin waiting for the EINTR to go away. The EINTR could be replaced by a different error. In either case, pthread_testcancel() would use a sigtimedwait() for SIGCANCEL with a zero-valued timespec, invoking cancellation processing if it succeeds. Hmm, there is also a slight problem here with syscalls that are cancellation points, but do not block, such as close() on a regular file (weird NFS intr stuff excluded). Preserving the semantics requires an additional syscall to check for cancellation, which is unfortunate because cancellation is not used much. Another (ugly) fix is to use some sort of timer to interrupt blocking operations if processing had apparently not yet started due to the race condition. This means that functions like open() and wait4() are wrapped more like curthread->cancel_point++; testcancel(curthread); curthread->cancel_point--; ret = __sys_wait4(pid, status, options, rusage); if (ret == -1 && errno == EINTR) { curthread->cancel_point++; testcancel(curthread); curthread->cancel_point--; } The signal handler for SIGCANCEL enables a timer that resends SIGCANCEL to the thread after some time. It looks like the facility underlying timer_create() provides such timers. -- Jilles Tjoelker