Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 24 Sep 2013 22:39:51 +0200
From:      Jilles Tjoelker <jilles@stack.nl>
To:        Konstantin Belousov <kostikbel@gmail.com>
Cc:        Tijl Coosemans <tijl@coosemans.org>, Russ Cox <rsc@swtch.com>, freebsd-current@FreeBSD.org
Subject:   Re: restarting SYSCALL system call on amd64 loses arguments
Message-ID:  <20130924203951.GB12607@stack.nl>
In-Reply-To: <20130924192909.GO41229@kib.kiev.ua>
References:  <20130923222613.548860a3@kalimero.tijl.coosemans.org> <20130923213730.GX41229@kib.kiev.ua> <20130924191949.GA12607@stack.nl> <20130924192909.GO41229@kib.kiev.ua>

next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, Sep 24, 2013 at 10:29:09PM +0300, Konstantin Belousov wrote:
> On Tue, Sep 24, 2013 at 09:19:49PM +0200, Jilles Tjoelker wrote:
> > On Tue, Sep 24, 2013 at 12:37:30AM +0300, Konstantin Belousov wrote:
> > > On Mon, Sep 23, 2013 at 10:26:13PM +0200, Tijl Coosemans wrote:
> > > > Has anyone taken a look at this PR yet?

> > > > http://www.freebsd.org/cgi/query-pr.cgi?pr=182161

> > > This looks like a valid bug, but probably not a valid testcase.

> > > Let me elaborate.  When a signal is delivered, return from the signal
> > > handler is performed by the sigreturn(2), which reloads the whole
> > > register file when crossing kernel->user boundary due to sys_sigreturn(9)
> > > setting PCB_FULL_IRET flag.  As result, the whole trap frame at the
> > > time of the syscall entry is restored, and ERESTART return is not
> > > exercised.

> > > I was not able to reproduce the issue with the supplied test program
> > > on HEAD.  I suspect that the program actually exposed the bug in the
> > > signal delivery in the threaded processes, which I introduced for 9.1
> > > and fixed in r251047 & r251365.

> > The ERESTART return happens if there is no signal or no longer a signal.
> > The latter is how the bug in the PR occurs: a SIGCHLD delivery via
> > handler in one thread races with a SIGCHLD acceptance in wait4() in
> > another thread. Note wait4() returning a value in the other thread in
> > the fourth line of the kdump output in the PR.

> > For some reason, I can reproduce this easily on my local quad-core
> > r255729 stable/9 system but not on ref9-amd64.freebsd.org or
> > ref10-amd64.freebsd.org.

> > I can also reproduce the bug on my local system by racing signal
> > delivery via handler with acceptance in sigtimedwait().

> So, could you, please, check the r255844 on your machine ?

I cannot reproduce it with that (patch applied to stable/9 kernel). The
test programs run fine for minutes.

-- 
Jilles Tjoelker



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20130924203951.GB12607>