Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 25 Nov 2013 11:41:12 +0200
From:      Mikolaj Golub <to.my.trociny@gmail.com>
To:        Pawel Jakub Dawidek <pjd@FreeBSD.org>
Cc:        freebsd-stable@freebsd.org, Pete French <petefrench@ingresso.co.uk>
Subject:   Re: Hast locking up under 9.2
Message-ID:  <20131125094111.GA22396@gmail.com>
In-Reply-To: <20131125083223.GE1398@garage.freebsd.pl>
References:  <20131121203711.GA3736@gmail.com> <E1Vjokn-000OuU-1Y@dilbert.ingresso.co.uk> <20131123215950.GA17292@gmail.com> <20131125083223.GE1398@garage.freebsd.pl>

next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, Nov 25, 2013 at 09:32:23AM +0100, Pawel Jakub Dawidek wrote:
> On Sat, Nov 23, 2013 at 11:59:51PM +0200, Mikolaj Golub wrote:
> > On Fri, Nov 22, 2013 at 11:18:29AM +0000, Pete French wrote:
> > 
> > > "Assertion failed: (!hio->hio_done), function write_complete, file
> > >  /usr/src/sbin/hastd/primary.c, line 1130."
> > 
> > It looks like write_complete usage (which should be called once per
> > write request) for memsync is racy.
> > 
> > Consider the following scenario:
> > 
> > 1) remote_recv_thread: memsync ack received, refcount -> 2;
> > 2) local_send_thread: local write completed, refcount -> 1, entering
> >    write_complete()
> > 3) remote_recv_thread: memsync fin received, refcount -> 0, move hio
> >    to done queue, ggate_send_thread gets the hio, checks for
> >    !hio->hio_done and (if loca_send_thread is still in
> >    write_complete()) entering write_complete()
> 
> I don't see how is that possible. The write_complete() function is
> called only when hio_countdown goes from 2 to 1 and because this is
> atomic operation it can only happen in one thread. Can you elaborate on
> how calling write_complete() concurrently for the same request is
> possible?

Yes, hio_countdown protects calling write_complete() concurently by
"component" threads. But it may also be called by ggate_send_thread():

	if (!hio->hio_done)
		write_complete(res, hio);

So if write_complete() has already started executing in
local_send_thread(), and at that time memsync fin is received, the
request is moved to ggate_send_thread, and write_complete can be
reentered if it is still in progress in local_send_thread (hio_done is
set on exiting write_complete).

That is why statement (3) in my patch: write_complete() in component
threads is called only before releasing hio_countdown. Otherwise you
are not protected from running it simultaneously by ggate_send_thread,
or even hio be moved to free before write_complete is finished in
local_send_thread. And so hio_countdown can't be used for detecting
the current memsync state.

-- 
Mikolaj Golub



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20131125094111.GA22396>