Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 29 May 2013 10:16:45 +0300
From:      Konstantin Belousov <kostikbel@gmail.com>
To:        Alfred Perlstein <bright@mu.org>
Cc:        freebsd-hackers@freebsd.org
Subject:   Re: please review, patch for lost camisr
Message-ID:  <20130529071645.GR3047@kib.kiev.ua>
In-Reply-To: <51A592BC.1050100@mu.org>
References:  <51A44B0C.8010908@ixsystems.com> <201305281204.14146.jhb@freebsd.org> <51A505A5.7030105@ixsystems.com> <201305281613.32414.jhb@freebsd.org> <51A514F5.9050405@ixsystems.com> <20130529050835.GP3047@kib.kiev.ua> <51A592BC.1050100@mu.org>

next in thread | previous in thread | raw e-mail | index | archive | help

--Kp2K9UuanVOCcgkO
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Tue, May 28, 2013 at 10:31:40PM -0700, Alfred Perlstein wrote:
> On 5/28/13 10:08 PM, Konstantin Belousov wrote:
> > On Tue, May 28, 2013 at 01:35:01PM -0700, Alfred Perlstein wrote:
> >> [[  moved to hackers@ from private mail. ]]
> >>
> >> On 5/28/13 1:13 PM, John Baldwin wrote:
> >>> On Tuesday, May 28, 2013 3:29:41 pm Alfred Perlstein wrote:
> >>>> On 5/28/13 9:04 AM, John Baldwin wrote:
> >>>>> On Tuesday, May 28, 2013 2:13:32 am Alfred Perlstein wrote:
> >>>>>> Hey folks,
> >>>>>>
> >>>>>> I had a talk with Nathan Whitehorn about the camisr issue.  The is=
sue we
> >>>>>> are seeing we mostly know, but to summarize, we are losing the cam=
isr
> >>>>>> signal and the camisr is not being run.
> >>>>>>
> >>>>>> I gave him a summary of what we have been seeing and pointed him t=
o the
> >>>>>> code I am concerned about here:
> >>>>>> http://pastebin.com/tLKr7mCV  (this is inside of kern_intr.c).
> >>>>>>
> >>>>>> What I think that is happening is that the setting of it_need to 0
> >>>>>> inside of sys/kern/kern_intr.c:ithread_loop() is not being schedul=
ed
> >>>>>> correctly and it is being delayed until AFTER the call to
> >>>>>> ithread_execute_handlers() right below the atomic_store_rel_int().
> >>>>> This seems highly unlikely, to the extent that if this were true al=
l our
> >>>>> locking primitives would be broken.  The store_rel is actually a re=
lease
> >>>>> barrier which acts like more a read/write fence.  No memory accesse=
s (read or
> >>>>> write) from before the release can be scheduled after the associate=
d store,
> >>>>> either by the compiler or CPU.  That is what Konstantin is referrin=
g to in his
> >>>>> commit when he says "release" semantics".
> >>>> Yes, that makes sense, however does it specify that the writes *must*
> >>>> occur at that *point*?  If it only enforces ordering then we may have
> >>>> some issue, specifically because the setting of it to '1' inside of
> >>>> intr_event_schedule_thread has no barrier other than the acq semanti=
cs
> >>>> of the thread lock.  I am wondering what is forcing out the '1' ther=
e.
> >>> Nothing ever forces writes.  You would have to invalidate the cache t=
o do that
> >>> and that is horribly expensive.  It is always only about ordering and=
 knowing
> >>> that if you can complete another operation on the same "cookie" varia=
ble with
> >>> acquire semantics that earlier writes will be visible.
> >> By cookie, you mean a specific memory address, basically a lock? This =
is
> >> starting to reinforce my suspicions as the setting of it_need is done
> >> with release semantics, however the acq on the other CPU is done on the
> >> thread lock.  Maybe that is irrelevant.  We will find out shortly.
> >>
> >>>> See below as I think we have proof that this is somehow happening.
> >>> Having ih_need of 1 and it_need of 0 is certainly busted.  The simple=
st fix
> >>> is probably to stop using atomics on it_need and just grab the thread=
 lock
> >>> in the main ithread loop and hold it while checking and clearing it_n=
eed.
> >>>
> >> OK, we have some code that will either prove this, or perturb the memo=
ry
> >> ordering enough to make the bug go away, or prove this assertion wrong.
> >>
> >> We will update on our findings hopefully in the next few days.
> > IMO the read of it_need in the 'while (ithd->it_need)' should
> > have acquire semantic, otherwise the future reads in the
> > ithread_execute_handlers(), in particular, of the ih_need, could pass
> > the read of it_need and cause the situation you reported.  I do not
> > see any acquire barrier between a condition in the while() statement
> > and the read of ih_need in the execute_handlers().
> >
> > It is probably true that the issue you see was caused by r236456, in the
> > sense that implicitely locked xchgl instruction on x86 has a full barri=
er
> > semantic.  As result, the store_rel() was actually an acquire too, maki=
ng
> > this reordering impossible.  I argue that this is not a bug in r236456,
> > but the issue in the kern_intr.c.
> If I remember the code correctly that would probably explain why we see=
=20
> it only on 9.1 system.
> >
> > On the other hand, the John' suggestion to move the manipulations of
> > it_need under the lock is probably the best anyway.
> >
> I was wondering if it would be lower latency to maintain it_need,=20
> however to keep another variable it_needlocked under the thread lock. =20
> This would result in potential superfluous interrupts, however under=20
> load you would allow the ithread to loop without taking the thread lock=
=20
> some number of times.
>=20
> I am not really sure if this is really worth the optimization=20
> (especially since it can result in superfluous interrupts) however it=20
> may reduce latency and that might be important.
>=20
> Is there some people that I can pass the patch onto for help with=20
> performance once we confirm that this is the actual bug?   We can do=20
> internal testing, but I am worried about regressing performance of any=20
> form of IO for the kernel.
>=20
> I'll show the patch soon.
>=20
> Thank you for the information.  This is promising.

Well, if you and I are right, the minimal patch should be

diff --git a/sys/kern/kern_intr.c b/sys/kern/kern_intr.c
index 8d63c9b..7c21015 100644
--- a/sys/kern/kern_intr.c
+++ b/sys/kern/kern_intr.c
@@ -1349,7 +1349,7 @@ ithread_loop(void *arg)
 		 * we are running, it will set it_need to note that we
 		 * should make another pass.
 		 */
-		while (ithd->it_need) {
+		while (atomic_load_acq_int(&ithd->it_need)) {
 			/*
 			 * This might need a full read and write barrier
 			 * to make sure that this write posts before any


--Kp2K9UuanVOCcgkO
Content-Type: application/pgp-signature

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.20 (FreeBSD)

iQIcBAEBAgAGBQJRpatcAAoJEJDCuSvBvK1BB84P/0yJ0u/G3Ddiw2Psk4UmIW1W
oDFF3Z3qDW25qQxyH7ER/35EfayVdugBvR4BwNt//V//nNlEXcS56d+qAG1Fxgd/
cCV/QSxBjb7sLka93FX8Ku5wvwKbeGbunBOG8DSVDL34+MJy+tyriw9LudYmegVf
4oT/TjtsJDAiXfl+vS1EhEoQy49QCQwThc2CA5DalrydF3xJc5RJRitR8f9yfjf5
fhPPm24g0wHYbG2ctUGl1oK7xnJUVVQT0MlEfMoB4F/tamXIeS/hP09dLJuWQK3j
wqzxD6KndKtLi5m/JSbpuxdE0nWpdewpLe66VEDW7bcxf98qZ/qzp3bjImYBE6dC
SYgMtgTwZfBpV7LuyPlR7lQo2Rc0XKKdJwCtdF0MSSBy8kzLK+WQsiCLJjJth1y/
k2mm+SndZgNKXjsYjUITxm1YZx6obVNiyiwUwoRilwizkMpdFyyO9FvGRBywDGQz
bQOjptIEvOzcUVHSFgSWon7+6xbaV7JEDe1ja8ApeLki3Cm699HZia4ZmlC5dNxP
OOfHp0CY9lCIdK/ib3I6yPY2g3Q8RWJRtBbhkU2ChxXzB4WiuAyumgWjWmYhpX3t
/PdWUXk3IGdmvp9w+vt8HzIqkl7RIGh3T3f3YKWypRhjB/00wmY1UKIGISzTUQc1
5jvmi4ZCGmTwuFhEbp5X
=/X81
-----END PGP SIGNATURE-----

--Kp2K9UuanVOCcgkO--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20130529071645.GR3047>