Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 26 Aug 2010 14:20:31 -0700
From:      mdf@FreeBSD.org
To:        John Baldwin <jhb@freebsd.org>
Cc:        freebsd-current@freebsd.org, Jeff Roberson <jroberson@jroberson.net>
Subject:   Re: sched_pin() bug in SCHED_ULE
Message-ID:  <AANLkTi=7cLfH3h3jMVhbEzNsuiPod3Zx=PiazK7Dn4%2BA@mail.gmail.com>
In-Reply-To: <201008261649.20543.jhb@freebsd.org>
References:  <AANLkTin6V9pc3d7ifyOmR2V-H5-g_AQFji-LbZzWfAKM@mail.gmail.com> <201008261649.20543.jhb@freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, Aug 26, 2010 at 1:49 PM, John Baldwin <jhb@freebsd.org> wrote:
> On Thursday, August 26, 2010 4:03:38 pm mdf@freebsd.org wrote:
>> Back at the beginning of August I posted about issues with sched_pin()
>> and witness_warn():
>>
>> http://lists.freebsd.org/pipermail/freebsd-hackers/2010-August/032553.ht=
ml
>>
>> After a lot of debugging I think I've basically found the issue. =A0I
>> found this bug on stable/7, but looking at the commit log for CURRENT
>> I don't see any reason the bug doesn't exist there. =A0This analysis is
>> specific to amd64/i386 but the problem is likely to exist on most
>> arches.
>>
>> The basic problem is that in both tdq_move() and sched_setcpu() can
>> manipulate the ts->ts_cpu variable of another process, which may be
>> running. =A0This means that a process running on CPU N may be set to run
>> on CPU M the next context switch.
>>
>> Then, that process may call sched_pin(), then grab a PCPU variable.
>> An IPI_PREEMPT can come in, causing the thread to call mi_switch() in
>> ipi_bitmap_handler(). =A0sched_switch() will then notice that ts->ts_cpu
>> is not PCPU_GET(cpuid), and call sched_switch_migrate(), migrating the
>> thread to the intended CPU. =A0Thus after sched_pin() and potentially
>> before using any PCPU variable the process can get migrated to another
>> CPU, and now it is using a PCPU variable for the wrong processor.
>>
>> Given that ts_cpu can be set by other threads, it doesn't seem worth
>> checking at sched_pin time whether ts_cpu is not the same as td_oncpu,
>> because to make the check would require taking the thread_lock.
>> Instead, I propose adding a check for !THREAD_CAN_MIGRATE() before
>> calling sched_switch_migrate(). =A0However, sched_pin() is also used by
>> sched_bind(9) to force the thread to stay on the intended cpu. =A0I
>> would handle this by adding a TSF_BINDING state that is different from
>> TSF_BOUND.
>>
>> The thing that has me wondering whether this is all correct is that I
>> don't see any checks in tdq_move() or sched_setcpu() for either
>> TSF_BOUND or THREAD_CAN_MIGRATE(). =A0Perhaps that logic is managed in
>> the various calling functions; in that case I would propose adding
>> asserts that the thread is THREAD_CAN_MIGRATE() unless it's marked
>> TSF_BINDING.
>>
>> Does this analysis seem correct?
>
> Calling code does check THREAD_CAN_MIGRATE(), but the problem is that it =
is
> not safe to call that on anything but curthread as td_pinned is not locke=
d.
> It is assumed that only curthread would ever check td_pinned, not other
> threads. =A0I suspect what is happening is that another CPU is seeing a s=
tale
> value of td_pinned. =A0You could fix this by grabbing the thread lock in
> sched_pin()/unpin() for ULE, but that is a bit expensive perhaps.

I tried making sched_pin() a real function which used
intr_disable/intr_restore around saving off td->td_oncpu,
td->td_lastcpu and ts->ts_cpu, and the stack at the time of call.  In
sched_switch when I saw an unexpected migration I printed all that
out.  I found that on my boxes, at sched_pin() time ts_cpu was already
different from td->td_oncpu, so the specific problem I was having was
that while another thread can change ts_cpu it has no way to force
that thread to immediately migrate.

I believe the below patch fixes the issue, though I'm open to other methods=
:


Index: kern/sched_ule.c
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
--- kern/sched_ule.c	(.../head/src/sys)	(revision 158279)
+++ kern/sched_ule.c	(.../branches/BR_BUG_67957/src/sys)	(revision 158279)
@@ -112,6 +112,7 @@
 /* flags kept in ts_flags */
 #define	TSF_BOUND	0x0001		/* Thread can not migrate. */
 #define	TSF_XFERABLE	0x0002		/* Thread was added as transferable. */
+#define	TSF_BINDING	0x0004		/* Thread is being bound. */

 static struct td_sched td_sched0;

@@ -1859,6 +1858,7 @@
 	struct mtx *mtx;
 	int srqflag;
 	int cpuid;
+	int do_switch;

 	THREAD_LOCK_ASSERT(td, MA_OWNED);

@@ -1888,10 +1888,21 @@
 		srqflag =3D (flags & SW_PREEMPT) ?
 		    SRQ_OURSELF|SRQ_YIELDING|SRQ_PREEMPTED :
 		    SRQ_OURSELF|SRQ_YIELDING;
-		if (ts->ts_cpu =3D=3D cpuid)
+		/*
+		 * Allow the switch to another processor as requested
+		 * only if the thread can migrate or we are in the
+		 * middle of binding for sched_bind(9).  This keeps
+		 * sched_pin() quick, since other threads can
+		 * manipulate ts_cpu.
+		 */
+		do_switch =3D (ts->ts_cpu !=3D cpuid);
+		if (!THREAD_CAN_MIGRATE(td) &&
+		    (ts->ts_flags & TSF_BINDING) =3D=3D 0)
+			do_switch =3D 0;
+		if (do_switch)
+			mtx =3D sched_switch_migrate(tdq, td, srqflag);
+		else
 			tdq_add(tdq, td, srqflag);
-		else
-			mtx =3D sched_switch_migrate(tdq, td, srqflag);
 	} else {
 		/* This thread must be going to sleep. */
 		TDQ_LOCK(tdq);
@@ -1938,12 +1949,25 @@
 		 */
 		cpuid =3D PCPU_GET(cpuid);
 		tdq =3D TDQ_CPU(cpuid);
+		KASSERT(THREAD_CAN_MIGRATE(td) ||
+		    (ts->ts_flags & TSF_BINDING) !=3D 0 ||
+		     cpuid =3D=3D td->td_lastcpu,
+		    ("Non-migratable thread %p migrated from %d to %d",
+		     td, td->td_lastcpu, cpuid));
 #ifdef	HWPMC_HOOKS
 		if (PMC_PROC_IS_USING_PMCS(td->td_proc))
 			PMC_SWITCH_CONTEXT(td, PMC_FN_CSW_IN);
 #endif
 	} else
 		thread_unblock_switch(td, mtx);
+
+	if ((ts->ts_flags & TSF_BINDING) !=3D 0) {
+		ts->ts_flags &=3D ~TSF_BINDING;
+		ts->ts_flags |=3D TSF_BOUND;
+	}
+	KASSERT((ts->ts_flags & TSF_BOUND) =3D=3D 0 || ts->ts_cpu =3D=3D cpuid,
+	    ("Bound thread %p on %d not %d", td, cpuid, ts->ts_cpu));
+
 	/*
 	 * Assert that all went well and return.
 	 */
@@ -2607,15 +2631,21 @@
 	ts =3D td->td_sched;
 	if (ts->ts_flags & TSF_BOUND)
 		sched_unbind(td);
-	ts->ts_flags |=3D TSF_BOUND;
+	KASSERT(THREAD_CAN_MIGRATE(td),
+	    ("%s called on non-migratable thread %p", __func__, td));
 #ifdef SMP
+	ts->ts_flags |=3D TSF_BINDING;
 	sched_pin();
 	if (PCPU_GET(cpuid) =3D=3D cpu)
 		return;
 	ts->ts_cpu =3D cpu;
 	/* When we return from mi_switch we'll be on the correct cpu. */
 	mi_switch(SW_VOL, NULL);
+#else
+	ts->ts_flags |=3D TSF_BOUND;
 #endif
+	KASSERT((ts->ts_flags & TSF_BOUND) !=3D 0 && cpu =3D=3D PCPU_GET(cpuid),
+	    ("Should be bound to %d", cpu));
 }

 /*
@@ -2640,7 +2670,7 @@
 sched_is_bound(struct thread *td)
 {
 	THREAD_LOCK_ASSERT(td, MA_OWNED);
-	return (td->td_sched->ts_flags & TSF_BOUND);
+	return ((td->td_sched->ts_flags & (TSF_BOUND | TSF_BINDING)) !=3D 0);
 }

 /*



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?AANLkTi=7cLfH3h3jMVhbEzNsuiPod3Zx=PiazK7Dn4%2BA>