From owner-freebsd-current@FreeBSD.ORG Thu Aug 26 21:20:33 2010 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 5805B1065694; Thu, 26 Aug 2010 21:20:33 +0000 (UTC) (envelope-from mdf356@gmail.com) Received: from mail-ew0-f54.google.com (mail-ew0-f54.google.com [209.85.215.54]) by mx1.freebsd.org (Postfix) with ESMTP id AB1C98FC16; Thu, 26 Aug 2010 21:20:32 +0000 (UTC) Received: by ewy4 with SMTP id 4so1783283ewy.13 for ; Thu, 26 Aug 2010 14:20:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:sender:received :in-reply-to:references:date:x-google-sender-auth:message-id:subject :from:to:cc:content-type:content-transfer-encoding; bh=jCmhhprGU6cL33UZWALWlLU593s1UI6h5PJcH/H76PQ=; b=UQl/s3VeLj3Vy0c9TbSkIQSh21QGtNq6KmZ5s230oh5Rf1qORKTtBPY8aIdfvQhLyn DV2MVR/II8LJKuHWwGotP+IxEHUKHpkuDNRhcE3GkG3Mc1IdBXgA8UtP4yJdgCsQxTiT 9euGizqpK9IOcFw25eAsNhQS/pZyKJ5geV+TQ= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type :content-transfer-encoding; b=eU4wjSjnEeTAC+Osq7GDXP2LX8d1B19s04tX5pdlY1YoW1rs11oLTctXFJKwXCfVVC HIZlkADewWnlDa+rQ5yMtuG3JCozjyl9b6poKIQb6BoJqSdhRPAxS/wylUB8vLGh6mWe siBrHtItU87YqUCTqwNUeoTlpyIvWHd4gzJYo= MIME-Version: 1.0 Received: by 10.213.28.20 with SMTP id k20mr84684ebc.73.1282857631407; Thu, 26 Aug 2010 14:20:31 -0700 (PDT) Sender: mdf356@gmail.com Received: by 10.213.20.144 with HTTP; Thu, 26 Aug 2010 14:20:31 -0700 (PDT) In-Reply-To: <201008261649.20543.jhb@freebsd.org> References: <201008261649.20543.jhb@freebsd.org> Date: Thu, 26 Aug 2010 14:20:31 -0700 X-Google-Sender-Auth: 1ApI06c0i1UhCXHueyrKY3X_ayo Message-ID: From: mdf@FreeBSD.org To: John Baldwin Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Cc: freebsd-current@freebsd.org, Jeff Roberson Subject: Re: sched_pin() bug in SCHED_ULE X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 26 Aug 2010 21:20:33 -0000 On Thu, Aug 26, 2010 at 1:49 PM, John Baldwin wrote: > On Thursday, August 26, 2010 4:03:38 pm mdf@freebsd.org wrote: >> Back at the beginning of August I posted about issues with sched_pin() >> and witness_warn(): >> >> http://lists.freebsd.org/pipermail/freebsd-hackers/2010-August/032553.ht= ml >> >> After a lot of debugging I think I've basically found the issue. =A0I >> found this bug on stable/7, but looking at the commit log for CURRENT >> I don't see any reason the bug doesn't exist there. =A0This analysis is >> specific to amd64/i386 but the problem is likely to exist on most >> arches. >> >> The basic problem is that in both tdq_move() and sched_setcpu() can >> manipulate the ts->ts_cpu variable of another process, which may be >> running. =A0This means that a process running on CPU N may be set to run >> on CPU M the next context switch. >> >> Then, that process may call sched_pin(), then grab a PCPU variable. >> An IPI_PREEMPT can come in, causing the thread to call mi_switch() in >> ipi_bitmap_handler(). =A0sched_switch() will then notice that ts->ts_cpu >> is not PCPU_GET(cpuid), and call sched_switch_migrate(), migrating the >> thread to the intended CPU. =A0Thus after sched_pin() and potentially >> before using any PCPU variable the process can get migrated to another >> CPU, and now it is using a PCPU variable for the wrong processor. >> >> Given that ts_cpu can be set by other threads, it doesn't seem worth >> checking at sched_pin time whether ts_cpu is not the same as td_oncpu, >> because to make the check would require taking the thread_lock. >> Instead, I propose adding a check for !THREAD_CAN_MIGRATE() before >> calling sched_switch_migrate(). =A0However, sched_pin() is also used by >> sched_bind(9) to force the thread to stay on the intended cpu. =A0I >> would handle this by adding a TSF_BINDING state that is different from >> TSF_BOUND. >> >> The thing that has me wondering whether this is all correct is that I >> don't see any checks in tdq_move() or sched_setcpu() for either >> TSF_BOUND or THREAD_CAN_MIGRATE(). =A0Perhaps that logic is managed in >> the various calling functions; in that case I would propose adding >> asserts that the thread is THREAD_CAN_MIGRATE() unless it's marked >> TSF_BINDING. >> >> Does this analysis seem correct? > > Calling code does check THREAD_CAN_MIGRATE(), but the problem is that it = is > not safe to call that on anything but curthread as td_pinned is not locke= d. > It is assumed that only curthread would ever check td_pinned, not other > threads. =A0I suspect what is happening is that another CPU is seeing a s= tale > value of td_pinned. =A0You could fix this by grabbing the thread lock in > sched_pin()/unpin() for ULE, but that is a bit expensive perhaps. I tried making sched_pin() a real function which used intr_disable/intr_restore around saving off td->td_oncpu, td->td_lastcpu and ts->ts_cpu, and the stack at the time of call. In sched_switch when I saw an unexpected migration I printed all that out. I found that on my boxes, at sched_pin() time ts_cpu was already different from td->td_oncpu, so the specific problem I was having was that while another thread can change ts_cpu it has no way to force that thread to immediately migrate. I believe the below patch fixes the issue, though I'm open to other methods= : Index: kern/sched_ule.c =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- kern/sched_ule.c (.../head/src/sys) (revision 158279) +++ kern/sched_ule.c (.../branches/BR_BUG_67957/src/sys) (revision 158279) @@ -112,6 +112,7 @@ /* flags kept in ts_flags */ #define TSF_BOUND 0x0001 /* Thread can not migrate. */ #define TSF_XFERABLE 0x0002 /* Thread was added as transferable. */ +#define TSF_BINDING 0x0004 /* Thread is being bound. */ static struct td_sched td_sched0; @@ -1859,6 +1858,7 @@ struct mtx *mtx; int srqflag; int cpuid; + int do_switch; THREAD_LOCK_ASSERT(td, MA_OWNED); @@ -1888,10 +1888,21 @@ srqflag =3D (flags & SW_PREEMPT) ? SRQ_OURSELF|SRQ_YIELDING|SRQ_PREEMPTED : SRQ_OURSELF|SRQ_YIELDING; - if (ts->ts_cpu =3D=3D cpuid) + /* + * Allow the switch to another processor as requested + * only if the thread can migrate or we are in the + * middle of binding for sched_bind(9). This keeps + * sched_pin() quick, since other threads can + * manipulate ts_cpu. + */ + do_switch =3D (ts->ts_cpu !=3D cpuid); + if (!THREAD_CAN_MIGRATE(td) && + (ts->ts_flags & TSF_BINDING) =3D=3D 0) + do_switch =3D 0; + if (do_switch) + mtx =3D sched_switch_migrate(tdq, td, srqflag); + else tdq_add(tdq, td, srqflag); - else - mtx =3D sched_switch_migrate(tdq, td, srqflag); } else { /* This thread must be going to sleep. */ TDQ_LOCK(tdq); @@ -1938,12 +1949,25 @@ */ cpuid =3D PCPU_GET(cpuid); tdq =3D TDQ_CPU(cpuid); + KASSERT(THREAD_CAN_MIGRATE(td) || + (ts->ts_flags & TSF_BINDING) !=3D 0 || + cpuid =3D=3D td->td_lastcpu, + ("Non-migratable thread %p migrated from %d to %d", + td, td->td_lastcpu, cpuid)); #ifdef HWPMC_HOOKS if (PMC_PROC_IS_USING_PMCS(td->td_proc)) PMC_SWITCH_CONTEXT(td, PMC_FN_CSW_IN); #endif } else thread_unblock_switch(td, mtx); + + if ((ts->ts_flags & TSF_BINDING) !=3D 0) { + ts->ts_flags &=3D ~TSF_BINDING; + ts->ts_flags |=3D TSF_BOUND; + } + KASSERT((ts->ts_flags & TSF_BOUND) =3D=3D 0 || ts->ts_cpu =3D=3D cpuid, + ("Bound thread %p on %d not %d", td, cpuid, ts->ts_cpu)); + /* * Assert that all went well and return. */ @@ -2607,15 +2631,21 @@ ts =3D td->td_sched; if (ts->ts_flags & TSF_BOUND) sched_unbind(td); - ts->ts_flags |=3D TSF_BOUND; + KASSERT(THREAD_CAN_MIGRATE(td), + ("%s called on non-migratable thread %p", __func__, td)); #ifdef SMP + ts->ts_flags |=3D TSF_BINDING; sched_pin(); if (PCPU_GET(cpuid) =3D=3D cpu) return; ts->ts_cpu =3D cpu; /* When we return from mi_switch we'll be on the correct cpu. */ mi_switch(SW_VOL, NULL); +#else + ts->ts_flags |=3D TSF_BOUND; #endif + KASSERT((ts->ts_flags & TSF_BOUND) !=3D 0 && cpu =3D=3D PCPU_GET(cpuid), + ("Should be bound to %d", cpu)); } /* @@ -2640,7 +2670,7 @@ sched_is_bound(struct thread *td) { THREAD_LOCK_ASSERT(td, MA_OWNED); - return (td->td_sched->ts_flags & TSF_BOUND); + return ((td->td_sched->ts_flags & (TSF_BOUND | TSF_BINDING)) !=3D 0); } /*