Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 28 Jan 2010 11:16:55 +0100
From:      Attilio Rao <attilio@freebsd.org>
To:        Marius Strobl <marius@alchemy.franken.de>
Cc:        svn-src-head@freebsd.org, svn-src-all@freebsd.org, Rob Farmer <rfarmer@predatorlabs.net>, src-committers@freebsd.org
Subject:   Re: svn commit: r202889 - head/sys/kern
Message-ID:  <3bbf2fe11001280216p705ed94ev61abc4be654f8cc1@mail.gmail.com>
In-Reply-To: <20100127215904.GF40779@alchemy.franken.de>
References:  <201001231554.o0NFsMbx049837@svn.freebsd.org> <b025ceb71001252225r56d4b0c8qe4c6affe338e6f9f@mail.gmail.com> <3bbf2fe11001252310r408a6be4j9bc42618394b3e3d@mail.gmail.com> <20100127215904.GF40779@alchemy.franken.de>

next in thread | previous in thread | raw e-mail | index | archive | help
2010/1/27 Marius Strobl <marius@alchemy.franken.de>:
> On Tue, Jan 26, 2010 at 08:10:25AM +0100, Attilio Rao wrote:
>> 2010/1/26 Rob Farmer <rfarmer@predatorlabs.net>:
>> > On Sat, Jan 23, 2010 at 7:54 AM, Attilio Rao <attilio@freebsd.org> wro=
te:
>> >> Author: attilio
>> >> Date: Sat Jan 23 15:54:21 2010
>> >> New Revision: 202889
>> >> URL: http://svn.freebsd.org/changeset/base/202889
>> >>
>> >> Log:
>> >> =C2=A0- Fix a race in sched_switch() of sched_4bsd.
>> >> =C2=A0 =C2=A0In the case of the thread being on a sleepqueue or a tur=
nstile, the
>> >> =C2=A0 =C2=A0sched_lock was acquired (without the aid of the td_lock =
interface) and
>> >> =C2=A0 =C2=A0the td_lock was dropped. This was going to break locking=
 rules on other
>> >> =C2=A0 =C2=A0threads willing to access to the thread (via the td_lock=
 interface) and
>> >> =C2=A0 =C2=A0modify his flags (allowed as long as the container lock =
was different
>> >> =C2=A0 =C2=A0by the one used in sched_switch).
>> >> =C2=A0 =C2=A0In order to prevent this situation, while sched_lock is =
acquired there
>> >> =C2=A0 =C2=A0the td_lock gets blocked. [0]
>> >> =C2=A0- Merge the ULE's internal function thread_block_switch() into =
the global
>> >> =C2=A0 =C2=A0thread_lock_block() and make the former semantic as the =
default for
>> >> =C2=A0 =C2=A0thread_lock_block(). This means that thread_lock_block()=
 will not
>> >> =C2=A0 =C2=A0disable interrupts when called (and consequently thread_=
unlock_block()
>> >> =C2=A0 =C2=A0will not re-enabled them when called). This should be do=
ne manually
>> >> =C2=A0 =C2=A0when necessary.
>> >> =C2=A0 =C2=A0Note, however, that ULE's thread_unblock_switch() is not=
 reaped
>> >> =C2=A0 =C2=A0because it does reflect a difference in semantic due in =
ULE (the
>> >> =C2=A0 =C2=A0td_lock may not be necessarilly still blocked_lock when =
calling this).
>> >> =C2=A0 =C2=A0While asymmetric, it does describe a remarkable differen=
ce in semantic
>> >> =C2=A0 =C2=A0that is good to keep in mind.
>> >>
>> >> =C2=A0[0] Reported by: =C2=A0 =C2=A0 =C2=A0Kohji Okuno
>> >> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0=
 =C2=A0 =C2=A0<okuno dot kohji at jp dot panasonic dot com>
>> >> =C2=A0Tested by: =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0Giovanni Tr=
ematerra
>> >> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0=
 =C2=A0 =C2=A0<giovanni dot trematerra at gmail dot com>
>> >> =C2=A0MFC: =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A02 weeks
>> >>
>> >> Modified:
>> >> =C2=A0head/sys/kern/kern_mutex.c
>> >> =C2=A0head/sys/kern/sched_4bsd.c
>> >> =C2=A0head/sys/kern/sched_ule.c
>> >
>> > Hi,
>> >
>> > This commit seems to be causing me a kernel panic on sparc64 - details
>> > are in PR 143215. Could you take a look before MFCing this?
>>
>> I think that the bug may be in cpu_switch() where the mutex parameter
>> for sched_4bsd is not handled correctly.
>> Does sparc64 support ULE? I don't think it does and I think that it
>> simply ignores the third argument of cpu_switch() which is vital now
>> for for sched_4bsd too (what needs to happen is to take the passed
>> mutex and to set the TD_LOCK of old thread to be the third argument).
>> Unluckilly, I can't do that in sparc64 asm right now, but it should
>> not be too difficult to cope with it.
>>
>
> The following patch adds handling of the mutex parameter to the
> sparc64 cpu_switch():
> http://people.freebsd.org/~marius/sparc64_cpu_switch_mtx.diff
> This patch works fine with r202888. With r202889 it allows the
> machine to boot again, however putting some load on the machine
> causes it to issue a reset without a chance to debug. I've also
> tried with some variations like duplicating the old cpu_switch()
> for cpu_throw() so the altered cpu_switch() doesn't need to
> distinguish between the to cases and only assigning old->td_lock
> right before return but nothing made a difference. Given that
> this leaves little room for a bug in the cpu_switch() changes I
> suspect r202889 also breaks additional assumptions. For example
> the sparc64 pmap code used sched_lock, does that need to change
> to be td_lock now maybe? Is there anything else that comes to
> your mind in this regard?

Sorry for being lame with sparc64 assembly (so that I can't make much
more productive help here), but the required patch, sched_4bsd only,
should simply save the extra-argument of cpu_switch() (and cpu_throw()
is not involved, so I'm not sure what is changing there) and move in
TD_LOCK(%oldthreadreg) when it is safe to do (just after the oldthread
switched out completely). It doesn't even require a memory barrier.
This patch seems a bit too big and I wonder what else it does (or I'm
entirely wrong and that's just what I asked here), maybe adding the
ULE support as well?

Said that, all the code, including MD parts should always use
td_lock() and not doing explicit acquisitions/drops of sched_lock, if
they want to support ULE (but probabilly even if they do not want),
unless there is a big compelling reason (that I expect to be justified
in comments at least).
I'm not sure how to debug a possible reset or I'm not aware of further
broken asserts, at least for tier-1 architectures.

Thanks,
Attilio


--=20
Peace can only be achieved by understanding - A. Einstein



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3bbf2fe11001280216p705ed94ev61abc4be654f8cc1>