Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 14 Dec 2011 02:36:29 +0200
From:      Ivan Klymenko <fidaj@ukr.net>
To:        mdf@FreeBSD.org
Cc:        Doug Barton <dougb@freebsd.org>, freebsd-stable@freebsd.org, Tjoelker <jilles@stack.nl>, "O. Hartmann" <ohartman@mail.zedat.fu-berlin.de>, Current FreeBSD <freebsd-current@freebsd.org>, Jilles, freebsd-performance@freebsd.org
Subject:   Re: SCHED_ULE should not be the default
Message-ID:  <20111214023629.3ae8c928@nonamehost.>
In-Reply-To: <CAMBSHm89SkzGVgk9kNwBQoR62pXKjhJ%2BqXJK0qwC20r9p%2Bu-bw@mail.gmail.com>
References:  <4EE1EAFE.3070408@m5p.com> <4EE22421.9060707@gmail.com> <4EE6060D.5060201@mail.zedat.fu-berlin.de> <4EE69C5A.3090005@FreeBSD.org> <20111213104048.40f3e3de@nonamehost.> <20111213230441.GB42285@stack.nl> <4ee7e2d3.0a3c640a.4617.4a33SMTPIN_ADDED@mx.google.com> <CAMBSHm89SkzGVgk9kNwBQoR62pXKjhJ%2BqXJK0qwC20r9p%2Bu-bw@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
=D0=92 Tue, 13 Dec 2011 16:01:56 -0800
mdf@FreeBSD.org =D0=BF=D0=B8=D1=88=D0=B5=D1=82:

> On Tue, Dec 13, 2011 at 3:39 PM, Ivan Klymenko <fidaj@ukr.net> wrote:
> > =D0=92 Wed, 14 Dec 2011 00:04:42 +0100
> > Jilles Tjoelker <jilles@stack.nl> =D0=BF=D0=B8=D1=88=D0=B5=D1=82:
> >
> >> On Tue, Dec 13, 2011 at 10:40:48AM +0200, Ivan Klymenko wrote:
> >> > If the algorithm ULE does not contain problems - it means the
> >> > problem has Core2Duo, or in a piece of code that uses the ULE
> >> > scheduler. I already wrote in a mailing list that specifically in
> >> > my case (Core2Duo) partially helps the following patch:
> >> > --- sched_ule.c.orig =C2=A0 =C2=A0 =C2=A0 =C2=A02011-11-24 18:11:48.=
000000000 +0200
> >> > +++ sched_ule.c =C2=A0 =C2=A0 2011-12-10 22:47:08.000000000 +0200
> >> > @@ -794,7 +794,8 @@
> >> > =C2=A0 =C2=A0 =C2=A0* 1.5 * balance_interval.
> >> > =C2=A0 =C2=A0 =C2=A0*/
> >> > =C2=A0 =C2=A0 balance_ticks =3D max(balance_interval / 2, 1);
> >> > - =C2=A0 balance_ticks +=3D random() % balance_interval;
> >> > +// balance_ticks +=3D random() % balance_interval;
> >> > + =C2=A0 balance_ticks +=3D ((int)random()) % balance_interval;
> >> > =C2=A0 =C2=A0 if (smp_started =3D=3D 0 || rebalance =3D=3D 0)
> >> > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 return;
> >> > =C2=A0 =C2=A0 tdq =3D TDQ_SELF();
> >>
> >> This avoids a 64-bit division on 64-bit platforms but seems to
> >> have no effect otherwise. Because this function is not called very
> >> often, the change seems unlikely to help.
> >
> > Yes, this section does not apply to this problem :)
> > Just I posted the latest patch which i using now...
> >
> >>
> >> > @@ -2118,13 +2119,21 @@
> >> > =C2=A0 =C2=A0 struct td_sched *ts;
> >> >
> >> > =C2=A0 =C2=A0 THREAD_LOCK_ASSERT(td, MA_OWNED);
> >> > + =C2=A0 if (td->td_pri_class & PRI_FIFO_BIT)
> >> > + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 return;
> >> > + =C2=A0 ts =3D td->td_sched;
> >> > + =C2=A0 /*
> >> > + =C2=A0 =C2=A0* We used up one time slice.
> >> > + =C2=A0 =C2=A0*/
> >> > + =C2=A0 if (--ts->ts_slice > 0)
> >> > + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 return;
> >>
> >> This skips most of the periodic functionality (long term load
> >> balancer, saving switch count (?), insert index (?), interactivity
> >> score update for long running thread) if the thread is not going to
> >> be rescheduled right now.
> >>
> >> It looks wrong but it is a data point if it helps your workload.
> >
> > Yes, I did it for as long as possible to delay the execution of the
> > code in section: ...
> > #ifdef SMP
> > =C2=A0 =C2=A0 =C2=A0 =C2=A0/*
> > =C2=A0 =C2=A0 =C2=A0 =C2=A0 * We run the long term load balancer infreq=
uently on the
> > first cpu. */
> > =C2=A0 =C2=A0 =C2=A0 =C2=A0if (balance_tdq =3D=3D tdq) {
> > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0if (balance_tick=
s && --balance_ticks =3D=3D 0)
> > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0sched_balance();
> > =C2=A0 =C2=A0 =C2=A0 =C2=A0}
> > #endif
> > ...
> >
> >>
> >> > =C2=A0 =C2=A0 tdq =3D TDQ_SELF();
> >> > =C2=A0#ifdef SMP
> >> > =C2=A0 =C2=A0 /*
> >> > =C2=A0 =C2=A0 =C2=A0* We run the long term load balancer infrequentl=
y on the
> >> > first cpu. */
> >> > - =C2=A0 if (balance_tdq =3D=3D tdq) {
> >> > - =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 if (balance_ticks && --balance_=
ticks =3D=3D 0)
> >> > + =C2=A0 if (balance_ticks && --balance_ticks =3D=3D 0) {
> >> > + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 if (balance_tdq =3D=3D tdq)
> >> > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 sched_balance();
> >> > =C2=A0 =C2=A0 }
> >> > =C2=A0#endif
> >>
> >> The main effect of this appears to be to disable the long term load
> >> balancer completely after some time. At some point, a CPU other
> >> than the first CPU (which uses balance_tdq) will set balance_ticks
> >> =3D 0, and sched_balance() will never be called again.
> >>
> >
> > That is, for the same reason as above in the text...
> >
> >> It also introduces a hypothetical race condition because the
> >> access to balance_ticks is no longer restricted to one CPU under a
> >> spinlock.
> >>
> >> If the long term load balancer may be causing trouble, try setting
> >> kern.sched.balance_interval to a higher value with unpatched code.
> >
> > I checked it in the first place - but it did not help fix the
> > situation...
> >
> > The impression of malfunction rebalancing...
> > It seems that the thread is passed on to the same core that is
> > loaded and so... Perhaps this is a consequence of an incorrect
> > definition of the topology CPU?
> >
> >>
> >> > @@ -2144,9 +2153,6 @@
> >> > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 if
> >> > (TAILQ_EMPTY(&tdq->tdq_timeshare.rq_queues[tdq->tdq_ridx]))
> >> > tdq->tdq_ridx =3D tdq->tdq_idx; }
> >> > - =C2=A0 ts =3D td->td_sched;
> >> > - =C2=A0 if (td->td_pri_class & PRI_FIFO_BIT)
> >> > - =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 return;
> >> > =C2=A0 =C2=A0 if (PRI_BASE(td->td_pri_class) =3D=3D PRI_TIMESHARE) {
> >> > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 /*
> >> > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0* We used a tick; ch=
arge it to the thread so
> >> > @@ -2157,11 +2163,6 @@
> >> > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 sched_priority(td);
> >> > =C2=A0 =C2=A0 }
> >> > =C2=A0 =C2=A0 /*
> >> > - =C2=A0 =C2=A0* We used up one time slice.
> >> > - =C2=A0 =C2=A0*/
> >> > - =C2=A0 if (--ts->ts_slice > 0)
> >> > - =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 return;
> >> > - =C2=A0 /*
> >> > =C2=A0 =C2=A0 =C2=A0* We're out of time, force a requeue at userret(=
).
> >> > =C2=A0 =C2=A0 =C2=A0*/
> >> > =C2=A0 =C2=A0 ts->ts_slice =3D sched_slice;
> >>
> >> > and refusal to use options FULL_PREEMPTION
> >> > But no one has unsubscribed to my letter, my patch helps or not
> >> > in the case of Core2Duo...
> >> > There is a suspicion that the problems stem from the sections of
> >> > code associated with the SMP...
> >> > Maybe I'm in something wrong, but I want to help in solving this
> >> > problem ...
>=20
>=20
> Has anyone experiencing problems tried to set sysctl
> kern.sched.steal_thresh=3D1 ?
>=20

In my case, the variable kern.sched.steal_thresh and so has the value 1.

> I don't remember what our specific problem at $WORK was, perhaps it
> was just interrupt threads not getting serviced fast enough, but we've
> hard-coded this to 1 and removed the code that sets it in
> sched_initticks().  The same effect should be had by setting the
> sysctl after a box is up.
>=20
> Thanks,
> matthew



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20111214023629.3ae8c928>