Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 21 Aug 2017 20:25:07 +0000
From:      bugzilla-noreply@freebsd.org
To:        freebsd-bugs@FreeBSD.org
Subject:   [Bug 221029] AMD Ryzen: strange compilation failures using poudriere or plain buildkernel/buildworld
Message-ID:  <bug-221029-8-facnJTVfYe@https.bugs.freebsd.org/bugzilla/>
In-Reply-To: <bug-221029-8@https.bugs.freebsd.org/bugzilla/>
References:  <bug-221029-8@https.bugs.freebsd.org/bugzilla/>

next in thread | previous in thread | raw e-mail | index | archive | help
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D221029

--- Comment #87 from Don Lewis <truckman@FreeBSD.org> ---
I set affinity back to its default value of 1 and got another clean 1700 po=
rt
poudriere run.  It's curious that the only issues I've had when steal_idle=
=3D0
and balance=3D0 happened when I set affinity=3D1000.  This is the opposite =
of what
I would expect.

I would expect that migrations controlled by the steal_idle and balance kno=
bs
to have similar issues.  In either case, the thread that is getting migrate=
d is
one that was preempted by an interrupt, and before being resumed, the sched=
uler
noticed that the thread had exhausted its run time quantum and moved the th=
read
to the back of the run queue for that cpu before resuming the thread that i=
s at
the front of the run queue.  The only difference between steal_idle and bal=
ance
is the event that actually causes the thread to migrate.  When they restart,
they basically just execute the kernel code to restore their state before
dropping back into user mode where they were preempted from. For some reaso=
n,
threads that have exhausted their time quantum seem to resume properly on t=
he
same CPU that they were previously running, but sometimes go wonky if they
resume on some other CPU.

The migrations controlled by the affinity knob are different.  In those cas=
es,
the thread has voluntarily put itself to sleep, either because it blocked i=
n a
syscall, or perhaps trap on a page fault and then go to sleep in the kernel
while the missing page is brought in.  When these threads get a wakeup even=
t,
they then execute the remaining part of the syscall or the page fault handl=
er
before returning to user mode.  It doesn't seem to matter what CPU these
threads restart on.

As a test, I set balance=3D1 and reduced balance_interval from its default =
127 to
10 so that balance events would happen a lot more frequently to try to make=
 up
for the steal_idle being disabled.  I had three port build failures.  The f=
irst
was a guile segfault when building finance/gnucash.  The second was a unit =
test
failure in editors/openoffice-devel.  The third was build runaway in
devel/doxygen.

The steal_idle code in sched_ule is topology-aware, so it looks like it sho=
uld
be easy to hack the code to only allow migrations between SMT threads shari=
ng
the same core, or cores in the same CCX.

--=20
You are receiving this mail because:
You are the assignee for the bug.=



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-221029-8-facnJTVfYe>