Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 26 Aug 2017 10:50:16 -0700 (PDT)
From:      Don Lewis <truckman@FreeBSD.org>
To:        avg@FreeBSD.org
Cc:        freebsd-arch@FreeBSD.org
Subject:   Re: ULE steal_idle questions
Message-ID:  <201708261750.v7QHoG2c053745@gw.catspoiler.org>
In-Reply-To: <201708251824.v7PIOA6q048321@gw.catspoiler.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On 25 Aug, To: avg@FreeBSD.org wrote:
> On 24 Aug, To: avg@FreeBSD.org wrote:
>> Aside from the Ryzen problem, I think the steal_idle code should be
>> re-written so that it doesn't block interrupts for so long.  In its
>> current state, interrupt latence increases with the number of cores and
>> the complexity of the topology.
>> 
>> What I'm thinking is that we should set a flag at the start of the
>> search for a thread to steal.  If we are preempted by another, higher
>> priority thread, that thread will clear the flag.  Next we start the
>> loop to search up the hierarchy.  Once we find a candidate CPU:
>> 
>>                 steal = TDQ_CPU(cpu);
>>                 CPU_CLR(cpu, &mask);
>>                 tdq_lock_pair(tdq, steal);
>> 		if (tdq->tdq_load != 0) {
>> 			goto out; /* to exit loop and switch to the new thread */
>> 		}
>> 		if (flag was cleared) {
>> 			tdq_unlock_pair(tdq, steal);
>> 			goto restart; /* restart the search */
>> 		}
>> 		if (steal->tdq_load < thresh || steal->tdq_transferable == 0 ||
>> 		    tdq_move(steal, tdq) == 0) {
>> 			tdq_unlock_pair(tdq, steal);
>> 			continue;
>> 		}
>> 	    out:
>> 	    	TDQ_UNLOCK(steal);
>> 	    	clear flag;
>> 	    	mi_switch(SW_VOL | SWT_IDLE, NULL);
>> 	    	thread_unlock(curthread);
>> 	    	return (0);
>> 
>> And we also have to clear the flag if we did not find a thread to steal.
> 
> I've implemented something like this and added a bunch of counters to it
> to get a better understanding of its behavior.  Instead of adding a flag
> to detect preemption, I used the same switchcnt test as is used by
> sched_idletd().  These are the results of a ~9 hour poudriere run:
> 
> kern.sched.steal.none: 9971668   # no threads were stolen
> kern.sched.steal.fail: 23709     # unable to steal from cpu=sched_highest()
> kern.sched.steal.level2: 191839  # somewhere on this chip
> kern.sched.steal.level1: 557659  # a core on this CCX
> kern.sched.steal.level0: 4555426 # the other SMT thread on this core
> kern.sched.steal.restart: 404    # preemption detected so restart the search
> kern.sched.steal.call: 15276638  # of times tdq_idled() called
> 
> There are a few surprises here.
> 
> One is the number of failed moves.  I don't know if the load on the
> source CPU fell below thresh, tdq_transferable went to zero, or if
> tdq_move() failed.  I also wonder if the failures are evenly distributed
> across CPUs.  It is possible that these failures are concentrated on CPU
> 0, which handles most interrupts.  If interrupts don't affect switchcnt,
> then the data collected by sched_highest() could be a bit stale and we
> would not know it.

Most of the above failed moves were do to the either tdq_load dropping
below the threshold or tdq_transferable going to zero.  These are evenly
distributed across CPUs that we want to steal from.  I didn't not bin
the results by which CPU this code was running on.  Actual failures of
tdq_move() are bursty and not evenly distributed across CPUs.

I've created this review for my changes:
https://reviews.freebsd.org/D12130



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201708261750.v7QHoG2c053745>