Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 21 Apr 2018 23:30:55 +0000
From:      Rick Macklem <rmacklem@uoguelph.ca>
To:        Konstantin Belousov <kostikbel@gmail.com>
Cc:        "freebsd-current@freebsd.org" <freebsd-current@freebsd.org>, "George Mitchell" <george+freebsd@m5p.com>, Peter <pmc@citylink.dinoex.sub.org>
Subject:   Re: SCHED_ULE makes 256Mbyte i386 unusable
Message-ID:  <YQBPR0101MB10421529BB346952BCE7F20EDD8B0@YQBPR0101MB1042.CANPRD01.PROD.OUTLOOK.COM>
In-Reply-To: <20180421201128.GO6887@kib.kiev.ua>
References:  <YQBPR0101MB1042F252A539E8D55EB44585DD8B0@YQBPR0101MB1042.CANPRD01.PROD.OUTLOOK.COM>, <20180421201128.GO6887@kib.kiev.ua>

next in thread | previous in thread | raw e-mail | index | archive | help
Konstantin Belousov wrote:
>On Sat, Apr 21, 2018 at 07:21:58PM +0000, Rick Macklem wrote:
>> I decided to start a new thread on current related to SCHED_ULE, since I=
 see
>> more than just performance degradation and on a recent current kernel.
>> (I cc'd a couple of the people discussing performance problems in freebs=
d-stable
>>  recently under a subject line of "Re: kern.sched.quantum: Creepy, sadis=
tic scheduler".
>>
>> When testing a pNFS server on a single core i386 with 256Mbytes using a =
Dec. 2017
>> current/head kernel, I would see about a 30% performance degradation (el=
apsed
>> run time for a kernel build over NFSv4.1) when the server kernel was bui=
lt with
>> options SCHED_ULE
>> instead of
>> options SCHED_4BSD
>>
>> Now, with a kernel from a couple of days ago, the
>> options SCHED_ULE
>> kernel becomes unusable shortly after starting testing.
>> I have seen two variants of this:
>> - Became essentially hung. All I could do was ping the machine from the =
network.
>> - Reported "vm_thread_new: kstack allocation failed
>>   and then any attempt to do anything gets "No more processes".
>This is strange.  It usually means that you get KVA either exhausted or
>severly fragmented.
Yes. I reduced the number of nfsd threads from 256->32 and the SCHED_ULE
kernel is working ok now. I haven't done enough to compare performance yet.
Maybe I'll post again when I have some numbers.

>Enter ddb, it should be operational since pings are replied.  Try to see
>where the threads are stuck.
I didn't do this, since reducing the number of kernel threads seems to have=
 fixed
the problem. For the pNFS server, the nfsd threads will spawn additional ke=
rnel
threads to do proxies to the mirrored DS servers.

>> with the only difference being a kernel built with
>> options SCHED_4BSD
>> everything works and performs the same as the Dec 2017 kernel.
>>
>> I can try rolling back through the revisions, but it would be nice if so=
meone
>> could suggest where to start, because it takes a couple of hours to buil=
d a
>> kernel on this system.
>>
>> So, something has made things worse for a head/current kernel this winte=
r, rick
>
>There are at least two potentially relevant changes.
>
>First is r326758 Dec 11 which bumped KSTACK_PAGES on i386 to 4.
I've been running this machine with KSTACK_PAGES=3D4 for some time, so no c=
hange.

>Second is r332489 Apr 13, which introduced 4/4G KVA/UVA split.
Could this change have resulted in the system being able to allocate fewer
kernel threads/stacks for some reason?

>Consequences of the first one are obvious, it is much harder to find
>the place to map the stack.  Second change, on the other hand, provides
>almost full 4G for KVA and should have mostly compensate for the negative
>effects of the first.
>
>And, I cannot see how changing the scheduler would fix or even affect that
>behaviour.
My hunch is that the system was running near its limit for kernel threads/s=
tacks.
Then, somehow, the timing SCHED_ULE caused resulted in the nfsd trying to g=
et
to a higher peak number of threads and hit the limit.
SCHED_4BSD happened to result in timing such that it stayed just below the
limit and worked.
I can think of a couple of things that might affect this:
1 - If SCHED_ULE doesn't do the termination of kernel threads as quickly, t=
hen
      they wouldn't terminate and release their resources before more new o=
nes
      are spawned.
2 - If SCHED_ULE handles the nfsd threads in a more "bursty" way, then the =
burst
      could try and spawn more mirror DS worker threads at about the same t=
ime.

Anyhow, thanks for the help, rick



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?YQBPR0101MB10421529BB346952BCE7F20EDD8B0>