Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 24 Oct 2000 14:31:54 +0000 (GMT)
From:      Terry Lambert <tlambert@primenet.com>
To:        seth@pengar.com (Seth Leigh)
Cc:        tlambert@primenet.com (Terry Lambert), jasone@canonware.com (Jason Evans), smp@FreeBSD.ORG
Subject:   Re: SA project (was Re: SMP project status)
Message-ID:  <200010241431.HAA28242@usr05.primenet.com>
In-Reply-To: <3.0.6.32.20001024094101.00c4d798@hobbiton.shire.net> from "Seth Leigh" at Oct 24, 2000 09:41:01 AM

next in thread | previous in thread | raw e-mail | index | archive | help
> >A scheduler activation, on the other hand, can reactivate in the
> >specific process (picking another thread from the thread group),
> >so long as there is quantum remaining, and thus actually deliver
> >on the promise of reduced context switch overhead.  A side benefit
> >is that cache coherency is also maintained, and there is a reduced
> >interprocessor arbitration overhead.
> 
> This may seem petty, but if we always use the whole quantum, won't this
> have the effect of driving down the priority of any multi-threaded
> application with respect to single-threaded apps?

No.  If you want to build an application that competes unfairly
with other processes for system resources, the correct approach
is to use multiple processes in order to get multiple quanta, OR
you can define a new scheduler class that implements fair share
scheduling or some other scheduling algorithm that gives your
program an unfair advantage in being selected for quanta.

Realize that the benefit of not paying the context switch overhead
will reduce overall system utilization.

Realize also that you have a hidden assumption here, which is not
necessarily true: that you will always have threads that are ready
to run, and are not all blocked pending I/O or other kernel
operations.  Unless you run one thread in a spin loop, this will
most likely never really be the case.

Consider how you would fix the context and cache thrashing problem
on a Linux or an SVR4 derived system: you could preferrentially
choose a thread in your thread group, when making your scheduling
decision.  But this leads to starvation of other processes, should
you make a coding error and go into a spin loop (or simply have a
lot of work to do in user space which is CPU bound, such as rendering
images).

Alternately, you might implement round-robin scheduling or some other
scheduling policy, and group threads in a single process next to each
other in the runnable queue.  But if you have even a moderate number
of threads, then you will damage interactive response, perhaps to a
considerable degree.

Effectively, you are left with a very hard problem.

The size of the quantum was chosen such that interactive response
would not be damaged, even if a process used the entirety of its
allotted CPU doing something compute intensive.

Even if you still balked at the "unfairness" of being unable to
have your one program compete with sendmail or inted as if it were
16,000 processes (for some definition of "unfairness" 8-)), you
could choose to weight it based on system calls currently blocked,
so it becomes the amount of time on average that your threads
remain blocked.

I personally wouldn't do this.  If I were worried about my threaded
application, I would either use rtprio to force the issue, or I
would "manufacture" my server load: for example, it's very rare
to see an Oracle server doing anything other than simply running
Oracle.


> You will pardon me if I ask dumb questions.  After dabbling and reading
> about it for a long time, I have finally started working on my first major
> multi-threaded application, and so I am thinking a lot about them but I am
> not necessarily a guru.  Additionally, I aspire someday to being a kernal
> guy, so I want to learn how these things work.

You'd do well to study schedulers and context switch overhead,
and cache-busting.  The scheduling algorithms are actually much
more complicated than they first appear, and it's not obvious on
first glance that kernel threads, as an implementation, interact
badly with schedulers, or where the system overhead really lives.
Sun has a number of good papers on threading that I would recommend
looking up on their web site.

Really, threading tends to make some types of programming easier,
but isn't terribly useful, unless you are trying to achieve SMP
scaling.  Even then, many OSs do it wrong.  There's a long
standing fiction that SMP systems start failing to scale at 4
processors, that they reach a point of diminishing returns at a
relatively small scale.  This isn't really true: mostly it's an
implementation failure when you see a limit that small.

John Sokol actually gave a nice presentation on using finite
state automatons in place of threading in the "AfterBurner"
web server product, and backed it up with some nice numbers; but
his solution, while incredibly capable on low end hardware, would
not scale to better ability on SMP.  I don't think he ever hit a
CPU binding limitation, but if he were to do so, the only thing
he'd be able to do would be to throw bigger iron at it.


					Terry Lambert
					terry@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-smp" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200010241431.HAA28242>