Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 9 Aug 2001 11:57:42 +0930
From:      Greg Lehey <grog@FreeBSD.org>
To:        Terry Lambert <tlambert2@mindspring.com>
Cc:        void <float@firedrake.org>, freebsd-hackers@freebsd.org
Subject:   Re: Allocate a page at interrupt time
Message-ID:  <20010809115742.E73579@wantadilla.lemis.com>
In-Reply-To: <3B70E9DB.B16F409C@mindspring.com>; from tlambert2@mindspring.com on Wed, Aug 08, 2001 at 12:27:23AM -0700
References:  <200108070739.f777dmi08218@mass.dis.org> <3B6FB0AE.8D40EF5D@mindspring.com> <20010807221509.A24999@firedrake.org> <3B70E9DB.B16F409C@mindspring.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Wednesday,  8 August 2001 at  0:27:23 -0700, Terry Lambert wrote:
> void wrote:
>>> Can you name one SMP OS implementation that uses an
>>> "interrupt threads" approach that doesn't hit a scaling
>>> wall at 4 (or fewer) CPUs, due to heavier weight thread
>>> context switch overhead?
>>
>> Solaris, if I remember my Vahalia book correctly (isn't that a favorite
>> of yours?).
>
> As usual, IMO...
>
> Yes, I like the Vahalia book; I did technical review of
> it for Prentice Hall before its publication.
>
> Solaris hits the wall a little later, but it still hits the
> wall.

Every SMP system experiences performance degradation at some point.
The question is a matter of the extent.

> On Intel hardware, it has historically hit it at the same 4 CPUs
> where everyone else tends to hit it, for the same reasons; 

This is a very broad statement.  You contradict it further down.

> as of Solaris 2.6, they have adopted the hybrid per CPU pool model
> recommended in Vahalia (Chapter 12).
>
> While I'm at it, I suppose I should recommend reading the
> definitive Solaris internals book, to date:
>
> 	Solaris Internals, Core Kernel Architecture
> 	Jim Mauro, Richard McDougall
> 	Prentice Hall
> 	ISBN: 0-13-022496-0

Yes, I have this book.  It looks very good, but I haven't found time
to read it.

> Solaris claims to scale to 64 processors while maintaining SMP,
> rather than real or virtual NUMA.  It's been my own experience that
> this scaling claim is not entirely accurate, if what you are doing
> is a lot of kernel processing.

I think that depends on how you interpret the claim.  It can only mean
that adding a 64th processor can still be of benefit.

> On the other hand, if you are running a lot of non-intersecting user
> space code (e.g. JVM's or CGI's), it's not as bad (and realized that
> FreeBSD is not that bad in the same situation, either: it's just not
> as common in practice as it is in theory).

You're just describing a fact of life about UNIX SMP support.

> It should be noted that Solaris Interrupt threads are only
> used for interrupts of priority 10 and below: higher priority
> interrupts are _NOT_ handled by threads (interrupts at a
> priority level from 11 to 15).  10 is the clock interrupt.

FreeBSD also has provision for not using interrupt threads for
everything.  It's clearly too early to decide which interrupts should
be left as traditional interrupts, and we've done some shifting back
and forth to get things to work.  Note that the priority numbers are
noise.  In this statement, they're just a convenient way to
distinguish between threaded and non-threaded interrupts.

> It should also be noted that Solaris maintains a per processor pool
> of interrupt threads for each of the lower priority interrupts, with
> a global thread that is used for handling of the clock interrupt.
> This is _very_ different than taking an interrupt thread, and
> rescheduling it on an arbitrary CPU, and as others have pointed out,
> the hardware used to do the scheduling is very different.

I think somebody else has pointed out that we're very conscious of CPU
affinity.

> In the 32 processor Sequent boxes, the actual system bus was
> different, and directly supported message passing.

Was this better or worse?

> There is also specific hardware support for handling interrupts
> via threads, which is really not applicable to x86 or even the
> Alpha architectures on which FreeBSD currently runs, nor to the
> IA64 architecture (port in progress).  In particular, there is
> a single system wide table, introduced with the UltraSPARC, that
> doesn't need to be locked to support interrupt handling.
>
> Also, the Sun system is still an IPL system, using level based
> blocking, rather than masking, and these threads can find
> themselves blocks on a mutex or condition variable for a
> relatively long time; if this happens, it resumes the previous
> thread _but does not drop its IPL below that of the suspended
> thread_, which is basically the Djikstra Banker's Algorithm
> method of avoiding priority inversion on interrupts (i.e. ugly).

So you're saying we're doing it better?

> Finally, the Sun system "borrows" the context of the interrupted
> process (thread) for interrupt handling (the LWP).  This is very
> similar to the technique employed with kernel vs. user space thread
> associations within the Windows kernels (this was one of the steps I
> was referring to when I said that NT had dealt with a number of
> scaling issues before it needed to, so that they would not turn into
> problems on 8-way and higher systems).

This is also the method we're planning to use, as I'm sure you're
aware from previous messages on the -smp list.

> Personally, I think that the Sun system is extremely succeptible to
> receiver livelock (Network interrupts are at 7, and disk interrupts
> are at 5, which means that so long as you are getting pounded with
> network interrupts for e.g. NFS read or write requests, you're not
> going to service the disk interrupts that will let you dispose of
> the traffic, nor will you run the user space code for things like
> CGI's or Apache servers trying to service a heavy load of requests
> for content).

Can you point to a system which does better, and explain why?

> I'm also not terrifically impressed with their callout mechanism,
> when applied to networking, which has a preponderance of fixed,
> known interval timers, but FreeBSD's isn't really any better, which
> it comes to huge numbers of network connections, since it will end
> up hashing 2/4/6/8/... into the same bucket, unordered, which means
> traversing a large list of timers which are not going to end up
> expiring (callout wheels are not a good thing to mix with fixed
> interval timers of relatively long durations, like the 2MSL timers
> that live in the networking code, or most especially the TIME_WAIT
> timers).

I haven't looked at this issue.  How does it differ from the original
System V implementation.  Are you implying that we should use a
different hash algorithm?

Greg
--
See complete headers for address and phone numbers

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20010809115742.E73579>