Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 22 Sep 1997 12:12:59 -0600 (MDT)
From:      Nate Williams <nate@mt.sri.com>
To:        "Justin T. Gibbs" <gibbs@plutotech.com>
Cc:        Nate Williams <nate@mt.sri.com>, Bruce Evans <bde@zeta.org.au>, current@freebsd.org
Subject:   Re: cvs commit: src/sys/conf files src/sys/dev/vx if_vx.c if_vxreg.h src/sys/i386/apm apm.c src/sys/i386/conf GENERIC files.i386 src/sys/i386/eisa 3c5x9.c aha1742.c aic7770.c bt74x.c eisaconf.c eisaconf.h if_fea.c if_vx_eisa.c src/sys/i386/i386 autoconf.c ... 
Message-ID:  <199709221812.MAA01622@rocky.mt.sri.com>
In-Reply-To: <199709221755.LAA25129@pluto.plutotech.com>
References:  <199709221630.KAA01072@rocky.mt.sri.com> <199709221755.LAA25129@pluto.plutotech.com>

next in thread | previous in thread | raw e-mail | index | archive | help
> >Except that processing a tick now runs O(n), where n is the # of
> >elements in the callout wheel (which gets pretty big when you have 630
> >callouts in your example above, since the initial size of the callout
> >wheel is set), so I still don't see a big win.  I'm not saying that it's
> >any worse/better than the existing scheme, in fact I don't see a win.
> 
> Let's be more precise.
> 	n = number of active callouts
> 	h = length of a hash chain in a call wheel bucket
> 	x = number of callout expired on a given tick.
> 
> Old implementation:
> 	timeout = O(n)
> 	untimeout = O(n)
> 	softclock = O(x) ~= O(1)
> 
> New implemenation:
> 	timeout = O(1)
> 	untimeout = O(1)
> 	softclock = O(h)
> 
> Now depending on how you size your callwheel,  h << n.  With a MAXUSERS
> of 32, the callwheel is 256 entries big if I recall correctly.

What is it for a 'typical' workstation, which is the one shipped in
GENERIC using 'maxusers 10'?

> But running time isn't the only thing to consider.  As I mentioned
> before, untimeout/timeout are often called from an interrupt context
> and the old algorithm caused an indeterminate delay in this scenario,
> potentially causing problems for that device and any that share the
> same interrupt.

'softclock()' is also called with some interrupt() masked as well, isn't
it?

> You also have to consider that timeout/untimeout calls occur at 
> indeterminate rates, but softclock runs at a fixed rate meaning that
> the amount of work it performs scales better than if that work was
> pushed off into either of timeout or untimeout.

True, but if it's 'worst-case' time happens often enough, we're
penalizing the system *alot* more than during timeout/untimeout, which
happens much less rarely.

The 'gotcha' is that I don't know if this is a 'normal' case, since the
paper didn't test normal cases, but instead did lots of
timeout/untimeouts in a user-land process, which stacks the test data in
favor of the new implementation.

> >Finally, you'll notice that I'm not proposing the we rip out the new
> >solution (since I have nothing better to propose, although the original
> >solution could be sped up by adding the setcallout()/unsetcallout()
> >additions w/out the call wheel).
> 
> That only fixes untimeout.

True, but w/out any real numbers to show which one causes the biggest
latency problems, it's hard to know *where* the bottleneck is.  (Or, if
there is a bottleneck at all...)

> >I just want to understand *why* it's
> >making things better for the average user, since the average user
> >doesn't have > 10 drives running tags.  Penalizing the 'extreme' user
> >doesn't seem to be a big win in my book.
> 
> The typical user will have perhaps 30 callouts outstanding.  In this
> scenario, both algorithms work fairly well.  Although PHK's BB profiling
> did show that the new implementation was faster, he didn't perceive
> a difference in performance.  Since he was using IDE devices, he had
> even fewer timeouts active than if he were using SCSI disks.
> 
> The main difference between the two approaches is that one algorithm 
> scales better than the other.

Does it?  If softclock() is now a bottleneck more often now than
timeout/untimeout previously, it scales worse.  It would be nice to see
some real data that tells us how often each operation is called, and how
long it takes to do each one.

> >In summary, all of the operations are called 'often'.
> >
> >1) clock ticks
> >2) registering timeouts
> >3) de-registering timeouts
> >
> >I would like to see some #'s that show us the # & amount of time it
> >takes in section on a 'normal' system, using both approaches.  If
> >someone has some good ideas how to do that, I'm willing to learn
> >something and go do it.
> 
> Allocate an array of ints of size ncallout.  In softclock, increment the
> array entry corresponding to the number of entries traversed in a softclock
> run.  By periodically scanning this array, you'll get a good idea of the
> value of 'h' for you system. Add three more ints that count the number of
> invocations of softclock, untimeout, and timeout and you should be able to
> draw conclusions from that.

But, that doesn't give me any latency #'s for any of the operations.
Knowing how often they are called is one thing, and knowing how many
entries are traversed is good, but *times* are the bigger issue.

> You may also want to look at the other paper I mention in the timeout.9
> man page.  It may have some of the statistics you want, but I don't know
> for sure that they ever implemented their schemes in a real system.

I'll do that, but the reason I'm a bit skeptical is because these
schemes were never 'tested' in a real system, only in test systems
stacked to show that their new scheme does do what it intended to do.
(How well this affects real systems is still to be seen.)



Nate



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199709221812.MAA01622>