Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 23 Sep 1997 16:52:16 +0000 (GMT)
From:      Terry Lambert <tlambert@primenet.com>
To:        julian@whistle.com (Julian Elischer)
Cc:        gibbs@plutotech.com, tlambert@primenet.com, nate@mt.sri.com, bde@zeta.org.au, current@FreeBSD.ORG
Subject:   Re: cvs commit: src/sys/conf files src/sys/dev/vx if_vx.c if_vxreg.h src/sys/i386/apm apm.c src/sys/i386/conf GENERIC files.i386
Message-ID:  <199709231652.JAA12392@usr01.primenet.com>
In-Reply-To: <Pine.BSF.3.95.970923013950.9006B-100000@current1.whistle.com> from "Julian Elischer" at Sep 23, 97 02:33:39 am

next in thread | previous in thread | raw e-mail | index | archive | help
> > >> >Queue the entry for ordered insertion when timeout() is called, but
> > >> >delay the ordered insertion until softclock().
> 
> This was one of the things I have been wondering about.
> if each spoke of the wheel had an orderd and an unordered queue,
> then items would USUALLY be removed before their
> bucket was accessed. Any remaining ones could be 
> moved to the ordered "long term" queue.
> In fact it might be arguable that they could be removed from
> the wheel entirely after surviving a full rotation of the softclock.
> They could be inserted into a timeout queue not disimilar to that we
> already run. The point is that there are several distinct usage models for
> the timeout facility. They have differnt characteristics..
> HARDWARE FAILURE TIMEOUTS: nearly ALWAYS expected to be removed very
> quickly. Ordering these is a complete waste of time.
> PERIODIC EVENTS: nearly always time out. These tend to be of duration of
> a large fraction of a second. Possibly it might be more efficient
> to have a different mechanism for periodic events.
> LONG TERM EVENT timing: These tend to be longer in duration, but are not
> recuring. These will be hit by the softclock several times. This
> could be considered a waste of resources, if we could hold them off for a
> while we should.

Hmmmm.  This echos my arguments about kernel memory object persistance:
there are low, medium, and high persistance timer events, just as there
are low, medium, and high persistance memory objects, which should be
kept seperate to avoid KVM fragmentation (LKM vs. mbuf vs. nameibuf, etc.).


This might be better handled with ( ..., timeout, expected removal)
parameters.

If I expect my event to complete in M softclocks, but want a callback
if it doesn't in N softclocks, where N > M, then I can categorize the
request based on the number of hash buckets (in Justin's scheme), or
an arbitrary categorization value in the original scheme, or in my
adaptation to mee Justin's O(1) enqueueing requirements.  Call this
value H.

1)	If H > N > M, it's a short duration event.
2)	If N > H > M, it's a medium duration event.
3)	If N > M > H, it's a long term event.

Cases where N == M collapse to case 1 or case 3.

For Justin's scheme:

1)	The softclock is not expected to hit the event.
2)	The softclock is not expected to hit the event, but
	it might.
3)	The softclock is expected to hit the event one or more
	times.

I suspect that Justin is primarily worried about things in category (1);
thing in this category, for a 256 entry hash size and his expected event
count of 30 may not live more than 8.53 ticks before O(h) is damaged
(assuming a perfect hash).

I'm concerned about things in category (2), which is where I would class
most (non-floppy) hardware failure timeouts.

I'm also concerned about user scheduled timers via itimer(), select(),
and poll(), as well as reschedulable one-shots used by the system to
implement drivers for slow hardware with strict timing requirments;
these are things I expect will live in category (3).

It seems to me that Justin's algorithm works best for category (1)
items, goes to a slightly higher order for category (2) items, and
fails miserably for category (3) items.

My suggested delayed ordered insertion mechanism works well for
category (1) items, so long as I have a back-pointer to allow O(1)
dequeueing, works more poorly than his for category (2) items (but
hardware failures are rare), and works better for category (3) items.


Your suggestion about split algorithms actually resolves my (2) problems
and Justin's (3) problems.

...With more code 8-).


> > Two pointers is additional space.  It also makes softclock become
> > q * O(h) + O(c) where q is the number of defered insertion entries,
> > h is the hash chain length and c is the number of empty buckets
> > that are traversed to find the next timeout for scheduling the next
> > one shot. 
>
> I see it as being (on each softclock) O(H x C)+1 where H is the
> unexpired (remaining) unorderd entries and C is the length of the
> unordered queue for that bucket. The hope is that MOST short term
> items are removed BEFORE the softclock gets to them, and that any
> remaining items are probably long term items that should not be examined
> at every following rotation.

This was the calculation I had arrived at as well; the value of C is
not obviously > 1 if you accept Justin's 30 items in the queue, unless
a given entry is queued for 8.53 or more softclocks on a kernel compiled
with sufficient users to cause 256 hash buckets to be allocated.

> I would even consider that there should be 3 (!) entries at each
> bucket.
> Items move in un-sorted order from 1 to 2  and are then inserted in 
> order into 3. Items going into 3 are Guarenteed to have survived at LEAST
> one full rotation, and can be assumed to be long-term.

You could preclassify them with an expectation-of-removal parameter,
or you could do this, and take a one softclock examination for each
category transition (ie: 1 hit for category (2), two hits for category (3).


> If the wheel is very sparcely populated it doesn't gain much, but then
> again it isn't that much more expensive than what you have now.
> 
> When the wheel get's heavily populated, or when there are a lot of
> long-term entries it saves time.

I think the number of entries is going to get relatively large compared
to the number of entries times the average lifetime divided by the
number of hash buckets.  I think this ratio number is going to get much
larger than one.  Potentially much, much larger.  If this happens, the
hash chain length will get larger, fast, quickly devaluing the hash.


> WHat is the right answer will depend greatly on the particular usage
> model. A method of trying to handle these different models is probably
> worth working for.

I think this is where we are arguing at cross-purposes.  Each of us
sees generic timer code as solving our own timeout problems, and we
ignore the fact that the problems are of differernt character.  Thanks
for adding some categorization!

> Especialy as networking code switches to using more generic
> timeouts. If you disconnect a network, we suddenly get hundreds
> of actual networking timeouts.....

Yep.  And don't forget FIN_2_WAIT even without network disconnects;
DOS network application programmers seldom call shutdown( s, 2) like
they need to...


					Regards,
					Terry Lambert
					terry@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199709231652.JAA12392>