Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 16 Sep 2002 12:12:01 +0200
From:      Poul-Henning Kamp <phk@critter.freebsd.dk>
To:        Bruce Evans <bde@zeta.org.au>
Cc:        Josef Karthauser <joe@FreeBSD.org>, "David O'Brien" <obrien@FreeBSD.org>, cvs-committers@FreeBSD.org, cvs-all@FreeBSD.org
Subject:   Re: cvs commit: src/sys/kern kern_timeout.c 
Message-ID:  <13645.1032171121@critter.freebsd.dk>
In-Reply-To: Your message of "Mon, 16 Sep 2002 20:07:40 %2B1000." <20020916195344.S6593-100000@gamplex.bde.org> 

next in thread | previous in thread | raw e-mail | index | archive | help
In message <20020916195344.S6593-100000@gamplex.bde.org>, Bruce Evans writes:

>> > Sep 15 19:40:26 kernel: Expensive timeout(9) function: 0xc65dfc80(0xc659f000) 0.008711721
>> > Sep 15 19:40:26 kernel: Expensive timeout(9) function: 0xc65dfc80(0xc659f000) 0.001068850
>
>This is hard to interpret without the function names (or a full stack
>trace).

Yes, but as far as I'm aware, we still don't have a print_backtrace(9) ?

>I get these on an old dual Celeron system mainly for fxp_tick()
>and uma_timeout().

uma_timeout() seems to trigger on practically all systems.
I've talked with Jeff about it already.

>fxp_tick() almost always causes one under load,
>with an average of about 2 msec and a maximum of about 8 msec.

I have seen other ${netcard}_tick() do the same.  I wonder if the 
1 second timers in network drivers should be moved from timeout(9)
to a kernel thread doing just that: polling net drivers 1/sec.

>8 msec
>is a huge bite out of the 10 msec between softclock for the default
>HZ, and huger for larger HZ.  softclock() expects to run multiple
>timeout routines and shouldn't take more than 1% of the CPU(s) altogether,
>but 8 msec per second is already 0.8% of 1 CPU for just 1 device.

But couldn't some of this be due to evil DELAY() or locking in the
in the timeout routine ?  In that case it's only the duration, not
the CPU usage which is .8% ?

>I think most of the time is spent
>in mii_tick(), and the overhead is not much larger than in RELENG_4.

Yes, I've identified mii_tick() as well.  Wild Bill Ethernet some
time ago pointed out the the parallel issue that MII's should be
run interrupt driven if at all possible to not steal bandwidth from
the actual data.

>In theory, running with Giant held is no worse than running at
>splsoftclock() in RELENG_4, but in practice most interrupt handlers
>wait for Giant, so the low priority fxp_tick() timeout routine blocks
>many high priority interrupts.

Which is why I added this diagnostic in the first place:  My interrupt
latency measurements pointed to timeout(9) users as the worst
offender(s).

I agree that the prints are annoying, but right now I think they provide
primary quality data, so I'd rather not take them behind an option yet.

If somebody wants to write some rate-limiting thing, they're more than
welcome.

-- 
Poul-Henning Kamp       | UNIX since Zilog Zeus 3.20
phk@FreeBSD.ORG         | TCP/IP since RFC 956
FreeBSD committer       | BSD since 4.3-tahoe    
Never attribute to malice what can adequately be explained by incompetence.

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe cvs-all" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?13645.1032171121>