Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 18 Jan 2002 01:06:19 +0100 (CET)
From:      Michal Mertl <mime@traveller.cz>
To:        Terry Lambert <tlambert2@mindspring.com>
Cc:        arch@FreeBSD.ORG
Subject:   Re: 64 bit counters again
Message-ID:  <Pine.BSF.4.41.0201180033210.82507-100000@prg.traveller.cz>
In-Reply-To: <3C473436.BBBABF6B@mindspring.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, 17 Jan 2002, Terry Lambert wrote:

> Michal Mertl wrote:
> > I wan't to inform you that I tried to look at some system pushing data
> > with different size/implementation network counters. I did my last test on
> > dual PIII 750. I don't know, of any good way to measure the load, so I
> > just run vmstat -w1 (and calculated average idle) while pushing the data
> > and also looked at the throughput at 100Mbit Full-Duplex. System was
> > performing about 10000 interrupts and 15000 packets per second. I didn't
> > notice any difference between using 32 bits non atomic operations (3
> > clocks per op) or 64 bit atomic (lock;cmpxchg8b - 50 clocks). I did also
> > measure it on single Duron 800 with the same result.
>
> 1)	Use a gigabit interface, not an FXP.
>

Don't have any :-(. It probably doesn't change a situation that I use
realtek which changes the counter inside interrupt on the Duron.

> 2)	Use 1K payload data.
>
> 3)	Tune your system so that it can push data at wire speed
> 	on the gigabit; this isn't hard, but it is a test of
> 	whether you can recognize overhead when you see it.
>
> 4)	Measure CPU overhead as well as I/O overhead.
>

I don't know what do you mean by I/O overhead here.

> 5)	Measure total number of mbufs and clusters in use,
>
> 6)	Use an SMP system, make sure that you have a sender
> 	on both CPUs, and measure TLB shootdown and page
> 	mapping turnover to ensure you get that overhead in
> 	there, too (plus the lock overhead).
>

I'm afraid I don't understand. I don't see that deep into kernel
unfortunately. If you tell me what to look at and how...

> 7)	Make sure you are sending data already in the kernel,
> 	so you aren't including copy overhead in the CPU cost,
> 	since practically no one implements servers with copy
> 	overhead these days.
>

What do you mean by that? Zero-copy operation? Like sendfile? Is Apache
1.x zero-copy?

> If you push data at 100Mbit, and not even at full throttle at
> that, you can't reasonably expect to see a slowdown when you
> have other bottlenecks between you and the changes.
>
> In particular, you're not going to see things like the pool
> size go up because of increased pool retention time, etc.,
> due to the overhead of doing the calculations.
>

That's probably correct eventhough I again don't fully understand what
you're talking about :-).

> Also, realize that even though simply pushing data doesn't
> use up a lot of CPU on FreeBSD if you do it right, even 2%
> or 4% increase in CPU overhead overall is enough to cause
> problems for already CPU-bound applications (i.e. that's
> ~40 less SSL connections per server).
>

You're right with that too. Of course I know that at full CPU load the
clocks will be missing and maybe other things (memory bandwidth with
locked operations?) will suffer.

> --
>
> Personally, I don't consider 3 and a half million clocks to
> be peanuts, and if you ran at wire speed on the 100Mbit, I
> would expect that to become 35 million clocks on a gigabit.
>

Yes. It seems so.

> But we can wait for your effects on the mbuf count high
> watermark and CPU utilization values before jumping to any
> conclusions...
>

I'm afraid I can't provide any measurement with faster interfaces. I can
try to use real server to sned me some data so it's executing on both
processors, but I would probably become limited with 100Mbit sooner than
I'll notice processors have less time to do their job :(.


I forgot to repeat that 3 clocks operation (unlocked 32 bit add) is
potentially unsafe on SMP. So I was comparing apples to oranges, just to
show that even under such unfair conditions it's not such a big
problem.

THE MOST IMPORTANT QUESTION, to which lots of you probably know answer
is, DO WE NEED ATOMIC OPERATIONS FOR ACCESSING DIFFERENT COUNTERS (e.g.
network-device (modified in ISR? - YES/NO) or network-protocol or
filesystem ...)? NO MATTER WHAT THE SIZE OF THE COUNTER IS.

If we need atomic, we need atomic 32 bit as much as 64 bit. If we don't,
we can have cheaper 64 bits counters. My API allows for different
treatment of different classes of counters (if simple answer to my
question exists) or places in kernel (you know you're calling when
interrupt can occur, other CPU may modify the same counter...). I run the
SMP kernel with the same test with "simple 64 bit add (addl,adcl)" without
noticing anything went wrong and that sure isn't anywhere near as
expensive as lock;cmpxchg8b.


-- 
Michal Mertl
mime@traveller.cz




To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.BSF.4.41.0201180033210.82507-100000>