Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 28 Nov 2001 22:48:05 -0800
From:      Peter Wemm <peter@wemm.org>
To:        Luigi Rizzo <luigi@FreeBSD.org>
Cc:        cvs-committers@FreeBSD.org, cvs-all@FreeBSD.org
Subject:   Re: cvs commit: src/sys/pci if_sis.c 
Message-ID:  <20011129064805.C37793808@overcee.netplex.com.au>
In-Reply-To: <20011128141510.A13586@iguana.aciri.org> 

next in thread | previous in thread | raw e-mail | index | archive | help
Luigi Rizzo wrote:
> > While this helps things like packet forwarding, it hurts things like
> 
> and generic servers (web, proxies) where things are done in
> userland and the content is opaque and the only unaligned
> accesses are for the IP/TCP headers (but those are touched
> already in the packet forwarding case).
> 
> > NFS which now have to do lots and lots of unaligned accesses.
> 
> I would actually like to see some numbers showing that this is the
> case.  Where else these unaligned accesses could be other than in
> creating the NFS/RPC headers ? Do a bunch of unaligned accesses
> really cost more than a memory-to-memory copy of 1500 bytes ?

Even just the IP, TCP and UDP header processing is affected.

> > Have you benchmarked anything else besides packet forwarding?
> 
> no, how would you benchmark this (that is without hitting a
> bottleneck elsewhere in the system) ?

You dont need to hit the wall, supply a constant stream of requests and
measure the cpu used in interrupt or system mode.

To show that unaligned accesses do have a measurable effect:
char buf[100000*4];

main()
{
        int i;
        int j;
        int n;
        int *p;

        j = 0;
        for (n = 0; n < 10000; n++) {
                p = (int *)&buf[OFF];
                for (i = 0; i < 99999; i++)
                        j += *p++;
        }
        exit(j);
}

On an AthlonMP (smp kernel, smp is running, my X11 desktop)
peter@daintree[10:19pm]~-192> cc -O2 -DOFF=0 -o b0 b.c
peter@daintree[10:19pm]~-193> cc -O2 -DOFF=1 -o b1 b.c
peter@daintree[10:19pm]~-194> cc -O2 -DOFF=2 -o b2 b.c
peter@daintree[10:19pm]~-195> cc -O2 -DOFF=3 -o b3 b.c
peter@daintree[10:19pm]~-196> set time
peter@daintree[10:20pm]~-198> ./b0 ; ./b1 ; ./b0 ; ./b2 ; ./b0 ; ./b3
8.876u 0.023s 0:08.97 99.1%     5+671k 0+0io 0pf+0w
9.154u 0.007s 0:09.23 99.1%     5+671k 0+0io 0pf+0w
8.901u 0.000s 0:08.97 99.2%     5+671k 0+0io 0pf+0w
9.157u 0.007s 0:09.23 99.1%     5+671k 0+0io 0pf+0w
8.883u 0.015s 0:08.96 99.2%     5+670k 0+0io 0pf+0w
9.147u 0.015s 0:09.22 99.2%     5+671k 0+0io 0pf+0w

On a Pentium4:
peter@pentium4[10:25pm]/home/tmp-11> ./b0 ; ./b1 ; ./b0 ; ./b2 ; ./b0 ; ./b3
3.229u 0.000s 0:03.23 100.0%    5+673k 0+0io 0pf+0w
4.464u 0.000s 0:04.46 100.0%    5+672k 0+0io 0pf+0w
3.236u 0.000s 0:03.23 100.0%    5+671k 0+0io 0pf+0w
4.464u 0.000s 0:04.46 100.0%    5+672k 0+0io 0pf+0w
3.235u 0.000s 0:03.23 100.0%    5+671k 0+0io 0pf+0w
4.464u 0.000s 0:04.46 100.0%    5+670k 0+0io 0pf+0w

On a Pentium3 (coppermine):
> ./b0 ; ./b1 ; ./b0 ; ./b2 ; ./b0 ; ./b3
14.710u 0.000s 0:14.71 100.0%   5+671k 0+0io 0pf+0w
14.728u 0.000s 0:14.73 99.9%    5+671k 0+0io 0pf+0w
14.718u 0.000s 0:14.71 100.0%   5+671k 0+0io 0pf+0w
14.720u 0.007s 0:14.73 99.9%    5+671k 0+0io 0pf+0w
14.718u 0.000s 0:14.71 100.0%   5+671k 0+0io 0pf+0w
14.735u 0.000s 0:14.73 100.0%   5+670k 0+0io 0pf+0w

On a Pentuim Pro (200MHz, I reduced the outer loop from 10000 to 1000):
> ./b0 ; ./b1 ; ./b0 ; ./b2 ; ./b0 ; ./b3
3.624u 0.007s 0:03.65 99.1%     5+677k 0+0io 0pf+0w
3.673u 0.007s 0:03.68 99.7%     5+673k 0+0io 0pf+0w
3.623u 0.015s 0:03.65 99.4%     5+674k 0+0io 0pf+0w
3.663u 0.007s 0:03.68 99.4%     5+671k 0+0io 0pf+0w
3.639u 0.007s 0:03.65 99.4%     5+674k 0+0io 0pf+0w
3.684u 0.000s 0:03.69 99.7%     5+673k 0+0io 0pf+0w

The most spectacular sufferer of unaligned accesses is the Pentium-4 which
takes ~38% longer to do unaligned accesses...  I suspect writes are
going to be more prounced, especially on systems with ECC that have to
do read/merge/write for every unaligned write.

> > >   Right now the new behaviour is controlled by a sysctl variable,
> > >   hw.sis_quick which defaults to 1 (on), you can set it to 0 to
> ...
> > 
> > Please do not remove this yet.
> 
> no problem. It will actually be useful to tell people who have
> a reasonable testbed to toggle this and see if it makes a difference.

Cheers,
-Peter
--
Peter Wemm - peter@FreeBSD.org; peter@yahoo-inc.com; peter@netplex.com.au
"All of this is for nothing if we don't go to the stars" - JMS/B5


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe cvs-all" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20011129064805.C37793808>