Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 16 Jan 2002 12:06:11 +1100
From:      Peter Jeremy <peter.jeremy@alcatel.com.au>
To:        Terry Lambert <tlambert2@mindspring.com>
Cc:        Michal Mertl <mime@traveller.cz>, Bosko Milekic <bmilekic@technokratis.com>, "James E. Housley" <jeh@FreeBSD.ORG>, Thomas Hurst <tom.hurst@clara.net>, arch@FreeBSD.ORG
Subject:   Re: 64 bit counters again
Message-ID:  <20020116120611.A72285@gsmx07.alcatel.com.au>
In-Reply-To: <3C4492EE.5A60AD0B@mindspring.com>; from tlambert2@mindspring.com on Tue, Jan 15, 2002 at 12:37:02PM -0800
References:  <Pine.BSF.4.41.0201141848330.82342-100000@prg.traveller.cz> <3C4492EE.5A60AD0B@mindspring.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On 2002-Jan-15 12:37:02 -0800, Terry Lambert <tlambert2@mindspring.com> wrote:
>Michal Mertl wrote:
>> I wrote it so it's just matter of one #define change to have all counters
>> switch to 32 or 64 bit.
>
>If it is #if'ed (e.g. "#if (sizeof(long)>=8) || FORCE_64_BIT_CONUTERS"),
>then I have no problem with the code going in, so long as it is off by
>default on 32 bit machines.

I tend to agree with this point.  Most people probably don't care
about "bytes since reboot" and average bandwidth calculations don't
need more than 32 bits for most installations.  (Gig-Ethernet
interfaces running close to wire speed on an IA32 is not a common
configuration and I doubt it ever will be).

>> I kind of measured the speed of just different addition implementation
>> (DUAL pIII cumine, ServerWorks and 440BX dual boards with SMP kernel) -
>[ ... ]

>> It seems you and others were right. SMP atomic implementations are a bit
>> expensive. But even the worst case 50 clocks for locked cmpxchg8b isn't
>> that bad but IMMV.
>
>Uh, that's a factor of 16 in the cases that I think most of
>us will care about, so I think MMDV.

Comparing an unlocked 32-bit operation to a locked 64-bit SMP
operation is unrealistic.  The comparison is either ~2.7:1 (simple) or
~2.4:1 (locked) - neither of which are outrageous.  Also, you need to
take into account the impact of this change on the overall times -
I suspect the extra ~30 cycles for the locked case is still only a
small fraction of the total processing time.

Note that in the UP case, you don't need atomic ops to access or
update a multi-word object.  The only requirements are that RMW
primitives are used and the words are always updated in the same
order (logically, LSW to MSW) - ie addl change,(mem);adcl 0,4(mem).

For the update case "addl, interrupt, addl, adcl, reti, adcl" is safe.
The counter value in memory will be out by 2^32 during the interrupt
but that doesn't matter to the interrupt and will be corrected by the
second adcl.

For the read case, the reader uses something like:

loop:	movl	4(mem),%edx
	movl	(mem),%eax
	cmpl	4(mem),%edx
	jnz	loop

If an interrupt updates the MSW then you take another pass around the
loop, otherwise you always read the correct value.

For the SMP case, you either need to use locks or you need to use
per-CPU counters.  (And the per-CPU counters can be read by another
CPU using the above trick).

>FWIW, HP is selling 10Gbit parts now; you can buy the cards
>from them online today, if you wanted to.  It is very likely
>that we will see IA64 based networking hardware to support
>multiple ports on these things.  10 ports is 100Gbit.  That
>puts us within one order of magnitude of terabit.

IA64 is 64-bit.  The current argument is about 64-bit operations
on a 32-bit machine.  Multi-port 10Gbit cards are going to need
a re-designed bus as well.

Peter

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20020116120611.A72285>