Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 19 Jan 2002 01:57:56 +0100 (CET)
From:      Michal Mertl <mime@traveller.cz>
To:        Terry Lambert <tlambert2@mindspring.com>
Cc:        arch@FreeBSD.ORG
Subject:   Re: 64 bit counters again
Message-ID:  <Pine.BSF.4.41.0201190128440.5921-100000@prg.traveller.cz>
In-Reply-To: <3C48A0E7.F97BC01@mindspring.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, 18 Jan 2002, Terry Lambert wrote:
> > > The additional locks required for i386 64 bit atomicity will,
> > > if the counter is accessed by more than one CPU, result in
> > > bus contention for inter-CPU coherency.
> >
> > What additional locks? The lock prefix for cmpxchg8b? It's required for 32
> > bit too and it increases time spent on operation from 3 to 21 clocks
> > making the difference between 32 and 64 bit "only" 29 clocks instead on
> > 47.
>
> The additional locks on PPC, SPARC, and Alpha.

Do I understand correctly that 64 bit atomic operations are significantly
more expensive/impossible without lock on these platforms? That sounds
strange for 64 bit platforms.

> THe lock also is a barrier instruction.  You need to read the
> Intel programming guid on barrier instructions.  On a P4, it
<snip>

Yes. That's what I described with "(memory bandwidth with lock
operations?) will suffer".

> > > > What do you mean by that? Zero-copy operation? Like sendfile? Is Apache
> > > > 1.x zero-copy?
> > >
> > > Yes, zero copy.  Sendfile isn't ideal, but works.  Apache is
> > > not zero copy.  The idea is to not include a lot of CPU work
> > > on copies between the user space and the kernel, which aren't
> > > going to happen in an extremely optimized application.
> >
> > An "extremely optimized" application is a thing which would have
> > an administrator who doesn't enable costly counters.
>
> No. If we are talking a BSD-based embedded system, then it's just
> one written by someone who was not playing at being an engineer
> (assuming the performance requirements were there; otherwise,
> their just an engineer who went after the low hanging fruit, and
> it's a legitimate design decision).
>

Yes if the default mode of operation of mentioned counters is going to
stay the same as today.

I'd like to stop this thread because you're still explaining me why 64 bit
is expensive when I already switched the subject. I don't push anyone to
64 bit.

What I'm offering now is:
 1) 64 bit atomic ops which probably could be easily added - probably
	even to base tree (/machine/atomic.h).
 2) "API" for counters which I think can help people to change most/all
counter accesses to use the right operation (what's right depends on the
actual counter and/or point on kernel codepath, where it's occuring - I
don't know what's right - you say we always need atomicity) - it can be
atomic, "simple", per-cpu (this may be hard to implement but with inlining
it should be possible) or whatever). It's everyones decision to #define
what to use. I would keep default 32 bit "simple" - same performance -
same (potencial) problems.

I'll polish my patch to STABLE and post a link to hackers.

> > > Well, you probably should collect *all* statistics you can,
> > > in the most "this is the only thing I'm doing with the box"
> > > way you can, before and after the code change, and then plot
> > > the ones that get worse (or better) as a result of the change.
> >
> > Will do eventually, but unfotunately don't have the time to devote to it
> > at the moment.
>
> I think it's a requirement to advocate this change.

No if it doesn't go into the standard tree and no if the actual
functionality/performance isn't affected by default.

> > > I think the answer is "yes, we need atomic counters".  Whether they
> > > need to be 64 bit or just 32 bit is really application dependent
> > > (we have all agreed to that, I think).
> >
> > Thanks. Do you think it's always true (STABLE/CURRENT,network device
> > ISRs, /sys/netinet routines) ?
>
> I think it's true of all open-ended counters, where there is
> a risk of overflow if they are 32 bit, and some application
> could be bitten by the overflow, and still be consideted to
> be "well written"... in other words, anywhere overflow is
> *expected*.

I don't understand again.

> > > See Bruce's posting about atomicity; I think it speaks very
> > > eleoquently on the issue (much more brief than what I'd write
> > > to say the same thing ;^)).
> >
> > If you mean the email where he talks about atomic_t ("atomic_t would be
> > "int" if anything") it doesn't fully apply. I am not inventing atomic_t
> > anymore anyway :-). Isn't there a platform, which better works with 64 bit
> > ints than with 32 bits (a-la 32/16 bits on modern i386)?
>
> Yes.  IA64.  SPARC 9b (SPARC64) and Alpha, which are 64
> bits, require locks, since they don't have the ability to
> do an atomic "lock; cmpxchg8b".

Can they do "lock; add const,(mem)" in 32 or 64 bit? I suppore not. I
don't care about cmp part - we need addition. cmpxchg8b is used because
that's the only way on X86 CPU to access (read/modify/write) 64 bit memory
in one operation (without using FPU).

-- 
Michal Mertl
mime@traveller.cz




To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.BSF.4.41.0201190128440.5921-100000>