From owner-freebsd-arch Fri Jan 18 6:46: 2 2002 Delivered-To: freebsd-arch@freebsd.org Received: from prg.traveller.cz (prg.traveller.cz [193.85.2.77]) by hub.freebsd.org (Postfix) with ESMTP id 40D9B37B421 for ; Fri, 18 Jan 2002 06:44:40 -0800 (PST) Received: from prg.traveller.cz (localhost [127.0.0.1]) by prg.traveller.cz (8.12.1[KQ-CZ](1)/8.12.1/pukvis) with ESMTP id g0IEhm4a069689; Fri, 18 Jan 2002 15:43:52 +0100 (CET) Received: from localhost (mime@localhost) by prg.traveller.cz (8.12.1[KQ-CZ](1)/pukvis) with ESMTP id g0IEhl8B069683; Fri, 18 Jan 2002 15:43:48 +0100 (CET) Date: Fri, 18 Jan 2002 15:43:47 +0100 (CET) From: Michal Mertl To: Terry Lambert Cc: arch@FreeBSD.ORG Subject: Re: 64 bit counters again In-Reply-To: <3C47E1B2.6938136@mindspring.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG On Fri, 18 Jan 2002, Terry Lambert wrote: > Michal Mertl wrote: > > > 4) Measure CPU overhead as well as I/O overhead. > > > > I don't know what do you mean by I/O overhead here. > > Say you could flood a gigabit interface, and it was 6% of the > CPU on average. Now after you patches, suppose that it's 10% > of the CPU. The limiting factor is the interface... but that's > only for your application, which is not doing CPU intensive > processing. Something that did a lot of CPU work (like SSL), > would have a different profile, and you would be limiting the > application by causing it to become CPU bound earlier. > That's explaining only CPU overhead which I knew there is some. > > > 6) Use an SMP system, make sure that you have a sender > > > on both CPUs, and measure TLB shootdown and page > > > mapping turnover to ensure you get that overhead in > > > there, too (plus the lock overhead). > > > > I'm afraid I don't understand. I don't see that deep into kernel > > unfortunately. If you tell me what to look at and how... > > The additional locks required for i386 64 bit atomicity will, > if the counter is accessed by more than one CPU, result in > bus contention for inter-CPU coherency. > What additional locks? The lock prefix for cmpxchg8b? It's required for 32 bit too and it increases time spent on operation from 3 to 21 clocks making the difference between 32 and 64 bit "only" 29 clocks instead on 47. > > > 7) Make sure you are sending data already in the kernel, > > > so you aren't including copy overhead in the CPU cost, > > > since practically no one implements servers with copy > > > overhead these days. > > > > What do you mean by that? Zero-copy operation? Like sendfile? Is Apache > > 1.x zero-copy? > > Yes, zero copy. Sendfile isn't ideal, but works. Apache is > not zero copy. The idea is to not include a lot of CPU work > on copies between the user space and the kernel, which aren't > going to happen in an extremely optimized application. > An "extremely optimized" application is a thing which would have an administrator who doesn't enable costly counters. > > > If you push data at 100Mbit, and not even at full throttle at > > > that, you can't reasonably expect to see a slowdown when you > > > have other bottlenecks between you and the changes. > > > > > > In particular, you're not going to see things like the pool > > > size go up because of increased pool retention time, etc., > > > due to the overhead of doing the calculations. > > > > That's probably correct eventhough I again don't fully understand what > > you're talking about :-). > > Look at the max number of mbufs allocated. They form a pool > of type stable memory from which mbufs are allocated (things Thanks. > > > Also, realize that even though simply pushing data doesn't > > > use up a lot of CPU on FreeBSD if you do it right, even 2% > > > or 4% increase in CPU overhead overall is enough to cause > > > problems for already CPU-bound applications (i.e. that's > > > ~40 less SSL connections per server). > > > > You're right with that too. Of course I know that at full CPU load the > > clocks will be missing and maybe other things (memory bandwidth with > > locked operations?) will suffer. > > Yes. It's important to know whether it is significant for > the bottleneck figure of merit for a particular application. > > For SSL, this is CPU cycles. For an NFS server, this is how > much data it can push in a given period of time (overall > throughput). For some other application, it's some other > number. Agreed. > > > But we can wait for your effects on the mbuf count high > > > watermark and CPU utilization values before jumping to any > > > conclusions... > > > > I'm afraid I can't provide any measurement with faster interfaces. I can > > try to use real server to sned me some data so it's executing on both > > processors, but I would probably become limited with 100Mbit sooner than > > I'll notice processors have less time to do their job :(. > > Well, you probably should collect *all* statistics you can, > in the most "this is the only thing I'm doing with the box" > way you can, before and after the code change, and then plot > the ones that get worse (or better) as a result of the change. Will do eventually, but unfotunately don't have the time to devote to it at the moment. > > THE MOST IMPORTANT QUESTION, to which lots of you probably know answer > > is, DO WE NEED ATOMIC OPERATIONS FOR ACCESSING DIFFERENT COUNTERS (e.g. > > network-device (modified in ISR? - YES/NO) or network-protocol or > > filesystem ...)? NO MATTER WHAT THE SIZE OF THE COUNTER IS. > > > I think the answer is "yes, we need atomic counters". Whether they > need to be 64 bit or just 32 bit is really application dependent > (we have all agreed to that, I think). Thanks. Do you think it's always true (STABLE/CURRENT,network device ISRs, /sys/netinet routines) ? > See Bruce's posting about atomicity; I think it speaks very > eleoquently on the issue (much more brief than what I'd write > to say the same thing ;^)). If you mean the email where he talks about atomic_t ("atomic_t would be "int" if anything") it doesn't fully apply. I am not inventing atomic_t anymore anyway :-). Isn't there a platform, which better works with 64 bit ints than with 32 bits (a-la 32/16 bits on modern i386)? -- Michal Mertl mime@traveller.cz To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message