Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 24 Oct 2010 19:50:26 +0100 (BST)
From:      Robert Watson <rwatson@FreeBSD.org>
To:        Lawrence Stewart <lstewart@freebsd.org>
Cc:        freebsd-net@freebsd.org, Andre Oppermann <andre@freebsd.org>, Sriram Gorti <gsriram@gmail.com>
Subject:   Re: Question on TCP reassembly counter
Message-ID:  <alpine.BSF.2.00.1010241948240.90390@fledge.watson.org>
In-Reply-To: <4CC2254C.7070104@freebsd.org>
References:  <AANLkTikWWmrnBy_DGgSsDbh6NAzWGKCWiFPnCRkwoDRi@mail.gmail.com> <4CA5D1F0.3000307@freebsd.org> <4CA9B6AC.20403@freebsd.org> <4CBB6CE9.1030009@freebsd.org> <AANLkTinvt4kCQNkf1ueDw0CFaYE9SELsBK8nR2yQKytZ@mail.gmail.com> <4CC2254C.7070104@freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help

On Sat, 23 Oct 2010, Lawrence Stewart wrote:

>> One observation though: net.inet.tcp.reass.cursegments was non-zero (it was 
>> just 1) after 30 rounds, where each round is (as earlier) 15-concurrent 
>> instances of netperf for 20s. This was on the netserver side. And, it was 
>> zero before the netperf runs. On the other hand, Andre told me (in a 
>> separate mail) that this counter is not relevant anymore - so, should I 
>> just ignore it ?
>
> It's relevant, just not guaranteed to be 100% accurate at any given point in 
> time. The value is calculated based on synchronised access to UMA zone stats 
> and unsynchronised access to UMA per-cpu zone stats. The latter is safe, but 
> causes the overall result to potentially be inaccurate due to use of stale 
> data. The accuracy vs overhead tradeoff was deemed worthwhile for 
> informational counters like this one.
>
> That being said, I would not expect the value to remain persistently at 1 
> after all TCP activity has finished on the machine. It won't affect 
> performance, but I'm curious to know if the calculation method has a flaw. 
> I'll try to reproduce locally, but can you please confirm if the value stays 
> at 1 even after many minutes of no TCP activity?

It's possible we should revisit the current synchronisation model for per-CPU 
caches in this regard.  We switched to soft critical sessions when the P4 Xeon 
was a popular CPU line -- it had extortionately expensive atomic operations, 
even when a cache line was in the local cache.  If we were to move back to 
mutexes for per-CPU caches, then we could acquire all the locks in sequence 
and get an atomic snapshot across them all (if desired).  This isn't a hard 
technical change, but would require very careful performance evaluation.

Robert



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?alpine.BSF.2.00.1010241948240.90390>