From owner-freebsd-net@FreeBSD.ORG Fri Oct 1 10:26:22 2010 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 69953106564A for ; Fri, 1 Oct 2010 10:26:22 +0000 (UTC) (envelope-from gsriram@gmail.com) Received: from mail-qy0-f175.google.com (mail-qy0-f175.google.com [209.85.216.175]) by mx1.freebsd.org (Postfix) with ESMTP id 2B3D08FC14 for ; Fri, 1 Oct 2010 10:26:21 +0000 (UTC) Received: by qyk8 with SMTP id 8so1435119qyk.13 for ; Fri, 01 Oct 2010 03:26:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:date:message-id :subject:from:to:content-type; bh=HCtOFGt/ty4WLm5PqEwL/H/YdHrpSwNN3v6aFaoXOww=; b=Da5SrSR7YFLcPoT/B0CdNAzd0piiLqLNm7xHG/jqdOYOXoDs2ncnJo3Z2wW1uDPQwt gAIVX4ud5jJtx+j8NM1oWNex9cFdXSuByh7sY2x7yPQh9BUzhQEMk0VLhsywddhc1LNh rJ1+WmJtCvKkrtbtehxmqxNEaA7Me+9+wYb84= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:date:message-id:subject:from:to:content-type; b=q/MD2oGg4a8LkabriynCEBFWBNjwHe4GeOUIXJxAAj6aYc+bynvyK0W83JYyPtJiVK du+TXDvh8csQEkxCDETZDHpwg/MzIpM1JQjFf+1Y7W3X/6NLaOfMZww8i6uwvVvjybRz MLCrRkvjDG4zNBM/XWKwUD3doQTlEBm5ppk5w= MIME-Version: 1.0 Received: by 10.224.29.3 with SMTP id o3mr3580581qac.178.1285927289266; Fri, 01 Oct 2010 03:01:29 -0700 (PDT) Received: by 10.229.236.66 with HTTP; Fri, 1 Oct 2010 03:01:29 -0700 (PDT) Date: Fri, 1 Oct 2010 15:31:29 +0530 Message-ID: From: Sriram Gorti To: freebsd-net@freebsd.org Content-Type: text/plain; charset=ISO-8859-1 Subject: Question on TCP reassembly counter X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 01 Oct 2010 10:26:22 -0000 Hi, In the following is an observation when testing our XLR/XLS network driver with 16 concurrent instances of netperf on FreeBSD-CURRENT. Based on this observation, I have a question on which I hope to get some understanding from here. When running 16 concurrent netperf instances (each for about 20 seconds), it was found that after some number of runs performance degraded badly (almost by a factor of 5). All subsequent runs remained so. Started debugging this from TCP-side as other driver tests were doing fine for comparably long durations on same board+s/w. netstat indicated the following: $ netstat -s -f inet -p tcp | grep discarded 0 discarded for bad checksums 0 discarded for bad header offset fields 0 discarded because packet too short 7318 discarded due to memory problems Then, traced the "discarded due to memory problems" to the following counter: $ sysctl -a net.inet.tcp.reass net.inet.tcp.reass.overflows: 7318 net.inet.tcp.reass.maxqlen: 48 net.inet.tcp.reass.cursegments: 1594 <--- // corresponds to V_tcp_reass_qsize variable net.inet.tcp.reass.maxsegments: 1600 Our guess for the need for reassembly (in this low-packet-loss test setup) was the lack of per-flow classification in the driver, causing it to spew incoming packets across the 16 h/w cpus instead of packets of a flow being sent to the same cpu. While we are working on addressing this driver limitation, debugged further to see how/why the V_tcp_reass_qsize grew (assuming that out-of-order segments should have dropped to zero at the end of the run). It was seen that this counter was actually growing up from the initial runs but only when it reached near to maxsgements, perf degradation was seen. Then, started looking at vmstat also to see how many of the reassembly segments were lost. But, there were no segments lost. We could not reconcile "no lost segments" with "growth of this counter across test runs". $ sysctl net.inet.tcp.reass ; vmstat -z | egrep "FREE|mbuf|tcpre" net.inet.tcp.reass.overflows: 0 net.inet.tcp.reass.maxqlen: 48 net.inet.tcp.reass.cursegments: 147 net.inet.tcp.reass.maxsegments: 1600 ITEM SIZE LIMIT USED FREE REQ FAIL SLEEP mbuf_packet: 256, 0, 4096, 3200, 5653833, 0, 0 mbuf: 256, 0, 1, 2048, 4766910, 0, 0 mbuf_cluster: 2048, 25600, 7296, 6, 7297, 0, 0 mbuf_jumbo_page: 4096, 12800, 0, 0, 0, 0, 0 mbuf_jumbo_9k: 9216, 6400, 0, 0, 0, 0, 0 mbuf_jumbo_16k: 16384, 3200, 0, 0, 0, 0, 0 mbuf_ext_refcnt: 4, 0, 0, 0, 0, 0, 0 tcpreass: 20, 1690, 0, 845, 1757074, 0, 0 In view of these observations, my question is: is it possible for the V_tcp_reass_qsize variable to be unsafely updated on SMP ? (The particular flavor of XLS that was used in the test had 4 cores with 4 h/w threads/core). I see that the tcp_reass function assumes some lock is taken but not sure if it is the per-socket or the global tcp lock. Any inputs on what I missed are most welcome. Thanks, Sriram Gorti Netlogic Microsystems