From owner-freebsd-stable@FreeBSD.ORG Wed Nov 24 16:21:28 2010 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A3E2D106566C for ; Wed, 24 Nov 2010 16:21:28 +0000 (UTC) (envelope-from qing.li@bluecoat.com) Received: from whisker.bluecoat.com (whisker.bluecoat.com [216.52.23.28]) by mx1.freebsd.org (Postfix) with ESMTP id 808E28FC0A for ; Wed, 24 Nov 2010 16:21:28 +0000 (UTC) Received: from bcs-mail03.internal.cacheflow.com ([10.2.2.95]) by whisker.bluecoat.com (8.14.2/8.14.2) with ESMTP id oAOGLRvP024835; Wed, 24 Nov 2010 08:21:27 -0800 (PST) X-MimeOLE: Produced By Microsoft Exchange V6.5 Content-class: urn:content-classes:message MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Date: Wed, 24 Nov 2010 08:21:12 -0800 Message-ID: X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: upd: 7.2->8.1 & many networks trouble & flowtable Thread-Index: AcuLz7t0JjBiPMncS1iiiE9WR+/cKQAHaad7 References: <201010221416.o9MEGSa0094817@lava.sentex.ca> <201010221425.o9MEPcWC094867@lava.sentex.ca> <201010221848.o9MIm7WF096197@lava.sentex.ca> <4CC1F3B8.3010302@bogus.com> <4CC225D3.1030502@ops-netman.net> <7.1.0.9.0.20101022210145.06fe25e8@sentex.net> <201010230159.o9N1xGGF098363@lava.sentex.ca> <201010230821.o9N8LVuR001382@lava.sentex.ca> <20101023091555.W66242@maildrop.int.zabbadoz.net> <20101123175100.V24596@maildrop.int.zabbadoz.net> <4CECDC89.7070706@yartv.ru> <4CECFF49.2010204@yartv.ru> From: "Li, Qing" To: "Andrey Groshev" Cc: "Li, Qing" , freebsd-stable@freebsd.org Subject: RE: upd: 7.2->8.1 & many networks trouble & flowtable X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 24 Nov 2010 16:21:28 -0000 You actually haven't answered my questions. =20 I think you are reporting multiple issues in your original email, which include issues in both userland applications as well as kernel issues = (that may be related to flow-table being enabled). =20 The paper discussed quite a few topics, in the right sequence, but if you jumped ahead or jumped around, you may make unintended = assumptions. =20 One of the main reasons for the flow-table enhancement, as its name implies, is to "affinitize" TCP/UDP flows to specific route/interface, even when ECMP is enabled, and the route entries in the ECMP group are changing or being shuffled constantly. This feature is specifically important when an appliance is deployed, for example, as a reverse proxy. =20 The other benefit of the flow-table, is to reduce the L3 route table and L2 ARP/ND6 lookups by caching the search result with the connection. In earlier releases of FreeBSD there is a field called "inp_route"=20 designed for this exact purpose, however, I believe it was removed back in FreeBSD 5.3 release. =20 So to summarize, the flow-table work is necessary and important, though there may be bugs that we need to fix. =20 Flow-table's main benefit, as it stands currrently, is mainly for L4 connections, not for L3 forwarding purposes. =20 When we were doing performance analysis through L4 connections to=20 measure the benefits of separating L2/L3, as noted in the paper, the=20 performance gain was not at the expected level. Further analysis showed=20 there were still lock contentions due to L2 table lookup. This was=20 the other motivation for the flow-table work. =20 I have done performance evaluation at L3 for packet forwarding tests with 100s of route entries, and I have not seen any degradation=20 compared with 7.2. =20 Recently we ran 8.1 on i7 processor using Avalanche testbed for = performance=20 evaluation, and noticed the locking contention is still very high in TCP = connection setup and tear-down. The CPU utilization is also high due to = the =20 lock contentions, not due to flow-table feature because it was disabled. =20 So before you conclude all of the issues that you are encountering falls within flow-table, I urge you to articulate the issues with more = details. =20 Also, once you disable flow-table through sysctl, what issues are you still running into. =20 Yes, I personally consider the flow-table work still being = experiemental. More work is being done as we speak. In addition, we are considering = other=20 enhancements for the routing code. =20 Cheers, =20 -- Qing =20 ________________________________ From: Andrey Groshev [mailto:greenx@yartv.ru] Sent: Wed 11/24/2010 4:04 AM To: Li, Qing Cc: freebsd-stable@freebsd.org Subject: Re: upd: 7.2->8.1 & many networks trouble & flowtable 24.11.2010 13:18, Li, Qing ?????: > I am the main author of this paper you referenced in your email. > =20 Hi! I know that you also worked on this. Kip Macy mention because I found his statement regarding this issue. > The main discussion and focus of my paper was on the design and work = done to separate L2 and L3 for both IPv4 and IPv6 to facilitate the = elimination of GIANT lock in the networking subsystem, thus achieving = high parallelism. > > This redesign of separately managing L2 ARP/ND6 and L3 routing tables = already show performance gain on multicore systems. > > The flow-table enhancement is just one other component, described = towards the end of the paper. Yes, It is experimental and was discussed = as such in the paper as well as on the mailing list. > =20 Ie You also confirms that this feature is still experimental? > I did not know flow-table feature was enabled by default. I wouldn't = have done so myself. > =20 Kip Macy added it to the generic kernel of head 2009-06-14 (vers. = 1.526). And it so happened that when he appeared RELENG_8 she moved into the stable branch. > So help me understand you better: are you complaining about the = general L2/L3 separation work, or you are angry about the flow-table = enhancement in particular? > > cheers, > > -- Qing > > > =20 I understand the importance and necessity of the features. I'll be glad when it will actually carry out what should be. But in the current situation, this feature should not be enabled by default in the generic kernel of the stable branch. Best regards, Andrey Groshev.