From owner-freebsd-net@freebsd.org Sun Nov 22 05:29:34 2015 Return-Path: Delivered-To: freebsd-net@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 12E36A35559 for ; Sun, 22 Nov 2015 05:29:34 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2001:1900:2254:206a::16:76]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id F20AF1505 for ; Sun, 22 Nov 2015 05:29:33 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from bugs.freebsd.org ([127.0.1.118]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id tAM5TXIF034719 for ; Sun, 22 Nov 2015 05:29:33 GMT (envelope-from bugzilla-noreply@freebsd.org) From: bugzilla-noreply@freebsd.org To: freebsd-net@FreeBSD.org Subject: [Bug 203630] [Hyper-V] [nat] [tcp] 10.2 NAT bug in TCP stack or hyperv netsvc driver Date: Sun, 22 Nov 2015 05:29:33 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: kern X-Bugzilla-Version: 10.2-RELEASE X-Bugzilla-Keywords: patch X-Bugzilla-Severity: Affects Some People X-Bugzilla-Who: weh@microsoft.com X-Bugzilla-Status: New X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: emulation@FreeBSD.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: maintainer-feedback? mfc-stable9? mfc-stable10? X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 22 Nov 2015 05:29:34 -0000 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=203630 --- Comment #17 from Wei Hu --- The fix went into Head as r291156. I will merge to 10 stable branch in a week. -- You are receiving this mail because: You are on the CC list for the bug. From owner-freebsd-net@freebsd.org Sun Nov 22 12:06:15 2015 Return-Path: Delivered-To: freebsd-net@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 29CA6A35F9A for ; Sun, 22 Nov 2015 12:06:15 +0000 (UTC) (envelope-from daniel.bilik@neosystem.cz) Received: from mail.neosystem.cz (mail.neosystem.cz [IPv6:2001:41d0:2:5ab8::10:15]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id E7E4E1348; Sun, 22 Nov 2015 12:06:14 +0000 (UTC) (envelope-from daniel.bilik@neosystem.cz) Received: from mail.neosystem.cz (unknown [127.0.10.15]) by mail.neosystem.cz (Postfix) with ESMTP id A7BAD74C; Sun, 22 Nov 2015 13:06:11 +0100 (CET) X-Virus-Scanned: amavisd-new at mail.neosystem.cz Received: from dragon.sn.neosystem.cz (unknown [IPv6:2001:41d0:2:5ab8::100:101]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.neosystem.cz (Postfix) with ESMTPSA id 07793746; Sun, 22 Nov 2015 13:06:10 +0100 (CET) Date: Sun, 22 Nov 2015 13:02:40 +0100 From: Daniel Bilik To: Kristof Provost Cc: freebsd-net@freebsd.org Subject: Re: Outgoing packets being sent via wrong interface Message-Id: <20151122130240.165a50286cbaa9288ffc063b@neosystem.cz> In-Reply-To: <20151121212043.GC2307@vega.codepro.be> References: <20151120155511.5fb0f3b07228a0c829fa223f@neosystem.org> <20151120163431.3449a473db9de23576d3a4b4@neosystem.org> <20151121212043.GC2307@vega.codepro.be> X-Mailer: Sylpheed 3.4.3 (GTK+ 2.24.28; x86_64-portbld-dragonfly4.3) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 22 Nov 2015 12:06:15 -0000 On Sat, 21 Nov 2015 22:20:43 +0100 Kristof Provost wrote: >> Sure, pf.conf attached. > Thanks. As a first guess, I think the origin of the problem might be > related to the double nat rule you've got. Well, even though pf may play some role in the problem, I tend to suspect the routing table as the main trigger. There are several facts to support this... 1. after reboot, the router runs fine, even with this "double nat" rule 2. this "double nat" rule was also present on the router when it was running 9-stable, working flawlessly for years 3. when the problems start, there already is one or more "hits" to routing table (by a previously mentioned cron task that updates default route to keep the connectivity), ie. the problems may or may not start only after touching the routing table 4. it seems that touching routing table can also "cure" the problem: last week I noticed the router was unable to make tcp connections to one host over vpn - same problem, it was pushing packets via re0 instead of tap0, but yesterday I've found the problem is gone, without any reboot or other intervention, and surprise... there was short connectivity problem at the beginning of this week, thus default route was changed twice > I don't have the time to dig into this right away. Could you create a PR > and cc me to it? Created, bug id 204735. Thank you. -- Dan From owner-freebsd-net@freebsd.org Sun Nov 22 12:51:43 2015 Return-Path: Delivered-To: freebsd-net@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 643FDA32A05 for ; Sun, 22 Nov 2015 12:51:43 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2001:1900:2254:206a::16:76]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4D276153F for ; Sun, 22 Nov 2015 12:51:43 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from bugs.freebsd.org ([127.0.1.118]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id tAMCphRH094113 for ; Sun, 22 Nov 2015 12:51:43 GMT (envelope-from bugzilla-noreply@freebsd.org) From: bugzilla-noreply@freebsd.org To: freebsd-net@FreeBSD.org Subject: [Bug 204735] [net] Outgoing packets being sent via wrong interface Date: Sun, 22 Nov 2015 12:51:43 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: kern X-Bugzilla-Version: 10.0-STABLE X-Bugzilla-Keywords: X-Bugzilla-Severity: Affects Only Me X-Bugzilla-Who: linimon@FreeBSD.org X-Bugzilla-Status: New X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: freebsd-net@FreeBSD.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: assigned_to Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 22 Nov 2015 12:51:43 -0000 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=204735 Mark Linimon changed: What |Removed |Added ---------------------------------------------------------------------------- Assignee|freebsd-bugs@FreeBSD.org |freebsd-net@FreeBSD.org -- You are receiving this mail because: You are the assignee for the bug. From owner-freebsd-net@freebsd.org Mon Nov 23 06:13:50 2015 Return-Path: Delivered-To: freebsd-net@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 52639A351A3 for ; Mon, 23 Nov 2015 06:13:50 +0000 (UTC) (envelope-from honzhan@microsoft.com) Received: from na01-bl2-obe.outbound.protection.outlook.com (mail-bl2on0125.outbound.protection.outlook.com [65.55.169.125]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-SHA384 (256/256 bits)) (Client CN "mail.protection.outlook.com", Issuer "MSIT Machine Auth CA 2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id A59AD1DEA for ; Mon, 23 Nov 2015 06:13:49 +0000 (UTC) (envelope-from honzhan@microsoft.com) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=selector1; h=From:To:Date:Subject:Message-ID:Content-Type:MIME-Version; bh=7P1hNSwPhsLsXzgbQ1qgjntOILD2Txsh3MUh+gKfauc=; b=hgd59OwSVET7Jh2WX+y2Tfsze1KuDo7vmZ8dLYORVPsr4KocK9wpUFrmX6gIAJ92EQKUGc5QNhZDN/eUwVW3NITLdg7mIwa8QcZO4dTPN69W035/2btFvcS7USWmJt1dNTwaD4jy/hl1WbfyZ2jaPdnRuo2TOka35lAFs2RchSk= Received: from BY2PR03CA059.namprd03.prod.outlook.com (10.141.249.32) by BL2PR03MB227.namprd03.prod.outlook.com (10.255.231.17) with Microsoft SMTP Server (TLS) id 15.1.331.20; Mon, 23 Nov 2015 05:58:16 +0000 Received: from BL2FFO11OLC016.protection.gbl (2a01:111:f400:7c09::169) by BY2PR03CA059.outlook.office365.com (2a01:111:e400:2c5d::32) with Microsoft SMTP Server (TLS) id 15.1.331.20 via Frontend Transport; Mon, 23 Nov 2015 05:58:16 +0000 Authentication-Results: spf=pass (sender IP is 206.191.228.180) smtp.mailfrom=microsoft.com; freebsd.org; dkim=none (message not signed) header.d=none;freebsd.org; dmarc=pass action=none header.from=microsoft.com; Received-SPF: Pass (protection.outlook.com: domain of microsoft.com designates 206.191.228.180 as permitted sender) receiver=protection.outlook.com; client-ip=206.191.228.180; helo=064-smtp-out.microsoft.com; Received: from 064-smtp-out.microsoft.com (206.191.228.180) by BL2FFO11OLC016.mail.protection.outlook.com (10.173.160.82) with Microsoft SMTP Server (TLS) id 15.1.331.11 via Frontend Transport; Mon, 23 Nov 2015 05:58:14 +0000 Received: from SG2PR3002MB0106.064d.mgd.msft.net (141.251.56.18) by SG2PR3002MB0108.064d.mgd.msft.net (141.251.56.20) with Microsoft SMTP Server (TLS) id 15.1.337.9; Mon, 23 Nov 2015 05:58:05 +0000 Received: from SG2PR3002MB0106.064d.mgd.msft.net ([141.251.56.18]) by SG2PR3002MB0106.064d.mgd.msft.net ([141.251.56.18]) with mapi id 15.01.0337.009; Mon, 23 Nov 2015 05:58:05 +0000 From: Hongjiang Zhang To: "freebsd-net@freebsd.org" Subject: Is it allowed to copy hyper-v drivers from FreeBSD 10 and packed it into FreeBSD 9.2 Thread-Topic: Is it allowed to copy hyper-v drivers from FreeBSD 10 and packed it into FreeBSD 9.2 Thread-Index: AdEls7mtuSdVMBL4RcS0JahNHNGLGA== Date: Mon, 23 Nov 2015 05:58:04 +0000 Message-ID: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [141.251.57.5] MIME-Version: 1.0 X-EOPAttributedMessage: 0 X-Microsoft-Exchange-Diagnostics: 1; BL2FFO11OLC016; 1:ZtvnE5akFcfVYk3gI3b1zj+KFEzPmdiGTBUWGGH0tgYXi95knBZ74wkgHxxY8WVKxAPme9zyV6xHx+NcGjcun+iVwYQisItAWWNK5wpTjuyyGVfCiLBwTw3L4xfJ8AvdmxK/He7JNoeVXn5GIPUiW39H/6AcBn3QbVwsk1Ytc+V0gUcM62M84O8CDvURxg8vPLqZjsIzYp16RYn+bjYUFDFXx1+0sE1hVvLVIjqB10urzi0jZIjCBkfhv0vDzXV2uvupfdbYgA533MrA4CMCyitco3spkmy5tSgB8aRFupwELCGMveKH8wgj4sEIwmuQmNmYnuVJKCTBYoRyqfrevUFKhZvbuNGwt7S6qSGet90= X-Forefront-Antispam-Report: CIP:206.191.228.180; CTRY:US; IPV:NLI; EFV:NLI; SFV:NSPM; SFS:(10019020)(6009001)(2980300002)(438002)(199003)(189002)(19300405004)(50986999)(69596002)(19580395003)(92566002)(24736003)(5008740100001)(86362001)(81156007)(97736004)(790700001)(86612001)(6806005)(2501003)(586003)(300700001)(6116002)(66066001)(3846002)(102836003)(260700001)(11100500001)(5004730100002)(86146001)(5003600100002)(84326002)(5007970100001)(87936001)(33646002)(10290500002)(10400500002)(106466001)(5005710100001)(10090500001)(450100001)(2900100001)(189998001)(110136002)(5001960100002)(107886002)(229853001)(19625215002)(54356999)(15975445007)(2351001)(108616004)(512954002)(16236675004)(16796002); DIR:OUT; SFP:1102; SCL:1; SRVR:BL2PR03MB227; H:064-smtp-out.microsoft.com; FPR:; SPF:Pass; PTR:ErrorRetry; A:1; MX:1; LANG:en; X-Microsoft-Exchange-Diagnostics: 1; BL2PR03MB227; 2:jJIt8UTR50m/7B7A8+1JkeRZr49hIiEZb1EV1dPDA6aR48AEOl3dvJCVbhf0V92PbNXt9N9rcOG2ygmSb6Y8+ibYoEyfJ5nY1oH9+Jp9MagQhUFFYJdW7n01tmLetCXcoVTjmwSYfJXiUlX/T8L4+g==; 3:8Ep7sOg1t1uK7OK2+cMp7JqO58XkAxMINKevuLOSjbRdHIAZcCZ5yb5mkbsY0HOY4Xy4Mpe4k9ESG2q0iabDZ1+SEzUcD8MC6osgRG8EwGCEWdfPbXYcA8eHUs8ab6XK7ALTmPto8l4zXAdIpo4k0qBcsRgblriyZ4CoValOKdBsfq0LI22JW4XpsgRObnIzqzBCAfrBOvHcX5ljuzkUec0cbF2ZNpkoImUVqfllUXC6WAAUYOnz/AM20RC6ZCvH3f+khWURuRaejySwPsUUMA==; 25:NI4MG0HBTuae37f06MnRH9HdWxaNbfIp4VCk6wrydxfncqhiqeAOQKlAJLZyL4uzWeJqIG9vtOrfswiS0sRS9LuVUtCHa2L0nsTFuqqlQ5uUh5pA97cxULpr9jr0gBaqW+SbyTyKR5X1NExB0M19SaT2Sb3wjgCCKMZmHjsSjfYIvXSSEHsalcFHGO+8vHzctGjIuRHob9ekvDQCJwnXWA9O/sCTqiit9AHCwjFKly2fx6dZGQNzDaIZIGDE82rSkOdwDaxVkv9SvaKxaE72Zg== X-Microsoft-Antispam: UriScan:; BCL:0; PCL:0; RULEID:(8251501001); SRVR:BL2PR03MB227; X-O365EOP-Header: O365_EOP: AllowList from IP - set SCL to -1 X-Microsoft-Exchange-Diagnostics: 1; BL2PR03MB227; 20:eTM42XggP5lGJtDV5eah+On8+2omA11Ren2fd1GpDAe7cyVUr4C8wMhCrVjrgQf1OEB/QtgYfdBvNGjbiw2YGIkL16M0dfHwYAQptoPD/P2ZA51Tgfk4mmxYCLvd57FDeqwDNc888tWqpYHFQgUFHKZdG5iRDfs0eVVagGpL95gHeN1eWyEWbhbOwn9K7LPizG9cep55Wx8uCESkCEOjkljUCT844nNK26g2cYn9UXDp1XSsAco2FJJ/KSJaJ68FPbtAF9fnmgsyGWBpxCLYoGQuqyTYIBSRew+WwkGuGm76RFNLV9/Gn5Y1F321eQ0yyjFVPEWjf/5In67jlPxcYryEiSSZ04x4XchGdytIuP8f4RWe8iXkGWe1uA43rMIzaE5+VojdrUhBpGZU87xKpEo+aUX/jKuIqp5jfDKgr4eQWVAsdKBa0lCvB0vc7pfVhhXCbfhDAv7iHyHUtexmDfRjxo8BmmEIrw0P5L1Xv67YIqeaGtlWE98zUo3vcLhF9I4bZmqlSoBC05BnPcruwC/PcfGIrsqka4HqvR9fITIRwhCi7BmkI76ubrodsIydHqBTLaBk5IfFxMJ2ZK2lDNFnZ4nvDmeNHXS3bZp1UYQ= X-Microsoft-Antispam-PRVS: X-Exchange-Antispam-Report-Test: UriScan:(108003899814671); X-Exchange-Antispam-Report-CFA-Test: BCL:0; PCL:0; RULEID:(61425024)(601004)(2401047)(5005006)(520078)(8121501046)(10201501046)(3002001)(61426024)(61427024); SRVR:BL2PR03MB227; BCL:0; PCL:0; RULEID:; SRVR:BL2PR03MB227; X-Microsoft-Exchange-Diagnostics: 1; BL2PR03MB227; 4:aJSTqwPWOMGUW14sMhe2mtY78TmeG6cO9Q6/nnezMSW7WGnhqbcBe3CQVf3zLG9pVEWA0OuZCsVSmkRdMrfdZE9TlQOmZVjfSfCdx2b6pbPB6yQ8nRhptuH6wo6I0SOIVbSr88YNldTVO8X5jKmf3RiLOhT4FZgUz8OhMC9Mu3yR1A2LWYF4uhLDlmKG/urQf5GWUlQ61tHap7qj8tbSGIQfd1fK5HK5ZWLSYCBnucuXypxX6dLQeAJrR+gMVoTopAipjUToEdYIog3wv0ARX9hcrtB1C8NWGWb/yailat9VJuiRFgD6p/SIlyIyJe2VYQy76su9H0TipZTJo1M/ngP8TpVTMy3ARpYjy8cR9t9+LD8CuQjWzIVRBw4iU+wEkYT8YJ97WD5Dh9LzgIYsOUX3O5lVJUspamEgbMut0Y9Wq/ulWY0SZIppm8KRaZCOkgA3fEkknK/LRuTY/gT6qCQ89WijHqZ/2u55E3wtkRD6As0/RAZZijf0PQlMpA5s X-Forefront-PRVS: 07697999E6 X-Microsoft-Exchange-Diagnostics: =?us-ascii?Q?1; BL2PR03MB227; 23:3vEpR5sJdMCFMduq7cTNYdKOTJmxQG8y2vw4Pq36/l?= =?us-ascii?Q?p+c81F3D3XrJQyf5W66n6e0HUrT7tBBmODFY8duAHebTc5v44FQShTpyRIJX?= =?us-ascii?Q?ggxswXgOZ97921y+zyjIbkP46/yRLticoi7nj9N/cIKuICp6Sbm1x6EzODtl?= =?us-ascii?Q?n/iEkc6SsqLKBMAvo4aSDAYCaBPyp0kDL2vCuz23tjg8L3SA5EwjJA+Zlwjf?= =?us-ascii?Q?JJVy6odwJZMEeJJ/1dR8BNwyUhjhJ2GMYXWlExRMUXKh/zcEN+9Bk2jv+CtH?= =?us-ascii?Q?QhbNbzS0kbdIYJ0lXPk8z8aPK350izoV6MyoniLLQkb9SGE/CGX9Wp5oMh+q?= =?us-ascii?Q?9l/dLgncWKJbbvgdhnTzUV5io26Vter9rlXeeB5gaA3fWFc6/k1q8amXU9mY?= =?us-ascii?Q?OPy0/Hi1X4/v9fsK3VDEyBJNDUig0CSMgXNOnej5RpA1TsmiJL2EhyAGjtIR?= =?us-ascii?Q?BVqjSoqWISMhxQZpzo1aSZLJg+NUE3nb68ocKnjzwdFQP4i6xYIMZ3lGe6GB?= =?us-ascii?Q?k2yMhk9my5c3p0pld7rTb+YJ4Y2wUozVfuEbtclFexCIOSFvgYFhL8TDBzVq?= =?us-ascii?Q?oWwFMbDKUL521B6a+Un6SmpQ7Q6Rl/k5+vDVPBenF9LX09jAmlmBwkDRKgD+?= =?us-ascii?Q?k+VgpuG+OCx7RJwTTMAAclTMVSDX15OHPRwyj6SqbXmOJiyxDifcdwhY0YLq?= =?us-ascii?Q?ssvg3drh9xxAtI5n8A5QpxII3jrcATv32cF5klWyaAUNT72TvMAvn40EwY+U?= =?us-ascii?Q?kBvZLaUcRfjmM6edLQyl3v0lROZmpQ4pnil1rDNCyZUEq57qvEfQ8awCsRem?= =?us-ascii?Q?cLK1ncfIfS3q00a5yODJ5JVWHqnzg573PLTxYu/8FTHQjpQzryA+3nTs/PeK?= =?us-ascii?Q?ErnPvX7uf/qYGBZ8EKs4u2+DO3XIEj4l6pV/x4Ikd66t4Z7XhzB8anTjfVrj?= =?us-ascii?Q?Bw4JXNAOv6qGlKuqMi3UCOu1CC6RzpU5IDYssciSuYL3ZfygF3KxUaMp4bJd?= =?us-ascii?Q?c+vdcQoRGbt565KQpymjxTaZG45GUWXfL8S0XSKEiU+FDfgDuIFNm1ycd4t5?= =?us-ascii?Q?I6R8CiWdSrTs1gi9DdhT7bwZxyfIwRSxKECdZVJW/znGbNwazE1YYjLt79ve?= =?us-ascii?Q?u8k5G+BQl1VEY6e0rrKWDW0IGLU8boVW1pV7w6W2OI9cJtfN+xWCkwuVrwT2?= =?us-ascii?Q?N5rYJgvc9kKf4O0SCffZtIvMWC4woiS+O6KOhPK9fh7R2lVtvjHELrI39uxL?= =?us-ascii?Q?dA6/5cIwgTXrXIcpO8qtcaNfOsbkLMtsi+2lsaannLCoEEMlWMuwN6yhVLH/?= =?us-ascii?Q?OgssyCzzkuvmop6OLX7h4ui/81x8drsk1s5gS6dAY3TUqw18rP7dv9sJ0wg7?= =?us-ascii?Q?g4IU2qqxvnrt+yKiYhFXaEzlPBv6af4spqgO+qBM2oSUNm4U/Im6v7Y3tQnX?= =?us-ascii?Q?ktKtdIQQ=3D=3D?= X-Microsoft-Exchange-Diagnostics: 1; BL2PR03MB227; 5:9B9kY3RPExiGM1NAFYcg3m7q8Jyd9n7mN3IuGMuB4tl9l5BfoedDp2vSJ5uINWl0yFa1xUdlPvIQqWwjs233eX5bdWD6XPQ00CRgcOzOWv03b51vY8Ut/Sw1tYRqDpgb9upfui92kJa/vnElBS58wg==; 24:zcp731eVsJ16jsa6+vKF3XTf47UtN2Z62emD1JRzgt9ceOYQB5m9x20ZBQnGdUgSJleTQFW3Nn+hJOhMMRqkOq1J/WTyK4ejWejHzIWTYZc= SpamDiagnosticOutput: 1:23 SpamDiagnosticMetadata: NSPM X-OriginatorOrg: microsoft.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 23 Nov 2015 05:58:14.0623 (UTC) X-MS-Exchange-CrossTenant-Id: 72f988bf-86f1-41af-91ab-2d7cd011db47 X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=72f988bf-86f1-41af-91ab-2d7cd011db47; Ip=[206.191.228.180]; Helo=[064-smtp-out.microsoft.com] X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: BL2PR03MB227 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.20 X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 23 Nov 2015 06:13:50 -0000 Hi, Some people, who used FreeBSD 9.2 and back-port network driver for Hyper-v = from FreeBSD 10, encountered a network issue. They installed 2 VM (FreeBSD = 9.2 with the customized FreeBSD kernel) on Azure. Network went offline very= soon when the big file (~320M byte) is copied from one VM to anther throug= h "scp". If TSO is disabled through "sysctl -w net.inet.tcp.tso=3D0", this = issue will be alleviated but cannot be eliminated. I did not figure out why= . I have checked the release notes of FreeBSD 9.2/9.3/10, but did not find an= ything which blocked the back-port. It is supposed 9.2 allows the back-port= ed Hyper-v drivers from 10. Is this assumption correct? From owner-freebsd-net@freebsd.org Mon Nov 23 12:52:00 2015 Return-Path: Delivered-To: freebsd-net@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 547F4A34232 for ; Mon, 23 Nov 2015 12:52:00 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2001:1900:2254:206a::16:76]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 425F21583 for ; Mon, 23 Nov 2015 12:52:00 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from bugs.freebsd.org ([127.0.1.118]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id tANCq0YF053735 for ; Mon, 23 Nov 2015 12:52:00 GMT (envelope-from bugzilla-noreply@freebsd.org) From: bugzilla-noreply@freebsd.org To: freebsd-net@FreeBSD.org Subject: [Bug 204437] 10.2 STABLE Crashing with IPSec Support Date: Mon, 23 Nov 2015 12:52:00 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: kern X-Bugzilla-Version: 10.2-STABLE X-Bugzilla-Keywords: crash, needs-qa X-Bugzilla-Severity: Affects Some People X-Bugzilla-Who: peixotocassiano@gmail.com X-Bugzilla-Status: New X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: freebsd-net@FreeBSD.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: mfc-stable9? mfc-stable10? X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 23 Nov 2015 12:52:00 -0000 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=204437 --- Comment #16 from Cassiano Peixoto --- (In reply to emeric.poupon from comment #15) I'll try it now and let you know the results. Thanks. -- You are receiving this mail because: You are the assignee for the bug. From owner-freebsd-net@freebsd.org Mon Nov 23 14:19:14 2015 Return-Path: Delivered-To: freebsd-net@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 21FBCA35AE4 for ; Mon, 23 Nov 2015 14:19:14 +0000 (UTC) (envelope-from zjshangkun_l@126.com) Received: from m15-55.126.com (m15-55.126.com [220.181.15.55]) by mx1.freebsd.org (Postfix) with ESMTP id 9D0EC106E for ; Mon, 23 Nov 2015 14:19:13 +0000 (UTC) (envelope-from zjshangkun_l@126.com) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=126.com; s=s110527; h=Date:From:Subject:MIME-Version:Message-ID; bh=7lt8k KACEy3y4AJ6LznkHwpxyYN6A0K57fjyN9ubUgY=; b=ppWma6s56ZAUd8cxoljOU pT4d2kJ2+XPGhnIBOv9kv/UUm+fwyhijRAIS3FGH6YZkT5zKU/E9q4E15l4clYBx e+9jiBp1IjeXqiGFeSBOC0RD2uYer8EP74u10a8Fy3UpzbVSUNGcrC6QVL4ck2IG sSYytoG1x5N2yHyddpwxDs= Received: from zjshangkun_l$126.com ( [183.154.89.172, 123.58.177.192] ) by ajax-webmail-wmsvr55 (Coremail) ; Mon, 23 Nov 2015 21:26:30 +0800 (CST) X-Originating-IP: [183.154.89.172, 123.58.177.192] Date: Mon, 23 Nov 2015 21:26:30 +0800 (CST) From: "ShangKun Co.,Ltd." To: freebsd-net@freebsd.org Subject: Repeater Supplier-China X-Priority: 3 X-Mailer: Coremail Webmail Server Version SP_ntes V3.5 build 20150911(74783.7961) Copyright (c) 2002-2015 www.mailtech.cn 126com X-CM-CTRLDATA: qffAxGZvb3Rlcl9odG09MTI0OTo1Ng== MIME-Version: 1.0 Message-ID: <6b52f85.122e8.15134863808.Coremail.zjshangkun_l@126.com> X-CM-TRANSID: N8qowACHRsELFFNWA1YEAA--.1143W X-CM-SenderInfo: x2mvxtpqjn30lbo6ij2wof0z/1tbiXwK7XFUJU7xfWwABsM X-Coremail-Antispam: 1U5529EdanIXcx71UUUUU7vcSsGvfC2KfnxnUU== Content-Type: text/plain; charset=GBK Content-Transfer-Encoding: base64 X-Content-Filtered-By: Mailman/MimeDel 2.1.20 X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 23 Nov 2015 14:19:14 -0000 RGVhciBNYW5hZ2VyLCAKSG93IGFyZSB5b3U/CldlIHNwZWNpYWxpemUgaW4gcmVwZWF0ZXIgZm9y IDEwIHllYXJzLCB3aXRoIG1hdHVyZSBvbmUtc3RvcC4KCgpBbHNvIHdlIGhhdmUgb3VyIG93biBw cm9mZXNzaW9uYWwgZGVzaWduZXJzIHRvIG1lZXQgYW55IG9mIHlvdXIgcmVxdWlyZW1lbnRzLiBG b3Igc2VudCB1cyBtb3JlIGRldGFpbCByZXF1aXJlbWVudCx3ZSB3aWxsIHN1cHBseSBiZXN0IHBy aWNlIGZvciB5b3UuCgoKU2hvdWxkIHlvdSBoYXZlIGFueSBxdWVzdGlvbnMsIGNhbGwgbWUsIGxl dCdzIHRhbGsgZGV0YWlscy4gCgoKU2luY2VyZWx5LApBbmR5L1NhbGVzIE1hbmFnZXIuCi0tCgoq KioqKioqKioqKioqKioqKioqKioqKioqKioqKioqKioqKioqKioqKioqKioqKioqKioqKioqKioq KioqKioqKioqKioqKioqKioqKioqKioqKioqKioqKioqKioqKioqKioqKioqKioqKioqKioKVEVM o7owMDg2LTA1NzktODUxMTgzNTkKRkFYo7owMDg2LTA1NzktODUxMTgzNTkgICAgICAgICAgICAK U0tZUEWjunpqc2hhbmdrdW4gIApFbWFpbDp6anNoYW5na3VuQGdtYWlsLmNvbQpBZGSjuk5vLjY5 OSBDaG91emhvdSBOb3J0aCBSb2FkICxZaVd1IENpdHksMzIyMDAwLFpoZUppYW5nIFByb3ZpbmNl LCBDaGluYS4KKioqKioqKioqKioqKioqKioqKioqKioqKioqKioqKioqKioqKioqKioqKioqKioq KioqKioqKioqKioqKioqKioqKioqKioqKioqKioqKioqKioqKioqKioqKioqKioqKioqKioqKioq KioqKio= From owner-freebsd-net@freebsd.org Mon Nov 23 15:14:18 2015 Return-Path: Delivered-To: freebsd-net@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 6A6F7A33A3D for ; Mon, 23 Nov 2015 15:14:18 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2001:1900:2254:206a::16:76]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 3BA0915E4 for ; Mon, 23 Nov 2015 15:14:18 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from bugs.freebsd.org ([127.0.1.118]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id tANFEIsc098356 for ; Mon, 23 Nov 2015 15:14:18 GMT (envelope-from bugzilla-noreply@freebsd.org) From: bugzilla-noreply@freebsd.org To: freebsd-net@FreeBSD.org Subject: [Bug 201694] 10.2-BETA2 crashing when killing VIMAGE/VNET jails Date: Mon, 23 Nov 2015 15:14:17 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: kern X-Bugzilla-Version: 10.2-RELEASE X-Bugzilla-Keywords: crash, needs-qa X-Bugzilla-Severity: Affects Some People X-Bugzilla-Who: koobs@FreeBSD.org X-Bugzilla-Status: Open X-Bugzilla-Priority: Normal X-Bugzilla-Assigned-To: freebsd-net@FreeBSD.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: mfc-stable10? X-Bugzilla-Changed-Fields: version bug_severity assigned_to flagtypes.name cc bug_status keywords priority Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 23 Nov 2015 15:14:18 -0000 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=201694 Kubilay Kocak changed: What |Removed |Added ---------------------------------------------------------------------------- Version|10.2-BETA1 |10.2-RELEASE Severity|Affects Only Me |Affects Some People Assignee|freebsd-jail@FreeBSD.org |freebsd-net@FreeBSD.org Flags| |mfc-stable10? CC| |koobs@FreeBSD.org Status|New |Open Keywords| |crash, needs-qa Priority|--- |Normal --- Comment #4 from Kubilay Kocak --- Bartek / Paul, To get this issue the attention it needs, id appreciate it if you could both provide: * Updated backtraces for this panic on the latest 10.2-RELEASE / CURRENT (for extra debugging) * Steps to reproduce. The summary mentions crash on 'killing' jails. what steps exactly? * Isolate/reduce the reproduction case and system configuration as much as possible (kernel, ifconfig, whatever) * Hardware (and virtualization if applicable) details. dmesg.boot should be fine for now Note: Please use attachments for any large outputs to keep the conversation clear and easy to follow. -- You are receiving this mail because: You are the assignee for the bug. From owner-freebsd-net@freebsd.org Mon Nov 23 15:30:34 2015 Return-Path: Delivered-To: freebsd-net@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id DDA72A33D54 for ; Mon, 23 Nov 2015 15:30:34 +0000 (UTC) (envelope-from adrian.chadd@gmail.com) Received: from mail-ig0-x232.google.com (mail-ig0-x232.google.com [IPv6:2607:f8b0:4001:c05::232]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id A12681BCD for ; Mon, 23 Nov 2015 15:30:34 +0000 (UTC) (envelope-from adrian.chadd@gmail.com) Received: by igcto18 with SMTP id to18so58509182igc.0 for ; Mon, 23 Nov 2015 07:30:34 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; bh=1QtS485PSWFGE9oTiDQx3xYmFXehBrv5V+OsVQV1TKg=; b=cS0/BTr3wsg0N8o9JXOQjP4PxZiIeVp00wJ03Z7YtDuz638Rgjf7fVsAiALc8M/RuQ eujudjZvZCB5t6WspAayWFk3L3MiF+RIaAeroDLTKYZEVfRbOBHSzqr6NtzRnau3p9pp dtzbHb0pHTiZk2RGxjLqdYPgX3M1C+SSS/5jBpBsM8D/Cr3Emm48xznRfhBg+7jRwF39 YAIY4Ud24CMj8ZRxQ+4ok1/24j3zQtnVhVpxdsVGj18sVwoSZCQ0rcA54nS73wk8GpaD A8bYPIO097+RQLgBXCz4nECLcRm+s8ZkcJIh8R3jbBWjNlH2ZAYFbsRr572pOcpI17W9 q8NQ== MIME-Version: 1.0 X-Received: by 10.50.65.74 with SMTP id v10mr13293413igs.61.1448292633707; Mon, 23 Nov 2015 07:30:33 -0800 (PST) Received: by 10.36.217.196 with HTTP; Mon, 23 Nov 2015 07:30:33 -0800 (PST) In-Reply-To: References: Date: Mon, 23 Nov 2015 07:30:33 -0800 Message-ID: Subject: Re: Is it allowed to copy hyper-v drivers from FreeBSD 10 and packed it into FreeBSD 9.2 From: Adrian Chadd To: Hongjiang Zhang Cc: "freebsd-net@freebsd.org" Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 23 Nov 2015 15:30:35 -0000 On 22 November 2015 at 21:58, Hongjiang Zhang wrote= : > Hi, > > Some people, who used FreeBSD 9.2 and back-port network driver for Hyper-= v from FreeBSD 10, encountered a network issue. They installed 2 VM (FreeBS= D 9.2 with the customized FreeBSD kernel) on Azure. Network went offline ve= ry soon when the big file (~320M byte) is copied from one VM to anther thro= ugh "scp". If TSO is disabled through "sysctl -w net.inet.tcp.tso=3D0", thi= s issue will be alleviated but cannot be eliminated. I did not figure out w= hy. > > I have checked the release notes of FreeBSD 9.2/9.3/10, but did not find = anything which blocked the back-port. It is supposed 9.2 allows the back-po= rted Hyper-v drivers from 10. Is this assumption correct? Hi! It may be something to do with maximum mbufs per packet or some other limit like that. Is there a lot of interest in backporting the latest hyperv driver to 9.2? -a > _______________________________________________ > freebsd-net@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" From owner-freebsd-net@freebsd.org Mon Nov 23 16:11:50 2015 Return-Path: Delivered-To: freebsd-net@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 0AD15A36548 for ; Mon, 23 Nov 2015 16:11:50 +0000 (UTC) (envelope-from julian@freebsd.org) Received: from vps1.elischer.org (vps1.elischer.org [204.109.63.16]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id CBE32149C for ; Mon, 23 Nov 2015 16:11:49 +0000 (UTC) (envelope-from julian@freebsd.org) Received: from Julian-MBP3.local (50-196-156-133-static.hfc.comcastbusiness.net [50.196.156.133]) (authenticated bits=0) by vps1.elischer.org (8.15.2/8.15.2) with ESMTPSA id tANGBhfB062679 (version=TLSv1.2 cipher=DHE-RSA-AES128-SHA bits=128 verify=NO); Mon, 23 Nov 2015 08:11:46 -0800 (PST) (envelope-from julian@freebsd.org) Subject: Re: Is it allowed to copy hyper-v drivers from FreeBSD 10 and packed it into FreeBSD 9.2 To: Adrian Chadd , Hongjiang Zhang References: Cc: "freebsd-net@freebsd.org" From: Julian Elischer Message-ID: <56533AB9.80707@freebsd.org> Date: Tue, 24 Nov 2015 00:11:37 +0800 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:38.0) Gecko/20100101 Thunderbird/38.3.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 23 Nov 2015 16:11:50 -0000 On 23/11/2015 11:30 PM, Adrian Chadd wrote: > On 22 November 2015 at 21:58, Hongjiang Zhang wrote: >> Hi, >> >> Some people, who used FreeBSD 9.2 and back-port network driver for Hyper-v from FreeBSD 10, encountered a network issue. They installed 2 VM (FreeBSD 9.2 with the customized FreeBSD kernel) on Azure. Network went offline very soon when the big file (~320M byte) is copied from one VM to anther through "scp". If TSO is disabled through "sysctl -w net.inet.tcp.tso=0", this issue will be alleviated but cannot be eliminated. I did not figure out why. >> >> I have checked the release notes of FreeBSD 9.2/9.3/10, but did not find anything which blocked the back-port. It is supposed 9.2 allows the back-ported Hyper-v drivers from 10. Is this assumption correct? > Hi! > > It may be something to do with maximum mbufs per packet or some other > limit like that. Is there a lot of interest in backporting the latest > hyperv driver to 9.2? I believe we ($JOB) have them back ported to 8.0. (I didn't do the work) > > > -a > >> _______________________________________________ >> freebsd-net@freebsd.org mailing list >> https://lists.freebsd.org/mailman/listinfo/freebsd-net >> To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" > _______________________________________________ > freebsd-net@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" > From owner-freebsd-net@freebsd.org Tue Nov 24 06:09:49 2015 Return-Path: Delivered-To: freebsd-net@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 496E1A362D2 for ; Tue, 24 Nov 2015 06:09:49 +0000 (UTC) (envelope-from honzhan@microsoft.com) Received: from na01-bn1-obe.outbound.protection.outlook.com (mail-bn1bn0107.outbound.protection.outlook.com [157.56.110.107]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-SHA384 (256/256 bits)) (Client CN "mail.protection.outlook.com", Issuer "MSIT Machine Auth CA 2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id A0AA512EB for ; Tue, 24 Nov 2015 06:09:48 +0000 (UTC) (envelope-from honzhan@microsoft.com) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=selector1; h=From:To:Date:Subject:Message-ID:Content-Type:MIME-Version; bh=4oIZybNa/dzCYXmGnUR1T18A7Uocu37MK9RyxRslx4k=; b=V1A6nIbkV83+vE8Nb1542IqtY2svL7BEDgdyK3YYdbWbTTVlataPAMVRlQyaxgyvNV6lL/9ABnRetcRDbIEQJPFB1+m7Z5fT1cqLrN+tVYHij8kvYrRfxfI3JnoY7jrzQttQ39G1uDN/EiKs0Ter7wKPB3eYwt7xgptSIHPS+qs= Received: from BLUPR0301CA0018.namprd03.prod.outlook.com (10.162.113.156) by BY2PR03MB506.namprd03.prod.outlook.com (10.141.143.18) with Microsoft SMTP Server (TLS) id 15.1.331.20; Tue, 24 Nov 2015 06:09:44 +0000 Received: from BL2FFO11FD053.protection.gbl (2a01:111:f400:7c09::112) by BLUPR0301CA0018.outlook.office365.com (2a01:111:e400:5259::28) with Microsoft SMTP Server (TLS) id 15.1.331.20 via Frontend Transport; Tue, 24 Nov 2015 06:09:43 +0000 Authentication-Results: spf=pass (sender IP is 206.191.228.180) smtp.mailfrom=microsoft.com; gmail.com; dkim=none (message not signed) header.d=none;gmail.com; dmarc=pass action=none header.from=microsoft.com; Received-SPF: Pass (protection.outlook.com: domain of microsoft.com designates 206.191.228.180 as permitted sender) receiver=protection.outlook.com; client-ip=206.191.228.180; helo=064-smtp-out.microsoft.com; Received: from 064-smtp-out.microsoft.com (206.191.228.180) by BL2FFO11FD053.mail.protection.outlook.com (10.173.161.181) with Microsoft SMTP Server (TLS) id 15.1.331.11 via Frontend Transport; Tue, 24 Nov 2015 06:09:41 +0000 Received: from SG2PR3002MB0106.064d.mgd.msft.net (141.251.56.18) by SG2PR3002MB0106.064d.mgd.msft.net (141.251.56.18) with Microsoft SMTP Server (TLS) id 15.1.337.9; Tue, 24 Nov 2015 06:09:32 +0000 Received: from SG2PR3002MB0106.064d.mgd.msft.net ([141.251.56.18]) by SG2PR3002MB0106.064d.mgd.msft.net ([141.251.56.18]) with mapi id 15.01.0337.009; Tue, 24 Nov 2015 06:09:32 +0000 From: Hongjiang Zhang To: Adrian Chadd CC: "freebsd-net@freebsd.org" Subject: RE: Is it allowed to copy hyper-v drivers from FreeBSD 10 and packed it into FreeBSD 9.2 Thread-Topic: Is it allowed to copy hyper-v drivers from FreeBSD 10 and packed it into FreeBSD 9.2 Thread-Index: AdEls7mtuSdVMBL4RcS0JahNHNGLGAAUCrGAABhWRmA= Date: Tue, 24 Nov 2015 06:09:31 +0000 Message-ID: References: In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [141.251.57.5] Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 MIME-Version: 1.0 X-EOPAttributedMessage: 0 X-Microsoft-Exchange-Diagnostics: 1; BL2FFO11FD053; 1:NCkLZGGUqK3OHrfiNP26JwxGBXQJtibVb1Z1Aj4ChUbFLeASPVLStrNIiosuDTGSkHKNJMpTdY+byWLt0ShmTx2gqn36vKl4MhKKVsfMGJ6uCl/4tQhmIKENRk58oo1s+Jq4R4mE/70DzKNLWJNaTT4tWswjWIaiWBigGkmh4QUe5wlUzc9y5CVHl+WgHHnLUOKSJ+ypgjPNP4x/aVaNVLLTx9DK8ZL6Ex7MNhtNN1m+9Pw/DOO5H6dpyci+f8fwiSMAUGCIn4dhjARi2Ue4VguUFeZRRTJQMcfFHMiCdN9TV+geDps/ouZMD6sUfHhhjYEDHxDxnjMm1MSwpHAPE/qng895rHVg3ZQO9p6dG6s= X-Forefront-Antispam-Report: CIP:206.191.228.180; CTRY:US; IPV:NLI; EFV:NLI; SFV:NSPM; SFS:(10019020)(979002)(6009001)(2980300002)(438002)(199003)(13464003)(189002)(24454002)(3905003)(6806005)(110136002)(66066001)(92566002)(23676002)(5003600100002)(19580405001)(81156007)(19580395003)(97736004)(76176999)(69596002)(189998001)(106466001)(16796002)(47776003)(5001960100002)(50986999)(5001920100001)(86612001)(586003)(86362001)(54356999)(33646002)(50466002)(5005710100001)(2900100001)(10090500001)(108616004)(5007970100001)(15975445007)(24736003)(5004730100002)(87936001)(2950100001)(6116002)(5008740100001)(11100500001)(10400500002)(10290500002)(86146001)(102836003)(3846002)(969003)(989001)(999001)(1009001)(1019001); DIR:OUT; SFP:1102; SCL:1; SRVR:BY2PR03MB506; H:064-smtp-out.microsoft.com; FPR:; SPF:Pass; PTR:ErrorRetry; A:1; MX:1; LANG:en; X-Microsoft-Exchange-Diagnostics: 1; BY2PR03MB506; 2:0iB9tmmS1krNEe8cEx5OEGn/oNwNNfWE53QCr5eevxxWW79M6FBryHNpqwJJEQmsLnaBvbqTDPHdX/N87kLXbniMNPdoXauMyJ6uX7HFWDiyxmIDkWUEbRW7PaVgbZ8Wow/vA00NuxTDxWLg9B2I8w==; 3:GQi2B8ruxDB63PS8fY37bnMlMn44GoZzjw1/llJvl24C6VaCEtFsUngxc8BsWRjD/JEPJwart9FLt8xWtGaYIVgXSNsUgVYGUSlAed5nqyyt6Uw6uZsZkUOF/GW96IcD09Uh1j8K3rplwbqYiDeDc0jY/VzHz87Om10gsVljJh+W0/XTmjF/al9RokLvBaHSSMVB0Bex3anzK/UR71FAqRD0HodwHpH8W2uz9omloWjY9Pz014GrVNClL1diDkGKwDnJHC83qQgEel3BlEehcauSd52NDpApNuLh4YQJvWZRbe1/zwh0wdOOypIHHk/o; 25:EjT7cHm/LoyPoeeL3pTXLylVQ2BhDPQvuHSm6egwClfJ8Zt+nmqLKZ3ZR+r0oZ3ewj+8Viis/nRPjTg5jl+Y+mInzORYUmrFdDD6f6ip/hsw8jHcXMYvxMiMfhB4qPx5JyKijR4WIYDLri0neR0MbxFb6ue4xLTALgVeJtIP54I0nWNBU7MrXAU9aqvTGByqLM1I31Tq5RRGRR+3X1faKji4R9vib4jxCf0daGxZMtzQmAbvOVpjvJHVEmbe+f99gi0rIjSpVOUTIw2pf/hCHg== X-Microsoft-Antispam: UriScan:; BCL:0; PCL:0; RULEID:(42134001)(42139001)(8251501001); SRVR:BY2PR03MB506; X-O365EOP-Header: O365_EOP: AllowList from IP - set SCL to -1 X-Microsoft-Exchange-Diagnostics: 1; BY2PR03MB506; 20:L6JYtPrYNPTDyYHfBd0x9nkR0+eKjTV7rr+q2QfRyojBujd6dytnee6/pLkgWn+zYHWdakseZUiZM4JWHmWfiybaLlMajt1P/FIKGuTRvWVG/r8itlA4XFRLWzCccN1BYDN7JtmCeNHkhHQ5plQxLJ7gCOCzERXfebEoXsMJ7CJi7+D9TADtZUSJnPetWbLa55TtVvaY7TxfT6AW1FIMKyO15rft2FZEspIimSF4XiVVq0zwrE9xQOe1SG8WdiPIgEVrJjLS6QN1+wlIF4flREsLIgMbd49CMDu/lLPhj9eYXUEMSYLbYoQhm4dp90yxYKK3e22FKAfn8It1DGiccIY2pMaqskTexqXonczY0YBFmN/dMx1Nu3+2+ExkgfU4U8D2Eis5RNvDcLZ7ZvLW/9wD2LUEPaDa0xacbcCuj6sPWeurLZLKKSe0XytQfCztO8/TvF6axKYx4lnW3+RSU2uMxOSA5zL0L36OztwVvFQBYiREbbe9j5kJpCYlVIL4WCB1TC7jWoYUx29WwSNxoY4M85tBO3leEs9bVKJ5xHCSwujL84HStSevUIUlk68g46G39w5ONojRyNIjMbELmm8G/amyvqAhtgwpLj/Bdd8= X-Microsoft-Antispam-PRVS: X-Exchange-Antispam-Report-Test: UriScan:(202136424685340)(189930954265078)(108003899814671); X-Exchange-Antispam-Report-CFA-Test: BCL:0; PCL:0; RULEID:(61425024)(601004)(2401047)(8121501046)(520078)(5005006)(3002001)(10201501046)(61426024)(61427024); SRVR:BY2PR03MB506; BCL:0; PCL:0; RULEID:; SRVR:BY2PR03MB506; X-Microsoft-Exchange-Diagnostics: 1; BY2PR03MB506; 4:cxFokBdkvZAjDs7YasIG5gXDRxaYkW6tLznoxYNOkc2X+2N9uYw4cd4eaevFDdefj9U10CfZyIAHjKeA9GbNpauhzKvuqyDSis3TFZYj4naGvi0wk/E3Vi2WST7DjvrjxbpIo3qauJ2PnxdVKiG7dAtDvf0xEAtK98StlQGP5QSvvEJhsd3/roMUaZMlvfmzjSjNff+4fhzLQuWNsUYv5WUu+AFZs4JAl0u9kCtqg4CZP46K1eURorjEDz5jskUumVeb3TRp5rU+7VEm4t/Fsx/c+0dA5bE/hFtTk/GFehE1ZuOVeg1g5aWYrhs8YeEAJkTD3b12Ga2+2J5Ni2yZYtZf7iwOvqlr4vlsvFZqsN0zv7XZWuHQsdSQo/wnk+wfyL6g1JJ3BZUOEhfdxBuhbeLxSKJ2vHzlswZg/786m2PR6h2Cu0uPAzKj5B8qnD0y7ePxtYP9+4LVCUfUiBmXtNZuUEPq0vrEYXFU9YOPDuJEW70LV9haV/rJ051TEiZUAiMXkd1FCU4fWguMydIYVQVmin15TRanKG8kuUW5+IqfJVW4bUYiYEjRf+5h36Zm X-Forefront-PRVS: 0770F75EA9 X-Microsoft-Exchange-Diagnostics: =?utf-8?B?MTtCWTJQUjAzTUI1MDY7MjM6ZEFJUjh4QWZpbERKSnI1Szk1V2c4Y0Y4cE13?= =?utf-8?B?NjRua2pLdThTUE43d1ZVYnIwUzQydkEvWnRTaklhN0NsYk1ZaENsQytpcmZX?= =?utf-8?B?N1F3c0pCbTduSndFZTVtNmRFSGRmcG1Yb1dBVFQyZTZQQ0U0UkNCeklhWWFh?= =?utf-8?B?SGh3T21xajJxaitIeUFiejVZT3VWdXdHbnl3WFUwbkRWcEptU0NXSUZnN2lv?= =?utf-8?B?emgwZE44S0I3L2I3VG0xdmdNSDlLd0tydDE0RHp1NmtXSER4YllMcW9maVFi?= =?utf-8?B?OW9RMTJJekN5V2xoZzBrOVNibGRLUWNGVE0vMzJqRXlGV0pTSHM4WDBPTjEz?= =?utf-8?B?WGZyTDNvamxsaHVlOVVTOXZTM0JXMEhkNGgzTmdMTWU0TFcwam5nVGlDaE4r?= =?utf-8?B?aVZza3FiQWIvK2Rmak41Ukt1YjArbEZkeE9Zd2xSK1Rxb1R2dGRRTWRkbDFS?= =?utf-8?B?Z2Nwd0dYOHd5aFVwSnB4YnlyWDRYaEdHNnZNeEZTbUhFMVhrZUhPckk4MkdT?= =?utf-8?B?RjZkcElGNWMxMS9obkxUOGZCU05USUR0Q3NWOFI4QXBIeW9TTGx6a1EyMjRS?= =?utf-8?B?ZzhMZVF3WFM0Mk5ONko4MDU5S29wQ3FycXpVbStmeE9DNXNReUdpeUtuSVhz?= =?utf-8?B?dTVJSWVnRjMwN0svenQrNnNkcXRvV1VsUFI0a1JMaERmZEFzNVlmUHk0eWl0?= =?utf-8?B?ak15d25wMFBKQ00yYkNTVzZUVzZqbk0vMzR3RXBRMlJvSzlRNTNZZ2szMnkw?= =?utf-8?B?djlCTFJJeEdJMnRCekRoUXUvNGwyekJTajhuQWx3RjZDYXNnMXE1TFlyTm5O?= =?utf-8?B?VHRkN3N1dzIrbWtGREJmVTRPbzN2OXROaXVoeVJTRGk5MVhmRFJnVzBhc1Zm?= =?utf-8?B?M0RSK2tDZ3RoZkM1NWVBVExXNmhkUWpETEpkRG0rNm9FZXVnWkFLMG1MYlh1?= =?utf-8?B?b0VUeGFWcXcvZjE5WjRvYkZjVE03eG1BdzdWZmNnL2N0SFI5d2FXUUE5R2dU?= =?utf-8?B?YkFqTTZROXRzZStwdWc1UzJlbXZYNXpCTlRPcEhlekcwUHpMSWs5SlAxclBP?= =?utf-8?B?Q2VENkZJTDlwOWhGWit3a2hydkdsVW44aDB6M0RGS3dWRFB4SjVrL0Jvek5n?= =?utf-8?B?QWhOYkEwZDN3cENTS3VMZVNERW8xNGJTS0NlV0NEdzRPMFJ1bDBFcjBrZ2gx?= =?utf-8?B?T2x0NlM4YzhBT2ZsNTlENDE4MHU3eEN6dlVOTFNLejdvYlJZSlNOdVRjOG42?= =?utf-8?B?QnpNeTZ3T0tCUnVzd0NPaEFHMmdSMDhmU28yWEZYd0RlT1NhN3grb3VlL2tQ?= =?utf-8?B?UTFxeEIzTkZpbEFWRFFERkZEZXlPN1BCMWNLdDBEZCtxa1NRcUFUS3lTZHVJ?= =?utf-8?B?Y2Rtdk82WHF4cDhhMzVMQ1hnWmJxL3BqNG95TDZ6NHdsbjZnYVN4Z1hXUE5Y?= =?utf-8?B?UFZLV3RaS3Faais4bThxVTRyNzBGVGY0eWRMT2ZpczRnQXdFSmdZYW5mZkt0?= =?utf-8?B?WTVvUTJ2ZHM4Sjg0a0o3TTBHV0VnU3ZSZVhiK2d6dUxLblQ2a0lLZnVoN2I2?= =?utf-8?B?QWtOeXhNckRVSUdaYXU2L2o0MnRTQ0ZWaU1Mb3REVk1lcDB3NXhMMVdmL2x0?= =?utf-8?B?U3VpMzIwR1UweFI0dkF1bFBOTzh6SE9GU2ZaUXR2UGxFWVl5NWJXR01hSFBH?= =?utf-8?B?RUlUMjNJaFNNeXlsTWR3UjNLNjVTNEJ5QkpZU2YybVpBMFcyYzd3Y1dYNyt0?= =?utf-8?B?ZU5aLzN6QkhrMmVUUTZWUnFnNWtTbDZmaVJqMTRLRVh3bVBBMUQ3NTRrS3Fi?= =?utf-8?B?eEVxR29aaG5jMC81enhhNGNoc1VTNmY2VFJwdThGTFBMTkZESUdiU05PMmgr?= =?utf-8?B?dGF0d3ZRN0xZYWQrRFZmcGJzVHJFV01lREZMVVMzWVlPdzVsRDV0LzNWSlFZ?= =?utf-8?B?U2twS0dzV2x4UTFvSE5heEFVSDAwcURsTmRSZXIyY1laRlZxMmlvbGhNbm1G?= =?utf-8?Q?JlVrd?= X-Microsoft-Exchange-Diagnostics: 1; BY2PR03MB506; 5:fPNGJsl9+aLzk7WbbmkcnZiDCPA/rhupngoLWl0B9PqyHQQj1o+uou+PUPyT05Mew2hP9D5wS1WIrjJYL5QfCZ96V8jvX0krQuxVjp+hd3ACryN/BvzvZslCSVjD4DM+hDSePep5Lt964z3pgU0Pjw==; 24:22N1hsfne3krxCRJfE5r6orT2DfgwRS0+EBcrrp+q3MP2e6Q1s3a+Pi3I3Jk72Xw9OnDxrJ0ObT7YRLXuNIjkYWnfX39ihcEM45Wj6xukfg= SpamDiagnosticOutput: 1:23 SpamDiagnosticMetadata: NSPM X-OriginatorOrg: microsoft.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 24 Nov 2015 06:09:41.3204 (UTC) X-MS-Exchange-CrossTenant-Id: 72f988bf-86f1-41af-91ab-2d7cd011db47 X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=72f988bf-86f1-41af-91ab-2d7cd011db47; Ip=[206.191.228.180]; Helo=[064-smtp-out.microsoft.com] X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: BY2PR03MB506 X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 24 Nov 2015 06:09:49 -0000 VGhhbmtzLiBFbmFibGluZyBUU08gY2FuIG1ha2UgdGhlIGlzc3VlIGVhc2lseSBvY2N1ci4gV2Vs bCwgdGhpcyBpcyBqdXN0IGEgY2x1ZS4NCg0KVGhlIGJhY2sgcG9ydCBpcyBub3QgZG9uZSBieSBt ZSwgYW5kIHRoZSBwZW9wbGUgd2hvIGRpZCB0aGF0IGJlY2F1c2UgdGhlaXIgc3lzdGVtIGlzIGJh c2VkIG9uIDkuMiwgYW5kIGl0IHRha2VzIGEgbG90IG9mIGVmZm9ydCB0byB1cGdyYWRlIEZyZWVC U0Qgc3lzdGVtLg0KDQotLS0tLU9yaWdpbmFsIE1lc3NhZ2UtLS0tLQ0KRnJvbTogQWRyaWFuIENo YWRkIFttYWlsdG86YWRyaWFuLmNoYWRkQGdtYWlsLmNvbV0gDQpTZW50OiAyMDE15bm0MTHmnIgy M+aXpSAyMzozMQ0KVG86IEhvbmdqaWFuZyBaaGFuZyA8aG9uemhhbkBtaWNyb3NvZnQuY29tPg0K Q2M6IGZyZWVic2QtbmV0QGZyZWVic2Qub3JnDQpTdWJqZWN0OiBSZTogSXMgaXQgYWxsb3dlZCB0 byBjb3B5IGh5cGVyLXYgZHJpdmVycyBmcm9tIEZyZWVCU0QgMTAgYW5kIHBhY2tlZCBpdCBpbnRv IEZyZWVCU0QgOS4yDQoNCk9uIDIyIE5vdmVtYmVyIDIwMTUgYXQgMjE6NTgsIEhvbmdqaWFuZyBa aGFuZyA8aG9uemhhbkBtaWNyb3NvZnQuY29tPiB3cm90ZToNCj4gSGksDQo+DQo+IFNvbWUgcGVv cGxlLCB3aG8gdXNlZCBGcmVlQlNEIDkuMiBhbmQgYmFjay1wb3J0IG5ldHdvcmsgZHJpdmVyIGZv ciBIeXBlci12IGZyb20gRnJlZUJTRCAxMCwgZW5jb3VudGVyZWQgYSBuZXR3b3JrIGlzc3VlLiBU aGV5IGluc3RhbGxlZCAyIFZNIChGcmVlQlNEIDkuMiB3aXRoIHRoZSBjdXN0b21pemVkIEZyZWVC U0Qga2VybmVsKSBvbiBBenVyZS4gTmV0d29yayB3ZW50IG9mZmxpbmUgdmVyeSBzb29uIHdoZW4g dGhlIGJpZyBmaWxlICh+MzIwTSBieXRlKSBpcyBjb3BpZWQgZnJvbSBvbmUgVk0gdG8gYW50aGVy IHRocm91Z2ggInNjcCIuIElmIFRTTyBpcyBkaXNhYmxlZCB0aHJvdWdoICJzeXNjdGwgLXcgbmV0 LmluZXQudGNwLnRzbz0wIiwgdGhpcyBpc3N1ZSB3aWxsIGJlIGFsbGV2aWF0ZWQgYnV0IGNhbm5v dCBiZSBlbGltaW5hdGVkLiBJIGRpZCBub3QgZmlndXJlIG91dCB3aHkuDQo+DQo+IEkgaGF2ZSBj aGVja2VkIHRoZSByZWxlYXNlIG5vdGVzIG9mIEZyZWVCU0QgOS4yLzkuMy8xMCwgYnV0IGRpZCBu b3QgZmluZCBhbnl0aGluZyB3aGljaCBibG9ja2VkIHRoZSBiYWNrLXBvcnQuIEl0IGlzIHN1cHBv c2VkIDkuMiBhbGxvd3MgdGhlIGJhY2stcG9ydGVkIEh5cGVyLXYgZHJpdmVycyBmcm9tIDEwLiBJ cyB0aGlzIGFzc3VtcHRpb24gY29ycmVjdD8NCg0KSGkhDQoNCkl0IG1heSBiZSBzb21ldGhpbmcg dG8gZG8gd2l0aCBtYXhpbXVtIG1idWZzIHBlciBwYWNrZXQgb3Igc29tZSBvdGhlciBsaW1pdCBs aWtlIHRoYXQuIElzIHRoZXJlIGEgbG90IG9mIGludGVyZXN0IGluIGJhY2twb3J0aW5nIHRoZSBs YXRlc3QgaHlwZXJ2IGRyaXZlciB0byA5LjI/DQoNCg0KLWENCg0KPiBfX19fX19fX19fX19fX19f X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fXw0KPiBmcmVlYnNkLW5ldEBmcmVlYnNkLm9y ZyBtYWlsaW5nIGxpc3QNCj4gaHR0cHM6Ly9uYTAxLnNhZmVsaW5rcy5wcm90ZWN0aW9uLm91dGxv b2suY29tLz91cmw9aHR0cHMlM2ElMmYlMmZsaXN0cw0KPiAuZnJlZWJzZC5vcmclMmZtYWlsbWFu JTJmbGlzdGluZm8lMmZmcmVlYnNkLW5ldCZkYXRhPTAxJTdjMDElN2Nob256aGFuDQo+ICU0MDA2 NGQubWdkLm1pY3Jvc29mdC5jb20lN2MxYmMzMTJhMDQ0OWQ0ZmI2MWJjNzA4ZDJmNDFiMDdkYiU3 YzcyZjk4OGINCj4gZjg2ZjE0MWFmOTFhYjJkN2NkMDExZGI0NyU3YzEmc2RhdGE9Yk5FaHJvSWx3 aG9USHlHM2JoSGxRMkRZSlk1RldNaThBMQ0KPiAyM1RWZVo3RGMlM2QgVG8gdW5zdWJzY3JpYmUs IHNlbmQgYW55IG1haWwgdG8gDQo+ICJmcmVlYnNkLW5ldC11bnN1YnNjcmliZUBmcmVlYnNkLm9y ZyINCg== From owner-freebsd-net@freebsd.org Tue Nov 24 11:39:46 2015 Return-Path: Delivered-To: freebsd-net@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 5B95CA36ED7 for ; Tue, 24 Nov 2015 11:39:46 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2001:1900:2254:206a::16:76]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 3EA931FD1 for ; Tue, 24 Nov 2015 11:39:46 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from bugs.freebsd.org ([127.0.1.118]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id tAOBdkZ9060863 for ; Tue, 24 Nov 2015 11:39:46 GMT (envelope-from bugzilla-noreply@freebsd.org) From: bugzilla-noreply@freebsd.org To: freebsd-net@FreeBSD.org Subject: [Bug 204437] 10.2 STABLE Crashing with IPSec Support Date: Tue, 24 Nov 2015 11:39:45 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: kern X-Bugzilla-Version: 10.2-STABLE X-Bugzilla-Keywords: crash, needs-qa X-Bugzilla-Severity: Affects Some People X-Bugzilla-Who: peixotocassiano@gmail.com X-Bugzilla-Status: New X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: freebsd-net@FreeBSD.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: mfc-stable9? mfc-stable10? X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 24 Nov 2015 11:39:46 -0000 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=204437 --- Comment #17 from Cassiano Peixoto --- (In reply to emeric.poupon from comment #15) Hi Emeric, Your patch fixed the bug. Thank you very much for your help. My system is now running for 15 hours with no reboot :) Will you commit this patch ASAP? Thank you again. -- You are receiving this mail because: You are the assignee for the bug. From owner-freebsd-net@freebsd.org Tue Nov 24 13:21:12 2015 Return-Path: Delivered-To: freebsd-net@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 4594CA36205 for ; Tue, 24 Nov 2015 13:21:12 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2001:1900:2254:206a::16:76]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 306D112BB for ; Tue, 24 Nov 2015 13:21:12 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from bugs.freebsd.org ([127.0.1.118]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id tAODLCbJ076671 for ; Tue, 24 Nov 2015 13:21:12 GMT (envelope-from bugzilla-noreply@freebsd.org) From: bugzilla-noreply@freebsd.org To: freebsd-net@FreeBSD.org Subject: [Bug 204437] 10.2 STABLE Crashing with IPSec Support Date: Tue, 24 Nov 2015 13:21:10 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: kern X-Bugzilla-Version: 10.2-STABLE X-Bugzilla-Keywords: crash, needs-qa X-Bugzilla-Severity: Affects Some People X-Bugzilla-Who: emeric.poupon@stormshield.eu X-Bugzilla-Status: New X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: freebsd-net@FreeBSD.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: mfc-stable9? mfc-stable10? X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 24 Nov 2015 13:21:12 -0000 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=204437 --- Comment #18 from emeric.poupon@stormshield.eu --- Hi, I am glad you confirm it fixes the problem. It is planned to be committed soon. -- You are receiving this mail because: You are the assignee for the bug. From owner-freebsd-net@freebsd.org Tue Nov 24 13:46:45 2015 Return-Path: Delivered-To: freebsd-net@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 0A0CDA367BD for ; Tue, 24 Nov 2015 13:46:45 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2001:1900:2254:206a::16:76]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id E44981099 for ; Tue, 24 Nov 2015 13:46:44 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from bugs.freebsd.org ([127.0.1.118]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id tAODkiUR033047 for ; Tue, 24 Nov 2015 13:46:44 GMT (envelope-from bugzilla-noreply@freebsd.org) From: bugzilla-noreply@freebsd.org To: freebsd-net@FreeBSD.org Subject: [Bug 204437] 10.2 STABLE Crashing with IPSec Support Date: Tue, 24 Nov 2015 13:46:44 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: kern X-Bugzilla-Version: 10.2-STABLE X-Bugzilla-Keywords: crash, needs-qa X-Bugzilla-Severity: Affects Some People X-Bugzilla-Who: peixotocassiano@gmail.com X-Bugzilla-Status: New X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: freebsd-net@FreeBSD.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: mfc-stable9? mfc-stable10? X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 24 Nov 2015 13:46:45 -0000 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=204437 --- Comment #19 from Cassiano Peixoto --- (In reply to emeric.poupon from comment #18) Please update this PR when it's commited. So i can keep posted :) Thanks again. -- You are receiving this mail because: You are the assignee for the bug. From owner-freebsd-net@freebsd.org Tue Nov 24 22:22:41 2015 Return-Path: Delivered-To: freebsd-net@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id F0DA8A372F7 for ; Tue, 24 Nov 2015 22:22:40 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2001:1900:2254:206a::16:76]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id DA6671EA1 for ; Tue, 24 Nov 2015 22:22:40 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from bugs.freebsd.org ([127.0.1.118]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id tAOMMeMQ032960 for ; Tue, 24 Nov 2015 22:22:40 GMT (envelope-from bugzilla-noreply@freebsd.org) From: bugzilla-noreply@freebsd.org To: freebsd-net@FreeBSD.org Subject: [Bug 202983] ixv driver in 11.0-CURRENT(10.1 & 10.2 RELEASE) doesn't pass traffic using XEN hypervisor(AWS EC2) Date: Tue, 24 Nov 2015 22:22:40 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: kern X-Bugzilla-Version: 11.0-CURRENT X-Bugzilla-Keywords: IntelNetworking X-Bugzilla-Severity: Affects Some People X-Bugzilla-Who: jlpetz@gmail.com X-Bugzilla-Status: New X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: freebsd-net@FreeBSD.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 24 Nov 2015 22:22:41 -0000 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=202983 --- Comment #4 from Jarrod Petz --- Have had feedback from other engineers who confirmed this patch fixes the issue. https://reviews.freebsd.org/D4186 However there was some small issues with it. As detailed below. ------------------------------------------------------------------------------------- I applied the changes from https://reviews.freebsd.org/D4186 to 11.0-CURRENT (which among other things adds the missing VF-PF API renegotiation on the reset path) and saw packets arriving in the instance, but tagged with vlan 2048. # tcpdump -i ixv0 -e -vvv tcpdump: listening on ixv0, link-type EN10MB (Ethernet), capture size 262144 bytes 10:39:07.551985 12:8d:18:b1:e5:6b (oui Unknown) > 12:39:94:73:0b:1d (oui Unknown), ethertype 802.1Q (0x8100), length 60: vlan 2048, p 0, ethertype ARP, Ethernet (len 6), IPv4 (len 4), Request who-has ip-10-0-3-114.ec2.internal tell ip-10-0-3-1.ec2.internal, length 42 10:39:08.552133 12:8d:18:b1:e5:6b (oui Unknown) > 12:39:94:73:0b:1d (oui Unknown), ethertype 802.1Q (0x8100), length 60: vlan 2048, p 0, ethertype ARP, Ethernet (len 6), IPv4 (len 4), Request who-has ip-10-0-3-114.ec2.internal tell ip-10-0-3-1.ec2.internal, length 42 After creating a vlan0 interface with ID 2048 on top of ixv0, I saw traffic passing and DHCP worked. # ifconfig vlan0 create # ifconfig vlan0 vlan 2048 vlandev ixv0 # tcpdump -i vlan0 -vvv -s65534 -n tcpdump: listening on vlan0, link-type EN10MB (Ethernet), capture size 65534 bytes 10:42:00.342629 IP (tos 0x10, ttl 128, id 0, offset 0, flags [none], proto UDP (17), length 328) 0.0.0.0.68 > 255.255.255.255.67: [udp sum ok] BOOTP/DHCP, Request from 12:39:94:73:0b:1d, length 300, xid 0x5d968cbb, Flags [none] (0x0000) Client-Ethernet-Address 12:39:94:73:0b:1d Vendor-rfc1048 Extensions Magic Cookie 0x63825363 DHCP-Message Option 53, length 1: Discover Client-ID Option 61, length 7: ether 12:39:94:73:0b:1d Hostname Option 12, length 13: "ip-10-0-0-203" Parameter-Request Option 55, length 9: Subnet-Mask, BR, Time-Zone, Classless-Static-Route Default-Gateway, Domain-Name, Domain-Name-Server, Hostname Option 119 END Option 255, length 0 PAD Option 0, length 0, occurs 21 10:42:00.342916 IP (tos 0x10, ttl 16, id 0, offset 0, flags [none], proto UDP (17), length 337) 10.0.3.1.67 > 10.0.3.114.68: [udp sum ok] BOOTP/DHCP, Reply, length 309, xid 0x5d968cbb, Flags [none] (0x0000) Your-IP 10.0.3.114 Client-Ethernet-Address 12:39:94:73:0b:1d Vendor-rfc1048 Extensions Magic Cookie 0x63825363 DHCP-Message Option 53, length 1: Offer Server-ID Option 54, length 4: 10.0.3.1 Lease-Time Option 51, length 4: 3600 Subnet-Mask Option 1, length 4: 255.255.255.0 BR Option 28, length 4: 10.0.3.255 Default-Gateway Option 3, length 4: 10.0.3.1 Domain-Name Option 15, length 12: "ec2.internal" Domain-Name-Server Option 6, length 4: 10.0.0.2 Hostname Option 12, length 13: "ip-10-0-3-114" END Option 255, length 0 10:42:02.365085 IP (tos 0x10, ttl 128, id 0, offset 0, flags [none], proto UDP (17), length 328) 0.0.0.0.68 > 255.255.255.255.67: [udp sum ok] BOOTP/DHCP, Request from 12:39:94:73:0b:1d, length 300, xid 0x5d968cbb, Flags [none] (0x0000) Client-Ethernet-Address 12:39:94:73:0b:1d Vendor-rfc1048 Extensions Magic Cookie 0x63825363 DHCP-Message Option 53, length 1: Request Server-ID Option 54, length 4: 10.0.3.1 Requested-IP Option 50, length 4: 10.0.3.114 Client-ID Option 61, length 7: ether 12:39:94:73:0b:1d Hostname Option 12, length 13: "ip-10-0-0-203" Parameter-Request Option 55, length 9: Subnet-Mask, BR, Time-Zone, Classless-Static-Route Default-Gateway, Domain-Name, Domain-Name-Server, Hostname Option 119 END Option 255, length 0 PAD Option 0, length 0, occurs 9 10:42:02.365274 IP (tos 0x10, ttl 16, id 0, offset 0, flags [none], proto UDP (17), length 337) 10.0.3.1.67 > 10.0.3.114.68: [udp sum ok] BOOTP/DHCP, Reply, length 309, xid 0x5d968cbb, Flags [none] (0x0000) Your-IP 10.0.3.114 Client-Ethernet-Address 12:39:94:73:0b:1d Vendor-rfc1048 Extensions Magic Cookie 0x63825363 DHCP-Message Option 53, length 1: ACK Server-ID Option 54, length 4: 10.0.3.1 Lease-Time Option 51, length 4: 3600 Subnet-Mask Option 1, length 4: 255.255.255.0 BR Option 28, length 4: 10.0.3.255 Default-Gateway Option 3, length 4: 10.0.3.1 Domain-Name Option 15, length 12: "ec2.internal" Domain-Name-Server Option 6, length 4: 10.0.0.2 Hostname Option 12, length 13: "ip-10-0-3-114" END Option 255, length 0 10:42:02.370732 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 10.0.3.114 tell 10.0.3.114, length 28 10:42:16.345260 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 10.0.3.114 tell 10.0.3.1, length 42 10:42:16.345280 ARP, Ethernet (len 6), IPv4 (len 4), Reply 10.0.3.114 is-at 12:39:94:73:0b:1d, length 28 ^C So I added the following patch to the VF driver in the instance to force the VF into stripping VLAN tags on RX and now the instance is able to acquire a DHCP lease and pass traffic on the interface. diff --git a/dev/ixgbe/if_ixv.c b/dev/ixgbe/if_ixv.c index bd06492..a90b4f2 100644 --- a/dev/ixgbe/if_ixv.c +++ b/dev/ixgbe/if_ixv.c @@ -1700,6 +1700,7 @@ ixv_initialize_receive_units(struct adapter *adapter) /* Do the queue enabling last */ rxdctl = IXGBE_READ_REG(hw, IXGBE_VFRXDCTL(i)); rxdctl |= IXGBE_RXDCTL_ENABLE; + rxdctl |= IXGBE_RXDCTL_VME; IXGBE_WRITE_REG(hw, IXGBE_VFRXDCTL(i), rxdctl); for (int k = 0; k < 10; k++) { if (IXGBE_READ_REG(hw, IXGBE_VFRXDCTL(i)) & All this with an unmodified host driver. The patch probably breaks VLANs inside the instance in some way. ------------------------------------------------------------------------------------- -- You are receiving this mail because: You are the assignee for the bug. From owner-freebsd-net@freebsd.org Wed Nov 25 08:25:00 2015 Return-Path: Delivered-To: freebsd-net@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 8E5D1A37A1C for ; Wed, 25 Nov 2015 08:25:00 +0000 (UTC) (envelope-from daniel.bilik@neosystem.cz) Received: from mail.neosystem.cz (mail.neosystem.cz [94.23.169.88]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 55D8C190D; Wed, 25 Nov 2015 08:24:59 +0000 (UTC) (envelope-from daniel.bilik@neosystem.cz) Received: from mail.neosystem.cz (unknown [127.0.10.15]) by mail.neosystem.cz (Postfix) with ESMTP id 57990B8C7; Wed, 25 Nov 2015 09:24:51 +0100 (CET) X-Virus-Scanned: amavisd-new at mail.neosystem.cz Received: from dragon.sn.neosystem.cz (unknown [IPv6:2001:41d0:2:5ab8::100:101]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.neosystem.cz (Postfix) with ESMTPSA id 76B2DB8C1; Wed, 25 Nov 2015 09:24:50 +0100 (CET) Date: Wed, 25 Nov 2015 09:21:45 +0100 From: Daniel Bilik To: Kristof Provost Cc: freebsd-net@freebsd.org Subject: Re: Outgoing packets being sent via wrong interface Message-Id: <20151125092145.e93151af70085c2b3393f149@neosystem.cz> In-Reply-To: <20151122130240.165a50286cbaa9288ffc063b@neosystem.cz> References: <20151120155511.5fb0f3b07228a0c829fa223f@neosystem.org> <20151120163431.3449a473db9de23576d3a4b4@neosystem.org> <20151121212043.GC2307@vega.codepro.be> <20151122130240.165a50286cbaa9288ffc063b@neosystem.cz> Organization: neosystem.cz X-Mailer: Sylpheed 3.4.3 (GTK+ 2.24.28; x86_64-portbld-dragonfly4.3) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 25 Nov 2015 08:25:00 -0000 On Sun, 22 Nov 2015 13:02:40 +0100 Daniel Bilik wrote: > Well, even though pf may play some role in the problem, I tend to suspect > the routing table as the main trigger. There are several facts to support > this... It happened again, yesterday, and I can now definitely confirm that it's related to default route. In this case, affected address was 192.168.2.33. This host was unable to connect to 192.168.2.15 (jail on the router), and router itself was unable to even ping the affected host... PING 192.168.2.33 (192.168.2.33): 56 data bytes ping: sendto: Operation not permitted ping: sendto: Operation not permitted ... because again it was pushing outgoing packets wrong way, via public interface, where it's dropped by pf... 00:00:07.091814 rule 53..16777216/0(match): block out on re0: 82.x.y.50 > 192.168.2.33: ICMP echo request, id 12037, seq 0, length 64 00:00:01.011536 rule 53..16777216/0(match): block out on re0: 82.x.y.50 > 192.168.2.33: ICMP echo request, id 12037, seq 1, length 64 I've tried to just delete default route and enter it back to routing table. In one tmux session ping was running, in another session I've performed this... # route delete default ; sleep 1 ; route add default 82.x.y.29 ... and voila, ping started to communicate with affected host... ping: sendto: Operation not permitted ping: sendto: Operation not permitted 64 bytes from 192.168.2.33: icmp_seq=12 ttl=128 time=0.535 ms 64 bytes from 192.168.2.33: icmp_seq=13 ttl=128 time=0.264 ms Touching nothing else (pf etc.), not rebooting, just "refreshing" the default route entry, and the problem disappeared. -- Dan From owner-freebsd-net@freebsd.org Wed Nov 25 12:20:35 2015 Return-Path: Delivered-To: freebsd-net@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 73697A37454 for ; Wed, 25 Nov 2015 12:20:35 +0000 (UTC) (envelope-from gpalmer@freebsd.org) Received: from mail.in-addr.com (mail.in-addr.com [IPv6:2a01:4f8:191:61e8::2525:2525]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 365A714D2; Wed, 25 Nov 2015 12:20:35 +0000 (UTC) (envelope-from gpalmer@freebsd.org) Received: from gjp by mail.in-addr.com with local (Exim 4.86 (FreeBSD)) (envelope-from ) id 1a1Z3p-0004eR-3g; Wed, 25 Nov 2015 12:20:33 +0000 Date: Wed, 25 Nov 2015 12:20:33 +0000 From: Gary Palmer To: Daniel Bilik Cc: Kristof Provost , freebsd-net@freebsd.org Subject: Re: Outgoing packets being sent via wrong interface Message-ID: <20151125122033.GB41119@in-addr.com> References: <20151120155511.5fb0f3b07228a0c829fa223f@neosystem.org> <20151120163431.3449a473db9de23576d3a4b4@neosystem.org> <20151121212043.GC2307@vega.codepro.be> <20151122130240.165a50286cbaa9288ffc063b@neosystem.cz> <20151125092145.e93151af70085c2b3393f149@neosystem.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20151125092145.e93151af70085c2b3393f149@neosystem.cz> X-SA-Exim-Connect-IP: X-SA-Exim-Mail-From: gpalmer@freebsd.org X-SA-Exim-Scanned: No (on mail.in-addr.com); SAEximRunCond expanded to false X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 25 Nov 2015 12:20:35 -0000 On Wed, Nov 25, 2015 at 09:21:45AM +0100, Daniel Bilik wrote: > On Sun, 22 Nov 2015 13:02:40 +0100 > Daniel Bilik wrote: > > > Well, even though pf may play some role in the problem, I tend to suspect > > the routing table as the main trigger. There are several facts to support > > this... > > It happened again, yesterday, and I can now definitely confirm that it's > related to default route. > > In this case, affected address was 192.168.2.33. This host was unable to > connect to 192.168.2.15 (jail on the router), and router itself was unable > to even ping the affected host... > > PING 192.168.2.33 (192.168.2.33): 56 data bytes > ping: sendto: Operation not permitted > ping: sendto: Operation not permitted > > ... because again it was pushing outgoing packets wrong way, via public > interface, where it's dropped by pf... > > 00:00:07.091814 rule 53..16777216/0(match): block out on re0: 82.x.y.50 > 192.168.2.33: ICMP echo request, id 12037, seq 0, length 64 > 00:00:01.011536 rule 53..16777216/0(match): block out on re0: 82.x.y.50 > 192.168.2.33: ICMP echo request, id 12037, seq 1, length 64 > > I've tried to just delete default route and enter it back to routing table. > In one tmux session ping was running, in another session I've performed > this... > > # route delete default ; sleep 1 ; route add default 82.x.y.29 > > ... and voila, ping started to communicate with affected host... > > ping: sendto: Operation not permitted > ping: sendto: Operation not permitted > 64 bytes from 192.168.2.33: icmp_seq=12 ttl=128 time=0.535 ms > 64 bytes from 192.168.2.33: icmp_seq=13 ttl=128 time=0.264 ms > > Touching nothing else (pf etc.), not rebooting, just "refreshing" the > default route entry, and the problem disappeared. When the problem happens, what does the output of route -n get show? It would also be worth checking the arp table. Gary From owner-freebsd-net@freebsd.org Wed Nov 25 12:53:02 2015 Return-Path: Delivered-To: freebsd-net@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id A4E6CA37B85 for ; Wed, 25 Nov 2015 12:53:02 +0000 (UTC) (envelope-from kp@vega.codepro.be) Received: from venus.codepro.be (venus.codepro.be [IPv6:2a01:4f8:162:1127::2]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "*.codepro.be", Issuer "Gandi Standard SSL CA 2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 6A4D513E2 for ; Wed, 25 Nov 2015 12:53:01 +0000 (UTC) (envelope-from kp@vega.codepro.be) Received: from vega.codepro.be (unknown [172.16.1.3]) by venus.codepro.be (Postfix) with ESMTP id 8A292D9F5; Wed, 25 Nov 2015 13:52:58 +0100 (CET) Received: by vega.codepro.be (Postfix, from userid 1001) id 8637A1AAA6; Wed, 25 Nov 2015 13:52:58 +0100 (CET) Date: Wed, 25 Nov 2015 13:52:58 +0100 From: Kristof Provost To: Daniel Bilik Cc: freebsd-net@freebsd.org Subject: Re: Outgoing packets being sent via wrong interface Message-ID: <20151125125258.GB2469@vega.codepro.be> References: <20151120155511.5fb0f3b07228a0c829fa223f@neosystem.org> <20151120163431.3449a473db9de23576d3a4b4@neosystem.org> <20151121212043.GC2307@vega.codepro.be> <20151122130240.165a50286cbaa9288ffc063b@neosystem.cz> <20151125092145.e93151af70085c2b3393f149@neosystem.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20151125092145.e93151af70085c2b3393f149@neosystem.cz> X-Checked-By-NSA: Probably User-Agent: Mutt/1.5.24 (2015-08-30) X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 25 Nov 2015 12:53:02 -0000 On 2015-11-25 09:21:45 (+0100), Daniel Bilik wrote: > Touching nothing else (pf etc.), not rebooting, just "refreshing" the > default route entry, and the problem disappeared. > I was still inclined to suspect pf based on your previous findings, because pf subscribes to IP address (and group) information, so changing those could have triggered something in pf. It doesn't subscribe to routing information though, so right now it does look unlikely to be a pf issue. Regards, Kristof From owner-freebsd-net@freebsd.org Wed Nov 25 13:20:01 2015 Return-Path: Delivered-To: freebsd-net@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 8B4E1A36109 for ; Wed, 25 Nov 2015 13:20:01 +0000 (UTC) (envelope-from daniel.bilik@neosystem.cz) Received: from mail.neosystem.cz (mail.neosystem.cz [IPv6:2001:41d0:2:5ab8::10:15]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 5149512F3; Wed, 25 Nov 2015 13:20:01 +0000 (UTC) (envelope-from daniel.bilik@neosystem.cz) Received: from mail.neosystem.cz (unknown [127.0.10.15]) by mail.neosystem.cz (Postfix) with ESMTP id 77BCEBBCE; Wed, 25 Nov 2015 14:19:58 +0100 (CET) X-Virus-Scanned: amavisd-new at mail.neosystem.cz Received: from dragon.sn.neosystem.cz (unknown [IPv6:2001:41d0:2:5ab8::100:101]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.neosystem.cz (Postfix) with ESMTPSA id 60FA7BBC8; Wed, 25 Nov 2015 14:19:57 +0100 (CET) Date: Wed, 25 Nov 2015 14:16:26 +0100 From: Daniel Bilik To: Gary Palmer Cc: freebsd-net@freebsd.org Subject: Re: Outgoing packets being sent via wrong interface Message-Id: <20151125141626.6f9579478e1b9d0eb1d4a84f@neosystem.cz> In-Reply-To: <20151125122033.GB41119@in-addr.com> References: <20151120155511.5fb0f3b07228a0c829fa223f@neosystem.org> <20151120163431.3449a473db9de23576d3a4b4@neosystem.org> <20151121212043.GC2307@vega.codepro.be> <20151122130240.165a50286cbaa9288ffc063b@neosystem.cz> <20151125092145.e93151af70085c2b3393f149@neosystem.cz> <20151125122033.GB41119@in-addr.com> Organization: neosystem.cz X-Mailer: Sylpheed 3.4.3 (GTK+ 2.24.28; x86_64-portbld-dragonfly4.3) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 25 Nov 2015 13:20:01 -0000 On Wed, 25 Nov 2015 12:20:33 +0000 Gary Palmer wrote: > When the problem happens, what does the output of > route -n get > show? I'll check this next time it happens. Thanks for the tip. Right now it seems correct: route to: 192.168.2.33 destination: 192.168.2.0 mask: 255.255.255.0 fib: 0 interface: re1 flags: recvpipe sendpipe ssthresh rtt,msec mtu weight expire 0 0 0 0 1500 1 0 > It would also be worth checking the arp table. Yes, checking arp table was one of the first things I did when analyzing the problem. All arp entries seem correct, and do not change before-during-after the problem. I've also tried to manually remove arp entry for affected address (ie. forcing it to be refreshed), but it does not help. -- Dan From owner-freebsd-net@freebsd.org Wed Nov 25 14:30:28 2015 Return-Path: Delivered-To: freebsd-net@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 67494A371DB for ; Wed, 25 Nov 2015 14:30:28 +0000 (UTC) (envelope-from wuhao.thu@gmail.com) Received: from mailman.ysv.freebsd.org (mailman.ysv.freebsd.org [IPv6:2001:1900:2254:206a::50:5]) by mx1.freebsd.org (Postfix) with ESMTP id 49CE019C0 for ; Wed, 25 Nov 2015 14:30:28 +0000 (UTC) (envelope-from wuhao.thu@gmail.com) Received: by mailman.ysv.freebsd.org (Postfix) id 49D93A371DA; Wed, 25 Nov 2015 14:30:28 +0000 (UTC) Delivered-To: net@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 49796A371D9 for ; Wed, 25 Nov 2015 14:30:28 +0000 (UTC) (envelope-from wuhao.thu@gmail.com) Received: from mail-lf0-x232.google.com (mail-lf0-x232.google.com [IPv6:2a00:1450:4010:c07::232]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id C7F1519BE for ; Wed, 25 Nov 2015 14:30:27 +0000 (UTC) (envelope-from wuhao.thu@gmail.com) Received: by lfdl133 with SMTP id l133so62234427lfd.2 for ; Wed, 25 Nov 2015 06:30:25 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type; bh=/dTR4BJ1E9iHitRR0fscwfFZGAlZnMx9JToQ7T8bcyA=; b=Az67MlS3qsJpAMjF/aRMEnW/Fjy4DG77VGJPpv5JnJtTohTev3mSfWdeS1AYmLQPJh HxvppQ/s9sHrW/QdiIxFmp4TmpdD6QZaEv0jeVGNxiakhjQuVzNneeNkgqNoGxl6SpLZ vTaHvbLwnvoFGtgF4VUH0V7kZIJQCl9tyReEoDRsisfelWImR6KfAHYB+22H4wXjWg52 gCsyXNWYLhjD0htRfZcvuvp99oy0yqkEXevyIsUhwef/mPbQKqViy9K01ZHUnEhri+M+ s5xdzaYlI0ijKzmZYMpzWzzrdmOnx5VOnXNRzy+qfOe4e2Tkz3CGpWCQCaogQmr3PtRQ lDqQ== MIME-Version: 1.0 X-Received: by 10.25.20.95 with SMTP id k92mr15797410lfi.13.1448461825789; Wed, 25 Nov 2015 06:30:25 -0800 (PST) Received: by 10.25.148.213 with HTTP; Wed, 25 Nov 2015 06:30:25 -0800 (PST) Date: Wed, 25 Nov 2015 22:30:25 +0800 Message-ID: Subject: Can I send the Ethernet frames with particular payload via Netmap? From: Hao Wu To: net@freebsd.org Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.20 X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 25 Nov 2015 14:30:28 -0000 Hi all, I just start using Netmap. I want to know can I build the Ethernet frames and send out via Netmap? I used to build the Ethernet frames via Libnet, but it is too slow. So I turn to Netmap now. But I have no idea on how to write the code using Netmap or what functions should I call? Any replay is highly appreciated! +++++++++++++++++ Best, Hao From owner-freebsd-net@freebsd.org Wed Nov 25 17:29:20 2015 Return-Path: Delivered-To: freebsd-net@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id DC887A367F9 for ; Wed, 25 Nov 2015 17:29:19 +0000 (UTC) (envelope-from kob6558@gmail.com) Received: from mail-ob0-x235.google.com (mail-ob0-x235.google.com [IPv6:2607:f8b0:4003:c01::235]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 9717C1FAC; Wed, 25 Nov 2015 17:29:19 +0000 (UTC) (envelope-from kob6558@gmail.com) Received: by obbnk6 with SMTP id nk6so43961303obb.2; Wed, 25 Nov 2015 09:29:18 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date:message-id:subject :from:to:cc:content-type; bh=diwozxJUKcF34tx/S1NaAG2EDpRlDL2j0Lbw0PbYOXM=; b=uSBVyVfth2XuB7G93SNJ5srqSScn49ot69DwAMS5TqxIuvmobET3rSYVHi+rF0Uw6z 2j9j5dtdES/K+bC8wvQFTYYTpfKVZ5KTtf1ujxzlE16vFPZC59OXOrm0Ucl3XUORaQDX TsybxhNurY+044QZdTUJcA/KEPqR0C+ec4NBP00X4ztc2OhHmy2xQ04Obo3mHIGF6/Lc mCT3RaFTM+GslBBrx5Lm/AFxxKZNQM4dapHd198Kya+afb6nVHBrOQER7WEK6pLHgcAv eNUb1/ZDijhmJmdBonaXNtDw6GBID7XRFFmtOl5WUBWpgaaXnVf1yNmGszk9AWhcF6Q3 3MPg== MIME-Version: 1.0 X-Received: by 10.182.148.164 with SMTP id tt4mr12231753obb.25.1448472558457; Wed, 25 Nov 2015 09:29:18 -0800 (PST) Sender: kob6558@gmail.com Received: by 10.202.98.131 with HTTP; Wed, 25 Nov 2015 09:29:18 -0800 (PST) In-Reply-To: <20151125141626.6f9579478e1b9d0eb1d4a84f@neosystem.cz> References: <20151120155511.5fb0f3b07228a0c829fa223f@neosystem.org> <20151120163431.3449a473db9de23576d3a4b4@neosystem.org> <20151121212043.GC2307@vega.codepro.be> <20151122130240.165a50286cbaa9288ffc063b@neosystem.cz> <20151125092145.e93151af70085c2b3393f149@neosystem.cz> <20151125122033.GB41119@in-addr.com> <20151125141626.6f9579478e1b9d0eb1d4a84f@neosystem.cz> Date: Wed, 25 Nov 2015 09:29:18 -0800 X-Google-Sender-Auth: tpUZH05jWqMyjb2TyO20Jy_VGL0 Message-ID: Subject: Re: Outgoing packets being sent via wrong interface From: Kevin Oberman To: Daniel Bilik Cc: Gary Palmer , "freebsd-net@freebsd.org" Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.20 X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 25 Nov 2015 17:29:20 -0000 On Wed, Nov 25, 2015 at 5:16 AM, Daniel Bilik wrote: > On Wed, 25 Nov 2015 12:20:33 +0000 > Gary Palmer wrote: > > > When the problem happens, what does the output of > > route -n get > > show? > > I'll check this next time it happens. Thanks for the tip. Right now it > seems correct: > > route to: 192.168.2.33 > destination: 192.168.2.0 > mask: 255.255.255.0 > fib: 0 > interface: re1 > flags: > recvpipe sendpipe ssthresh rtt,msec mtu weight expire > 0 0 0 0 1500 1 0 > > > It would also be worth checking the arp table. > > Yes, checking arp table was one of the first things I did when analyzing > the problem. All arp entries seem correct, and do not change > before-during-after the problem. I've also tried to manually remove arp > entry for affected address (ie. forcing it to be refreshed), but it > does not help. > > -- > Dan Have you looked for ICMP redirect traffic? Does your firewall allow them? If so, could you try adding a rule to block them? I can't provide a sample rule as I don't use pf, but you want to block ICMP type 5 messages. For a good overview of redirects, see either Wikipedia or Cisco articles (or Google for many others). -- Kevin Oberman, Part time kid herder and retired Network Engineer E-mail: rkoberman@gmail.com PGP Fingerprint: D03FB98AFA78E3B78C1694B318AB39EF1B055683 From owner-freebsd-net@freebsd.org Wed Nov 25 22:45:14 2015 Return-Path: Delivered-To: freebsd-net@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id A5E62A362AF for ; Wed, 25 Nov 2015 22:45:14 +0000 (UTC) (envelope-from rysto32@gmail.com) Received: from mail-io0-x22e.google.com (mail-io0-x22e.google.com [IPv6:2607:f8b0:4001:c06::22e]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 6AFCF1870 for ; Wed, 25 Nov 2015 22:45:14 +0000 (UTC) (envelope-from rysto32@gmail.com) Received: by iouu10 with SMTP id u10so68977339iou.0 for ; Wed, 25 Nov 2015 14:45:13 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=FxzwdmOliw6wYmn/EaPvNNfg7IXZMPrYFx6lkPRpdzY=; b=jnkJ5gVr7gn0NoNjFZz+hD8XwlSiu+P2AjklOrb5x3XCT24UtaK4yInfYrTBgPvzgM S0V2er4KEbEm0QgqqHI/BW9JeDmjuyWJHNaeFISW2II35S6uz51Us+fEhhs4eYUEFzFH jJRYX3u+m9S3pbRVKk+EOAmZ3DQyoKT1WUN5ndHLGW35pTXpR4nLvDNKi6/792EQd+OB 58m3p/Jp0uvC1T5X/AJwNMzGq0hcecZfXJVFRAl3DnBs4MPRTT2jvkauG5yjmBK4lYQ0 WAvqf2QcTUM6csis4wUAxSaBtRLQ6Qz9KOoaU5YCsqp9iPSMWE1Iec0Fop2sJZ2qOGH6 niSQ== MIME-Version: 1.0 X-Received: by 10.107.16.18 with SMTP id y18mr39247920ioi.113.1448491513697; Wed, 25 Nov 2015 14:45:13 -0800 (PST) Received: by 10.107.170.102 with HTTP; Wed, 25 Nov 2015 14:45:13 -0800 (PST) In-Reply-To: References: <20151120155511.5fb0f3b07228a0c829fa223f@neosystem.org> <20151120163431.3449a473db9de23576d3a4b4@neosystem.org> <20151121212043.GC2307@vega.codepro.be> <20151122130240.165a50286cbaa9288ffc063b@neosystem.cz> <20151125092145.e93151af70085c2b3393f149@neosystem.cz> <20151125122033.GB41119@in-addr.com> <20151125141626.6f9579478e1b9d0eb1d4a84f@neosystem.cz> Date: Wed, 25 Nov 2015 17:45:13 -0500 Message-ID: Subject: Re: Outgoing packets being sent via wrong interface From: Ryan Stone To: Kevin Oberman Cc: Daniel Bilik , "freebsd-net@freebsd.org" Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.20 X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 25 Nov 2015 22:45:14 -0000 An easier way to block ICMP redirects would be to set the sysctl: sysctl net.inet.icmp.drop_redirect=1 From owner-freebsd-net@freebsd.org Thu Nov 26 10:35:40 2015 Return-Path: Delivered-To: freebsd-net@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id B8102A39CFA for ; Thu, 26 Nov 2015 10:35:40 +0000 (UTC) (envelope-from ulric@siag.nu) Received: from mailman.ysv.freebsd.org (mailman.ysv.freebsd.org [IPv6:2001:1900:2254:206a::50:5]) by mx1.freebsd.org (Postfix) with ESMTP id 976F4179A for ; Thu, 26 Nov 2015 10:35:40 +0000 (UTC) (envelope-from ulric@siag.nu) Received: by mailman.ysv.freebsd.org (Postfix) id 96542A39CF8; Thu, 26 Nov 2015 10:35:40 +0000 (UTC) Delivered-To: net@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 95E05A39CF4 for ; Thu, 26 Nov 2015 10:35:40 +0000 (UTC) (envelope-from ulric@siag.nu) Received: from smtp.outgoing.loopia.se (smtp.outgoing.loopia.se [194.9.95.113]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 52FF71798 for ; Thu, 26 Nov 2015 10:35:39 +0000 (UTC) (envelope-from ulric@siag.nu) Received: from s314.loopia.se (localhost [127.0.0.1]) by s314.loopia.se (Postfix) with ESMTP id C7FAF162B579 for ; Thu, 26 Nov 2015 11:25:54 +0100 (CET) X-Loopia-Auth: webmail X-Loopia-User: ulric@siag.nu Received: from s498.loopia.se (unknown [172.21.200.96]) by s314.loopia.se (Postfix) with ESMTP id AA90220057FD; Thu, 26 Nov 2015 11:25:54 +0100 (CET) Received: from s405.loopia.se (unknown [172.21.200.105]) by s498.loopia.se (Postfix) with ESMTP id A30D245F912; Thu, 26 Nov 2015 11:25:54 +0100 (CET) X-Virus-Scanned: amavisd-new at amavis.loopia.se X-Spam-Flag: NO X-Spam-Score: -0.331 X-Spam-Level: X-Spam-Status: No, score=-0.331 tagged_above=-999 required=6.2 tests=[ALL_TRUSTED=-1, AWL=0.669] autolearn=disabled Received: from s498.loopia.se ([172.21.200.105]) by s405.loopia.se (s405.loopia.se [172.21.200.135]) (amavisd-new, port 10024) with LMTP id G0V0CYfB_os3; Thu, 26 Nov 2015 11:25:53 +0100 (CET) Received: from localhost (webmail.loopia.se [194.9.95.85]) (Authenticated sender: ulric@siag.nu) by s498.loopia.se (Postfix) with ESMTPA id 9E82A45EDB0; Thu, 26 Nov 2015 11:25:53 +0100 (CET) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII; format=flowed Content-Transfer-Encoding: 7bit Date: Thu, 26 Nov 2015 11:25:53 +0100 From: ulric@siag.nu To: Hao Wu Cc: net@freebsd.org, owner-freebsd-net@freebsd.org Subject: Re: Can I send the Ethernet frames with particular payload via Netmap? In-Reply-To: References: Message-Id: <9b7b2eb25fd42295d3f2f8a46d2a3c1e@siag.nu> X-Sender: ulric@siag.nu User-Agent: Loopia Webmail/1.1.3 X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 26 Nov 2015 10:35:40 -0000 2015-11-25 15:30 skrev Hao Wu: > Hi all, > > I just start using Netmap. I want to know can I build the Ethernet > frames and send out via Netmap? I used to build the Ethernet frames via > Libnet, but it is too slow. So I turn to Netmap now. But I have no idea > on > how to write the code using Netmap or what functions should I call? > > Any replay is highly appreciated! It's crazy simple. To open a netmap descriptor: d = nm_open(ifname, NULL, 0, 0); To receive a frame: uint8_t *b = nm_nextpkt(d, &h); To send a frame: int n = nm_inject(d, b, len); Working example from Pen: https://github.com/UlricE/pen/blob/master/dsr.c Ulric From owner-freebsd-net@freebsd.org Thu Nov 26 12:14:11 2015 Return-Path: Delivered-To: freebsd-net@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 0D611A39A4A for ; Thu, 26 Nov 2015 12:14:11 +0000 (UTC) (envelope-from wuhao.thu@gmail.com) Received: from mailman.ysv.freebsd.org (mailman.ysv.freebsd.org [IPv6:2001:1900:2254:206a::50:5]) by mx1.freebsd.org (Postfix) with ESMTP id E2D411A60 for ; Thu, 26 Nov 2015 12:14:10 +0000 (UTC) (envelope-from wuhao.thu@gmail.com) Received: by mailman.ysv.freebsd.org (Postfix) id DFBD0A39A48; Thu, 26 Nov 2015 12:14:10 +0000 (UTC) Delivered-To: net@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id C558EA39A47; Thu, 26 Nov 2015 12:14:10 +0000 (UTC) (envelope-from wuhao.thu@gmail.com) Received: from mail-lf0-x233.google.com (mail-lf0-x233.google.com [IPv6:2a00:1450:4010:c07::233]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4EB551A5E; Thu, 26 Nov 2015 12:14:10 +0000 (UTC) (envelope-from wuhao.thu@gmail.com) Received: by lfs39 with SMTP id 39so92891988lfs.3; Thu, 26 Nov 2015 04:14:08 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=IbG9KZdhjTjxNiP+DBXerQIsDZzErEqKKQEN9i5GhVM=; b=0ZO1wOrOT8N4z84DW+iRDS1+WI0DSfinaSmiMAOMB5bOGjzMhpQ3sHZytlxPXpXjDe +r2m2xJIi2pIqMGs9+AWHqVBOfoGTlm2ssLGBqPMWirrw5DBYk5cNbXvz8G+2Y0FuW0H ulJIZ+6O8T7bRPDKfVxuTsXC/QluoHiGrVLWTCG6TqfovTdDqSlZ5VL8htft56jlX/g1 L6EhQGbLhpwcBktqjCpyZgmz3fKMZOOUgzcKA3kqvgyK7oQH8BIwj9xO0owgzI/jndwW wqbDsr5Y4+xN26mmvG0xadpAFnjX6q90msrTuZwPd7JoQFP+83ZGpSAHtmKmKOzecH1v BCwA== MIME-Version: 1.0 X-Received: by 10.112.199.4 with SMTP id jg4mr14915780lbc.59.1448540048159; Thu, 26 Nov 2015 04:14:08 -0800 (PST) Received: by 10.25.148.213 with HTTP; Thu, 26 Nov 2015 04:14:08 -0800 (PST) In-Reply-To: <9b7b2eb25fd42295d3f2f8a46d2a3c1e@siag.nu> References: <9b7b2eb25fd42295d3f2f8a46d2a3c1e@siag.nu> Date: Thu, 26 Nov 2015 20:14:08 +0800 Message-ID: Subject: Re: Can I send the Ethernet frames with particular payload via Netmap? From: Hao Wu To: ulric@siag.nu Cc: net@freebsd.org, owner-freebsd-net@freebsd.org Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.20 X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 26 Nov 2015 12:14:11 -0000 Hi Ulric, Got it! Many thanks :) +++++++++++++++++ Best, Hao On Thu, Nov 26, 2015 at 6:25 PM, wrote: > > > 2015-11-25 15:30 skrev Hao Wu: > >> Hi all, >> >> I just start using Netmap. I want to know can I build the Ethernet >> frames and send out via Netmap? I used to build the Ethernet frames via >> Libnet, but it is too slow. So I turn to Netmap now. But I have no idea on >> how to write the code using Netmap or what functions should I call? >> >> Any replay is highly appreciated! >> > > > It's crazy simple. To open a netmap descriptor: > > d = nm_open(ifname, NULL, 0, 0); > > To receive a frame: > > uint8_t *b = nm_nextpkt(d, &h); > > To send a frame: > > int n = nm_inject(d, b, len); > > Working example from Pen: > > https://github.com/UlricE/pen/blob/master/dsr.c > > Ulric > From owner-freebsd-net@freebsd.org Thu Nov 26 16:41:52 2015 Return-Path: Delivered-To: freebsd-net@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 06B4AA3A7A4 for ; Thu, 26 Nov 2015 16:41:52 +0000 (UTC) (envelope-from steven@multiplay.co.uk) Received: from mail-wm0-x22c.google.com (mail-wm0-x22c.google.com [IPv6:2a00:1450:400c:c09::22c]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 8FE641042 for ; Thu, 26 Nov 2015 16:41:51 +0000 (UTC) (envelope-from steven@multiplay.co.uk) Received: by wmww144 with SMTP id w144so27951819wmw.1 for ; Thu, 26 Nov 2015 08:41:49 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=multiplay-co-uk.20150623.gappssmtp.com; s=20150623; h=subject:to:references:cc:from:message-id:date:user-agent :mime-version:in-reply-to:content-type:content-transfer-encoding; bh=zGaCqlfEIyjkycC5lVbrA2dzIttoGT3UhSnIc9uUsmg=; b=Ym0//sDfx7OCjmEgNh+olVlxqUzaGW6TZBrgrEPr/wqsjg+X1GD/li/okeKp5oO5W6 da/d6TwYLjQ33813k6T5/AcZ2ZvN/YO9TN2SzZFYXQ7wQ6/Xfrb7Ww0rBbBgZwHFt85Y LNED3jwNpvyc3yMSFGhQE/FTWM1+KEJ4iCKR6xd+rWmNAG81tr7dxj7JzgQD+V7tRJil rnJGOtx/dZ/sUsEsKrama/VwNQhcUOo3O13mQKWUtUpUa1nCdQ59PzpvEkiqmKQOATJw 46LyInd9GuqC0hV4n/TpwK6ZjkUNVguIdQmrJ1oc3hpMneF2+TWm8SQhwAtVzA4Q+Nqr rR4g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:subject:to:references:cc:from:message-id:date :user-agent:mime-version:in-reply-to:content-type :content-transfer-encoding; bh=zGaCqlfEIyjkycC5lVbrA2dzIttoGT3UhSnIc9uUsmg=; b=lEnstK5NusFe8sAv1IiiQmnh04FdXjq47kzN++68TVwxkT105gb0wooeVuWEgw/QKs 042aZHQDQihsqqLvzwTUdsEGwXWMbn/g6Xpfas8vEvliaaDTL8Tz2HZqN+ytUndlvSwx qTSO1WOxX2j4kK9S+KY353JbtmWFi2vAmGlPpsJoCF5TSCEQCt0w3vtathsrCURdGfMc nwzrYQ4XpFrHfoLctjByELR9BfOvDLN59sFaOjJLUEs9aB2LIjyZWesa5PURpTqB7uW+ THFY8vyuLH2AkgH+VVNJ9ThZcQbYrClgFaF12MAnkYVs2IeFAZqvR7tIJ/KQ1GUCl8Ao eDbw== X-Gm-Message-State: ALoCoQmP5A3hTGRL1C6A2KSMuwJcGm6wywPj+4MaiGbBdcQWdDmnfPi+6F040UM6L5pvedpmfEkA X-Received: by 10.28.4.7 with SMTP id 7mr4600539wme.85.1448556109659; Thu, 26 Nov 2015 08:41:49 -0800 (PST) Received: from [10.10.1.58] (liv3d.labs.multiplay.co.uk. [82.69.141.171]) by smtp.gmail.com with ESMTPSA id jz1sm28811223wjc.27.2015.11.26.08.41.48 (version=TLSv1/SSLv3 cipher=OTHER); Thu, 26 Nov 2015 08:41:48 -0800 (PST) Subject: Re: Intel XL710 broken link down detection? To: "Pieper, Jeffrey E" , Ryan Stone References: <564357E0.1050002@freebsd.org> <56436A5F.4020102@multiplay.co.uk> <56446159.3080405@multiplay.co.uk> <2A35EA60C3C77D438915767F458D65688080C4A8@ORSMSX111.amr.corp.intel.com> Cc: Jack F Vogel , "freebsd-net@freebsd.org" From: Steven Hartland Message-ID: <5657364C.10806@multiplay.co.uk> Date: Thu, 26 Nov 2015 16:41:48 +0000 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:38.0) Gecko/20100101 Thunderbird/38.3.0 MIME-Version: 1.0 In-Reply-To: <2A35EA60C3C77D438915767F458D65688080C4A8@ORSMSX111.amr.corp.intel.com> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 26 Nov 2015 16:41:52 -0000 Its been a couple of weeks now so wanted to check if you had any news on this Pieper? Also for extra visibility I'm looking to bring ixl in stable/10 up to date with the MFC's commits present in HEAD, which are quite a way behind. One commit requires big changes between HEAD and stable/10, due to it being a combination of RSS support (not going to MFC'ed) and some bug fixes, is up for review here: https://reviews.freebsd.org/D4265 I'm looking to use this for a 10.x based DC rollout over the next few weeks, so if anyone can look at that it would be most appreciated. Regards Steve On 12/11/2015 15:18, Pieper, Jeffrey E wrote: > We already have a fix in place that will be committed for review shortly. > > Thanks, > Jeff > > -----Original Message----- > From: owner-freebsd-net@freebsd.org [mailto:owner-freebsd-net@freebsd.org] On Behalf Of Steven Hartland > Sent: Thursday, November 12, 2015 1:52 AM > To: Ryan Stone > Cc: Jack F Vogel ; freebsd-net@freebsd.org > Subject: Re: Intel XL710 broken link down detection? > > Yes this works but a better way IMO would be to invert the bits we want: > https://people.freebsd.org/~smh/ixl_int_init.patch > > If there are no objections then I'll commit this later today. > > Also just fixed the debug sysctls from causing panics when compiled with > INVARIANTS see: > https://svnweb.freebsd.org/base?view=revision&revision=290708 > > Regards > Steve > > On 11/11/2015 16:31, Ryan Stone wrote: >> On Wed, Nov 11, 2015 at 11:18 AM, Steven Hartland >> > wrote: >> >> Comparing this to the Linux driver which does detect the link down >> I've discovered it actually polls the link status by default in >> its watchdog. >> >> Disabling this with "ethtool --set-priv-flags eth1 LinkPolling >> off" and the Linux driver also fails to detect link down. >> >> So this seems like a firmware or even hardware bug where it should >> be reporting down events and the Linux driver has been updated to >> workaround the problem? >> >> >> No, apparently the Linux devs just didn't read the datasheet closely >> enough (and presumably the FreeBSD driver copied the mistake). There >> is a mask of interrupt causes that works backwards from how one would >> expect; you mask out events that you *don't* want rather than events >> that you do want. Both the Linux and FreeBSD drivers pass a mask of >> events that they want interrupts for (the only reason why it appears >> to work on link up is that the the AN Completed event fires when link >> is up, as far as I can tell). Try the following patch: >> >> https://people.freebsd.org/~rstone/patches/ixl_link_int.diff >> >> > _______________________________________________ > freebsd-net@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" From owner-freebsd-net@freebsd.org Thu Nov 26 22:56:36 2015 Return-Path: Delivered-To: freebsd-net@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 45E61A3A522 for ; Thu, 26 Nov 2015 22:56:36 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2001:1900:2254:206a::16:76]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 316FF18E1 for ; Thu, 26 Nov 2015 22:56:36 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from bugs.freebsd.org ([127.0.1.118]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id tAQMuaeE075106 for ; Thu, 26 Nov 2015 22:56:36 GMT (envelope-from bugzilla-noreply@freebsd.org) From: bugzilla-noreply@freebsd.org To: freebsd-net@FreeBSD.org Subject: [Bug 204831] mld_v2 listener report does not report all active groups to the router Date: Thu, 26 Nov 2015 22:56:35 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: kern X-Bugzilla-Version: 9.3-STABLE X-Bugzilla-Keywords: X-Bugzilla-Severity: Affects Some People X-Bugzilla-Who: linimon@FreeBSD.org X-Bugzilla-Status: New X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: freebsd-net@FreeBSD.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: assigned_to Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 26 Nov 2015 22:56:36 -0000 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=204831 Mark Linimon changed: What |Removed |Added ---------------------------------------------------------------------------- Assignee|freebsd-bugs@FreeBSD.org |freebsd-net@FreeBSD.org -- You are receiving this mail because: You are the assignee for the bug. From owner-freebsd-net@freebsd.org Fri Nov 27 01:57:46 2015 Return-Path: Delivered-To: freebsd-net@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 4D72DA3A73F for ; Fri, 27 Nov 2015 01:57:46 +0000 (UTC) (envelope-from mmacy@nextbsd.org) Received: from sender163-mail.zoho.com (sender163-mail.zoho.com [74.201.84.163]) (using TLSv1 with cipher ECDHE-RSA-AES128-SHA (128/128 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 3E34D1A7F for ; Fri, 27 Nov 2015 01:57:45 +0000 (UTC) (envelope-from mmacy@nextbsd.org) Received: from mail.zoho.com by mx.zohomail.com with SMTP id 1448589456055734.7743441994829; Thu, 26 Nov 2015 17:57:36 -0800 (PST) Date: Thu, 26 Nov 2015 17:57:35 -0800 From: Matthew Macy To: "freebsd-net@freebsd.org" Message-ID: <15146a8f285.b094791a15089.3823664487014698900@nextbsd.org> Subject: TCP notes and incast recommendations MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Priority: Medium User-Agent: Zoho Mail X-Mailer: Zoho Mail X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 27 Nov 2015 01:57:46 -0000 In an effort to be somewhat current on the state TCP I've collected a small= bibliography. I've tried to summarize RFCs and papers that I believe to be important and provide som= e general background for others who do not have a deeper familiarity with TCP or congestion control = - in particular as impacts DCTCP. Recommendations references phabricator changes. Table Of Contents: I) - A Roadmap for Transmission Control Protocol (TCP) Specification Documents (RFC 7414) II) - Metrics for the Evaluation of Congestion Control Mechanisms =09(RFC 5166) III) - TCP Congestion Control (RFC 5681) IV) - Computing TCP's Retransmission Timer (RFC 6298) V) - Increasing TCP's Initial Window (RFC 6928) VI) - TCP Extensions for High Performance [RTO updates =09and changes to RFC 1323] (RFC 7323) VII) - Updating TCP to Support Rate-Limited Traffic =09[Congestion Window Validation] (RFC 7661) VIII) - Active Queue Management (AQM) IX) - Explicit Congestion Notification (ECN) X) - AccurateECN (AccECN) XI) - Incast Causes and Solutions XII) - Data Center Transmission Control Protocol (DCTCP) XIII) - Incast TCP (ICTCP) XIV) - Quantum Congestion Notification (QCN) XV) - Recommendations A Roadmap for Transmission Control Protocol (TCP) Specification Documents [important]: https://tools.ietf.org/html/rfc7414 A correct and efficient implementation of the Transmission Control Protocol (TCP) is a critical part of the software of most Internet hosts. As TCP has evolved over the years, many distinct documents have become part of the accepted standard for TCP. At the same time, a large number of experimental modifications to TCP have also been published in the RFC series, along with informational notes, case studies, and other advice. As an introduction to newcomers and an attempt to organize the plethora of information for old hands, this document contains a roadmap to the TCP-related RFCs. It provides a brief summary of the RFC documents that define TCP. This should provide guidance to implementers on the relevance and significance of the standards-track extensions, informational notes, and best current practices that relate to TCP. This roadmap includes a brief description of the contents of each TCP-related RFC [N.B. I only include an excerpt of the summary for those that I consider interesting or important]. In some cases, we simply sup= ply=20 the abstract or a key summary sentence from the text as a terse descript= ion. =20 In addition, a letter code after an RFC number indicates its category in= the RFC series (see BCP 9 [RFC2026] for explanation of these categories): S - Standards Track (Proposed Standard, Draft Standard, or Internet Standard) E - Experimental I - Informational H - Historic B - Best Current Practice U - Unknown (not formally defined) [2.] Core Functionality A small number of documents compose the core specification of TCP. These define the required core functionalities of TCP's header parsing, state machine, congestion control, and retransmission timeout computation. These base specifications must be correctly followed for interoperability. RFC 793 S: "Transmission Control Protocol", STD 7 (September 1981) (Errata) This is the fundamental TCP specification document [RFC793]. Written by Jon Postel as part of the Internet protocol suite's core, it describes the TCP packet format, the TCP state machine and event processing, and TCP's semantics for data transmission, reliability, flow control, multiplexing, and acknowledgment. RFC 1122 S: "Requirements for Internet Hosts - Communication Layers" (October 1989) This document [RFC1122] updates and clarifies RFC 793 (see above in Section 2), fixing some specification bugs and oversights. It also explains some features such as keep-alives and Karn's and Jacobson's RTO estimation algorithms [KP87][Jac88][JK92]. ICMP interactions are mentioned, and some tips are given for efficient implementation. RFC 1122 is an Applicability Statement, listing the various features that MUST, SHOULD, MAY, SHOULD NOT, and MUST NOT be present in standards-conforming TCP implementations. Unlike a purely informational roadmap, this Applicability Statement is a standards document and gives formal rules for implementation. RFC 2460 S: "Internet Protocol, Version 6 (IPv6) Specification" (December 1998) (Errata) This document [RFC2460] is of relevance to TCP because it defines how the pseudo-header for TCP's checksum computation is derived when 128-bit IPv6 addresses are used instead of 32-bit IPv4 addresses. Additionally, RFC 2675 (see Section 3.1 of this document) describes TCP changes required to support IPv6 jumbograms. RFC 2873 S: "TCP Processing of the IPv4 Precedence Field" (June 2000) (Errata) This document [RFC2873] removes from the TCP specification all processing of the precedence bits of the TOS byte of the IP header. This resolves a conflict over the use of these bits between RFC 793 (see above in Section 2) and Differentiated Services [RFC2474]. RFC 5681 S: "TCP Congestion Control" (August 2009) Although RFC 793 (see above in Section 2) did not contain any congestion control mechanisms, today congestion control is a required component of TCP implementations. This document [RFC5681] defines congestion avoidance and control mechanism for TCP, based on Van Jacobson's 1988 SIGCOMM paper [Jac88]. A number of behaviors that together constitute what the community refers to as "Reno TCP" is described in RFC 5681. The name "Reno" comes from the Net/2 release of the 4.3 BSD operating system. This is generally regarded as the least common denominator among TCP flavors currently found running on Internet hosts. Reno TCP includes the congestion control features of slow start, congestion avoidance, fast retransmit, and fast recovery. RFC 5681 details the currently accepted congestion control mechanism, while RFC 1122, (see above in Section 2) mandates that such a congestion control mechanism must be implemented. RFC 5681 differs slightly from the other documents listed in this section, as it does not affect the ability of two TCP endpoints to communicate; RFCs 2001 and 2581 are the conceptual precursors of RFC 5681. The most important changes relative to RFC 2581 are: (a) The initial window requirements were changed to allow larger Initial Windows as standardized in [RFC3390] (see Section 3.2 of this document). (b) During slow start and congestion avoidance, the usage of Appropriate Byte Counting [RFC3465] (see Section 3.2 of this document) is explicitly recommended. (c) The use of Limited Transmit [RFC3042] (see Section 3.3 of this document) is now recommended. RFC 6093 S: "On the Implementation of the TCP Urgent Mechanism" (January 2011) This document [RFC6093] analyzes how current TCP stacks process TCP urgent indications, ... and recommends against the use of urgent= =20 mechanism. RFC 6298 S: "Computing TCP's Retransmission Timer" (June 2011) Abstract of RFC 6298 [RFC6298]: "This document defines the standard algorithm that Transmission Control Protocol (TCP) senders are required to use to compute and manage their retransmission timer. It expands on the discussion in Section 4.2.3.1 of RFC 1122 and upgrades the requirement of supporting the algorithm from a SHOULD to a MUST." RFC 6298 updates RFC 2988 by _changing_ the initial RTO from _3s_ to _1s_ [emphasis mine]. RFC 6691 I: "TCP Options and Maximum Segment Size (MSS)" (July 2012) This document [RFC6691] clarifies what value to use with the TCP Maximum Segment Size (MSS) option when IP and TCP options are in use. [3.] Strongly Encouraged Enhancements This section describes recommended TCP modifications that improve performance and security. Section 3.1 represents fundamental changes to the protocol. Sections 3.2 and 3.3 list improvements over the congestion control and loss recovery mechanisms as specified in RFC 5681 (see Section 2). Section 3.4 describes algorithms that allow a TCP sender to detect whether it has entered loss recovery spuriously. Section 3.5 comprises Path MTU Discovery mechanisms. Schemes for TCP/IP header compression are listed in Section 3.6. Finally, Section 3.7 deals with the problem of preventing acceptance of forged segments and flooding attacks. [3.1.] Fundamental Changes RFCs 2675 and 7323 represent fundamental changes to TCP by redefining how parts of the basic TCP header and options are interpreted. RFC 7323 defines the Window Scale option, which reinterprets the advertised receive window. RFC 2675 specifies that MSS option and urgent pointer fields with a value of 65,535 are to be treated RFC 2675 S: "IPv6 Jumbograms" (August 1999) (Errata) RFC 7323 S: "TCP Extensions for High Performance" (September 2014) This document [RFC7323] defines TCP extensions for window scaling, timestamps, and protection against wrapped sequence numbers, for efficient and safe operation over paths with large bandwidth-delay products. These extensions are commonly found in currently used systems. The predecessor of this document, RFC 1323, was published in 1992, and is deployed in most TCP implementations. This document includes fixes and clarifications based on the gained deployment experience. One specific issued addressed in this specification is a recommendation how to modify the algorithm for estimating the mean RTT when timestamps are used. RFCs 1072, 1185, and 1323 are the conceptual precursors of RFC 7323. [3.2.] Congestion Control Extensions Two of the most important aspects of TCP are its congestion control and loss recovery features. TCP treats lost packets as indicating congestion-related loss and cannot distinguish between congestion- related loss and loss due to transmission errors. Even when ECN is in use, there is a rather intimate coupling between congestion control and loss recovery mechanisms. There are several extensions to both features, and more often than not, a particular extension applies to both. In these two subsections, we group enhancements to TCP's congestion control, while the next subsection focus on TCP's loss recovery. RFC 3168 S: "The Addition of Explicit Congestion Notification (ECN) to IP" (September 2001) This document [RFC3168] defines a means for end hosts to detect congestion before congested routers are forced to discard packets. Although congestion notification takes place at the IP level, ECN requires support at the transport level (e.g., in TCP) to echo the bits and adapt the sending rate. This document updates RFC 793 (see Section 2 of this document) to define two previously unused flag bits in the TCP header for ECN support. RFC 3390 S: "Increasing TCP's Initial Window" (October 2002) This document [RFC3390] specifies an increase in the permitted initial window for TCP from one segment to three or four segments during the slow start phase, depending on the segment size. RFC 3465 E: "TCP Congestion Control with Appropriate Byte Counting (ABC)" (February 2003) This document [RFC3465] suggests that congestion control use the number of bytes acknowledged instead of the number of acknowledgments received. This change improves the performance of TCP in situations where there is no one-to-one relationship between data segments and acknowledgments (e.g., delayed ACKs or ACK loss). ABC is recommended by RFC 5681 (see Section 2). RFC 6633 S: "Deprecation of ICMP Source Quench Messages" (May 2012) This document [RFC6633] formally deprecates the use of ICMP Source Quench messages by transport protocols and recommends against the implementation of [RFC1016]. [3.3.] Loss Recovery Extensions For the typical implementation of the TCP fast recovery algorithm described in RFC 5681 (see Section 2 of this document), a TCP sender only retransmits a segment after a retransmit timeout has occurred, or after three duplicate ACKs have arrived triggering the fast retransmit. A single RTO might result in the retransmission of several segments, while the fast retransmit algorithm in RFC 5681 leads only to a single retransmission. Hence, multiple losses from a single window of data can lead to a performance degradation. Documents listed in this section aim to improve the overall performance of TCP's standard loss recovery algorithms. In particular, some of them allow TCP senders to recover more effectively when multiple segments are lost from a single flight of data. RFC 2018 S: "TCP Selective Acknowledgment Options" (October 1996) (Errata) When more than one packet is lost during one RTT, TCP may experience poor performance since a TCP sender can only learn about a single lost packet per RTT from cumulative acknowledgments. This document [RFC2018] defines the basic selective acknowledgment (SACK) mechanism for TCP, which can help to overcome these limitations. The receiving TCP returns SACK blocks to inform the sender which data has been received. The sender can then retransmit only the missing data segments. RFC 3042 S: "Enhancing TCP's Loss Recovery Using Limited Transmit" (January 2001) Abstract of RFC 3042 [RFC3042]: "This document proposes a new Transmission Control Protocol (TCP) mechanism that can be used to more effectively recover lost segments when a connection's congestion window is small, or when a large number of segments are lost in a single transmission window." This algorithm described in RFC 3042 is called "Limited Transmit". Limited Transmit is=20 recommended by RFC 5681 (see Section 2 of this document). RFC 6582 S: "The NewReno Modification to TCP's Fast Recovery Algorithm" (April 2012) This document [RFC6582] specifies a modification to the standard Reno fast recovery algorithm, whereby a TCP sender can use partial acknowledgments to make inferences determining the next segment to send in situations where SACK would be helpful but isn't available. Although it is only a slight modification, the NewReno behavior can make a significant difference in performance when multiple segments are lost from a single window of data. RFC 6675 S: "A Conservative Loss Recovery Algorithm Based on Selective Acknowledgment (SACK) for TCP" (August 2012) This document [RFC6675] describes a conservative loss recovery algorithm for TCP that is based on the use of the selective acknowledgment (SACK) TCP option [RFC2018] (see above in Section 3.3). The algorithm conforms to the spirit of the congestion control specification in RFC 5681 (see Section 2 of this document), but allows TCP senders to recover more effectively when multiple segments are lost from a single flight of data. RFC 6675 is a revision of RFC 3517 to address several situations that are not handled explicitly before. In particular, (a) it improves the loss detection in the event that the sender has outstanding segments that are smaller than Sender Maximum Segment Size (SMSS). (b) it modifies the definition of a "duplicate acknowledgment" to utilize the SACK information in detecting loss. (c) it maintains the ACK clock under certain circumstances involving loss at the end of the window. 3.4. Detection and Prevention of Spurious Retransmissions Spurious retransmission timeouts are harmful to TCP performance and multiple algorithms have been defined for detecting when spurious retransmissions have occurred, but they respond differently with regard to their manners of recovering performance. The IETF defined multiple algorithms because there are trade-offs in whether or not certain TCP options need to be implemented and concerns about IPR status. The Standards Track RFCs in this section are closely related to the Experimental RFCs in Section 4.5 also addressing this topic. RFC 2883 S: "An Extension to the Selective Acknowledgement (SACK) Option for TCP" (July 2000) This document [RFC2883] extends RFC 2018 (see Section 3.3 of this document). It enables use of the SACK option to acknowledge duplicate packets. With this extension, called DSACK, the sender is able to infer the order of packets received at the receiver and, therefore, to infer when it has unnecessarily retransmitted a packet. A TCP sender could then use this information to detect spurious retransmissions (see [RFC3708]). RFC 4015 S: "The Eifel Response Algorithm for TCP" (February 2005) Abstract of RFC 4015 [RFC4015]: "Based on an appropriate detection algorithm, the Eifel response algorithm provides a way for a TCP sender to respond to a detected spurious timeout. RFC 5682 S: "Forward RTO-Recovery (F-RTO): An Algorithm for Detecting Spurious Retransmission Timeouts with TCP" (September 2009) The F-RTO detection algorithm [RFC5682], originally described in RFC 4138, provides an option for inferring spurious retransmission timeouts. Unlike some similar detection methods (e.g., RFCs 3522 and 3708, both listed in Section 4.5 of this document), F-RTO does not rely on the use of any TCP options. The basic idea is to send previously unsent data after the first retransmission after a RTO. If the ACKs advance the window, the RTO may be declared spurious. [3.5.] Path MTU Discovery RFC 1191 S: "Path MTU Discovery" (November 1990) RFC 1981 S: "Path MTU Discovery for IP version 6" (August 1996) RFC 4821 S: "Packetization Layer Path MTU Discovery" (March 2007) Abstract of RFC 4821 [RFC4821]: "This document describes a robust method for Path MTU Discovery (PMTUD) that relies on TCP or some other Packetization Layer to probe an Internet path with progressively larger packets. [3.6.] Header Compression Especially in streaming applications, the overhead of TCP/IP headers could correspond to more than 50% of the total amount of data sent. Such large overheads may be tolerable in wired LANs where capacity is often not an issue, but are excessive for WANs and wireless systems where bandwidth is scarce. Header compression schemes for TCP/IP like RObust Header Compression (ROHC) can significantly compress this overhead. It performs well over links with significant error rates and long round-trip times. RFC 1144 S: "Compressing TCP/IP Headers for Low-Speed Serial Links" (February 1990) RFC 6846 S: "RObust Header Compression (ROHC): A Profile for TCP/IP (ROHC-TCP)" (January 2013) 3.7. Defending Spoofing and Flooding Attacks By default, TCP lacks any cryptographic structures to differentiate legitimate segments from those spoofed from malicious hosts. Spoofing valid segments requires correctly guessing a number of fields. The documents in this subsection describe ways to make that guessing harder or to prevent it from being able to affect a connection negatively. RFC 4953 I: "Defending TCP Against Spoofing Attacks" (July 2007) RFC 4987 I: "TCP SYN Flooding Attacks and Common Mitigations" (August 2007) RFC 5925 S: "The TCP Authentication Option" (June 2010) RFC 5926 S: "Cryptographic Algorithms for the TCP Authentication Option (TCP-AO)" (June 2010) RFC 5927 I: "ICMP Attacks against TCP" (July 2010) RFC 5961 S: "Improving TCP's Robustness to Blind In-Window Attacks" (August 2010) RFC 6528 S: "Defending against Sequence Number Attacks" (February 2012) [4.] Experimental Extensions The RFCs in this section are either Experimental and may become Proposed Standards in the future or are Proposed Standards (or Informational), but can be considered experimental due to lack of wide deployment. At least part of the reason that they are still experimental is to gain more wide-scale experience with them before a standards track decision is made. [4.1.] Architectural Guidelines As multiple flows may share the same paths, sections of paths, or other resources, the TCP implementation may benefit from sharing information across TCP connections or other flows. Some experimental proposals have been documented and some implementations have included the concepts. RFC 2140 I: "TCP Control Block Interdependence" (April 1997) RFC 3124 S: "The Congestion Manager" (June 2001) This document [RFC3124] is a related proposal to RFC 2140 (see above in Section 4.1). The idea behind the Congestion Manager, moving congestion control outside of individual TCP connections, represents a modification to the core of TCP, which supports sharing information among TCP connections. Although a Proposed Standard, some pieces of the Congestion Manager support architecture have not been specified yet, and it has not achieved use or implementation beyond experimental stacks, so it is not listed among the standard TCP enhancements in this roadmap. [4.2.] Fundamental Changes Like the Standards Track documents listed in Section 3.1, there also exist new Experimental RFCs that specify fundamental changes to TCP. At the time of writing, the only example so far is TCP Fast Open that deviates from the standard TCP semantics of [RFC793]. RFC 7413 E: "TCP Fast Open" (December 2014) This document [RFC7413] describes TCP Fast Open that allows data to be carried in the SYN and SYN-ACK packets and consumed by the receiver during the initial connection handshake. [4.3.] Congestion Control Extensions TCP congestion control has been an extremely active research area for many years (see RFC 5783 discussed in Section 7.6 of this document), as it determines the performance of many applications that use TCP. A number of Experimental RFCs address issues with flow start up, overshoot, and steady-state behavior in the basic algorithms of RFC 5681 (see Section 2 of this document). In these subsections, enhancements to TCP's congestion control are listed.=20 RFC 2861 E: "TCP Congestion Window Validation" (June 2000) RFC 3540 E: "Robust Explicit Congestion Notification (ECN) Signaling with Nonces" (June 2003) RFC 3649 E: "HighSpeed TCP for Large Congestion Windows" (December 2003) RFC 3742 E: "Limited Slow-Start for TCP with Large Congestion Windows" (March 2004) RFC 4782 E: "Quick-Start for TCP and IP" (January 2007) (Errata) RFC 5562 E: "Adding Explicit Congestion Notification (ECN) Capability to TCP's SYN/ACK Packets" (June 2009) RFC 5690 I: "Adding Acknowledgement Congestion Control to TCP" (February 2010) RFC 6928 E: "Increasing TCP's Initial Window" (April 2013) This document [RFC6928] proposes to increase the TCP initial window from between 2 and 4 segments, as specified in RFC 3390 (see Section 3.2 of this document), to 10 segments with a fallback to the existing recommendation when performance issues are detected. [4.4.] Loss Recovery Extensions RFC 5827 E: "Early Retransmit for TCP and Stream Control Transmission Protocol (SCTP)" (April 2010) This document [RFC5827] proposes the "Early Retransmit" mechanism for TCP (and SCTP) that can be used to recover lost segments when a connection's congestion window is small. In certain special circumstances, Early Retransmit reduces the number of duplicate acknowledgments required to trigger fast retransmit to recover segment losses without waiting for a lengthy retransmission timeout. RFC 6069 E: "Making TCP More Robust to Long Connectivity Disruptions (TCP-LCD)" (December 2010) RFC 6937 E: "Proportional Rate Reduction for TCP" (May 2013) This document [RFC6937] describes an experimental Proportional Rate Reduction (PRR) algorithm as an alternative to the widely deployed Fast Recovery algorithm, to improve the accuracy of the amount of data sent by TCP during loss recovery. [4.5.] Detection and Prevention of Spurious Retransmissions In addition to the Standards Track extensions to deal with spurious retransmissions in Section 3.4, Experimental proposals have also been documented. RFC 3522 E: "The Eifel Detection Algorithm for TCP" (April 2003) RFC 3708 E: "Using TCP Duplicate Selective Acknowledgement (DSACKs) and Stream Control Transmission Protocol (SCTP) Duplicate Transmission Sequence Numbers (TSNs) to Detect Spurious Retransmissions" (February 2004) RFC 4653 E: "Improving the Robustness of TCP to Non-Congestion Events" (August 2006) [4.6.] TCP Timeouts RFC 5482 S: "TCP User Timeout Option" (March 2009) [4.7.] Multipath TCP MultiPath TCP (MPTCP) is an ongoing effort within the IETF that allows a TCP connection to simultaneously use multiple IP addresses / interfaces to spread their data across several subflows, while presenting a regular TCP interface to applications. Benefits of this include better resource utilization, better throughput and smoother reaction to failures. The documents listed in this section specify the Multipath TCP scheme, while the documents in Sections 7.2, 7.4, and 7.5 provide some additional background information. RFC 6356 E: "Coupled Congestion Control for Multipath Transport Protocols" (October 2011) RFC 6824 E: "TCP Extensions for Multipath Operation with Multiple Addresses" (January 2013) (Errata) [5.] TCP Parameters at IANA RFC 2780 B: "IANA Allocation Guidelines For Values In the Internet Protocol and Related Headers" (March 2000) RFC 4727 S: "Experimental Values in IPv4, IPv6, ICMPv4, ICMPv6, UDP, and TCP Headers" (November 2006) RFC 6335 B: "Internet Assigned Numbers Authority (IANA) Procedures for the Management of the Service Name and Transport Protocol Port Number Registry" (August 2011) RFC 6994 S: "Shared Use of Experimental TCP Options (August 2013) [7.] Support Documents This section contains several classes of documents that do not necessarily define current protocol behaviors but that are nevertheless of interest to TCP implementers. Section 7.1 describes several foundational RFCs that give modern readers a better understanding of the principles underlying TCP's behaviors and development over the years. Section 7.2 contains architectural guidelines and principles for TCP architects and designers. The documents listed in Section 7.3 provide advice on using TCP in various types of network situations that pose challenges above those of typical wired links. Guidance for developing, analyzing, and evaluating TCP is given in Section 7.4. Some implementation notes and implementation advice can be found in Section 7.5. RFCs that describe tools for testing and debugging TCP implementations or that contain high-level tutorials on the protocol are listed Section 7.6. The TCP Management Information Bases are described in Section 7.7, and Section 7.8 lists a number of case studies that have explored TCP performance. 7.4. Guidance for Developing, Analyzing, and Evaluating TCP Documents in this section give general guidance for developing, analyzing, and evaluating TCP. Some of the documents discuss, for example, the properties of congestion control protocols that are "safe" for Internet deployment as well as how to measure the properties of congestion control mechanisms and transport protocols. RFC 5033 B: "Specifying New Congestion Control Algorithms" (August 2007) This document [RFC5033] considers the evaluation of suggested congestion control algorithms that differ from the principles outlined in RFC 2914 (see Section 7.2 of this document). It is useful for authors of such algorithms as well as for IETF members reviewing the associated documents. RFC 5166 I: "Metrics for the Evaluation of Congestion Control Mechanisms" (March 2008) This document [RFC5166] discusses metrics that need to be considered when evaluating new or modified congestion control mechanisms for the Internet. Among other topics, the document discusses throughput, delay, loss rates, response times, fairness, and robustness for challenging environments. RFC 6077 I: "Open Research Issues in Internet Congestion Control" (February 2011) This document [RFC6077] summarizes the main open problems in the domain of Internet congestion control. As a good starting point for newcomers, the document describes several new challenges that are becoming important as the network grows, as well as some issues that have been known for many years. RFC 6181 I: "Threat Analysis for TCP Extensions for Multipath Operation with Multiple Addresses" (March 2011) This document [RFC6181] describes a threat analysis for Multipath TCP (MPTCP) (see Section 4.7 of this document). The document discusses several types of attacks and provides recommendations for MPTCP designers how to create an MPTCP specification that is as secure as the current (single-path) TCP. RFC 6349 I: "Framework for TCP Throughput Testing" (August 2011) From the Abstract of RFC 6349 [RFC6349]: "This framework describes a practical methodology for measuring end-to-end TCP Throughput in a managed IP network. The goal is to provide a better indication in regard to user experience. In this framework, TCP and IP parameters are specified to optimize TCP Throughput." 7.5. Implementation Advice RFC 794 U: "PRE-EMPTION" (September 1981) This document [RFC794] clarifies that operating systems need to manage their limited resources, which may include TCP connection state, and that these decisions can be made with application input, but they do not need to be part of the TCP protocol specification itself. RFC 879 U: "The TCP Maximum Segment Size and Related Topics" (November 1983) RFC 1071 U: "Computing the Internet Checksum" (September 1988) (Errata) RFC 1624 I: "Computation of the Internet Checksum via Incremental Update" (May 1994) RFC 1936 I: "Implementing the Internet Checksum in Hardware" (April 1996) RFC 2525 I: "Known TCP Implementation Problems" (March 1999) RFC 2923 I: "TCP Problems with Path MTU Discovery" (September 2000) RFC 3493 I: "Basic Socket Interface Extensions for IPv6" (February 2003) RFC 6056 B: "Recommendations for Transport-Protocol Port Randomization" (December 2010) RFC 6191 B: "Reducing the TIME-WAIT State Using TCP Timestamps" (April 2011) RFC 6429 I: "TCP Sender Clarification for Persist Condition" (December 2011) RFC 6897 I: "Multipath TCP (MPTCP) Application Interface Considerations" (March 2013) 7.6. Tools and Tutorials RFC 1180 I: "TCP/IP Tutorial" (January 1991) (Errata) This document [RFC1180] is an extremely brief overview of the TCP/ IP protocol suite as a whole. It gives some explanation as to how and where TCP fits in. RFC 1470 I: "FYI on a Network Management Tool Catalog: Tools for Monitoring and Debugging TCP/IP Internets and Interconnected Devices" (June 1993) A few of the tools that this document [RFC1470] describes are still maintained and in use today, for example, ttcp and tcpdump. However, many of the tools described do not relate specifically to TCP and are no longer used or easily available. RFC 2398 I: "Some Testing Tools for TCP Implementors" (August 1998) This document [RFC2398] describes a number of TCP packet generation and analysis tools. Although some of these tools are no longer readily available or widely used, for the most part they are still relevant and usable. RFC 5783 I: "Congestion Control in the RFC Series" (February 2010) This document [RFC5783] provides an overview of RFCs related to congestion control that had been published at the time. The focus of the document is on end-host-based congestion control. 8. Undocumented TCP Features There are a few important implementation tactics for the TCP that have not yet been described in any RFC. Although this roadmap is primarily concerned with mapping the TCP RFCs, this section is included because an implementer needs to be aware of these important issues. Header Prediction Header prediction is a trick to speed up the processing of segments. Van Jacobson and Mike Karels developed the technique in the late 1980s. The basic idea is that some processing time can be saved when most of a segment's fields can be predicted from previous segments. A good description of this was sent to the TCP-IP mailing list by Van Jacobson on March 9, 1988 (see [Jacobson] for the full message): Quite a bit of the speedup comes from an algorithm that we ('we' refers to collaborator Mike Karels and myself) are calling "header prediction". The idea is that if you're in the middle of a bulk data transfer and have just seen a packet, you know what the next packet is going to look like: It will look just like the current packet with either the sequence number or ack number updated (depending on whether you're the sender or receiver). Combining this with the "Use hints" epigram from Butler Lampson's classic "Epigrams for System Designers", you start to think of the tcp state (rcv.nxt, snd.una, etc.) as "hints" about what the next packet should look like. If you arrange those "hints" so they match the layout of a tcp packet header, it takes a single 14-byte compare to see if your prediction is correct (3 longword compares to pick up the send & ack sequence numbers, header length, flags and window, plus a short compare on the length). If the prediction is correct, there's a single test on the length to see if you're the sender or receiver followed by the appropriate processing. E.g., if the length is non-zero (you're the receiver), checksum and append the data to the socket buffer then wake any process that's sleeping on the buffer. Update rcv.nxt by the length of this packet (this updates your "prediction" of the next packet). Check if you can handle another packet the same size as the current one. If not, set one of the unused flag bits in your header prediction to guarantee that the prediction will fail on the next packet and force you to go through full protocol processing. Otherwise, you're done with this packet. So, the *total* tcp protocol processing, exclusive of checksumming, is on the order of 6 compares and an add. Forward Acknowledgement (FACK) FACK [MM96] includes an alternate algorithm for triggering fast retransmit [RFC5681], based on the extent of the SACK scoreboard. Its goal is to trigger fast retransmit as soon as the receiver's reassembly queue is larger than the duplicate ACK threshold, as indicated by the difference between the forward most SACK block edge and SND.UNA. This algorithm quickly and reliably triggers fast retransmit in the presence of burst losses -- often on the first SACK following such a loss. Such a threshold-based algorithm also triggers fast retransmit immediately in the presence of any reordering with extent greater than the duplicate ACK threshold. FACK is implemented in Linux and turned on per default. Congestion Control for High Rate Flows In the last decade significant research effort has been put into experimental TCP congestion control modifications for obtaining high throughput with reduced startup and recovery times. Only a few RFCs have been published on some of these modifications, including HighSpeed TCP [RFC3649], Limited Slow-Start [RFC3742], and Quick-Start [RFC4782] (see Section 4.3 of this document for more information on each), but high-rate congestion control mechanisms are still considered an open issue in congestion control research. Some other schemes have been published as Internet-Drafts, e.g. CUBIC [CUBIC] (the standard TCP congestion control algorithm in Linux), Compound TCP [CTCP], and H-TCP [HTCP] or have been discussed a little by the IETF, but much of the work in this area has not been adopted within the IETF yet, so the majority of this work is outside the RFC series and may be discussed in other products of the IRTF Internet Congestion Control Research Group (ICCRG). Metrics for the Evaluation of Congestion Control Mechanisms https://tools.ietf.org/html/rfc5166 Discusses the metrics to be considered in an evaluation of new or modified congestion control mechanisms for the Internet. These include metrics for the evaluation of new transport protocols, of proposed modifications to TCP, of application-level congestion control, and of Active Queue Management (AQM) mechanisms in the router. This document is the first in a series of documents aimed at improving the models that we use in the evaluation of transport protocols. Types Of Metrics: - Throughput, Delay, and Loss Rates - Throughput: can be measured as - router-based metric of aggregate link utilization - flow-based metric of per-connection transfer times - user-based metric of utility functions or user wait times - Goodput: sometimes distinguished from throughput where throughput is the link utilization or flow rate in bytes per second; goodput is the subset of throughput (also measured in Bytes/s) consisting of useful traffic [i.e. excluding duplicate packets] - Delay: Like throughput, delay can be measured as a router-based metri= c of queueing delay over time, or as a flow-based metric in terms of per-packet transfer times. Per-packet delay can also include delay at the sender waiting for the transport protocol to send the packet. For reliable transfer, the per-packet transfer time seen by the application includes the possible delay of retransmitting a lost packet. - Packet Loss Rates: can be measured as a network-based or as a flow-based metric. One network-related reason to avoid high steady- state packet loss rates is to avoid congestion collapse in environmen= ts=20 containing paths with multiple congested links - Response Times and Minimizing Oscillations =20 - Response to Changes: One of the key concerns in the design of congest= ion=20 control mechanisms has been the response times to sudden congestion i= n the network. On the one hand, congestion control mechanisms should respond reasonably promptly to sudden congestion from routing or bandwidth changes or from a burst of competing traffic. At the same time, congestion control mechanisms should not respond too severely to transient changes, e.g., to a sudden increase in delay that will dissipate in less than the connection's round-trip time. - Minimizing Oscillations: One goal is that of stability, in terms of= =20 minimizing oscillations of queueing delay or of throughput. In pract= ice,=20 stability is frequently associated with rate fluctuations or variance= . =20 Rate variations can result in fluctuations in router queue size and therefore of queue overflows. These queue overflows can cause loss synchronizations across coexisting flows and periodic under-utilizati= on=20 of link capacity, both of which are considered to be general signs of= =20 network instability. Thus, measuring the rate variations of flows is= =20 often used to measure the stability of transport protocols. To measu= re=20 rate variations, [JWL04], [RX05], and [FHPW00] use the coefficient of= =20 variation (CoV) of per-flow transmission rates, and [WCL05] suggests = the use of standard deviations of per-flow rates. Since rate variations = are=20 a function of time scales, it makes sense to measure these rate varia= tions over various time scales. - Fairness and Convergence - Fairness between Flows: let x_i be the throughput for the i-th connec= tion. - Jain's fairness index: The fairness index in [JCH84] is: =20 =09(( sum_i x_i )^2) / (n * sum_i ( (x_i)^2 )), =09where there are n users. This fairness index ranges from 0 to 1, = and =09it is maximum when all users receive the same allocation. This in= dex =09is k/n when k users equally share the resource, and the other n-k =09users receive zero allocation. - The product measure: =09product_i x_i =09the product of the throughput of the individual connections, is also =09used as a measure of fairness. (In some contexts x_i is taken as the =09power of the i-th connection, and the product measure is referred to =09as network power.) The product measure is particularly sensitive to =09segregation; the product measure is zero if any connection receives =09zero throughput. [N.B. If one normalizes to actual bandwidth by takin= g=20 =09the Nth root of the product, where N =3D number of connections, this is =09the geometric mean. The geometric mean will be less than the arithmetic =09mean unless all flows have equivalent throughput.] - Epsilon-fairness: A rate allocation is defined as epsilon-fair if (min_i x_i) / (max_i x_i) >=3D 1 - epsilon =20 Epsilon-fairness measures the worst-case ratio between any two throu= ghput rates [ZKL04]. Epsilon-fairness is related to max-min fairness. - Fairness between Flows with Different Resource Requirements - Max-min fairness: In order to satisfy the max-min fairness criteria= , =09the smallest throughput rate must be as large as possible. Given =09this condition, the next-smallest throughput rate must be as large as =09possible, and so on. Thus, the max-min fairness gives absolute =09priority to the smallest flows. (Max-min fairness can be explained =09by the progressive filling algorithm, where all flow rates start at =09zero, and the rates all grow at the same pace. Each flow rate stops =09growing only when one or more links on the path reach link capacity.) - Proportional fairness: A feasible allocation, x, is defined as proportionally fair if, for any other feasible allocation x*, the aggregate of proportional changes is zero or negative: =09 sum_i ( (x*_i - x_i)/x_i ) <=3D 0. "This criterion favours smaller flows, but less emphatically than max-min fairness" [K01]. (Using the language of utility functions, proportional fairness can be achieved by using logarithmic utility functions, and maximizing the sum of the per-flow utility functions; see [KMT98] for a fuller explanation.) - Minimum potential delay fairness: Minimum potential delay fairness has been shown to model TCP [KS03], and is a compromise between max-min fairness and proportional fairness. An allocation, x, is defined as having minimum potential delay fairness if: sum_i (1/x_i) is smaller than for any other feasible allocation. That is, it woul= d minimize the average download time if each flow was an equal-sized file. - Comments on Fairness - Trade-offs between fairness and throughput: The fairness measures i= n =09the section above generally measure both fairness and throughput, =09giving different weights to each. Potential trade-offs between =09fairness and throughput are also discussed by Tang, et al. in =09[TWL06], for a framework where max-min fairness is defined as the =09most fair. In particular, [TWL06] shows that in some topologies, =09throughput is proportional to fairness, while in other topologies, =09throughput is inversely proportional to fairness. - Fairness and the number of congested links: Some of these fairness metrics are discussed in more detail in [F91]. We note that there i= s not a clear consensus for the fairness goals, in particular for fairness between flows that traverse different numbers of congested links [F91]. Utility maximization provides one framework for describing this trade-off in fairness. - Fairness and round-trip times: One goal cited in a number of new transport protocols has been that of fairness between flows with different round-trip times [KHR02] [XHR04]. We note that there is not a consensus in the networking community about the desirability o= f this goal, or about the implications and interactions between this goal and other metrics [FJ92] (Section 3.3). One common argument against the goal of fairness between flows with different round-trip times has been that flows with long round-trip times consume more resources; this aspect is covered by the previous paragraph. Researchers have also noted the difference between the RTT-unfairnes= s of standard TCP, and the greater RTT-unfairness of some proposed modifications to TCP [LLS05]. - Fairness and packet size: One fairness issue is that of the relative fairness for flows with different packet sizes. Many file transfer applications will use the maximum packet size possible; in contrast= , low-bandwidth VoIP flows are likely to send small packets, sending a new packet every 10 to 40 ms., to limit delay. Should a small-packe= t VoIP connection receive the same sending rate in *bytes* per second as a large-packet TCP connection in the same environment, or should it receive the same sending rate in *packets* per second? This fairness issue has been discussed in more detail in [RFC3714], with [RFC4828] also describing the ways that packet size can affect the packet drop rate experienced by a flow. - Convergence times: Convergence times concern the time for convergenc= e to fairness between an existing flow and a newly starting one, and are a special concern for environments with high-bandwidth long-dela= y flows. Convergence times also concern the time for convergence to fairness after a sudden change such as a change in the network path, the competing cross-traffic, or the characteristics of a wireless link. As with fairness, convergence times can matter both between flows of the same protocol, and between flows using different protocols [SLFK03]. One metric used for convergence times is the delta-fair convergence time, defined as the time taken for two flows with the same round-trip time to go from shares of 100/101-th and 1/101-th of the link bandwidth, to having close to fair sharing with shares of (1+delta)/2 and (1-delta)/2 of the link bandwidth [BBFS01]= . A similar metric for convergence times measures the convergence time as the number of round-trip times for two flows to reach epsilon- fairness, when starting from a maximally-unfair state [ZKL04]. TCP Congestion Control (RFC 5681): http://www.rfc-editor.org/rfc/rfc5681.txt Specifies four TCP congestion algorithms: slow start, congestion avoidance, fast retransmit and fast recovery. They were devised in [Jac88] and [Jac90]. Their use with TCP is standardized in=20 [RFC1122]. In addition the document specifies what TCP connections should do after a relatively long idle period, as well as clarifying some of the issues pertaining to TCP ACK generation. Obsoletes [RFC2581], which in turn obsoleted [RFC2001]. The slow start and congestion avoidance algorithms MUST be used by the=20 TCP sender to control the amount of outstanding data being injected into the network. These add three state variables. - Congestion Window (cwnd): a sender-side limit on the amount of data= =20 the sender can transmit before receiving an ACK. - Receiver's Advertised Window (rwnd): a receiver-side limit o the amo= unt=20 of outstanding data.=20 - Slow Start Threshold (ssthresh): used to determine whether the slow s= tart=20 or congestion avoidance algorithm is used to control data transmissio= n. Slow Start: Used to determine available link capacity at the beginning of a transfer, after repairing loss detected by the retransmission timer, or=20 [potentially] after a long idle period. It is additionally used to start th= e=20 "ACK clock". - SMSS: Sender Maximum Segment Size - IW: Initial Window, the initial value of cwnd, MUST be set using the= =20 following guidelines as an upper bound =20 If SMSS > 2190 bytes: =09IW =3D 2 * SMSS bytes and MUST NOT be more than 2 segments If (SMSS > 1095 bytes) and (SMSS <=3D 2190 bytes): =09IW =3D 3 * SMSS bytes and MUST NOT be more than 3 segments If SMSS <=3D 1095 bytes: =09IW =3D 4 * SMSS bytes and MUST NOT be more than 4 segments - Ssthresh:=20 - SHOULD be set arbitrarily high (e.g., to the size of the largest=20 =09possible advertised window), but ssthresh MUST be reduced in respo= nse =09to congestion. - The slow start algorithm is used when cwnd < ssthresh, while the =09congestion avoidance algorithm is used when cwnd > ssthresh. When =09cwnd and ssthresh are equal, the sender may use either slow start or =09congestion avoidance. - When a TCP sender detects segment loss using the retransmission tim= er =09and the given segment has not yet been resent once by way of the =09retransmission timer, the value of ssthresh MUST be set to no more =09than the value given in equation (4): =09ssthresh =3D max (FlightSize / 2, 2*SMSS) (4) =09Where Flightsize is the amount of outstanding data in the network. =20 - Growing cwnd: During slow start, a TCP increments cwnd by at most SMS= S=20 bytes for each ACK received that cumulatively acknowledges new data.= =20 Slow start ends when cwnd reaches or exceeds ssthresh. =20 - Traditionally TCP implementations have increased cwnd by precisely =09SMSS bytes upon receipt of an ACK covering new data, we RECOMMEND =09that TCP implementations increase cwnd, per:=20 =09cwnd +=3D min (N, SMSS) (2) =09where N is the number of previously unacknowledged bytes acknowledged =09in the incoming ACK. Congestion Avoidance: during congestion avoidance, cwnd is incremented by roughly 1 full-sized segment per RTT. Congestion avoidance continues until congestion is detected. The basic guidelines for incrementing cwnd are: - MAY increment cwnd by SMSS bytes - SHOULD increment cwnd per equation (2) once per RTT - MUST NOT increment cwnd by more than SMSS bytesb [RFC3465] allows for cwnd increases of more than SMSS bytes for incoming=20 acknowledgments during slow start on an experimental basis; however, such= =20 behavior is not allowed as part of the standard. Another common formula that a TCP MAY use to update cwnd during congestion avoidance is given in equation (3): cwnd +=3D SMSS*SMSS/cwnd (3) This adjustment is executed on every incoming ACK that acknowledges new data. Equation (3) provides an acceptable approximation to the underlying principle of increasing cwnd by 1 full-sized segment per RTT. Upon a timeout (as specified in [RFC2988]) cwnd MUST be set to no more than the loss window, LW, which equals 1 full-sized segment (regardless of the value of IW). Therefore, after retransmitting the dropped segment the TCP sender uses the slow start algorithm to increase the window from 1 full-sized segment to the new value of ssthresh, at which point congestion avoidance again takes over. Fast Retransmit/Fast Recovery: A TCP receiver SHOULD send an immediate=20 duplicate ACK when an out-of-order segment arrives. The purpose of this AC= K=20 is to inform the sender that a segment was received out-of-order and which= =20 sequence number is expected. In addition, a TCP receiver SHOULD send an=20 immediate ACK when the incoming segment fills in all or part of a gap in th= e=20 sequence space. This will generate more timely information for a sender recovering from a loss through a retransmission timeout, a fast retransmit,= or an advanced loss recovery algorithm. The TCP sender SHOULD use the "fast retransmit" algorithm to detect and rep= air loss, based on incoming duplicate ACKs. The fast retransmit algorithm uses= the arrival of 3 duplicate ACKs as an indication that a segment has been lost. TCP then performs a retransmission of what appears to be the missing segmen= t,=20 without waiting for the retransmission timer to expire. The fast retransmit and fast recovery algorithms are implemented together as follows. 1. On the first and second duplicate ACKs received at a sender, a TCP SHOULD send a segment of previously unsent data per [RFC3042] provided that the receiver's advertised window allows, the total FlightSize would remain less than or equal to cwnd plus 2*SMSS, and that new data is available for transmission. Further, the TCP sender MUST NOT change cwnd to reflect these two segments [RFC3042]. Note that a sender using SACK [RFC2018] MUST NOT send new data unless the incoming duplicate acknowledgment contains new SACK information. 2. When the third duplicate ACK is received, a TCP MUST set ssthresh to no more than the value given in equation (4). When [RFC3042] is in use, additional data sent in limited transmit MUST NOT be included in this calculation. 3. The lost segment starting at SND.UNA MUST be retransmitted and cwnd set to ssthresh plus 3*SMSS. This artificially "inflates" the congestion window by the number of segments (three) that have left the network and which the receiver has buffered. 4. For each additional duplicate ACK received (after the third), cwnd MUST be incremented by SMSS. This artificially inflates the congestion window in order to reflect the additional segment that has left the network. Note: [SCWA99] discusses a receiver-based attack whereby many bogus duplicate ACKs are sent to the data sender in order to artificially inflate cwnd and cause a higher than appropriate sending rate to be used. A TCP MAY therefore limit the number of times cwnd is artificially inflated during loss recovery to the number of outstanding segments (or, an approximation thereof). Note: When an advanced loss recovery mechanism (such as outlined in section 4.3) is not in use, this increase in FlightSize can cause equation (4) to slightly inflate cwnd and ssthresh, as some of the segments between SND.UNA and SND.NXT are assumed to have left the network but are still reflected in FlightSize. 5. When previously unsent data is available and the new value of cwnd and the receiver's advertised window allow, a TCP SHOULD send 1*SMSS bytes of previously unsent data. 6. When the next ACK arrives that acknowledges previously unacknowledged data, a TCP MUST set cwnd to ssthresh (the value set in step 2). This is termed "deflating" the window. This ACK should be the acknowledgment elicited by the retransmission from step 3, one RTT after the retransmission (though it may arrive sooner in the presence of significant out- of-order delivery of data segments at the receiver). Additionally, this ACK should acknowledge all the intermediate segments sent between the lost segment and the receipt of the third duplicate ACK, if none of these were lost. Note: This algorithm is known to generally not recover efficiently from multiple losses in a single flight of packets=20 RTO: https://tools.ietf.org/html/rfc6298 Does not modify the behaviour in RFC 5681. The RTO is a function of two state variables, SRTT and RTTVAR. The following constants are used for calculations: =09G <- clock granularity in seconds =09K <- 4 [(2.1)] Until a round-trip time (RTT) measurment has been made for a segmen= t sent between the sender and the receiver, the sender SHOULD set RTO <- 1 se= cond, [i.e. not the outdated 3s currently in FreeBSD] - the "backing off" on repe= ated=20 retransmission still applies. [(2.2)] When the first RTT measurement R is made, the host MUST set =09SRTT <- R =09RTTVAR <- R/2 =09RTO <- SRTT + max (G, K*RTTVAR) [(2.3)] When a subsequent RTT measurement R' is made, a host must set =09RTTVAR <- (1 - beta)*RTTVAR + beta * |SRTT - R'| =09SRTT <- (1 - alpha)*SRTT + alpha*R' The value of SRTT used in updating RTTVAR is the one prior to the update in the second assignment - i.e. the updates are done RTTVAR then SRTT. The above calculation SHOULD be done with alpha=3D1/8 and beta=3D1/4 (as suggested in [JK88]). [N.B. Should these values be smaller in the data center so that the SRTT maintains a longer memory and isn't compromised by a transient microburst?]. [(2.4)] Whenever RTO is computed, if it is less than 1 second, then the RTO SHOULD be rounded up to 1 second. [See the incast section =09 for why this is unequivocally wrong in the data center] Traditionally, TCP implementations use coarse grain clocks to measure the RTT and trigger the RTO, which imposes a large minimum value on the RTO. Research suggests that a large minimum RTO is needed to keep TCP conservative and avoid spurious retransmissions [AP99]. Therefore, this specification requires a large minimum RTO as a conservative approach, while =09 at the same time acknowledging that at some future point, research may show that a smaller minimum RTO is acceptable or superior. [Vasudevan09 (incast section) clearly shows this to =09 be the case.] Note that a TCP implementation MAY clear SRTT and RTTVAR after backing off the timer multiple times as it is likely that the current SRTT and RTTVAR are bogus in this situation. Once SRTT and RTTVAR are cleared, they should be initialized with the next RTT sample taken per (2.2) rather than using (2.3). [(7)] Changes from RFC 2988 This document reduces the initial RTO from the previous 3 seconds [PA00] to 1 second, unless the SYN or the ACK of the SYN is lost, in which case the default RTO is reverted to 3 seconds before data transmission begins. Increasing TCP's intial window: http://www.rfc-editor.org/rfc/rfc3390.txt http://www.rfc-editor.org/rfc/rfc6928.txt Proposes an experiment to increase the permitted TCP initial window (IW) from between 2 and 4 segments, as specified in RFC 3390, to 10 segments with a fallback to the existing recommendation when performance issues are detected. It discusses the motivation behind the increase, the advantages and disadvantages of the higher initial window, and presents results from several large-scale experiments showing that the higher initial window improves the overall performance of many web services without resulting in a congestion collapse.=20 TCP Modification:=20 - The upper bound for the initial window will be:=20 =20 =09min (10*MSS, max (2*MSS, 14600)) - This change applies to the initial window of the connection in the first round-trip time (RTT) of data transmission during or following the TCP three-way handshake. - all the test results described in this document were based on the regular Ethernet MTU of 1500 bytes. Future study of the effect of a different MTU may be needed to fully validate (1) above. - [In contrast to RFC 3390 and RFC 5681] The proposed change to reduce = the=20 default retransmission timeout (RTO) to 1 second [RFC6298] increases = the=20 chance for spurious SYN or SYN/ACK retransmission, thus unnecessarily= =20 penalizing connections with RTT > 1 second if their initial window is= =20 reduced to 1 segment. For this reason, it is RECOMMENDED that=20 implementations refrain from resetting the initial window to 1 segmen= t,=20 unless there have been more than one SYN or SYN/ACK retransmissions o= r=20 true loss detection has been made. - TCP implementations use slow start in as many as three different ways: (1) to start a new connection (the initial window); (2) to restart transmission after a long idle period (the restart window); and (3) to restart transmission after a retransmit timeout (the loss window). The change specified in this document affects the value of the initial window. Optionally, a TCP MAY set the restart window to the minimum of the value used for the initial window and the current value of cwnd (in other words, using a larger value for the restart window should never increase the size of cwnd). These changes do NOT change the loss window, which must remain 1 segment of MSS bytes (to permit the lowest possible window size in the case of severe congesti= on). - To limit any negative effect that a larger initial window may have on links with limited bandwidth or buffer space, implementations SHOULD fall back to RFC 3390 for the restart window (RW) if any packet loss is detected during either the initial window or a restart window, and more than 4 KB of data is sent. 4. Background - According to the latest report from Akamai [AKAM10], the global broadband (> 2 Mbps) adoption has surpassed 50%, propelling the average connection speed to reach 1.7 Mbps, while the narrowband (< 256 Kbps) usage has dropped to 5%. In contrast, TCP's initial window has remained 4 KB for a decade [RFC2414], corresponding to a bandwidth utilization of less than 200 Kbps per connection, assuming an RTT of 200 ms. - A large proportion of flows on the Internet are short web transactions over TCP and complete before exiting TCP slow start. - applications have responded to TCP's "slow" start. Web sites use multiple subdomains [Bel10] to circumvent HTTP 1.1 regulation on two connections per physical host [RFC2616]. As of today, major web browsers open multiple connections to the same site (up to six connections per domain [Ste08] and the number is growing). This trend is to remedy HTTP serialized download to achieve parallelism and higher performance. But it also implies that today most access links are severely under-utilized, hence having multiple TCP connections improves performance most of the time. - persistent connections and pipelining are designed to address some of the above issues with HTTP [RFC2616]. Their presence does not diminish the need for a larger initial window, e.g., data from the Chrome browser shows that 35% of HTTP requests are made on new TCP connections. Our test data also shows significant latency reduction with the large initial window even in conjunction with these two HTTP features [Duk10]. 5. Advantages of Larger Initial Windows - Reducing Latency An increase of the initial window from 3 segments to 10 segments reduces the total transfer time for data sets greater than 4 KB by up to 4 round trips. The table below compares the number of round trips between IW=3D3 and IW=3D10 for different transfer sizes, assuming infinite bandwidth, no packet loss, and the standard delayed ACKs with large delayed-ACK timer. --------------------------------------- | total segments | IW=3D3 | IW=3D10 | --------------------------------------- | 3 | 1 | 1 | | 6 | 2 | 1 | | 10 | 3 | 1 | | 12 | 3 | 2 | | 21 | 4 | 2 | | 25 | 5 | 2 | | 33 | 5 | 3 | | 46 | 6 | 3 | | 51 | 6 | 4 | | 78 | 7 | 4 | | 79 | 8 | 4 | | 120 | 8 | 5 | | 127 | 9 | 5 | --------------------------------------- For example, with the larger initial window, a transfer of 32 segments of data will require only 2 rather than 5 round trips to complete. - Recovering Faster from Loss on Under-Utilized or Wireless Links A greater-than-3-segment initial window increases the chance to recover packet loss through Fast Retransmit rather than the lengthy initial RTO [RFC5681]. This is because the fast retransmit algorithm requires three duplicate ACKs as an indication that a segment has been lost rather than reordered. While newer loss recovery techniques such as Limited Transmit [RFC3042] and Early Retransmit [RFC5827] have been proposed to help speeding up loss recovery from a smaller window, both algorithms can still benefit from the larger initial window because of a better chance to receive more ACKs. 8. Mitigation of Negative Impact Much of the negative impact from an increase in the initial window is likely to be felt by users behind slow links with limited buffers. The negative impact can be mitigated by hosts directly connected to a low-speed link advertising an initial receive window smaller than 10 segments. This can be achieved either through manual configuration by the users or through the host stack auto-detecting the low- bandwidth links. Additional suggestions to improve the end-to-end performance of slow links can be found in RFC 3150 [RFC3150]. RTO & High Performance: https://tools.ietf.org/html/rfc7323 Updates the venerable RFC 1361. [Also in RFC1361] An additional mechanism could be added to the TCP, a per-host cache of the last timestamp received from any connection. This value could then be used in the PAWS mechanism to reject old duplicate segments from earlier incarnations of the connection, if the timestamp clock can be guaranteed to have ticked at least once since the old connection was open. This would require that the TIME-WAIT delay plus the RTT together must be at least one tick of the sender's timestamp clock. Such an extension is not part of the proposal of this RFC. Appendix G. RTO Calculation Modification Taking multiple RTT samples per window would shorten the history calculated by the RTO mechanism in [RFC6298], and the below algorithm aims to maintain a similar history as originally intended by [RFC6298].=20 It is roughly known how many samples a congestion window worth of data will yield, not accounting for ACK compression, and ACK losses. Such events will result in more history of the path being reflected in the final value for RTO, and are uncritical. This modification will ensure that a similar amount of time is taken into account for the RTO estimation, regardless of how many samples are taken per window: ExpectedSamples =3D ceiling(FlightSize / (SMSS * 2)) alpha' =3D alpha / ExpectedSamples beta' =3D beta / ExpectedSamples Note that the factor 2 in ExpectedSamples is due to "Delayed ACKs". Instead of using alpha and beta in the algorithm of [RFC6298], use alpha' and beta' instead: RTTVAR <- (1 - beta') * RTTVAR + beta' * |SRTT - R'| SRTT <- (1 - alpha') * SRTT + alpha' * R' (for each sample R') =20 Appendix H. Changes from RFC 1323 Several important updates and clarifications to the specification in RFC 1323 are made in this document. The [important] technical changes a= re summarized below: (d) The description of which TSecr values can be used to update the measured RTT has been clarified. Specifically, with timestamps, the Karn algorithm [Karn87] is disabled. The Karn algorithm disables all RTT measurements during retransmission, since it is ambiguous whether the is for the original segment, or the retransmitted segment. With timestamps, that ambiguity is removed since the TSecr in the will contain the TSval from whichever data segment made it to the destination. (e) RTTM update processing explicitly excludes segments not updating SND.UNA. The original text could be interpreted to allow taking RTT samples when SACK acknowledges some new, non-continuous data. (f) In RFC 1323, Section 3.4, step (2) of the algorithm to control which timestamp is echoed was incorrect in two regards: (1) It failed to update TS.Recent for a retransmitted segment that resulted from a lost . (2) It failed if SEG.LEN =3D 0. In the new algorithm, the case of SEG.TSval >=3D TS.Recent is included for consistency with the PAWS test. (g) It is now recommended that the Timestamps option is included in segments if the incoming segment contained a Timestamps option. (h) segments are explicitly excluded from PAWS processing. (j) Snd.TSoffset and Snd.TSclock variables have been added. Snd.TSclock is the sum of my.TSclock and Snd.TSoffset. This allows the starting points for timestamp values to be randomized on a per-connection basis. Setting Snd.TSoffset to zero yields the same results as [RFC1323]. Text was added to guide implementers to the proper selection of these offsets, as entirely random offsets for each new connection will conflict with PAWS. Congestion Window Validation (CWV): http://www.ietf.org/proceedings/69/slides/tcpm-7.pdf https://tools.ietf.org/html/rfc7661 Provides a mechanism to address issues that arise when TCP is used for traffic that exhibits periods where the sending rate is limited by the application rather than the congestion window. This=20 RFC provides an experimental update to TCP that allows a TCP sender to restart quickly following a rate-limited interval. This method is expected to benefit applications that send rate-limited traffic using TCP while also providing an appropriate response if congestion is experienced. Motivation: Standard TCP states that a TCP sender SHOULD set cwnd to no more than the Restart Window (RW) before beginning transmission if the TCP sender has not sent data in an interval exceeding the retransmission timeout, i.e., when an application becomes idle [RFC5681]. [RFC2861] notes that this TCP behaviour was not always observed in current implementations. Experiments confirm this to still be the case (see [Bis08]). Congestion Window Validation (CWV) [RFC2861] introduced the term "application-limited period" for the time when the sender sends less than is allowed by the congestion or receiver windows. Standard TCP does not impose additional restrictions on the growth of the congestion window when a TCP sender is unable to send at the maximum rate allowed by the cwnd. In this case, the rate-limited sender may grow a cwnd far beyond that corresponding to the current transmit rate, resulting in a value that does not reflect current information about the state of the network path the flow is using. Use of such an invalid cwnd may result in reduced application performance and/or could significantly contribute to network congestion. Active Queue Management (AQM): Active Queue Management is an effort to avoid the latency increases (and in= crease in time in the=20 feedback loop) and bursty losses caused by naive tail drop in intermediate = buffering. The concept was introduced along with a discussion of the queue management algorithm "R= ED" (Random Early=20 Detect/Drop) by RFC 2309. The most current RFC is 7567. The usual mix of long high throughput and short low latency flows place con= flicting demands on=20 the queue occupancy of a switch: o The queue must be short enough that it does not impose excessive latency on short flows. o The queue must be long enough to buffer sufficient data for the long flows to saturate the path capacity. o The queue must be short enough to absorb incast bursts without excessive packet loss. =20 RED: The RED algorithm itself consists of two main parts: estimation of the average queue size and the decision of whether or not to drop an incoming packet. (a) Estimation of Average Queue Size RED estimates the average queue size, either in the forwarding path using a simple exponentially weighted moving average (such as presented in Appendix A of [Jacobson88]), or in the background (i.e., not in the forwarding path) using a similar mechanism. (b) Packet Drop Decision In the second portion of the algorithm, RED decides whether or not to drop an incoming packet. It is RED's particular algorithm for dropping that results in performance improvement for responsive flows. Two RED parameters, minth (minimum threshold) and maxth (maximum threshold), figure prominently in this decision process. Minth specifies the average queue size *below which* no packets will be dropped, while maxth specifies the average queue size *above which* all packets will be dropped. As the average queue size varies from minth to maxth, packets will be dropped with a probability that varies linearly from 0 to maxp. Recommendations on Queue Management and Congestion Avoidance in the Internet https://tools.ietf.org/html/rfc2309 IETF Recommendations Regarding Active Queue Management https://tools.ietf.org/html/rfc7567 https://en.wikipedia.org/wiki/Active_queue_management Explicit Congestion Notification (ECN): At its core ECN in TCP allows compliant routers to provide compliant sender= s with notification of "virtual drops" as a congestion indicator to halve its congestion window= . This allows the=20 sender to not wait for the retransmit timeout or repeated ACKS to learn of = a congestion=20 event and allows the receiver to avoid latency induced by drop/retransmit. = ECN relies on some=20 form of AQM in the intermediate routers/switches to determine the marking t= he CE (congestion encountered) bit IP header, it is then the receiver's responsibility to mar= k the ECE (ECN-Echo)=20 in the TCP header of the subsequent ACK. The receiver will continue to send= packets marked with=20 the ECE bit until it receives a packet with the CWR (Congestion Window Redu= ced) bit set. Note=20 that although this last design decision makes it robust in the presence of = ack loss (the=20 original version ECN specifies that ACKs / SYNs / SYN-ACKs not be marked as= ECN capable and=20 thus are not eligible for marking), it limits the use of ECN to once per RT= T. As we'll see later this leads to interoperability issues with DCTCP. ECN is negotiated at connection time. In FreeBSD it is configured by a sysc= tl defaulting to off for all connections. Enabling the sysctl enables it for all connections. Th= e last time a survey=20 was done, 2.7% of the internet would not respond to a SYN negotiating ECN. = This isn't fatal as=20 subsequent SYNs will switch to not requesting ECN. This just adds the defau= lt RTO to connection establishment (3s in FreeBSD, 1s per RFC6298 - discussed later). Linux has some very common sense configurability improvements. Its ECN knob= takes on _3_ values: 0) no request / no accept 1) no request / accept 2) request / accept. The d= efault is (1),=20 supporting it for those adventurous enough to request it. The route command= can specify ECN by subnet. In effect allowing servers / clients to only use it within a data c= enter or between=20 compliant data centers. ECN sees very little usage due to continued compatibility concerns. Althoug= h the difficulty of correctly tuning maxth and minth in RED and many other AQM mechanisms is no= t specific to ECN,=20 RED et al are necessary to use ECN and thus further add to associated diffi= culties of its use. Talks: More Accurate ECN Feedback in TCP (AccECN) - https://www.ietf.org/proceedings/90/slides/slides-90-tcpm-10.pdf ECN is slow, does not report condition extent, just it's existence. It lack= s inter- operability with DCTCP. Need to add mechanism for negotiating finer-grained= ,=20 adaptive congestion notification.=20 RFCS: A Proposal to add Explicit Congestion Notification (ECN) to IP - https://tools.ietf.org/html/rfc2481 Initial proposal. The Addition of Explicit Congestion Notification (ECN) to IP - https://tools.ietf.org/html/rfc3168 Elaboration and further specification of how to tie it in to TCP. =20 Adding Explicit Congestion Notification (ECN) Capability to TCP's SYN/ACK P= ackets - https://tools.ietf.org/html/rfc5562 Sometimes referred to as ECN+. This extends ECN to SYN/ACK packets. Note th= at SYN packets are still not covered, being considered a potential security hole. Accurate ECN (AccECN) Problem Statement and Requirements for Increased Accuracy in Explicit Congestion Notification (ECN) Feedback - https://tools.ietf.org/html/rfc7560 Problem Statement and Requirements for Increased Accuracy in Explicit Congestion Notification (ECN) Feedback "A primary motivation for this document is to intervene before each proprietary implementation invents its own non-interoperable handshake, which could lead to _de facto_ consumption of the few flags or codepoints that remain available for standardizing capability negotiation." Incast: The term was coined in [PANFS] for the case of increasing the number of simultaneously initiated, effectively barrier synchronized, fan-in flows=20 in to a single port to the point where the instantaneous switch / NIC buffe= ring capacity was exceeded. Thus causing a decline in aggregate bandwidth as the= need for re-transmits increases. This is further exacerbated by tail-drop behavi= or in the switch whereby multiple losses within individual streams exceeds the re= - covery abilities of duplicate ACKs or SACK, leading to RTOs before the flow= is=20 resumed. The Panasas ActiveScale Storage Cluster - Delivering Scalable High Bandwidth Storage [PANFS] - http://acm.supercomputing.org/sc2004/schedule/pdfs/pap207.pdf Focuses on the Object-based Storage Device (OSD) component backing the PanF= S=20 distributed file system. PanFS runs on the client, backend storage consists= of=20 networked block devices (OSD). The intelligence consists in how stripes are= laid out across OSD. PanFS relies on a Metadata Server (MDS) to control the inte= raction of clients with the objects on OSDs and maintain cache coherency. Scalable bandwidth is achieved through aggregation by striping data across = many OSDs. Although in principle it would be desirable to stripe files as widely= as possible. In practice, in their 1Gbps testbed (this is 2004) bandwidth scal= ed linearly from 3 to 7 OSDs but then after 14 OSDs aggregate bandwidth actua= lly decreases. With a 10ms disk access latency, if just one OSD experienced eno= ugh=20 packet loss to result in one 200ms RTO the system would suffer a 10x decrea= se in performance. Changes to address the incast problem: - Reduce the minRTO from 200ms to 50ms. - Tuning the _individual, socket buffer size. While a client must have a = large aggregate receive buffer size, each individual stream's receive buffer = should be relatively small. Thus they reduced the clients' (per OSD) receive s= ocket buffer to under 64K. - To reduce the size of a single synchronized incast response PanFS imple= ments a two level striping pattern. The first level is optimized for RAID's p= arity update performance and read overhead. The second level of striping is d= esigned to resist incast induced bandwidth penalties by stacking successive par= ity stripes that are stacked in the same subset of objects. They call N seq= uential parity stripes that are stacked in the same set of objects a 'visit', b= ecause a client repeatedly feteches data from just a few OSDs (whose number is= =20 controlled by parity stripe width) for a while, then moves on to the ne= xt set of OSDs. This striping pattern minimizes simultaneous fan-in and thus t= he=20 potential for incast. Typically PanFS stripes about 1GB of data per vis= it, using a round-robin layout algorithm of visits across all OSDs. Measurement and Analysis of TCP Throughput Collapse in Cluster-based Storag= e Systems - https://www.usenix.org/legacy/event/fast08/tech/full_papers/phanishayee/p= hanishayee_html/ Attempts to do a more general analysis of incast than [PANFS]. Analysis is = based on the model of a cluster-based storage system with data blocks striped over a= number of servers. They refer to a single block fragmented over multiple servers as a= Server Request Unit (SRU). A subsequent block request will only be made after the = client=20 has received all the data for the current block. They refer to such reads a= s=20 'synchronized reads'. The paper makes three contributions to the literature= : - Explores the root causes of incast, characterizing it under a variety o= f=20 conditions (buffer space, varying number of servers, etc.). Buffer spac= e can delay the onset of Incast, but any particular switch configuration will= have some maximum number of servers that can send simultaneously before=20 throughput collapse occurs. =20 - Reproduce incast collapse on 3 different models of switches. In some = cases disabling QoS can help delay incast by freeing up packet buffers for= =20 general switching. - Demonstrate applicability of simulation by showing that the throughpu= t=20 collapse curve produced by ns-2 with a simulated 32KB buffer closely matches that shown by the HP Procurve 2848 with QoS disabled. =20 - Analysis of TCP traces obtained from simulation reveals that TCP re- transmission timeouts are the primary cause of incast. - Displays the effect of varying the switch buffer size. Doubling the s= ize of the switch's output port buffer doubles the number of servers that= can=20 be supported before the system experiences incast. - TCP performs well in settings without synchronized reads, which can be modelled by an infinite SRU size. Running netperf across many serv= ers does not induce incast. With larger SRU sizes servers can use the spa= re link capacity made available by any stalled flow waiting for a timeou= t event.=20 - Examines the effectiveness of existing TCP variants (e.g. Reno, NewReno= , SACK, and limited transmit). Although the move from Reno to NewReno=20 improves performance, none of the additional improvements help. When TC= P loses all packets in its window or loses retransmissions, no clever los= s recovery algorithms can help. - Examine a set of techniques that are moderately effective in masking In= cast, such as drastically reducing TCP's retransmission timeout timer. None o= f these techniques are without drawbacks. =20 - reducing RTOmin from 200ms to 200us improves throughput by an order o= f magnitde for 8-32 servers. However, at the time of the paper Linux an= d BSD TCP implementations were unable to provide a timer of sufficient= =20 granularity to calculate RTT at less than the system clock frequency. Understanding TCP Incast Throughput Collapse in Datacenter Networks - http://conferences.sigcomm.org/sigcomm/2009/workshops/wren/papers/p73.pdf Proposes an analytical model of limited generality based on the results observed in two test beds. - Observed little benefit from disabling delayed acks - Observed a much shallower decline in throughput after 4 servers with 1m= s minRTO vs 200ms minRTO. No benefit was shown for 200us over 1ms. [The= =20 next paper concludes that this was because the calculated RTO never wen= t below 5ms, so a 200us minRTO was equivalent to disabling minRTO in this setting]. - For large RTO timer values, reducing the RTO timer value is a first-ord= er=20 mitigation. For smaller RTO timer values, intelligently controlling the inter-packet wait time [pacing] becomes crucial. - Observes two regions of throughput increase. Following the initial=20 throughput decline there is an increasing region. They reason that: As the number of senders increase, 'T' increases, and there is less overlap in the RTO periods for different senders. This means the impact of RTO events is less severe - a mitigating effect.=20 (Prob(enter RTO at t) =3D { 1/T : d < t < d + T, 0: otherwise} - d is t= he=20 delay for congestion info to propagate back to the sender and T is the= =20 width of the uniform distribution in time.) - The smaller the RTO timer values, the faster the rate of recovery betwe= en=20 the throughput minimum and the second order throughput maximum. For sma= ller=20 RTO timer values, the same increase in 'T' will have a larger mitigatin= g=20 effect. Hence, as the number of senders increases, the same increase in= 'T' will result in a faster increase in the goodput for smaller RTO timer= =20 values. - After the second order goodput maximum, the slope of throughput decreas= e is the=20 same for different RTO timer values. When 'T' becomes comparable or lar= ger than the RTO timer value, the amount of interference between retransmits aft= er RTO=20 and transmissions before RTO no longer depends on the value of the RTO = timer. Safe and Effective Fine-grained TCP Retransmissions for Datacenter Communic= ation - https://www.cs.cmu.edu/afs/cs.cmu.edu/Web/People/ekrevat/docs/SIGCOMMInca= st.pdf Effectively makes the case for using high resolution timers to neable micro= second granularity TCP timeouts. They claim that they demonstrate that this techni= que is effective in avoiding TCP incast collapse in both simulation and real-world= =20 experiments. - Prototype uses Linux's high resolution kernel timers. - Demonstrate that this change prevents incast collapse in practice for u= p to 47 senders. - Demonstrate that simply reducing RTOmin in today's [2009] TCP=20 implementations without also improving the timing granularity does not prevent TCP incast. =20 - Even without incast patterns, the RTO can determine observed performa= nce. Simple example: They started ten bulk-data transfer TCP flows from te= n clients to one server. They then had another client issue small request packets for 1KB of data from the server, waiting for the response before sending the next request. Approximately 1% of these requests experienced a TCP timeout, delaying the response by at least 200ms. Finer-grained re-transmission handlin= g can improve the performance of latency sensitive applications. Evaluating Throughput with Fine-Grained RTO: - to be maximally effective timers must operate on a granularity close= to the RTT of the network. - Jacobson RTO Estimation: - The standard RTO estimator [V. Jacobson, 98] tracks a smoothed =09 estimate of the round-trip time, and sets the timeout to this RT= T =09 estimate plus 4 times the mean deviation (a simpler calculation =09 than the standard deviation, and given a normal distribution of =09 prediction errors mdev =3D sqrt(pi/2)*sdev). =09 - RTO =3D SRTT + (4xRTTMDEV) =09 - Two factors set lower bounds on the value that the RTO can achie= ve: =20 - the explicit configuration parameter RTOmin =09 - the implicit effects of the granularity with which the RTT i= s=20 =09 measured and with which the kernel sets and checks timers. =09 Most implementations track RTTs and timers at a granularity =09 of 1ms or larger. Thus the minimum achievable RTO is 5ms. - In Simulation (simulate one client with multiple servers connected through a single switch with an unloaded RTT of 100us, each node has a 1Gbps link, the switch buffers have 32KB of space per output port, and a random timer scheduling delay of up to 20us to account for real-world variance): =20 - With an RTOmin of 200ms throughput drops by an order of magnitude =09 with 8 concurrent senders. - Reducing RTOmin to 1ms is effective for 8-16 concurrent senders, =09 fully utilizing the client's link. However, throughput declines =09 as the number of servers is increased. 128 concurrent senders =09 use only 50% of the available link bandwidth even with a 1ms =09 RTOmin. - In Real Clusters (sixteen node cluster w/ HP Procurve 2848 &=20 48 node cluster w/ Force10 S50 switch - all nodes 1Gbps and a client to server RTT of ~100us): - Modified the Linux 2.6.28 kernel to use 'microsecond-accurate' =09 timers with microsecond granularity RTT estimation. - For all configurations, throughput drops with increasing RTOmin =09 above 1ms. For 8 and 16 concurrent senders, the default RTOmin =09 of 200ms results in nearly 2 orders of magnitude drop in through- =09 put. - Results show identical performance for RTOmin values of 200us and =091 ms. Although teh baseline RTTs can be between 50-100us, increase= d =09congestion causes RTTs to rise to 400us on average with spikes as=20 =09high as 850us. Thus the higher RTTs combined with increased RTT =09variance causes the RTO estimator to set timeouts of 1-3ms and an =09RTOmin below 1ms will not lead to shorter retransmission times. =09In effect, specifying an RTOmin <=3D 1ms is equivalent to eliminating =09RTOmin. Next-Generation Datacenters: - 10Gbps networks have smaller RTTs than 1Gbps - port-to-port latency can be as low as 10us. In a sampling of an active storage node at=20 LANL 20% of RTTs are belowe 100us even when accounting for kernel scheduling. - smaller RTO values are required to avoid idle link time.=20 - Scaling to Thousands [simulating large numbers of servers on a 10Gbp= s network] (reduce baseline RTTs from 100us to 20us, eliminate 20us timer sched= uling variance, increase link capacity to 10Gbps, set per-port buffer size= to 32KB, increase blocksize to 80MB to ensure each flow can saturate a 10Gbps= link,=20 vary the number of servers from 32 to 2048): =20 - Having an artificial bound of either 1ms or 200us results in low t= hroughput =09 in a network whose RTTs are 20us - underscoring the requirement = that=20 =09 retransmission timeouts should be on the same timescale as network late= ncy =09 to avoid incast collapse. - Eliminating a lower bound on RTO performs well for up to 512 concu= rrent =09 senders. For 1024 servers and beyond, even the aggressively low = RTO =09 configuration sees up to a 50% reduction in throughput resulting from =09 significant periods of link idle time caused by repeated, simultaneous, =09 successive timeouts. =09=20 =09 - For incast communication the standard exponential backoff increase of =09 RTO can overshoot some portion of the time the link is actually idle. =09 Because only one flow must overshoot to delay the entire transfer,=20 =09 the probability of overshooting increases with increased number of =09 flows. =09 - Decreased throughput for a large number of flows can be attributed to =09 many flows timing out simultaneously, backing off deterministically, =09 and retransmitting at the same time. While some flows are successful =09 on this retransmission, a majority of flows lose their retransmitted =09 packet and backoff by another factor of two, sometimes far beyond =09 when the link becomes idle. - Desynchronizing Retransmissions =20 - Adding some randomness to the RTO will desynchronize retransmissio= ns. =20 - Adding an adaptive randomize RTO to the scheduled timeout: =09 timeout =3D (RTO + (rand(0.5) x RTO)) x 2^backoff =09 performs well regardless of the number of concurrent senders.=20 =09 Nonetheless, real-world variances my be large enough to avoid the =09 need for explicit randomization in practice. - Do not evaluate the impact on wide area flows. - Implementing fine-grained retransmissions =20 - Three changes to the Linux TCP stack were required: =09=20 =09 - microsecond resolution time accounting to track RTTs with greater =09 precision - store microseconds in the TCP timestamp option=20 =09 [timestamp resolution can go as high as 57ns without violating the =09 requirements of PAWS] =09 - redefinition of TCP constants - timer constants formerly defined in= =20 =09 terms of jiffies [ticks] are converted to absolute values (e.g. 1ms= =20 =09 instead of 1 jiffy) =09 =09=20 =09 - replacement of low-resolution timers with hrtimers - replace standard =09 timer objects in the socket structure with the hrtimer structure, =09 ensuring that all calls to set, reset, or clear timers use the =09 hrtimer functions. - Results: =09=20 =09 - Using the default 200ms RTOmin throughput plummets beyond 8 =09 concurrent senders on both testbeds. =09 - On the 16 server testbed a 5ms jiffy-based RTOmin throughput begins= =20 =09 to drop at 8 servers to ~70% of link capacity and slowly decreases=20 =09 thereafter. On the 47 server testbed [Force10 switch] the 5ms=20 =09 RTOmin kernel obtained 70-80% throughput with a substantial =09 decline after 40 servers. =09 =20 =09 - TCP hrtimer implementation / microsecond RTO kernel is able to =09 saturate the link for up to 16/47 servers [total number in=20 =09 both testbeds]. - Implications of Fine-Grained TCP Retransmissions: =09 - A receiver's delayed ACK timer should always fire before the s= enders =09 retransmission timer fires to prevent the sender form timing out =09 waiting for an ACK that is merely delayed. Current system protect =09 against this by setting the delayed ACK timer to a value (40ms) =09 that is safely under the RTOmin (200ms). =09=20 =09- A host with microsecend granularity retransmissions would periodically =09 experience an unnecessary timeout when communicating with unmodified =09 hosts in environments where the RTO is below 40ms (e.g., in the data =09 center and for short flows in the WAN), because the sender incorrectly =09 assumes that a loss has occurred. In practice the two consequences =09 are mitigated by newer TCP features and the limited circumstances in =09 which they occur (and bulk data transfer is essentially unimpacted by= =20 =09 the issue). =09 - The major potential effect of a spurious timeout is a loss of =09 performance: a flow that experiences a timeout will reduce =09 its slow-start threshold (ssthresh) by half, its window to one =09 and attempt to rediscover link capacity. It is important to =09 understand that spurious timeouts do not endanger network =09 stability through increased congestion [On estimating end-to-end =09 network path properties. SIGCOMM 99]. Spurious timeouts =09 occur not when the network path drops packets, but rather when=20 =09 the path observers a sudden, higher delay. =09=20 =09 - Several algorithms have been proposed to undo the effects of spurious =09 timeouts have been proposed and, in the case of F-RTO [Forward=20 =09 RTO-Recovery RFC 4138], adopted in the Linux TCP implementation. - When seeding torrents over a WAN there was no observable differenc= e =09 in performance between the 200us and 200ms RTOmin [no penalty]. - Interaction with Delayed ACK in the Datacenter: For servers using = a =09 reduced RTO in a datacenter environment, the server's retrans= mission=20 =09 timer may expire long before an unmodied client's 40ms delayed ACK time= r =09 expires. As a result, the server will timeout and resend the unacked =09 packet, cutting ssthresh in half and rediscovering link capacity using =09 slow-start. Because the client acknowledges the retransmitted segment= =20 =09 immediately, the server does not observe a coarse-grained 40ms delay,= =20 =09 only an unnecessary timeout. - Although for full performance delayed acks should be disabled, unm= odified =09 clients still achieve good performance and avoid incast when onl= y the =09 servers implement fine-grained retransmissions. Data Center Transmission Control Protocol (DCTCP): The Microsoft & Stanford developed CC protocol uses simplified switch RED/E= CN CE marking to=20 provide fine grained congestion notification to senders. RED is enabled in = the switch but minth=3Dmaxth=3DK, where K is an empirically determined constant that is a = function of bandwidth and desired switch utilization vs rate of convergence. Common values for K = are 5 for 1Gbps and 60 for 10Gbps. The value for 40Gbps is presumably on the order of 240. = The sender's=20 congestion window is scaled back once per RTT as function of (#ECE/(#segmen= ts in window))/2. In the degenerate case of all segments being marked window is scaled back a= la a loss in Reno. In the steady state latencies are much lower than in Reno due to cons= iderably reduced switch occupancy.=20 There is currently no mechanism for negotiating CC protocols and DCTCP's re= liance on continuous ECE notifications is incompatible with ECN's continuous repeating of the sa= me ECE until a CWR is received. In effect ECN support has to be sucessfully negotiated when es= tablishing the=20 connection, but the receiver has to instead provide one ECE per new CE seen= .=20 RFC: Datacenter TCP (DCTCP): TCP Congestion Control for Datacenters https://tools.ietf.org/pdf/draft-ietf-tcpm-dctcp-00.pdf The window scaling constant is referred to as 'alpha'. Alpha=3D0 correspond= s to no congestion, alpha=3D1 corresponds to a loss event in Reno or an ECE m= ark in standard ECN - resulting in a halving of the congestion window. 'g' is the feedback= gain, 'M' is the=20 fraction of bytes marked to bytes sent. Alpha and the congestion window 'cw= nd' are calculated as follows: alpha =3D alpha * (1 - g) + g * M cwnd =3D cwnd * (1 - alpha/2) To cope with delayed acks DCTCP specifies the following state machine - CE = refers to DCTCP.CE,=20 a new Boolean TCP state variable, "DCTCP Congestion Encountered" - which is= initialized to=20 false and stored in the Transmission Control Block (TCB). =20 Send immediate ACK with ECE=3D0 .----. .-------------. .---. Send 1 ACK / v v | | \ for every | .------. .------. | Send 1 ACK m packets | | CE=3D0 | | CE=3D1 | | for every with ECE=3D0 | =E2=80=99------=E2=80=99 =E2=80=99-= -----=E2=80=99 | m packets \ | | ^ ^ / with ECE=3D1 =E2=80=99---=E2=80=99 =E2=80=99------------=E2= =80=99 =E2=80=99----=E2=80=99 Send immediate ACK with ECE=3D1 The clear implication of this is that if the ack is delayed by more than m,= as in different assumptions between peers or dropped ACKs, the signal can underestimate the= level of encountered=20 congestion. None of the literature suggests that this has been a problem in= practice. [Section 3.4 of RFC] Handling of SYN, SYN-ACK, RST Packets [RFC3168] requires that a compliant TCP MUST NOT set ECT on SYN or SYN-ACK packets. [RFC5562] proposes setting ECT on SYN-ACK packets, but maintains the restriction of no ECT on SYN packets. Both these RFCs prohibit ECT in SYN packets due to security concerns regarding malicious SYN packets with ECT set. These RFCs, however, are intended for general Internet use, and do not directly apply to a controlled datacenter environment. The switching fabric can drop TCP packets that do not have the ECT set in the IP header. If SYN and SYN-ACK packets for DCTCP connections do not have ECT set, they will be dropped with high probability. For DCTCP connections, the sender SHOULD set ECT for SYN, SYN-ACK and RST packets. [Section 4] Implementation Issues - the implementation must choose a suitable estimation gain (feedback gain) - [DCTCP10] provides a theoretical basis for its selection, in practice more practical to select empirically by network/workload - The Microsoft implementation uses a fixed estimation gain of 1/16 - the implementation must decide when to use DCTCP. DCTCP may not be=20 suitable or supported for all peers. - It is RECOMMENDED that the implementation deal with loss episodes in the same way as conventional TCP. - To prevent incast throughput collapse, the minimum RTO (MinRTO) should be= =20 lowered significantly. The default value of MinRTO in Windows is 300ms,= =20 Linux 200ms, and FreeBSD 233ms. A lower MinRTO requires a correspondingl= y=20 lower delayed ACK timeout on the receiver. Thus, it is RECOMMENDED that a= n=20 implementation allow configuration of lower timeouts for DCTCP connection= s. - It is also RECOMMENDED that an implementation allow configuration of=20 restarting the congestion window (cwnd) of idle DCTCP connections as desc= ribed=20 in [RFC5681]. - [RFC3168] forbids the ECN-marking of pure ACK packets, because of the inability of TCP to mitigate ACK-path congestion and protocol-wise preferential treatment by routers. However, dropping pure ACKs - rather than ECN marking them - has disadvantages for typical datacenter traffic patterns. Dropping of ACKs causes subsequent re- transmissions. It is RECOMMENDED that an implementation provide a=20 configuration knob that forces ECT to be set on pure ACKs. [Section 5] Deployment Issues - DCTCP and conventional TCP congestion control do not coexist well in the same network. In DCTCP, the marking threshold is set to a very low value to reduce queueing delay, and a relatively small amount of congestion will exceed the marking threshold. During such periods of congestion, conventional TCP will suffer packet loss and quickly and drastically reduce cwnd. DCTCP, on the other hand, will use the fraction of marked packets to reduce cwnd more gradually. Thus, the rate reduction in DCTCP will be much slower than that of conventional TCP, and DCTCP traffic will gain a larger share of the capacity compared to conventional TCP traffic traversing the same path. It is RECOMMENDED that DCTCP traffic be segregated from conventional TCP traff= ic. [MORGANSTANLEY] describes a deployment that uses the IP DSCP bits to=20 segregate the network such that AQM is applied to DCTCP traffic, whereas= =20 TCP traffic is managed via drop-tail queueing. - Since DCTCP relies on congestion marking by the switches, DCTCP can only be deployed in datacenters where the entire network infrastructure supports ECN. The switches may also support configuration of the congestion threshold used for marking. The proposed parameterization can be configured with switches that implement RED. [DCTCP10] provides a theoretical basis for selecting the congestion threshold, but as with the estimation gain, it may be more practical to rely on experimentation or simply to use the default configuration of the device. DCTCP will degrade to loss- based congestion control when transiting a congested drop-tail link. - DCTCP requires changes on both the sender and the receiver, so both endpoints must support DCTCP. Furthermore, DCTCP provides no mechanism for negotiating its use, so both endpoints must be configured through some out-of-band mechanism to use DCTCP. A variant of DCTCP that can be deployed unilaterally and only requires standard ECN behavior has been described in [ODCTCP][BSDCAN], but requires additional experimental evaluation. [Section 6] Known Issues - DCTCP relies on the sender=E2=80=99s ability to reconstruct the stream o= f CE codepoints received by the remote endpoint. To accomplish this, DCTCP avoids using a single ACK packet to acknowledge segments received both with and without the CE codepoint set. However, if one or more ACK packets are dropped, it is possible that a subsequent ACK will cumulatively acknowledge a mix of CE and non-CE segments. This will, of course, result in a less accurate congestion estimate. o Even with an inaccurate congestion estimate, DCTCP may still perform better than [RFC3168]. o If the estimation gain is small relative to the packet loss rate, the estimate may not be too inaccurate. o If packet loss mostly occurs under heavy congestion, most drops will occur during an unbroken string of CE packets, and the estimate will be unaffected - The effect of packet drops on DCTCP under real world conditions has not b= een analyzed. - Much like standard TCP, DCTCP is biased against flows with longer RTTs. A method for improving the fairness of DCTCP has been proposed in [ADCTCP], but requires additional experimental evaluation. Papers: Data Center TCP [DCTCP10] - http://research.microsoft.com/en-us/um/people/padhye/publications/dctcp-s= igcomm2010.pdf The original DCTCP SIGCOMM paper by Stanford and Microsoft Research. It is = very accessible even for those of us not well versed in CC protocols. - reduce minRTO to 10ms. - suggest that K > (RTT * C)/7, where C is the sending rate in packets per= second. Attaining the Promise and Avoiding the Pitfalls of TCP=20 in the Datacenter [MORGANSTANLEY] - https://www.usenix.org/system/files/conference/nsdi15/nsdi15-paper-judd.p= df Real world experience deploying DCTCP on Linux at Morgan Stanley. - reduce minRTO to 5ms. - reduce delayed ACK to 1ms. - Only ToR switches support ECN marking, higher level switches purely tai= l-drop. Tests show that DCTCP successfully resorts to loss-based congestion con= trol when transiting a congested drop-tail link. - Find that setting ECT on SYN and SYN-ACK is critical for the practical= =20 deployment of DCTCP. Under load, DCTCP would fail to establish network= =20 connections in the absence of ECT in SYN and SYN-ACK packets. (DCTCP+) - Without correct receive buffer tuning DCTCP will converge _faster_ than= TCP, rather than the theoretical 1.4 x TCP. Per-packet latency in ms =09 TCP=09 DCTCP+ Mean=09 4.01=09 0.0422 Median=09 4.06=09 0.0395 Maximum=09 4.20=09 0.0850 Minimum=09 3.32=09 0.0280 sigma=09 0.167 0.0106 Extensions to FreeBSD Datacenter TCP for Incremental Deployment Support [BSDCAN] - https://www.bsdcan.org/2015/schedule/attachments/315_dctcp-bsdcan2015-pap= er.pdf Proposes a variant of DCTCP that can be deployed only on one endpoint of a = connection, provided the peer is ECN-capable. ODTCP changes: - In order to facilitate one-sided deployment, a DCTCP sender should set the CWR mark after receiving an ECE- marked ACK once per RTT. It is safe in two-sided deploy- ments, because a regular DCTCP receiver will simply ig- nore the CWR mark.=20 - A a one-sided DCTCP receiver should always delay an ACK for=20 incoming packets marked with CWR, which is the only indication of recovery exit. DCTCP improvements: - ECE processing: Under standard ECN an ACK with an ECE mark will trigger congestion recovery. When this happens a sender stops increasing cwnd for one RTT. For DCTCP there is no reason for this response. ECEs are used, not for detecting congestion=20 events, but to quantify the extent of congestion and react=20 proportionally. Thus, there is no need to stop cwnd from in- creasing.=20 - Set initial value of alpha to 0 (i.e. don't halve cwnd on first ECE seen). - Idle Periods: The same tradeoffs regarding "slow-start restart" apply to alpha. The FreeBSD implementation re-initializes alpha after an idle period longer than the RTO. - Timeouts and Packet Loss: The DCTCP specification defines the update interval for alpha as one RTT. To track this DCTCP compares received ACKs against the sequence numbers of outgoing packets. This is not robust in the face of packet loss. The FreeBSD=20 implementation addresses this by updating alpha when it detects duplicate ACKs or timeouts. =20 Data Center TCP (DCTCP) - http://www.ietf.org/proceedings/80/slides/iccrg-3.pdf Case studies, workloads, latency and flow completion time of TCP vs DCTCP. Interesting set of slides worth skimming. - Small (10-100KB & 100KB - 1MB) background flows complete in ~45% less= =20 time than TCP. - 99th %ile & 99.9th %ile query flows are 2/3rds and 4/7ths respectively - large (1-10MB & > 10MB) flows unchanged - query completion time with 10 to 1 background incast unchanged with=20 DCTCP, ~5x slower with TCP Analysis of DCTCP: Stability, Convergence, and Fairness [ADCTCP] - http://sedcl.stanford.edu/files/dctcp-analysis.pdf Follow up mathematical analysis of DCTCP using a fluid model. Contains=20 interesting graphs showing how the gain factor affects the convergence rate between two flows. - Analyzes the convergence of DCTCP sources to their fair share, obtaini= ng an explicit characterization of the convergence rate. - Proposes a simple change to DCTCP suggested by the fluid model which= =20 significantly improves DCTCP's RTT-fairness. It suggests updating the= =20 congestion window continuously rather than once per RTT. - Finds that with a marking threshold, K, of about 17% of the bandwidth- delay product, DCTCP achieves 100% throughput, and that even for value= s=20 of K as small as 1% of the bandwidth-delay product, its throughput is= =20 at least 94%. - Show that DCTCP's convergence rate is no more than a factor 1.4 slower= than=20 TCP Using Data Center TCP (DCTCP) in the Internet [ADCTCP] - http://www.ikr.uni-stuttgart.de/Content/Publications/Archive/Wa_GLOBECOM_= 14_40260.pdf Investigates what would be needed to deploy DCTCP incrementally outside the= data center. - Proposes finer resolution for alpha value - Allow the congestion window to grow in the CWR state (similar to [BSDC= AN]) - Continuous update of alpha: Define a smaller gain factor (1/2^8 instea= d of 1/2^4) to permit an EWMA updated every packet. However, g should actually be = a function of number of packets in flight. - Progressive congestion window reduction: Similar to [ADCTCP], reduce t= he congestion window on the reception of each ECE. - develops a formula for AQM RED parameters that always results in equal= sharing between DCTCP and non-DCTCP. Incast Transmission Control Protocol (ICTCP): In ICTCP the receiver plays a direct role in estimating the per-flow availa= ble bandwidth and actively re-sizes each connection's receive window accordingly. - http://research.microsoft.com/pubs/141115/ictcp.pdf Quantum Congestion Notification (QCN): Congestion control in ethernet. Introduced as part of the IEEE 802.2 Standa= rds=20 Body discussions for Data Center Bridging [DCB] motivated by the needs of F= CoE.=20 The initial congestion control protocol was standardized as 802.1Qau. Unlik= e=20 the single bit of congestion information per-packet in TCP QCN uses 6-bits. The algorithm is composed of two main parts: Switch or Control Point (CP)= =20 Dynamics and Rate Limiter or Reaction Point (RP) Dynamics. - The CP Algorithm runs at the network nodes. Its objective is to maintai= n the node's buffer occupancy at the operating point 'Beq'. It computes a con= - gestion measure Fb and randomly samples an incoming packet with a proba= bility=20 proportional to the severity of the congestion. The node sends a 6-bit= =20 quantized value of Fb back to the source of the sampled packet. =20 - B: Value of the current queue length - Bold: Value of the buffer occupancy when the last feedback message wa= s=20 generated. - w: a non-negative constant, equal to 2 for the baseline implementatio= n - Boff =3D B - Beq - Bd =3D B - Bold - Fb =3D Boff + w*Bd - essentially equivalent to the PI AQM. The first term is the offset =09from the target operating point and the second term is proportiona= l =09to the rate at which the queue size is changing. When Fb < 0, there is no congestion, and no feedback messages are sent= . When Fb >=3D 0, then either the buffers or the link is oversubscribed,= and control action needs to be taken. - The RP algorithm runs on end systems (NICs) and controls the rate at w= hich ethernet packets are transmitted. Unlike TCP, the RP algorithm does no= t get positive ACKs from the network and thus needs alternative mechanis= ms for increasing its sending rate. =20 - Current Rate (Rc): The transmission rate of the source - Target Rate (Rt): The transmission rate of the source just before th= e=20 arrival of the last feedback message - Gain (Gd): a constant chosen so that Gd*|Fbmax| =3D 1/2 - that is to= say the rate can decrease by at most 50%. Only 6 bits are available for feedback so Fbmax =3D 64, and thus Gd =3D 1/128. - Byte counter: A counter at the RP for counting transmitted bytes; us= ed to time rate increases - Timer: A clock at the RP used for timing rate increases. Rate Decreases: A rate decrease is only done when a feedback message is received: - Rt <- Rc - Rt <- Rc*(1 - Gd*|Fb|)=20 Rate Increases: Rate Increase is done in two phases: Fast Recovery and Active Increase= . Fast Recovery (FR): The source enters the FR state immediately after= a rate decrease event - at which point the Byte Counter is reset. FR consists of 5 cycles, in each of which 150KB of data (assuming full- sized regular frames) are transmitted (100 packets of 1500 bytes eac= h), as counted by the Byte Counter. At the end of each cycle, Rt remains unchanged, and Rc is updated as follows: =09=09 =20 =09=09 Rc <- (Rc + Rt)/2 =09The rationale being that, when congested, Rate Decrease messages are =09sent by the CP once every 100 packets. Thus the absence of a Rate =09Decrease message during this interval indicates that the CP is no =09longer congested. Active Increase (AI): After 5 cycles of FR, the source enters the AI state when it probes for extra bandwidth. AI consists of multiple cycles of 50 packets each. Rt and Rc are updated as follows: =09= =20 =09 - Rt <- Rt + Rai =09 - Rc <- (Rc + Rt)/2 =09 - Rai: a constant set to 5Mbps by default. When Rc is extremely small after a rate decrease the time required to send out 150 KB can be excessive. To increase the rate of increase the source also uses a timer that is used as follows:=20 =09 1) reset timer when rate decrease message arrives =09 2) source enters FR and counts out 5 cycles of T ms duration =09 (T =3D 10ms in baseline implementation), and in the AI state, =09 each cycle is T/2 ms long =09 3) in the AI state, Rc is updated when _either_ the Byte Counter =09 or the Timer completes a cycle. =09 4) The source is is in teh AI state iff either the Byte Counter =09 or the timer is in teh AI state. =09 5) if _both_ the Byte Counter and the Timer ar in AI the source is =09 said to be in Hyper-Active Increase (HAI). In this case, at the =09 completion of the ith Byte Counter and Timer cycle, Rt and Rc =09 are updated: =09 - Rt <- Rt + i*Rhai =09 - Rc <- (Rc + Rt) / 2 =09 - Rhai: 50Mbps in the baseline [Taken from "Internet Congestion Control" by Subir Varna, ch. 8] Performance of Quantized Congestion Notification in TCP Incast Scenarios of= =20 Data Centers - http://eprints.networks.imdea.org/131/1/Performance_of_Quantized_Congesti= on_Notification_-_2010_EN.pdf Using the QCN pseudocode version released by Rong Pan [IEEE EDCS-608482]=20 simulated the performance of QCN at 1Gbps under a number of incast scenario= s, reaching the conclusion that the the default QCN behaviors will not scale to large number of flows with full link utilization. It goes on to propose a small number of changes to the QCN algorithm that _will_ support a large number of flows at full link utilization. However, there is no indication i= n the literature that these ideas have been taken any further in practice. A surv= ey paper written in 2014 [A Survey on TCP Incast in Data Center Networks] indi= cates that these problems still exist. It is unclear what the current state of th= e art is in shipping hardware. http://www.ieee802.org/3/ar/public/0505/bergamasco_1_0505.pdf http://www.ieee802.org/1/files/public/docs2007/au-bergamasco-ecm-v0.1.pdf http://www.cs.wustl.edu/~jain/papers/ftp/bcn.pdf http://www.cse.wustl.edu/~jain/papers/ftp/icc08.pdf Recommendations:=20 RFC 6298:=20 - change starting RTO from 3s to 1s=09 (in /dctcp)=09=09=09=09D4294 - DO NOT round RTO up to 1s counter to the suggestions here (long done) - simplify setting of minRTO sysctl to eliminate "slop" component"=09= =09D4294 (in /dctcp) RFC 6928: - increase initial / idle window to 10 segments when connecting to=09(d= one by hiren) data center peers RFC 7323: - stop truncating SRTT prematurely on low-latency conections,=09=09D429= 3 see appendix G to calculate reduce potentially detrimental fluctuations in calculated RTO Incast: - do SW TSO only - add rudimentary pacing by interleaving streams - fine grained timers=09=09=09=09=09=09=09=09D4292 - scale RTO down to same granularity as RTT=09(patch in progres) ECN: - change default to allow ECN on incoming connections - set ECT on _ALL_ packets sent by a host using a DCTCP connection =20 - add facility to enable ECN by subnet DCTCP: - add facility to enable DCTCP by subnet - set ECT on _ALL_ packets used by a host using a DCTCP connection - update TCP to use microsecond granularity timers to timestamps (patch in= progress) - when using current coarse-grained timers reduce minRTO to 3ms=09=09D4294 when using DCTCP, if fine-grained timers are available disable minRTO when using DCTCP - reduce delack to 1/5th of min(minRTO, RTO) (reduced to 1/2 in /dctcp)=09= D4294 ICTCP: - if there is time investigate it's use and the ability to use the socket buffer sizing to communicate the amount of anticipated data for purposes of TCB's sharing the port's connection optimally=09 From owner-freebsd-net@freebsd.org Fri Nov 27 07:52:59 2015 Return-Path: Delivered-To: freebsd-net@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 7C692A3A371 for ; Fri, 27 Nov 2015 07:52:59 +0000 (UTC) (envelope-from hiren@strugglingcoder.info) Received: from mail.strugglingcoder.info (strugglingcoder.info [65.19.130.35]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "mail.strugglingcoder.info", Issuer "mail.strugglingcoder.info" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 6C7E21862 for ; Fri, 27 Nov 2015 07:52:59 +0000 (UTC) (envelope-from hiren@strugglingcoder.info) Received: from localhost (unknown [10.1.1.3]) (Authenticated sender: hiren@strugglingcoder.info) by mail.strugglingcoder.info (Postfix) with ESMTPA id 8EA6BC4BE6; Thu, 26 Nov 2015 23:52:58 -0800 (PST) Date: Thu, 26 Nov 2015 23:52:58 -0800 From: hiren panchasara To: Matthew Macy Cc: "freebsd-net@freebsd.org" Subject: Re: TCP notes and incast recommendations Message-ID: <20151127075258.GD68002@strugglingcoder.info> References: <15146a8f285.b094791a15089.3823664487014698900@nextbsd.org> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="C1iGAkRnbeBonpVg" Content-Disposition: inline In-Reply-To: <15146a8f285.b094791a15089.3823664487014698900@nextbsd.org> User-Agent: Mutt/1.5.23 (2014-03-12) X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 27 Nov 2015 07:52:59 -0000 --C1iGAkRnbeBonpVg Content-Type: text/plain; charset=us-ascii Content-Disposition: inline On 11/26/15 at 05:57P, Matthew Macy wrote: > In an effort to be somewhat current on the state TCP I've collected a small bibliography. This is beyond awesome! Thank you for this work. Cheers, Hiren --C1iGAkRnbeBonpVg Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQF8BAABCgBmBQJWWAvXXxSAAAAAAC4AKGlzc3Vlci1mcHJAbm90YXRpb25zLm9w ZW5wZ3AuZmlmdGhob3JzZW1hbi5uZXRBNEUyMEZBMUQ4Nzg4RjNGMTdFNjZGMDI4 QjkyNTBFMTU2M0VERkU1AAoJEIuSUOFWPt/lcHgH/RFscV6eCMNap2wqsFAl0Bcw 7mmQqA8L2WRi1qMoz8Lrxw/RnOGKfn5cXXO5i/ntbV7HEIqvkQXkzsixfHN4nRFV /lnrLEJC/DHwpgno7diU4zPNcxOoENpX/pMwakcXzhQpaWkf8f7NgcECPQRDgDhF 8kCTAzQfH8WNKGBiEXDCM7xdrtByyBQItB9JAw+2oJ1zMxkg+Y5F6tIFnOfDR4F1 WLL6mp/mtvUhp8S8UhWM3ytFUsjSH1X2iRbOBD7Bda5F+jzl5WhqrrtlLIsUqgQg SSKXjOrS+s0q7a5pc0Y/Dzsx7BM6lKLOFmxzu9+xOgaets3AVKHMBd0+Jczv+M4= =/1Uh -----END PGP SIGNATURE----- --C1iGAkRnbeBonpVg-- From owner-freebsd-net@freebsd.org Fri Nov 27 09:18:09 2015 Return-Path: Delivered-To: freebsd-net@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id A548EA36DFA for ; Fri, 27 Nov 2015 09:18:09 +0000 (UTC) (envelope-from ddb@neosystem.org) Received: from mail.neosystem.cz (mail.neosystem.cz [IPv6:2001:41d0:2:5ab8::10:15]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 6D7151983; Fri, 27 Nov 2015 09:18:09 +0000 (UTC) (envelope-from ddb@neosystem.org) Received: from mail.neosystem.cz (unknown [127.0.10.15]) by mail.neosystem.cz (Postfix) with ESMTP id 6AB32BD7D; Fri, 27 Nov 2015 10:18:06 +0100 (CET) X-Virus-Scanned: amavisd-new at mail.neosystem.cz Received: from iron.sn.neosystem.cz (unknown [IPv6:2001:41d0:2:5ab8::100:107]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.neosystem.cz (Postfix) with ESMTPSA id 9CE92BD77; Fri, 27 Nov 2015 10:18:05 +0100 (CET) Date: Fri, 27 Nov 2015 10:13:49 +0100 From: Daniel Bilik To: Gary Palmer Cc: freebsd-net@freebsd.org Subject: Re: Outgoing packets being sent via wrong interface Message-Id: <20151127101349.752c94090e78ca68cf0f81fc@neosystem.org> In-Reply-To: <20151125122033.GB41119@in-addr.com> References: <20151120155511.5fb0f3b07228a0c829fa223f@neosystem.org> <20151120163431.3449a473db9de23576d3a4b4@neosystem.org> <20151121212043.GC2307@vega.codepro.be> <20151122130240.165a50286cbaa9288ffc063b@neosystem.cz> <20151125092145.e93151af70085c2b3393f149@neosystem.cz> <20151125122033.GB41119@in-addr.com> X-Mailer: Sylpheed 3.4.3 (GTK+ 2.24.28; amd64-portbld-freebsd10.2) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 27 Nov 2015 09:18:09 -0000 On Wed, 25 Nov 2015 12:20:33 +0000 Gary Palmer wrote: > route -n get As suggested by Kevin and Ryan, I set the router to drop redirects... net.inet.icmp.drop_redirect: 1 ... but it happened again today, and again affected host was 192.168.2.33. Routing and arp entries were correct. Output of "route -n get"... route to: 192.168.2.33 destination: 192.168.2.0 mask: 255.255.255.0 fib: 0 interface: re1 flags: recvpipe sendpipe ssthresh rtt,msec mtu weight expire 0 0 0 0 1500 1 0 ... has not changed during the problem. Interesting was ping result... PING 192.168.2.33 (192.168.2.33): 56 data bytes ping: sendto: Operation not permitted ping: sendto: Operation not permitted ... 64 bytes from 192.168.2.33: icmp_seq=11 ttl=128 time=0.593 ms ping: sendto: Operation not permitted ... 64 bytes from 192.168.2.33: icmp_seq=20 ttl=128 time=0.275 ms 64 bytes from 192.168.2.33: icmp_seq=21 ttl=128 time=0.251 ms ping: sendto: Operation not permitted ... 64 bytes from 192.168.2.33: icmp_seq=40 ttl=128 time=0.245 ms ping: sendto: Operation not permitted 64 bytes from 192.168.2.33: icmp_seq=42 ttl=128 time=7.111 ms ping: sendto: Operation not permitted ... --- 192.168.2.33 ping statistics --- 46 packets transmitted, 5 packets received, 89.1% packet loss It seems _some_ packets go the right interface (re1), but most try to go wrong (re0) and are dropped by pf... 00:00:01.066886 rule 53..16777216/0(match): block out on re0: 82.x.y.50 > 192.168.2.33: ICMP echo request, id 58628, seq 39, length 64 00:00:02.017874 rule 53..16777216/0(match): block out on re0: 82.x.y.50 > 192.168.2.33: ICMP echo request, id 58628, seq 41, length 64 00:00:02.069634 rule 53..16777216/0(match): block out on re0: 82.x.y.50 > 192.168.2.33: ICMP echo request, id 58628, seq 43, length 64 And again, refreshing default route (delete default / add default) resolved it... PING 192.168.2.33 (192.168.2.33): 56 data bytes 64 bytes from 192.168.2.33: icmp_seq=0 ttl=128 time=0.496 ms 64 bytes from 192.168.2.33: icmp_seq=1 ttl=128 time=0.226 ms 64 bytes from 192.168.2.33: icmp_seq=2 ttl=128 time=0.242 ms 64 bytes from 192.168.2.33: icmp_seq=3 ttl=128 time=0.226 ms -- Dan From owner-freebsd-net@freebsd.org Fri Nov 27 20:28:25 2015 Return-Path: Delivered-To: freebsd-net@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 200D3A3A1FB for ; Fri, 27 Nov 2015 20:28:25 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2001:1900:2254:206a::16:76]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 0C729156D for ; Fri, 27 Nov 2015 20:28:25 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from bugs.freebsd.org ([127.0.1.118]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id tARKSOoW081302 for ; Fri, 27 Nov 2015 20:28:24 GMT (envelope-from bugzilla-noreply@freebsd.org) From: bugzilla-noreply@freebsd.org To: freebsd-net@FreeBSD.org Subject: [Bug 204853] Panic after close openconnect VPN Cisco Date: Fri, 27 Nov 2015 20:28:24 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: kern X-Bugzilla-Version: 10.2-RELEASE X-Bugzilla-Keywords: X-Bugzilla-Severity: Affects Only Me X-Bugzilla-Who: linimon@FreeBSD.org X-Bugzilla-Status: New X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: freebsd-net@FreeBSD.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: assigned_to Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 27 Nov 2015 20:28:25 -0000 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=204853 Mark Linimon changed: What |Removed |Added ---------------------------------------------------------------------------- Assignee|freebsd-bugs@FreeBSD.org |freebsd-net@FreeBSD.org -- You are receiving this mail because: You are the assignee for the bug. From owner-freebsd-net@freebsd.org Sat Nov 28 10:06:59 2015 Return-Path: Delivered-To: freebsd-net@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 57051A3AFEE for ; Sat, 28 Nov 2015 10:06:59 +0000 (UTC) (envelope-from julian@freebsd.org) Received: from vps1.elischer.org (vps1.elischer.org [204.109.63.16]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 26D591DFA; Sat, 28 Nov 2015 10:06:58 +0000 (UTC) (envelope-from julian@freebsd.org) Received: from Julian-MBP3.local (ppp121-45-225-88.lns20.per1.internode.on.net [121.45.225.88]) (authenticated bits=0) by vps1.elischer.org (8.15.2/8.15.2) with ESMTPSA id tASA6pCe084546 (version=TLSv1.2 cipher=DHE-RSA-AES128-SHA bits=128 verify=NO); Sat, 28 Nov 2015 02:06:55 -0800 (PST) (envelope-from julian@freebsd.org) Subject: Re: Outgoing packets being sent via wrong interface To: Daniel Bilik , Gary Palmer References: <20151120155511.5fb0f3b07228a0c829fa223f@neosystem.org> <20151120163431.3449a473db9de23576d3a4b4@neosystem.org> <20151121212043.GC2307@vega.codepro.be> <20151122130240.165a50286cbaa9288ffc063b@neosystem.cz> <20151125092145.e93151af70085c2b3393f149@neosystem.cz> <20151125122033.GB41119@in-addr.com> <20151127101349.752c94090e78ca68cf0f81fc@neosystem.org> Cc: freebsd-net@freebsd.org From: Julian Elischer Message-ID: <56597CB5.7030307@freebsd.org> Date: Sat, 28 Nov 2015 18:06:45 +0800 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:38.0) Gecko/20100101 Thunderbird/38.4.0 MIME-Version: 1.0 In-Reply-To: <20151127101349.752c94090e78ca68cf0f81fc@neosystem.org> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 28 Nov 2015 10:06:59 -0000 On 27/11/2015 5:13 PM, Daniel Bilik wrote: > On Wed, 25 Nov 2015 12:20:33 +0000 > Gary Palmer wrote: > >> route -n get > As suggested by Kevin and Ryan, I set the router to drop redirects... > > net.inet.icmp.drop_redirect: 1 > > ... but it happened again today, and again affected host was 192.168.2.33. > Routing and arp entries were correct. Output of "route -n get"... > > route to: 192.168.2.33 > destination: 192.168.2.0 > mask: 255.255.255.0 > fib: 0 > interface: re1 > flags: > recvpipe sendpipe ssthresh rtt,msec mtu weight expire > 0 0 0 0 1500 1 0 > > ... has not changed during the problem. > > Interesting was ping result... > > PING 192.168.2.33 (192.168.2.33): 56 data bytes > ping: sendto: Operation not permitted > ping: sendto: Operation not permitted > ... > 64 bytes from 192.168.2.33: icmp_seq=11 ttl=128 time=0.593 ms > ping: sendto: Operation not permitted > ... > 64 bytes from 192.168.2.33: icmp_seq=20 ttl=128 time=0.275 ms > 64 bytes from 192.168.2.33: icmp_seq=21 ttl=128 time=0.251 ms > ping: sendto: Operation not permitted > ... > 64 bytes from 192.168.2.33: icmp_seq=40 ttl=128 time=0.245 ms > ping: sendto: Operation not permitted > 64 bytes from 192.168.2.33: icmp_seq=42 ttl=128 time=7.111 ms > ping: sendto: Operation not permitted > ... > --- 192.168.2.33 ping statistics --- > 46 packets transmitted, 5 packets received, 89.1% packet loss > > It seems _some_ packets go the right interface (re1), but most > try to go wrong (re0) and are dropped by pf... > > 00:00:01.066886 rule 53..16777216/0(match): block out on re0: 82.x.y.50 > 192.168.2.33: ICMP echo request, id 58628, seq 39, length 64 > 00:00:02.017874 rule 53..16777216/0(match): block out on re0: 82.x.y.50 > 192.168.2.33: ICMP echo request, id 58628, seq 41, length 64 > 00:00:02.069634 rule 53..16777216/0(match): block out on re0: 82.x.y.50 > 192.168.2.33: ICMP echo request, id 58628, seq 43, length 64 > > And again, refreshing default route (delete default / add default) > resolved it... > > PING 192.168.2.33 (192.168.2.33): 56 data bytes > 64 bytes from 192.168.2.33: icmp_seq=0 ttl=128 time=0.496 ms > 64 bytes from 192.168.2.33: icmp_seq=1 ttl=128 time=0.226 ms > 64 bytes from 192.168.2.33: icmp_seq=2 ttl=128 time=0.242 ms > 64 bytes from 192.168.2.33: icmp_seq=3 ttl=128 time=0.226 ms next time it happens try flushing the arp table. > > -- > Dan > _______________________________________________ > freebsd-net@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" > From owner-freebsd-net@freebsd.org Sat Nov 28 11:16:33 2015 Return-Path: Delivered-To: freebsd-net@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id B1252A3ACA4; Sat, 28 Nov 2015 11:16:33 +0000 (UTC) (envelope-from julian@freebsd.org) Received: from vps1.elischer.org (vps1.elischer.org [204.109.63.16]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 7D0E01393; Sat, 28 Nov 2015 11:16:30 +0000 (UTC) (envelope-from julian@freebsd.org) Received: from Julian-MBP3.local (ppp121-45-225-88.lns20.per1.internode.on.net [121.45.225.88]) (authenticated bits=0) by vps1.elischer.org (8.15.2/8.15.2) with ESMTPSA id tASBGK0f085176 (version=TLSv1.2 cipher=DHE-RSA-AES128-SHA bits=128 verify=NO); Sat, 28 Nov 2015 03:16:23 -0800 (PST) (envelope-from julian@freebsd.org) Subject: Re: Kernel NAT issues To: Nathan Aherne References: <94B91F98-DE01-4A10-8AB5-4193FE11AF3F@reddog.com.au> <20151013142301.B67283@sola.nimnet.asn.au> <20151014232026.S15983@sola.nimnet.asn.au> <9908EC22-344F-4D0B-8930-7D2C70B084A1@reddog.com.au> <32DEEFB3-E41F-40CD-8E1A-520FB261C572@reddog.com.au> <564C8879.8070307@freebsd.org> <20151119032200.T27669@sola.nimnet.asn.au> <9D81BDD4-200C-40AB-AB24-B1112881E43A@reddog.com.au> <3BF360A8-35E6-4043-8AFF-87D983F29C66@reddog.com.au> <5652B9EB.10805@freebsd.org> Cc: freebsd-ipfw@freebsd.org, Ian Smith , "freebsd-net@freebsd.org" From: Julian Elischer Message-ID: <56598CFF.3060102@freebsd.org> Date: Sat, 28 Nov 2015 19:16:15 +0800 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:38.0) Gecko/20100101 Thunderbird/38.4.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 28 Nov 2015 11:16:33 -0000 On 27/11/2015 12:55 PM, Nathan Aherne wrote: > Hi Julian, > > Thank you for replying. I was completely off grid for a while and only got back on it today. > > I thought that Vimage was probably the way to achieve what I want. The main reason I was staying away from Vimage was the reported bugs with it, another reason was the extra overhead. I would like to be able to shutdown jails quite regularly so was worried the kernel panic bug or memory leak bug might be a problem here. Is there any version of Vimage/FreeBSD which is stable? Generally vimage is stable. It has had problems with pf over the years becasue pf is imported from OpenBSD and has some pretty vimage-unfriendly assumptions in its design, but I hear that even some of thise have been ironed out. I know of vimage being used to run production virtual systems in some of the largest banks in the world processing amounts of trnasactions that would make your head spin so have a small play with it. Vimage overhead is negative in some situations. i.e. things work faster.. This is especially true when non vimage workloads contest a single lock heavily, but vimage splits it over many locks.. one for each VM. run up a virtualbox or amazon or whatever freebsd instance and play around with it. once realize how insanely powerful it is, you will wonder how you ever did jails without it. you can use bridges, epairs or netgraph to do your networking... your choice. > > Regards, > > Nathan > >> On 23 Nov 2015, at 5:02 pm, Julian Elischer wrote: >> >> On 21/11/2015 10:06 AM, Nathan Aherne wrote: >>> I had a bit of a think about how to describe what I am trying to achieve. >>> >>> I am treating each jail likes its own little "virtual machine”. The jail provides certain services, using things like nginx or nodejs, php-fpm, mysql or postgresql. The jails can control connections to themselves by configuring the firewall ports that are opened on the IP their IP (10.0.0.0/16 or a public IP). I know the jails have no firewall of their own, the firewall is configured from the host. >>> >>> I want each jail or “virtual machine” to be able to communicate with one another and the wider internet. When a jail does a DNS query for another App jail, it may get a public IP on its own Host (or it may get another host) and it has no issues being able to communicate with another jail on the same host. >>> >>> At the moment all of the above is working perfectly except for jail to jail communication on the same host (when the communication is not directly between 10.0.0.0/16 IP addresses). >> this is pretty much exactly when vimage/vnet jails could be used to great affect. >> Is there a reason you are not doing that? Each jail has it's own routing tables, addresses and (virtual) interfaces. >> >> here's how I'd do it with vimage >> >> +--------------+ >> +---------------+ | servers >> | +--------------+ >> | >> | +--------------+ >> | +--------+ | >> | | +--------------+ >> | | >> +--------+ +--+------+----+ >> | iface | | bridge | >> | +-----+ | >> +--------+ +----+---------+ >> | >> | >> | >> | >> | >> | >> +------------------------+---------------------+ >> | | >> | | >> | NAT jail router | >> | | >> | | >> +-------+--------+--------+-------+------------+ >> | | | | >> +--+--+ +--+--+ +--+--+ +--+--+ >> | | | | | | | | >> | | | | | | | | >> | | | | | | | | jails >> | | | | | | | | >> +-----+ +-----+ +-----+ +-----+ >> >> >> >> however the hairpin idea might still be useful even in that scenario if they don't know about each other's 'local' addresses, but do NAT'd machines need to talk to each other by externeal addresses? >> >> i Nathan >>>> On 21 Nov 2015, at 9:12 am, Nathan Aherne wrote: >>>> >>>> I am not exactly sure how to draw the setup so it doesn’t confuse the situation. The setup is extremely simple (I am not running vimage), jails running on the 10.0.0.0/16 (cloned lo1 interface) network or with public IPs. The jails with private IPs are the HTTP app jails. The Host runs a HTTP Proxy (nginx) and forwards traffic to each HTTP App jail based on the URL it receives. The jails with public IPs are things like database jails which cannot be proxied by the Host. >>>> >>>> I can happily communicate with any jail from my laptop (externally) but when I want one jail to communicate with another jail (for example an App Jail communicating with the database jail) the traffic shows as backwards (destination:port -> source:port) in the IPFW logs (tshark shows the traffic correctly source:port -> destination:port). The jail to jail traffic tries to go over the lo1 interface (backwards) and is blocked. Below is some IPFW logs of an App jail (10.0.0.25) communicating with the database jail (aaa.bbb.ccc.ddd) >>>> >>>> IPFW logs. The lines labelled UNKNOWN is the check-state rule (everything is labelled UNKNOWN even if it is KNOWN traffic) >>>> >>>> Nov 21 08:49:07 host5 kernel: ipfw: 101 UNKNOWN TCP eee.fff.gg.hhh:5432 10.0.0.25:42957 out via lo1 >>>> Nov 21 08:49:07 host5 kernel: ipfw: 65501 Deny TCP eee.fff.gg.hhh:5432 10.0.0.25:42957 out via lo1 >>>> Nov 21 08:49:10 host5 kernel: ipfw: 101 UNKNOWN TCP eee.fff.gg.hhh:5432 10.0.0.25:42957 out via lo1 >>>> Nov 21 08:49:10 host5 kernel: ipfw: 65501 Deny TCP eee.fff.gg.hhh:5432 10.0.0.25:42957 out via lo1 >>>> Nov 21 08:49:13 host5 kernel: ipfw: 101 UNKNOWN TCP eee.fff.gg.hhh:5432 10.0.0.25:42957 out via lo1 >>>> Nov 21 08:49:13 host5 kernel: ipfw: 65501 Deny TCP eee.fff.gg.hhh:5432 10.0.0.25:42957 out via lo1 >>>> Nov 21 08:49:16 host5 kernel: ipfw: 101 UNKNOWN TCP eee.fff.gg.hhh:5432 10.0.0.25:42957 out via lo1 >>>> Nov 21 08:49:16 host5 kernel: ipfw: 65501 Deny TCP eee.fff.gg.hhh:5432 10.0.0.25:42957 out via lo1 >>>> >>>> tshark output (loopback and wan interface capture for port 5432) >>>> >>>> Capturing on 'Loopback' and 'bce0' >>>> 1 0.000000 10.0.0.25 -> eee.fff.gg.hhh TCP 64 42957→5432 [SYN] Seq=0 Win=65535 Len=0 MSS=16344 WS=64 SACK_PERM=1 TSval=142885525 TSecr=0 >>>> 2 3.013905 10.0.0.25 -> eee.fff.gg.hhh TCP 64 [TCP Retransmission] 42957→5432 [SYN] Seq=0 Win=65535 Len=0 MSS=16344 WS=64 SACK_PERM=1 TSval=142888539 TSecr=0 >>>> 3 6.241658 10.0.0.25 -> eee.fff.gg.hhh TCP 64 [TCP Retransmission] 42957→5432 [SYN] Seq=0 Win=65535 Len=0 MSS=16344 WS=64 SACK_PERM=1 TSval=142891767 TSecr=0 >>>> 4 9.451516 10.0.0.25 -> eee.fff.gg.hhh TCP 64 [TCP Retransmission] 42957→5432 [SYN] Seq=0 Win=65535 Len=0 MSS=16344 WS=64 SACK_PERM=1 TSval=142894976 TSecr=0 >>>> 5 12.654656 10.0.0.25 -> eee.fff.gg.hhh TCP 64 [TCP Retransmission] 42957→5432 [SYN] Seq=0 Win=65535 Len=0 MSS=16344 WS=64 SACK_PERM=1 TSval=142898180 TSecr=0 >>>> 6 15.863900 10.0.0.25 -> eee.fff.gg.hhh TCP 64 [TCP Retransmission] 42957→5432 [SYN] Seq=0 Win=65535 Len=0 MSS=16344 WS=64 SACK_PERM=1 TSval=142901389 TSecr=0 >>>> 7 22.076655 10.0.0.25 -> eee.fff.gg.hhh TCP 64 [TCP Retransmission] 42957→5432 [SYN] Seq=0 Win=65535 Len=0 MSS=16344 WS=64 SACK_PERM=1 TSval=142907602 TSecr=0 >>>> >>>> >>>>> If so, what sort of routing is setup on both host and jails? >>>> Routing is what would be added by default (whatever the host system adds when adding an IP), there is no custom routing. I have wondered if I need to modify the routing table to get this to work. >>>> >>>> Below is the output of netstat -rn >>>> >>>> www.xxx.yy .zzz is the gateway address >>>> eee.fff.gg.hhh is the database jail public IP >>>> aaa.bbb.cc.ddd is the public IP for NAT >>>> lll.mmm.nn.ooo is the Hosts public IP >>>> >>>> >>>> Routing tables >>>> >>>> Internet: >>>> Destination Gateway Flags Netif Expire >>>> default www.xxx.yy .zzz UGS bce0 >>>> 10.0.0.1 link#6 UH lo1 >>>> 10.0.0.2 link#6 UH lo1 >>>> 10.0.0.3 link#6 UH lo1 >>>> 10.0.0.4 link#6 UH lo1 >>>> 10.0.0.5 link#6 UH lo1 >>>> 10.0.0.6 link#6 UH lo1 >>>> 10.0.0.7 link#6 UH lo1 >>>> 10.0.0.8 link#6 UH lo1 >>>> 10.0.0.9 link#6 UH lo1 >>>> 10.0.0.10 link#6 UH lo1 >>>> 10.0.0.11 link#6 UH lo1 >>>> 10.0.0.12 link#6 UH lo1 >>>> 10.0.0.13 link#6 UH lo1 >>>> 10.0.0.14 link#6 UH lo1 >>>> 10.0.0.15 link#6 UH lo1 >>>> 10.0.0.16 link#6 UH lo1 >>>> 10.0.0.17 link#6 UH lo1 >>>> 10.0.0.18 link#6 UH lo1 >>>> 10.0.0.19 link#6 UH lo1 >>>> 10.0.0.20 link#6 UH lo1 >>>> 10.0.0.21 link#6 UH lo1 >>>> 10.0.0.22 link#6 UH lo1 >>>> 10.0.0.23 link#6 UH lo1 >>>> 10.0.0.24 link#6 UH lo1 >>>> 10.0.0.25 link#6 UH lo1 >>>> 10.0.0.26 link#6 UH lo1 >>>> www.xxx.yy.zzz/25 link#1 U bce0 >>>> eee.fff.gg.hhh link#1 UHS lo0 >>>> eee.fff.gg.hhh/32 link#1 U bce0 >>>> aaa.bbb.cc .ddd link#1 UHS lo0 >>>> aaa.bbb.cc.ddd/32 link#1 U bce0 >>>> lll.mmm.nn.ooo link#1 UHS lo0 >>>> 127.0.0.1 link#5 UH lo0 >>>> >>>> Internet6: >>>> Destination Gateway Flags Netif Expire >>>> ::/96 ::1 UGRS lo0 >>>> ::1 link#5 UH lo0 >>>> ::ffff:0.0.0.0/96 ::1 UGRS lo0 >>>> fe80::/10 ::1 UGRS lo0 >>>> fe80::%lo0/64 link#5 U lo0 >>>> fe80::1%lo0 link#5 UHS lo0 >>>> ff01::%lo0/32 ::1 U lo0 >>>> ff02::/16 ::1 UGRS lo0 >>>> ff02::%lo0/32 ::1 U lo0 >>>> >>>>> Anything like ? >>>>> http://kb.juniper.net/InfoCenter/index?page=content&id=KB24639&actp=search >>>> Yes just like that. >>>> >>>> Regards, >>>> >>>> Nathan >>>> >>>>> On 19 Nov 2015, at 2:46 am, Ian Smith > wrote: >>>>> >>>>> On Wed, 18 Nov 2015 22:17:29 +0800, Julian Elischer wrote: >>>>>> On 11/18/15 8:40 AM, Nathan Aherne wrote: >>>>>>> For some reason hairpin (loopback nat or nat reflection) does not seem to >>>>>>> be working, which is why I chose IPFW in the first place. >>>>>> it would be good to see a diagram of what this actually means. >>>>> Anything like ? >>>>> http://kb.juniper.net/InfoCenter/index?page=content&id=KB24639&actp=search >>>>> >>>>> Was this so one jail can only access service/s provided by other jail/s, >>>>> both/all with internal NAT'd addresses, by using only the public address >>>>> and port of the 'router', which IIRC this is a single system with jails? >>>>> >>>>> If so, what sort of routing is setup on both host and jails? >>>>> >>>>> (blindfolded, no idea where I've pinned the donkey's tail :) >>>>> >>>>> cheers, Ian >>> _______________________________________________ >>> freebsd-ipfw@freebsd.org mailing list >>> https://lists.freebsd.org/mailman/listinfo/freebsd-ipfw >>> To unsubscribe, send any mail to "freebsd-ipfw-unsubscribe@freebsd.org" >>> >>> >> _______________________________________________ >> freebsd-ipfw@freebsd.org mailing list >> https://lists.freebsd.org/mailman/listinfo/freebsd-ipfw >> To unsubscribe, send any mail to "freebsd-ipfw-unsubscribe@freebsd.org" > _______________________________________________ > freebsd-ipfw@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-ipfw > To unsubscribe, send any mail to "freebsd-ipfw-unsubscribe@freebsd.org" > >