From owner-freebsd-net@FreeBSD.ORG Wed Jul 16 23:58:20 2014 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 4D9C7D6A for ; Wed, 16 Jul 2014 23:58:20 +0000 (UTC) Received: from smtp.rlwinm.de (smtp.rlwinm.de [IPv6:2a01:4f8:201:31ef::e]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 1283322AF for ; Wed, 16 Jul 2014 23:58:20 +0000 (UTC) Received: from hexe.rlwinm.de (p50834048.dip0.t-ipconnect.de [80.131.64.72]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (No client certificate requested) by smtp.rlwinm.de (Postfix) with ESMTPSA id C2E00C0F3 for ; Thu, 17 Jul 2014 01:58:15 +0200 (CEST) Message-ID: <53C71196.4030501@rlwinm.de> Date: Thu, 17 Jul 2014 01:58:14 +0200 From: Jan Bramkamp User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:24.0) Gecko/20100101 Thunderbird/24.6.0 MIME-Version: 1.0 To: freebsd-net@freebsd.org Subject: Re: netmap, selective processing. References: In-Reply-To: X-Enigmail-Version: 1.6 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 16 Jul 2014 23:58:20 -0000 On 16.07.2014 19:48, Daniel Corbe wrote:> > I hope this it the right place to ask questions about netmap. I'm > toying with the idea of writing a netmap-based OSPF implementation > because bird's OSPF implementation isn't as good as its BGP > implementation, quagga doesn't scale well and openospfd doesn't compile > on 10-RELEASE or CURRENT. How many prefixes do you have in your OSPF area 0? If you run into scalability problems with OSPF on current x86 CPUs your network design probably is the cause of the problem e.g. redistributing announcements from BGP into OSPF. OSPF is just one more (rather ugly) IP protocol. Is moving the OSPF packets between kernel and userspace really a problem worth optimizing for? Putting netmap between the NIC and the kernel IP stack introduces overhead to all non OSPF packets unless your netmap application also implements IP routing and bypasses the kernel. From owner-freebsd-net@FreeBSD.ORG Thu Jul 17 00:19:26 2014 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id D54A114C for ; Thu, 17 Jul 2014 00:19:26 +0000 (UTC) Received: from mail-pa0-x233.google.com (mail-pa0-x233.google.com [IPv6:2607:f8b0:400e:c03::233]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id A50152463 for ; Thu, 17 Jul 2014 00:19:26 +0000 (UTC) Received: by mail-pa0-f51.google.com with SMTP id ey11so2228542pad.38 for ; Wed, 16 Jul 2014 17:19:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=sender:message-id:date:from:user-agent:mime-version:to:cc:subject :references:in-reply-to:content-type:content-transfer-encoding; bh=jQQlCAqmwh91Y5tmLJVQzgJrsm4wAFQHzbM2GX7p91M=; b=PAQxcnKv8Okv4WqT/nVOAmEDBbdYSnqa4gsBxaWdtHVRmvTk3gEzh6DzPKfajcUhXv Nja0TawI8iI1Dufm+khpDwvQ1PaBp1OqibiXQfR/NO96Get+7ZPPmVYIFA6cynb6P0zA uehmwd/215Se/0veJ74cGeR/wHwrGLzExTcZA1Q/F8TGN2fftrJExo00wdL7N6Kxao4I ib5SVv4p8XrRqibroL9n0K19bhgkN7y76wG7xlLqfc+cu3qF33iUKaIFnNJGTA0uWs6+ c5bQzTa0hPhCe0fhBKMwRJ6DaeBf9yGT9DoLfXP7OjGSMsBzyLuDY9xQ+HE0FclgADJO StlA== X-Received: by 10.69.17.230 with SMTP id gh6mr33475962pbd.0.1405556366173; Wed, 16 Jul 2014 17:19:26 -0700 (PDT) Received: from [10.192.166.0] (stargate.chelsio.com. [67.207.112.58]) by mx.google.com with ESMTPSA id nd10sm561921pbc.51.2014.07.16.17.19.25 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Wed, 16 Jul 2014 17:19:25 -0700 (PDT) Sender: Navdeep Parhar Message-ID: <53C7168C.3050702@FreeBSD.org> Date: Wed, 16 Jul 2014 17:19:24 -0700 From: Navdeep Parhar User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:24.0) Gecko/20100101 Thunderbird/24.6.0 MIME-Version: 1.0 To: Luigi Rizzo Subject: Re: change netmap global lock to sx? References: <5385249D.9050501@FreeBSD.org> In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Cc: "freebsd-net@freebsd.org" X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 17 Jul 2014 00:19:27 -0000 On 05/27/14 17:32, Luigi Rizzo wrote: >=20 >=20 >=20 > On Wed, May 28, 2014 at 1:49 AM, Navdeep Parhar wrote: >=20 > I'd like to change the netmap global lock from a mutex into a sleep= able > shared/exclusive lock. This will allow a driver's nm_register hook= > (which is called with the global lock held) to sleep if it has to. = I've > casually used pkt-gen after this conversion (patch attached) and th= e > witness hasn't complained about it. >=20 >=20 > =E2=80=8Bno objections, let me give this a try on stable/10 > stable/9 to make sure we can use the same code there as well Any updates? I'm considering what to have in cxgbe(4) in time for 10.1 and this needs to be sorted out before cxgbe's netmap support gets MFC'd to any stable branch. Regards, Navdeep >=20 > cheers > luigi > =E2=80=8B >=20 > Thoughts? >=20 > Regards, > Navdeep >=20 >=20 > diff -r 0300d80260f4 sys/dev/netmap/netmap_kern.h > --- a/sys/dev/netmap/netmap_kern.h Fri May 23 19:00:56 2014 -0= 700 > +++ b/sys/dev/netmap/netmap_kern.h Sat May 24 12:49:15 2014 -0= 700 > @@ -43,13 +43,13 @@ > #define unlikely(x) __builtin_expect((long)!!(x), 0L) >=20 > #define NM_LOCK_T struct mtx > -#define NMG_LOCK_T struct mtx > -#define NMG_LOCK_INIT() mtx_init(&netmap_global_lock, \ > - "netmap global lock", NULL, MTX_DEF= ) > -#define NMG_LOCK_DESTROY() mtx_destroy(&netmap_global_lock) > -#define NMG_LOCK() mtx_lock(&netmap_global_lock) > -#define NMG_UNLOCK() mtx_unlock(&netmap_global_lock) > -#define NMG_LOCK_ASSERT() mtx_assert(&netmap_global_lock, > MA_OWNED) > +#define NMG_LOCK_T struct sx > +#define NMG_LOCK_INIT() sx_init(&netmap_global_lock, \ > + "netmap global lock") > +#define NMG_LOCK_DESTROY() sx_destroy(&netmap_global_lock) > +#define NMG_LOCK() sx_xlock(&netmap_global_lock) > +#define NMG_UNLOCK() sx_xunlock(&netmap_global_lock) > +#define NMG_LOCK_ASSERT() sx_assert(&netmap_global_lock, > SA_XLOCKED) >=20 > #define NM_SELINFO_T struct selinfo > #define MBUF_LEN(m) ((m)->m_pkthdr.len) >=20 >=20 From owner-freebsd-net@FreeBSD.ORG Thu Jul 17 02:04:18 2014 Return-Path: Delivered-To: freebsd-net@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 47628A4D for ; Thu, 17 Jul 2014 02:04:18 +0000 (UTC) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2001:1900:2254:206a::16:76]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 2DE2B2C2D for ; Thu, 17 Jul 2014 02:04:18 +0000 (UTC) Received: from bugs.freebsd.org ([127.0.1.118]) by kenobi.freebsd.org (8.14.8/8.14.8) with ESMTP id s6H24Ige083429 for ; Thu, 17 Jul 2014 02:04:18 GMT (envelope-from bugzilla-noreply@freebsd.org) From: bugzilla-noreply@freebsd.org To: freebsd-net@FreeBSD.org Subject: [Bug 187835] ngctl(8) strange behavior when adding more than 530 vlan through nethraph Date: Thu, 17 Jul 2014 02:04:18 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: bin X-Bugzilla-Version: 10.0-STABLE X-Bugzilla-Keywords: X-Bugzilla-Severity: Affects Only Me X-Bugzilla-Who: admin@support.od.ua X-Bugzilla-Status: In Discussion X-Bugzilla-Priority: Normal X-Bugzilla-Assigned-To: freebsd-net@FreeBSD.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: version Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 17 Jul 2014 02:04:18 -0000 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=187835 Vladislav V. Prodan changed: What |Removed |Added ---------------------------------------------------------------------------- Version|unspecified |10.0-STABLE -- You are receiving this mail because: You are the assignee for the bug. From owner-freebsd-net@FreeBSD.ORG Thu Jul 17 02:41:42 2014 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 3D774F5 for ; Thu, 17 Jul 2014 02:41:42 +0000 (UTC) Received: from gpo3.cc.swin.edu.au (gpo3.cc.swin.edu.au [136.186.1.32]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id C831E2EEF for ; Thu, 17 Jul 2014 02:41:41 +0000 (UTC) Received: from [136.186.229.154] (nwilliams-laptop.caia.swin.edu.au [136.186.229.154]) by gpo3.cc.swin.edu.au (8.14.3/8.14.3) with ESMTP id s6H2fdx3018271 for ; Thu, 17 Jul 2014 12:41:39 +1000 Message-ID: <53C737DB.4030804@swin.edu.au> Date: Thu, 17 Jul 2014 12:41:31 +1000 From: Nigel Williams User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.6.0 MIME-Version: 1.0 To: freebsd-net@freebsd.org Subject: Re: Multipath TCP for FreeBSD v0.4 References: <513CB9AF.3090409@swin.edu.au> <53BF8945.3000802@swin.edu.au> <20140711102535.7613DBE5@hub.freebsd.org> <53C341FC.4060307@swin.edu.au> <20140714063019.876218DD@hub.freebsd.org> In-Reply-To: <20140714063019.876218DD@hub.freebsd.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 17 Jul 2014 02:41:42 -0000 Just a quick note for anyone else that might be trying out the patch... > > and I've built the whole system on both nodes without WITNESS and other debug- > ging functionalities: > =============================================================================== > Index: /usr/src/sys/amd64/conf/GENERIC > =================================================================== > --- /usr/src/sys/amd64/conf/GENERIC (revision 265307) > +++ /usr/src/sys/amd64/conf/GENERIC (working copy) > @@ -76,14 +76,14 @@ > options KDB # Enable kernel debugger support. > options KDB_TRACE # Print a stack trace for a panic. > # For full debugger support use (turn off in stable branch): > -options DDB # Support DDB. > -options GDB # Support remote GDB. > -options DEADLKRES # Enable the deadlock resolver > -options INVARIANTS # Enable calls of extra sanity checking > -options INVARIANT_SUPPORT # Extra sanity checks of internal structures, required by INVARIANTS > -options WITNESS # Enable checks to detect deadlocks and cycles > -options WITNESS_SKIPSPIN # Don't run witness on spinlocks for speed > -options MALLOC_DEBUG_MAXZONES=8 # Separate malloc(9) zones > +#options DDB # Support DDB. > +#options GDB # Support remote GDB. > +#options DEADLKRES # Enable the deadlock resolver > +#options INVARIANTS # Enable calls of extra sanity checking > +#options INVARIANT_SUPPORT # Extra sanity checks of internal structures, required by INVARIANTS > +#options WITNESS # Enable checks to detect deadlocks and cycles > +#options WITNESS_SKIPSPIN # Don't run witness on spinlocks for speed > +#options MALLOC_DEBUG_MAXZONES=8 # Separate malloc(9) zones > > # Make an SMP-capable kernel by default > options SMP # Symmetric MultiProcessor Kernel > =============================================================================== I'd recommend leaving debugging options on (at minimum INVARIANTS and INVARIANT_SUPPORT). This will slow network performance but will allow a number of assertions to run that can make it a little easier to debug some issues. cheers, nigel From owner-freebsd-net@FreeBSD.ORG Thu Jul 17 07:49:44 2014 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 3DF1C249 for ; Thu, 17 Jul 2014 07:49:44 +0000 (UTC) Received: from atl4mhfb03.myregisteredsite.com (atl4mhfb03.myregisteredsite.com [209.17.115.61]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 13EB628E7 for ; Thu, 17 Jul 2014 07:49:43 +0000 (UTC) Received: from atl4mhob12.myregisteredsite.com (atl4mhob12.myregisteredsite.com [209.17.115.50]) by atl4mhfb03.myregisteredsite.com (8.14.4/8.14.4) with ESMTP id s6H7nEun019630 for ; Thu, 17 Jul 2014 03:49:18 -0400 Received: from mailpod.hostingplatform.com ([10.30.71.211]) by atl4mhob12.myregisteredsite.com (8.14.4/8.14.4) with ESMTP id s6H7n6qJ017267 for ; Thu, 17 Jul 2014 03:49:06 -0400 Received: (qmail 15358 invoked by uid 0); 17 Jul 2014 07:49:06 -0000 X-TCPREMOTEIP: 118.186.129.16 X-Authenticated-UID: peterxu@cyphy.net Received: from unknown (HELO Peters-MacAir.local) (peterxu@cyphy.net@118.186.129.16) by 0 with ESMTPA; 17 Jul 2014 07:49:05 -0000 Message-ID: <53C77FEE.9000707@cyphy.net> Date: Thu, 17 Jul 2014 15:49:02 +0800 From: Xu Zhe User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:24.0) Gecko/20100101 Thunderbird/24.6.0 MIME-Version: 1.0 To: freebsd-net@freebsd.org Subject: Network unstability issue with ixgbe driver (ix0 local_faults non zero) Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Cc: Javen Wu , Jason Zhang X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 17 Jul 2014 07:49:44 -0000 Hi, Freebsd developers, We are encountering network problem on Freebsd (version 8.2), with Intel X540T 10g card and ixgbe 2.5.15 (also tried a older version 2.5.8) driver. First, we found the problem when SSH always fails due to timed out. Then we found that it is possibly a generic network issue rather than SSH problem. We found non-zero local_faults and remote_faults in sysctl: # sysctl -a | grep ix.0 dev.ix.0.%desc: Intel(R) PRO/10GbE PCI-Express Network Driver, Version - 2.5.8 dev.ix.0.%driver: ix dev.ix.0.%location: slot=0 function=0 handle=\_SB_.PCI0.NPE3.TGBE dev.ix.0.%pnpinfo: vendor=0x8086 device=0x1528 subvendor=0x152d subdevice=0x899f class=0x020000 dev.ix.0.%parent: pci3 dev.ix.0.fc: 3 dev.ix.0.enable_aim: 1 dev.ix.0.advertise_speed: 0 dev.ix.0.ts: 0 dev.ix.0.dropped: 0 dev.ix.0.mbuf_defrag_failed: 0 dev.ix.0.watchdog_events: 0 dev.ix.0.link_irq: 4 dev.ix.0.queue0.interrupt_rate: 55555 dev.ix.0.queue0.irqs: 1491075 dev.ix.0.queue0.txd_head: 604 dev.ix.0.queue0.txd_tail: 604 dev.ix.0.queue0.tso_tx: 154 dev.ix.0.queue0.no_tx_dma_setup: 0 dev.ix.0.queue0.no_desc_avail: 0 dev.ix.0.queue0.tx_packets: 948089 dev.ix.0.queue0.rxd_head: 620 dev.ix.0.queue0.rxd_tail: 619 dev.ix.0.queue0.rx_packets: 7799404 dev.ix.0.queue0.rx_bytes: 11075537104 dev.ix.0.queue0.rx_copies: 111468 dev.ix.0.queue0.lro_queued: 7788218 dev.ix.0.queue0.lro_flushed: 968958 dev.ix.0.queue1.interrupt_rate: 100000 dev.ix.0.queue1.irqs: 90817 dev.ix.0.queue1.txd_head: 1800 dev.ix.0.queue1.txd_tail: 1800 dev.ix.0.queue1.tso_tx: 2 dev.ix.0.queue1.no_tx_dma_setup: 0 dev.ix.0.queue1.no_desc_avail: 0 dev.ix.0.queue1.tx_packets: 32468 dev.ix.0.queue1.rxd_head: 1802 dev.ix.0.queue1.rxd_tail: 1801 dev.ix.0.queue1.rx_packets: 40714 dev.ix.0.queue1.rx_bytes: 4527395 dev.ix.0.queue1.rx_copies: 38784 dev.ix.0.queue1.lro_queued: 38668 dev.ix.0.queue1.lro_flushed: 38486 dev.ix.0.queue2.interrupt_rate: 71428 dev.ix.0.queue2.irqs: 28625 dev.ix.0.queue2.txd_head: 349 dev.ix.0.queue2.txd_tail: 349 dev.ix.0.queue2.tso_tx: 1 dev.ix.0.queue2.no_tx_dma_setup: 0 dev.ix.0.queue2.no_desc_avail: 0 dev.ix.0.queue2.tx_packets: 6981 dev.ix.0.queue2.rxd_head: 1952 dev.ix.0.queue2.rxd_tail: 1951 dev.ix.0.queue2.rx_packets: 6048 dev.ix.0.queue2.rx_bytes: 947930 dev.ix.0.queue2.rx_copies: 5241 dev.ix.0.queue2.lro_queued: 4846 dev.ix.0.queue2.lro_flushed: 4760 dev.ix.0.queue3.interrupt_rate: 500000 dev.ix.0.queue3.irqs: 54879 dev.ix.0.queue3.txd_head: 504 dev.ix.0.queue3.txd_tail: 504 dev.ix.0.queue3.tso_tx: 10 dev.ix.0.queue3.no_tx_dma_setup: 0 dev.ix.0.queue3.no_desc_avail: 0 dev.ix.0.queue3.tx_packets: 18406 dev.ix.0.queue3.rxd_head: 449 dev.ix.0.queue3.rxd_tail: 448 dev.ix.0.queue3.rx_packets: 20929 dev.ix.0.queue3.rx_bytes: 2572540 dev.ix.0.queue3.rx_copies: 20297 dev.ix.0.queue3.lro_queued: 19218 dev.ix.0.queue3.lro_flushed: 19102 dev.ix.0.queue4.interrupt_rate: 500000 dev.ix.0.queue4.irqs: 22609 dev.ix.0.queue4.txd_head: 1370 dev.ix.0.queue4.txd_tail: 1370 dev.ix.0.queue4.tso_tx: 1 dev.ix.0.queue4.no_tx_dma_setup: 0 dev.ix.0.queue4.no_desc_avail: 0 dev.ix.0.queue4.tx_packets: 3518 dev.ix.0.queue4.rxd_head: 1622 dev.ix.0.queue4.rxd_tail: 1621 dev.ix.0.queue4.rx_packets: 3670 dev.ix.0.queue4.rx_bytes: 474745 dev.ix.0.queue4.rx_copies: 3014 dev.ix.0.queue4.lro_queued: 2174 dev.ix.0.queue4.lro_flushed: 2171 dev.ix.0.queue5.interrupt_rate: 100000 dev.ix.0.queue5.irqs: 366375 dev.ix.0.queue5.txd_head: 833 dev.ix.0.queue5.txd_tail: 833 dev.ix.0.queue5.tso_tx: 326797 dev.ix.0.queue5.no_tx_dma_setup: 0 dev.ix.0.queue5.no_desc_avail: 0 dev.ix.0.queue5.tx_packets: 531092 dev.ix.0.queue5.rxd_head: 57 dev.ix.0.queue5.rxd_tail: 56 dev.ix.0.queue5.rx_packets: 796729 dev.ix.0.queue5.rx_bytes: 108295068 dev.ix.0.queue5.rx_copies: 582757 dev.ix.0.queue5.lro_queued: 795369 dev.ix.0.queue5.lro_flushed: 258290 dev.ix.0.queue6.interrupt_rate: 100000 dev.ix.0.queue6.irqs: 26775 dev.ix.0.queue6.txd_head: 1146 dev.ix.0.queue6.txd_tail: 1146 dev.ix.0.queue6.tso_tx: 13 dev.ix.0.queue6.no_tx_dma_setup: 0 dev.ix.0.queue6.no_desc_avail: 0 dev.ix.0.queue6.tx_packets: 5469 dev.ix.0.queue6.rxd_head: 1077 dev.ix.0.queue6.rxd_tail: 1076 dev.ix.0.queue6.rx_packets: 9269 dev.ix.0.queue6.rx_bytes: 6631479 dev.ix.0.queue6.rx_copies: 4878 dev.ix.0.queue6.lro_queued: 8054 dev.ix.0.queue6.lro_flushed: 4260 dev.ix.0.queue7.interrupt_rate: 55555 dev.ix.0.queue7.irqs: 243399 dev.ix.0.queue7.txd_head: 66 dev.ix.0.queue7.txd_tail: 66 dev.ix.0.queue7.tso_tx: 5 dev.ix.0.queue7.no_tx_dma_setup: 0 dev.ix.0.queue7.no_desc_avail: 0 dev.ix.0.queue7.tx_packets: 121101 dev.ix.0.queue7.rxd_head: 130 dev.ix.0.queue7.rxd_tail: 129 dev.ix.0.queue7.rx_packets: 127106 dev.ix.0.queue7.rx_bytes: 15197119 dev.ix.0.queue7.rx_copies: 118192 dev.ix.0.queue7.lro_queued: 125622 dev.ix.0.queue7.lro_flushed: 125138 dev.ix.0.mac_stats.crc_errs: 0 dev.ix.0.mac_stats.ill_errs: 0 dev.ix.0.mac_stats.byte_errs: 0 dev.ix.0.mac_stats.short_discards: 0 dev.ix.0.mac_stats.local_faults: 7 <=============== HERE dev.ix.0.mac_stats.remote_faults: 1 dev.ix.0.mac_stats.rec_len_errs: 0 dev.ix.0.mac_stats.xon_txd: 0 dev.ix.0.mac_stats.xon_recvd: 0 dev.ix.0.mac_stats.xoff_txd: 0 dev.ix.0.mac_stats.xoff_recvd: 0 dev.ix.0.mac_stats.total_octets_rcvd: 11249450018 dev.ix.0.mac_stats.good_octets_rcvd: 11249396646 dev.ix.0.mac_stats.total_pkts_rcvd: 8804445 dev.ix.0.mac_stats.good_pkts_rcvd: 8803850 dev.ix.0.mac_stats.mcast_pkts_rcvd: 9311 dev.ix.0.mac_stats.bcast_pkts_rcvd: 1908 dev.ix.0.mac_stats.rx_frames_64: 18132 dev.ix.0.mac_stats.rx_frames_65_127: 759186 dev.ix.0.mac_stats.rx_frames_128_255: 116641 dev.ix.0.mac_stats.rx_frames_256_511: 686728 dev.ix.0.mac_stats.rx_frames_512_1023: 67041 dev.ix.0.mac_stats.rx_frames_1024_1522: 7156122 dev.ix.0.mac_stats.recv_undersized: 0 dev.ix.0.mac_stats.recv_fragmented: 0 dev.ix.0.mac_stats.recv_oversized: 0 dev.ix.0.mac_stats.recv_jabberd: 0 dev.ix.0.mac_stats.management_pkts_rcvd: 11219 dev.ix.0.mac_stats.management_pkts_drpd: 0 dev.ix.0.mac_stats.checksum_errs: 0 dev.ix.0.mac_stats.good_octets_txd: 20162287794 dev.ix.0.mac_stats.total_pkts_txd: 14419225 dev.ix.0.mac_stats.good_pkts_txd: 14419225 dev.ix.0.mac_stats.bcast_pkts_txd: 621 dev.ix.0.mac_stats.mcast_pkts_txd: 0 dev.ix.0.mac_stats.management_pkts_txd: 0 dev.ix.0.mac_stats.tx_frames_64: 12833 dev.ix.0.mac_stats.tx_frames_65_127: 549847 dev.ix.0.mac_stats.tx_frames_128_255: 80184 dev.ix.0.mac_stats.tx_frames_256_511: 631975 dev.ix.0.mac_stats.tx_frames_512_1023: 116264 dev.ix.0.mac_stats.tx_frames_1024_1522: 13028122 Does any one know what does local_faults/remot_faults mean here? Does this means there is a hardware error? (We tried to find the adaptor manual, but there is no detail on IXGBE_MLFC [0x04034] register) Any suggestion on how to diagnose this problem is welcomed too. Thanks in advance! Peter From owner-freebsd-net@FreeBSD.ORG Thu Jul 17 11:46:16 2014 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id A4954CB0; Thu, 17 Jul 2014 11:46:16 +0000 (UTC) Received: from vps1.elischer.org (vps1.elischer.org [204.109.63.16]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "vps1.elischer.org", Issuer "CA Cert Signing Authority" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 8117C2D57; Thu, 17 Jul 2014 11:46:16 +0000 (UTC) Received: from Julian-MBP3.local (ppp121-45-250-191.lns20.per2.internode.on.net [121.45.250.191]) (authenticated bits=0) by vps1.elischer.org (8.14.9/8.14.9) with ESMTP id s6HBk2Wu072106 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES128-SHA bits=128 verify=NO); Thu, 17 Jul 2014 04:46:05 -0700 (PDT) (envelope-from julian@freebsd.org) Message-ID: <53C7B774.60304@freebsd.org> Date: Thu, 17 Jul 2014 19:45:56 +0800 From: Julian Elischer User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:24.0) Gecko/20100101 Thunderbird/24.6.0 MIME-Version: 1.0 To: John Baldwin , Rick Macklem Subject: Re: NFS client READ performance on -current References: <2136988575.13956627.1405199640153.JavaMail.root@uoguelph.ca> <201407151034.54681.jhb@freebsd.org> In-Reply-To: <201407151034.54681.jhb@freebsd.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: pyunyh@gmail.com, "Russell L. Carter" , freebsd-net@freebsd.org X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 17 Jul 2014 11:46:16 -0000 On 7/15/14, 10:34 PM, John Baldwin wrote: > On Saturday, July 12, 2014 5:14:00 pm Rick Macklem wrote: >> Yonghyeon Pyun wrote: >>> On Fri, Jul 11, 2014 at 09:54:23AM -0400, John Baldwin wrote: >>>> On Thursday, July 10, 2014 6:31:43 pm Rick Macklem wrote: >>>>> John Baldwin wrote: >>>>>> On Thursday, July 03, 2014 8:51:01 pm Rick Macklem wrote: >>>>>>> Russell L. Carter wrote: >>>>>>>> >>>>>>>> On 07/02/14 19:09, Rick Macklem wrote: >>>>>>>> >>>>>>>>> Could you please post the dmesg stuff for the network >>>>>>>>> interface, >>>>>>>>> so I can tell what driver is being used? I'll take a look >>>>>>>>> at >>>>>>>>> it, >>>>>>>>> in case it needs to be changed to use m_defrag(). >>>>>>>> em0: port >>>>>>>> 0xd020-0xd03f >>>>>>>> mem >>>>>>>> 0xfe4a0000-0xfe4bffff,0xfe480000-0xfe49ffff irq 44 at >>>>>>>> device 0.0 >>>>>>>> on >>>>>>>> pci2 >>>>>>>> em0: Using an MSI interrupt >>>>>>>> em0: Ethernet address: 00:15:17:bc:29:ba >>>>>>>> 001.000007 [2323] netmap_attach success for em0 >>>>>>>> tx >>>>>>>> 1/1024 >>>>>>>> rx >>>>>>>> 1/1024 queues/slots >>>>>>>> >>>>>>>> This is one of those dual nic cards, so there is em1 as >>>>>>>> well... >>>>>>>> >>>>>>> Well, I took a quick look at the driver and it does use >>>>>>> m_defrag(), >>>>>>> but >>>>>>> I think that the "retry:" label it does a goto after doing so >>>>>>> might >>>>>>> be in >>>>>>> the wrong place. >>>>>>> >>>>>>> The attached untested patch might fix this. >>>>>>> >>>>>>> Is it convenient to build a kernel with this patch applied >>>>>>> and then >>>>>>> try >>>>>>> it with TSO enabled? >>>>>>> >>>>>>> rick >>>>>>> ps: It does have the transmit segment limit set to 32. I have >>>>>>> no >>>>>>> idea if >>>>>>> this is a hardware limitation. >>>>>> I think the retry is not in the wrong place, but the overhead >>>>>> of all >>>>>> those >>>>>> pullups is apparently quite severe. >>>>> The m_defrag() call after the first failure will just barely >>>>> squeeze >>>>> the just under 64K TSO segment into 32 mbuf clusters. Then I >>>>> think any >>>>> m_pullup() done during the retry will allocate an mbuf >>>>> (at a glance it seems to always do this when the old mbuf is a >>>>> cluster) >>>>> and prepend that to the list. >>>>> --> Now the list is > 32 mbufs again and the >>>>> bus_dmammap_load_mbuf_sg() >>>>> will fail again on the retry, this time fatally, I think? >>>>> >>>>> I can't see any reason to re-do all the stuff using m_pullup() >>>>> and Russell >>>>> reported that moving the "retry:" fixed his problem, from what I >>>>> understood. >>>> Ah, I had assumed (incorrectly) that the m_pullup()s would all be >>>> nops in this >>>> case. It seems the NIC would really like to have all those things >>>> in a single >>>> segment, but it is not required, so I agree that your patch is >>>> fine. >>>> >>> I recall em(4) controllers have various limitation in TSO. Driver >>> has to update IP header to make TSO work so driver has to get a >>> writable mbufs. bpf(4) consumers will see IP packet length is 0 >>> after this change. I think tcpdump has a compile time option to >>> guess correct IP packet length. The firmware of controller also >>> should be able to access complete IP/TCP header in a single buffer. >>> I don't remember more details in TSO limitation but I guess you may >>> be able to get more details TSO limitation from publicly available >>> Intel data sheet. >> I think that the patch should handle this ok. All of the m_pullup() >> stuff gets done the first time. Then, if the result is more than 32 >> mbufs in the list, m_defrag() is called to copy the chain. This should >> result in all the header stuff in the first mbuf cluster and the map >> call is done again with this list of clusters. (Without the patch, >> m_pullup() would allocate another prepended mbuf and make the chain >> more than 32mbufs again.) > Hmm, I am surprised by the m_pullup() behavior that it doesn't just > notice that the first mbuf with a cluster has the desired data already > and returns without doing anything. That is, I'm surprised the first > statement in m_pullup() isn't just: > > if (n->m_len >= len) > return (n); I seem to remember that the standard behaviour is for the caller to do exactly that. > From owner-freebsd-net@FreeBSD.ORG Thu Jul 17 18:39:34 2014 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id CDE5D307 for ; Thu, 17 Jul 2014 18:39:34 +0000 (UTC) Received: from a0i308.smtpcorp.com (a0i308.smtpcorp.com [216.22.15.140]) (using TLSv1.2 with cipher DHE-RSA-AES128-SHA (128/128 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 8ED1823CC for ; Thu, 17 Jul 2014 18:39:34 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=smtpcorp.com; s=a0_1; h=Content-Type:MIME-Version:Message-ID:In-Reply-To:Date:References:Subject:Cc:To:From; bh=jB3Y6nsqvhpkqVtuMs6Gn9SmJ8PAQNloHz9RmV7tRt8=; b=XPz/2aOa54FtmqRfQrggpv5SQ0Hid4MKaZz4cpUmF023TBQ/qmGvppqEdN38J9vd+5joQb0uPtGaOAHsty+bGaXwFFxBVnuuTIRxKz7EE5yRsZVc0leu4GbhtDNoeJq2gbXBv2EuFn81pP/+aYc1lA68isU0dE17KVdes77EdXA=; From: Daniel Corbe To: Jan Bramkamp Subject: Re: netmap, selective processing. References: <53C71196.4030501@rlwinm.de> Date: Thu, 17 Jul 2014 14:39:13 -0400 In-Reply-To: <53C71196.4030501@rlwinm.de> (Jan Bramkamp's message of "Thu, 17 Jul 2014 01:58:14 +0200") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.3 (berkeley-unix) MIME-Version: 1.0 Content-Type: text/plain X-Smtpcorp-Track: 1b7qal4gfuSHlT.yUhfk05hs Cc: freebsd-net@freebsd.org X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 17 Jul 2014 18:39:34 -0000 Jan Bramkamp writes: > On 16.07.2014 19:48, Daniel Corbe wrote:> >> I hope this it the right place to ask questions about netmap. I'm >> toying with the idea of writing a netmap-based OSPF implementation >> because bird's OSPF implementation isn't as good as its BGP >> implementation, quagga doesn't scale well and openospfd doesn't compile >> on 10-RELEASE or CURRENT. > > How many prefixes do you have in your OSPF area 0? If you run into > scalability problems with OSPF on current x86 CPUs your network design > probably is the cause of the problem e.g. redistributing announcements > from BGP into OSPF. I have about 15k interior routes. And most of it is RFC1918 address space or random /64s doing various things. So when I say I'm worried about scale issues, I should more accurately be saying "I just don't want to use quagga but I can't get anything else to work." > > OSPF is just one more (rather ugly) IP protocol. Is moving the OSPF > packets between kernel and userspace really a problem worth optimizing > for? Putting netmap between the NIC and the kernel IP stack introduces > overhead to all non OSPF packets unless your netmap application also > implements IP routing and bypasses the kernel. I've been searching for a reason to play with netmap. It looks like a neat toy. And at worst I will have implemented something only useful to one person but I'll also have learned something in the process. >From the perspective of totally wrecking the performance of the host network stack: how much more overhead am I really introducing by looking at every packet inside of the netmap framework and going "am I really interested in this? Or should I simply pass it through to the host." And I'm hoping this leads me down the avenue of doing interesting things with MPLS. MPLS is something that absolutely needs to look at everything because labels should always be processed and forwarded first. -Daniel From owner-freebsd-net@FreeBSD.ORG Thu Jul 17 19:54:12 2014 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 922775AF for ; Thu, 17 Jul 2014 19:54:12 +0000 (UTC) Received: from smtp1.bushwire.net (f5.bushwire.net [199.48.133.46]) by mx1.freebsd.org (Postfix) with SMTP id 417572AC2 for ; Thu, 17 Jul 2014 19:54:10 +0000 (UTC) Received: (qmail 37369 invoked by uid 1001); 17 Jul 2014 19:47:28 -0000 Delivered-To: qmda-intercept-freebsd-net@freebsd.org DomainKey-Signature: a=rsa-sha1; q=dns; c=simple; s=s384; d=romeo.emu.st; b=PAyCzCGW1cOCD7uQ54tbh3ub8h+RyrWFS84RHERIo/Um4ajH9M+HqHEz8PI0ovoX; Comments: DomainKeys? See http://en.wikipedia.org/wiki/DomainKeys DomainKey-Trace-MD: h=14; b=29; l=C18R71D32M65F38T27S42R39?29?28M17C39C27I40; Comments: QMDA 0.3 Received: (qmail 37362 invoked by uid 1001); 17 Jul 2014 19:47:28 -0000 Date: 17 Jul 2014 19:47:28 +0000 Message-ID: <20140717194728.37361.qmail@f5-external.bushwire.net> From: "Mark Delany" To: freebsd-net@freebsd.org Subject: Re: netmap, selective processing. References: <53C71196.4030501@rlwinm.de> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 17 Jul 2014 19:54:12 -0000 On 17Jul14, Daniel Corbe allegedly wrote: > From the perspective of totally wrecking the performance of the host > network stack: how much more overhead am I really introducing by looking > at every packet inside of the netmap framework and going "am I really > interested in this? Or should I simply pass it through to the host." If you haven't look at netmap in detail yet, then the main thing to remember is that once netmap is active on an interface, *all* packets on that interface enter (and potentially leave) your netmap handler via an excursion into user space. If the majority of packets are untouched and merely pushed back thru the stack, then for each batch of packets you've introduced an additional user-space context switch, at least one system call and the cost of your own packet selection code. I'm not sure that constitutes "totally wrecking" but something to keep in mind if you plan to run on a busy system. Another thing to keep in mind is, if you netmap app has bugs you could break all the regular applications sitting on top of sockets. You are probably right that OSPF in netmap may not be directly useful to anyone else, but I think more people using netmap to implement interesting applications is of value to netmap, frankly. Mark. From owner-freebsd-net@FreeBSD.ORG Fri Jul 18 07:49:13 2014 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 582B2812; Fri, 18 Jul 2014 07:49:13 +0000 (UTC) Received: from mail-wi0-x22e.google.com (mail-wi0-x22e.google.com [IPv6:2a00:1450:400c:c05::22e]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id B34D3251D; Fri, 18 Jul 2014 07:49:12 +0000 (UTC) Received: by mail-wi0-f174.google.com with SMTP id d1so372478wiv.13 for ; Fri, 18 Jul 2014 00:49:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:reply-to:in-reply-to:references:date:message-id :subject:from:to:cc:content-type; bh=NCTzDB/O8dFVlltq3iVJ6EZa92sSB1CrpvKhNdHMIzY=; b=PTfTuiTc+kJmL+PXFeahnErRVAJZsgtJC46xRoN7tFyGjSJSEfNRE2xx4nqaOCK8ge LmykhJS/GVVvmeJ9UQOJBxDvYxAnZX14PEAu4kxQuKLCyRGfZANoOIRWKBcHe6By81Jq T87mmO0cxCuaUNaeI3lWsJJ6nL5t6QB6EAccYGWQqFYFwbeigLjA/8+WTrsYZpVDlnGO 7BxeFIJkTDxF26TBvBsI0wauc45UBe6dilbetDbP9YyXRh7pW/tAaQ8WX6q6+GTt2PoT HRf+B5MhtAM8/ydjArC+g4TCuV3f+t+ucYOkM+7qwNDo9Kia8GGywY7IR5ycLMJpjzd9 Vx7g== MIME-Version: 1.0 X-Received: by 10.180.91.6 with SMTP id ca6mr4463775wib.77.1405669750597; Fri, 18 Jul 2014 00:49:10 -0700 (PDT) Received: by 10.216.190.194 with HTTP; Fri, 18 Jul 2014 00:49:10 -0700 (PDT) Reply-To: araujo@FreeBSD.org In-Reply-To: References: Date: Fri, 18 Jul 2014 15:49:10 +0800 Message-ID: Subject: Re: [patch][lagg] - Set a better granularity and distribution on roundrobin protocol. From: Marcelo Araujo To: Adrian Chadd Content-Type: multipart/mixed; boundary=f46d043c7e0c28456204fe72fed0 X-Content-Filtered-By: Mailman/MimeDel 2.1.18 Cc: FreeBSD Net X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 18 Jul 2014 07:49:13 -0000 --f46d043c7e0c28456204fe72fed0 Content-Type: text/plain; charset=UTF-8 Hello guys, I made few changes on the lagg(4) patch. Also, I made tests using igb(4), ixgbe(4) and em(4); seems everything worked pretty well. I'm wondering if anyone else could make a review, and what I need to do, to see this patch committed. Best Regards, 2014-06-24 10:40 GMT+08:00 Marcelo Araujo : > > > 2014-06-24 6:54 GMT+08:00 Adrian Chadd : > > Hi, >> >> No, don't introduce out of order behaviour. Ever. > > > Yes, it has out of order behavior; with my patch much less. I upload two > pcap files and you can see by yourself, if you don't believe in what I'm > talking about. > > Test done using: "iperf -s" and "iperf -c -i 1 -t 10". > > 1) Don't change the number of packets(default round robin behavior). > http://people.freebsd.org/~araujo/lagg/lagg-nop.cap > 8 out of order packets. > Several SACKs. > > 2) Set the number of packets to 50. > http://people.freebsd.org/~araujo/lagg/lagg.cap > 0 out of order packets. > Less SACKs. > > >> You may not think >> it's a problem for TCP, but UDP things and VPN things will start >> getting very angry. There are VPN configurations out there that will >> drop the VPN if frames are out of order. >> > > I'm not thinking that will be a problem for TCP, but, in somehow it will > be, less throughput as I showed before, and less SACK. About the VPN, > please, tell me which softwares, and let me know where I can get a sample > to make a testbed. > > However to be very honest, I don't believe anyone here when change > something at network protocols will make this extensive testbed. It is > almost impossible to predict what software it will works or not, and I > don't believe anyone here has all these stuff in hands. > > >> >> The ixgbe driver is setting the flowid to the msix queue ID, rather >> than a 32 bit unique flow id hash value for the flow. That makes it >> hard to do traffic distribution where the flowid is available. >> > > Thanks for the explanation. > > >> >> There's an lagg option to re-hash the mbuf rather than rely on the >> flowid for outbound port choice - have you looked at using that? Did >> that make any difference? >> > > Yes, I set to 0 the net.link.lagg.0.use _flowid, it make a little > difference to the default round robin implementation, but yet I can't reach > more than 5 Gbit/s. With my patch and set the packets to 50, it improved a > bit too. > > So, thank you so much for all review, I don't know if you have time and a > testbed to make a real test, as I'm doing. I would be happy if you or more > people could make tests on that patch. Also, I have only ixgbe(4) to make > tests, would appreciate if this patch could be tested with other NICs too. > > Best Regards, > > -- > Marcelo Araujo (__) > araujo@FreeBSD.org \\\'',)http://www.FreeBSD.org \/ \ ^ > Power To Server. .\. /_) > > -- -- Marcelo Araujo (__)araujo@FreeBSD.org \\\'',)http://www.FreeBSD.org \/ \ ^ Power To Server. .\. /_) --f46d043c7e0c28456204fe72fed0 Content-Type: application/octet-stream; name="if_lagg-rr.patch" Content-Disposition: attachment; filename="if_lagg-rr.patch" Content-Transfer-Encoding: base64 X-Attachment-Id: f_hxr7fodu0 SW5kZXg6IGlmX2xhZ2cuYwo9PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09 PT09PT09PT09PT09PT09PT09PT09PT09PT09Ci0tLSBpZl9sYWdnLmMJKHJldmlzaW9uIDI2ODgz MikKKysrIGlmX2xhZ2cuYwkod29ya2luZyBjb3B5KQpAQCAtMTg3LDYgKzE4NywxMCBAQAogU1lT Q1RMX0lOVChfbmV0X2xpbmtfbGFnZywgT0lEX0FVVE8sIGRlZmF1bHRfZmxvd2lkX3NoaWZ0LCBD VExGTEFHX1JXVFVOLAogICAgICZkZWZfZmxvd2lkX3NoaWZ0LCAwLAogICAgICJEZWZhdWx0IHNl dHRpbmcgZm9yIGZsb3dpZCBzaGlmdCBmb3IgbG9hZCBzaGFyaW5nIik7CitzdGF0aWMgaW50IGxh Z2dfcnJfcGFja2V0cyA9IDA7IC8qIERlZmF1bHQgdmFsdWUgZm9yIHVzaW5nIHJyX3BhY2tldHMg Ki8KK1NZU0NUTF9JTlQoX25ldF9saW5rX2xhZ2csIE9JRF9BVVRPLCBycl9wYWNrZXRzLCBDVExG TEFHX1JXLAorICAgICZsYWdnX3JyX3BhY2tldHMsIDAsCisgICAgIkhvdyBtYW55IHBhY2tldHMg dG8gYmUgc2VuZCBwZXIgaW50ZXJmYWNlIik7CiAKIHN0YXRpYyBpbnQKIGxhZ2dfbW9kZXZlbnQo bW9kdWxlX3QgbW9kLCBpbnQgdHlwZSwgdm9pZCAqZGF0YSkKQEAgLTE2ODcsMTQgKzE2OTEsNzMg QEAKIHsKIAlzdHJ1Y3QgbGFnZ19wb3J0ICpscDsKIAl1aW50MzJfdCBwOworCXVpbnQzMl90IHAy OworCXVpbnQzMl90IHBrdF9zeXNjdGxfY291bnQ7CisJaW50IGlmcF9jb3VudCA9IDE7CiAKIAlw ID0gYXRvbWljX2ZldGNoYWRkXzMyKCZzYy0+c2Nfc2VxLCAxKTsKIAlwICU9IHNjLT5zY19jb3Vu dDsKKworCXAyID0gYXRvbWljX2ZldGNoYWRkXzMyKCZzYy0+c2Nfc2VxLCAxKTsKKwlwMiAlPSBz Yy0+c2NfY291bnQ7CisKIAlscCA9IFNMSVNUX0ZJUlNUKCZzYy0+c2NfcG9ydHMpOwotCXdoaWxl IChwLS0pCi0JCWxwID0gU0xJU1RfTkVYVChscCwgbHBfZW50cmllcyk7CiAKIAkvKgorCSAqIElm IHRoZXJlIGlzIG5vIHJlZmVyZW5jZSBmb3IgdGhlIElGUCwgd2UgbXVzdAorIAkgKiBjb3B5IGl0 IG5vdy4KKwkgKi8KKwlpZiAoc3RybGVuKHNjLT5zY19yZWZfaWZwKSA9PSAwKQorCQlzdHJuY3B5 KHNjLT5zY19yZWZfaWZwLCBscC0+bHBfaWZwLT5pZl94bmFtZSwgc2l6ZW9mKHNjLT5zY19yZWZf aWZwKSk7CisgICAgICAgICAgICAgIAorCS8qCisJICogSWYgaWZwX2NvdW50IHdhcyBub3QgeWV0 IGluaXRpYWxpemVkLCB3ZSBtdXN0CisJICogaW5pdGlhbGl6ZSBub3cuCisJICovCisJaWYgKHNj LT5zY19pZnBfY291bnQgPT0gMCkKKwkJc2MtPnNjX2lmcF9jb3VudCA9IDE7CisKKwkvKgorCSAq IElmIHRoZSBzeXNjdGwgcnJfcGFja2V0cyBpcyBzZXQgdG8gMCwgd2UgbXVzdCB1c2UgdGhlCisJ ICogcm91bmRyb2JpbiBhcyBpdCBpcywgb3Igb3RoZXJ3aXNlLCB3ZSBtdXN0IGFwcGx5IHRoZQor CSAqIGdyYW51bGFyaXR5IGJldHdlZW4gdGhlIGludGVyZmFjZXMgdGhhdCBhcmUgcGFydCBvZiB0 aGUgZ3JvdXAuCisJICovCisJaWYgKCFsYWdnX3JyX3BhY2tldHMpIHsKKwkJd2hpbGUgKHAtLSkK KwkJCWxwID0gU0xJU1RfTkVYVChscCwgbHBfZW50cmllcyk7CisJCWdvdG8gc2VuZF9tYnVmOwor CX0gZWxzZSB7CisJCXBrdF9zeXNjdGxfY291bnQgPSBhdG9taWNfZmV0Y2hhZGRfMzIoJnNjLT5z Y19wa3RfY291bnQsIDEpOworCQlpZiAocGt0X3N5c2N0bF9jb3VudCA9PSBsYWdnX3JyX3BhY2tl dHMpIHsKKwkJCWlmIChzYy0+c2NfaWZwX2NvdW50IDw9IHNjLT5zY19jb3VudCkgeworCQkJCXdo aWxlIChpZnBfY291bnQgPCBzYy0+c2NfaWZwX2NvdW50KSB7CisJCQkJCWxwID0gU0xJU1RfTkVY VChscCwgbHBfZW50cmllcyk7CisJCQkJCWlmcF9jb3VudCsrOworCQkJCX0KKwkJCQlzYy0+c2Nf aWZwX2NvdW50Kys7CisJCQkJaWYgKHNjLT5zY19pZnBfY291bnQgPiBzYy0+c2NfY291bnQpCisJ CQkJCXNjLT5zY19pZnBfY291bnQgPSAwOworCQkJfQorCQkJc3RybmNweShzYy0+c2NfcmVmX2lm cCwgbHAtPmxwX2lmcC0+aWZfeG5hbWUsIHNpemVvZihzYy0+c2NfcmVmX2lmcCkpOworCQkJc2Mt PnNjX3BrdF9jb3VudCA9IDA7CisJCX0KKwl9CisKKwkvKgorCSAqIENoZWNrIGlmIHRoZSBjdXJy ZW50IGludGVyZmFjZSB0byBiZSBlbnF1ZXVlIGlzIG5vdCB0aGUKKwkgKiBzYW1lIHVzZWQgaW4g dGhlIGxhc3Qgcm91bmQuCisJICovCisJbHAgPSBTTElTVF9GSVJTVCgmc2MtPnNjX3BvcnRzKTsK Kwl3aGlsZSAocDItLSkgeworCQlpZiAoc3RyY21wKGxwLT5scF9pZnAtPmlmX3huYW1lLCBzYy0+ c2NfcmVmX2lmcCkgPT0gMCkKKwkJCWJyZWFrOworCQllbHNlCisJCQlscCA9IFNMSVNUX05FWFQo bHAsIGxwX2VudHJpZXMpOworCX0KKwlnb3RvIHNlbmRfbWJ1ZjsKKworc2VuZF9tYnVmOgorCS8q CiAJICogQ2hlY2sgdGhlIHBvcnQncyBsaW5rIHN0YXRlLiBUaGlzIHdpbGwgcmV0dXJuIHRoZSBu ZXh0IGFjdGl2ZQogCSAqIHBvcnQgaWYgdGhlIGxpbmsgaXMgZG93biBvciB0aGUgcG9ydCBpcyBO VUxMLgogCSAqLwpJbmRleDogaWZfbGFnZy5oCj09PT09PT09PT09PT09PT09PT09PT09PT09PT09 PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT0KLS0tIGlmX2xhZ2cuaAkocmV2 aXNpb24gMjY4ODMyKQorKysgaWZfbGFnZy5oCSh3b3JraW5nIGNvcHkpCkBAIC0yMzIsNiArMjMy LDkgQEAKIAlzdHJ1Y3Qgc3lzY3RsX29pZAkJKnNjX29pZDsJLyogc3lzY3RsIHRyZWUgb2lkICov CiAJaW50CQkJCXVzZV9mbG93aWQ7CS8qIHVzZSBNX0ZMT1dJRCAqLwogCWludAkJCQlmbG93aWRf c2hpZnQ7CS8qIHNoaWZ0IHRoZSBmbG93aWQgKi8KKwl1aW50MzJfdAkJCXNjX3BrdF9jb3VudDsg LyogdXNlIGZvciBjb3VudCBwYWNrYXRlcyBwZXIgaWZwICovCisJaW50CQkJCXNjX2lmcF9jb3Vu dDsgLyogY291bnRlciByZWZlcmVuY2Ugb2YgaW50ZXJmYWNlcyBvbiByciAqLworCWNoYXIJCQkJ c2NfcmVmX2lmcFtJRk5BTVNJWl07IC8qIG5hbWUgb2YgdGhlIGlmcCAqLwogfTsKIAogc3RydWN0 IGxhZ2dfcG9ydCB7Cg== --f46d043c7e0c28456204fe72fed0-- From owner-freebsd-net@FreeBSD.ORG Fri Jul 18 08:03:58 2014 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id AC4BDDBE; Fri, 18 Jul 2014 08:03:58 +0000 (UTC) Received: from mail-qc0-x22a.google.com (mail-qc0-x22a.google.com [IPv6:2607:f8b0:400d:c01::22a]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 5A7E826BA; Fri, 18 Jul 2014 08:03:58 +0000 (UTC) Received: by mail-qc0-f170.google.com with SMTP id c9so3137223qcz.29 for ; Fri, 18 Jul 2014 01:03:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date:message-id:subject :from:to:cc:content-type; bh=9ZLM3khD5OJzkQ2LG3bNwSSfKcCflqp89lF43c4fpbA=; b=NnzsvMkTB1SYl2bk37gIVrnGsQo+vRbGJvE/Cg2mdiFtJLT0/e4vPQz5wVVj3Iv0xB KSOVn3GQ9Fyi146+ut5L3wmWtOexUcQMDkHW9ebkD0PV6jtt8dWBvR0AdhIjq0uZ1B4Q r6aOBcHtqIKxPjygl8DHT3cu3FPAq2srjRrFaQES/1lb8u5p5YhkZOgkacrkLwNXda3P qxeiY7TTHEJIcJZjPF+ovRNo8Yjnmr0LSHqaB5j/TuvdTzDU2GV+mEtkTEQVPgkmL9Vu 7g1gzSroxJKYRJOiCP/mey1/9VRIcZct9NZ+CaY5a577zEZGR5RnlWQYgOoR1Ejiveya cNuQ== MIME-Version: 1.0 X-Received: by 10.140.38.169 with SMTP id t38mr4954791qgt.3.1405670635347; Fri, 18 Jul 2014 01:03:55 -0700 (PDT) Sender: adrian.chadd@gmail.com Received: by 10.224.202.193 with HTTP; Fri, 18 Jul 2014 01:03:55 -0700 (PDT) In-Reply-To: References: Date: Fri, 18 Jul 2014 01:03:55 -0700 X-Google-Sender-Auth: vvQfWXgrH5GdC8-IuckQYdtUeZU Message-ID: Subject: Re: [patch][lagg] - Set a better granularity and distribution on roundrobin protocol. From: Adrian Chadd To: araujo@freebsd.org Content-Type: text/plain; charset=UTF-8 Cc: FreeBSD Net X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 18 Jul 2014 08:03:58 -0000 Hi, I strongly object to having a round-robin method like this. Yes, we won't get > 1 link of bandwidth out of a single stream, but you're showing that you can't even get that. There's still something else weird going on. I'm sorry, but introducing more out of order possibilities is being a bad network citizen. -a On 18 July 2014 00:49, Marcelo Araujo wrote: > Hello guys, > > I made few changes on the lagg(4) patch. Also, I made tests using igb(4), > ixgbe(4) and em(4); seems everything worked pretty well. > > I'm wondering if anyone else could make a review, and what I need to do, to > see this patch committed. > > Best Regards, > > > > > 2014-06-24 10:40 GMT+08:00 Marcelo Araujo : > >> >> >> 2014-06-24 6:54 GMT+08:00 Adrian Chadd : >> >>> Hi, >>> >>> No, don't introduce out of order behaviour. Ever. >> >> >> Yes, it has out of order behavior; with my patch much less. I upload two >> pcap files and you can see by yourself, if you don't believe in what I'm >> talking about. >> >> Test done using: "iperf -s" and "iperf -c -i 1 -t 10". >> >> 1) Don't change the number of packets(default round robin behavior). >> http://people.freebsd.org/~araujo/lagg/lagg-nop.cap >> 8 out of order packets. >> Several SACKs. >> >> 2) Set the number of packets to 50. >> http://people.freebsd.org/~araujo/lagg/lagg.cap >> 0 out of order packets. >> Less SACKs. >> >>> >>> You may not think >>> it's a problem for TCP, but UDP things and VPN things will start >>> getting very angry. There are VPN configurations out there that will >>> drop the VPN if frames are out of order. >> >> >> I'm not thinking that will be a problem for TCP, but, in somehow it will >> be, less throughput as I showed before, and less SACK. About the VPN, >> please, tell me which softwares, and let me know where I can get a sample to >> make a testbed. >> >> However to be very honest, I don't believe anyone here when change >> something at network protocols will make this extensive testbed. It is >> almost impossible to predict what software it will works or not, and I don't >> believe anyone here has all these stuff in hands. >> >>> >>> >>> The ixgbe driver is setting the flowid to the msix queue ID, rather >>> than a 32 bit unique flow id hash value for the flow. That makes it >>> hard to do traffic distribution where the flowid is available. >> >> >> Thanks for the explanation. >> >>> >>> >>> There's an lagg option to re-hash the mbuf rather than rely on the >>> flowid for outbound port choice - have you looked at using that? Did >>> that make any difference? >> >> >> Yes, I set to 0 the net.link.lagg.0.use _flowid, it make a little >> difference to the default round robin implementation, but yet I can't reach >> more than 5 Gbit/s. With my patch and set the packets to 50, it improved a >> bit too. >> >> So, thank you so much for all review, I don't know if you have time and a >> testbed to make a real test, as I'm doing. I would be happy if you or more >> people could make tests on that patch. Also, I have only ixgbe(4) to make >> tests, would appreciate if this patch could be tested with other NICs too. >> >> Best Regards, >> >> -- >> Marcelo Araujo (__) >> araujo@FreeBSD.org \\\'',) >> http://www.FreeBSD.org \/ \ ^ >> Power To Server. .\. /_) > > > > > -- > > -- > Marcelo Araujo (__) > araujo@FreeBSD.org \\\'',) > http://www.FreeBSD.org \/ \ ^ > Power To Server. .\. /_) From owner-freebsd-net@FreeBSD.ORG Fri Jul 18 18:18:38 2014 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id C451266B; Fri, 18 Jul 2014 18:18:38 +0000 (UTC) Received: from mail-pd0-x229.google.com (mail-pd0-x229.google.com [IPv6:2607:f8b0:400e:c02::229]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 89CE82FB4; Fri, 18 Jul 2014 18:18:38 +0000 (UTC) Received: by mail-pd0-f169.google.com with SMTP id y10so5473823pdj.28 for ; Fri, 18 Jul 2014 11:18:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=message-id:date:from:user-agent:mime-version:to:cc:subject :references:in-reply-to:content-type:content-transfer-encoding; bh=I/wfJBWA56SOxl24a2TyVa4MKrrU78ZtbWfQoVuJHis=; b=VPW19GBZS/FjnqX3iMGzXhsAQXNKXiUBI6aZ52hFg18l8h29hHw8GPAXLaFcjggshW W2tqtiaWDA5U60pJAyCUi78M4aFp16gtWzc3QsCB7p+3+Dpd6ajlYPDSYvhRve8/5Po5 GImt9R/KsJcO7mSZ1KXl1yAvxaVb5jfIv+upiLtQnwbl/40dwkXwSXB4hm/WG/pPomTO AgduyZaPCb9GjiKQHLLJOluNZryZwegvWixraiMlKls8QneKdn0rCHDhp3IP+erbcI7t KzPDmVfDMBCeggORr/5VAQq9oq0oMowNX4e3p/2nK8DaKO1jCg+QkXmzyviqZ0DK4SAR OQQg== X-Received: by 10.69.12.33 with SMTP id en1mr7571950pbd.66.1405707518042; Fri, 18 Jul 2014 11:18:38 -0700 (PDT) Received: from [10.192.166.0] (stargate.chelsio.com. [67.207.112.58]) by mx.google.com with ESMTPSA id xh10sm25630043pac.24.2014.07.18.11.18.36 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Fri, 18 Jul 2014 11:18:37 -0700 (PDT) Message-ID: <53C964F7.8060503@gmail.com> Date: Fri, 18 Jul 2014 11:18:31 -0700 From: Navdeep Parhar User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:24.0) Gecko/20100101 Thunderbird/24.6.0 MIME-Version: 1.0 To: araujo@FreeBSD.org, Adrian Chadd Subject: Re: [patch][lagg] - Set a better granularity and distribution on roundrobin protocol. References: In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: FreeBSD Net X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 18 Jul 2014 18:18:38 -0000 On 07/18/14 00:49, Marcelo Araujo wrote: > Hello guys, > > I made few changes on the lagg(4) patch. Also, I made tests using igb(4), > ixgbe(4) and em(4); seems everything worked pretty well. > > I'm wondering if anyone else could make a review, and what I need to do, to > see this patch committed. Deliberately putting out-of-order packets on the wire is never a good idea. This would count as a serious regression in lagg(4) imho. Regards, Navdeep > > Best Regards, > > > > > 2014-06-24 10:40 GMT+08:00 Marcelo Araujo : > >> >> >> 2014-06-24 6:54 GMT+08:00 Adrian Chadd : >> >> Hi, >>> >>> No, don't introduce out of order behaviour. Ever. >> >> >> Yes, it has out of order behavior; with my patch much less. I upload two >> pcap files and you can see by yourself, if you don't believe in what I'm >> talking about. >> >> Test done using: "iperf -s" and "iperf -c -i 1 -t 10". >> >> 1) Don't change the number of packets(default round robin behavior). >> http://people.freebsd.org/~araujo/lagg/lagg-nop.cap >> 8 out of order packets. >> Several SACKs. >> >> 2) Set the number of packets to 50. >> http://people.freebsd.org/~araujo/lagg/lagg.cap >> 0 out of order packets. >> Less SACKs. >> >> >>> You may not think >>> it's a problem for TCP, but UDP things and VPN things will start >>> getting very angry. There are VPN configurations out there that will >>> drop the VPN if frames are out of order. >>> >> >> I'm not thinking that will be a problem for TCP, but, in somehow it will >> be, less throughput as I showed before, and less SACK. About the VPN, >> please, tell me which softwares, and let me know where I can get a sample >> to make a testbed. >> >> However to be very honest, I don't believe anyone here when change >> something at network protocols will make this extensive testbed. It is >> almost impossible to predict what software it will works or not, and I >> don't believe anyone here has all these stuff in hands. >> >> >>> >>> The ixgbe driver is setting the flowid to the msix queue ID, rather >>> than a 32 bit unique flow id hash value for the flow. That makes it >>> hard to do traffic distribution where the flowid is available. >>> >> >> Thanks for the explanation. >> >> >>> >>> There's an lagg option to re-hash the mbuf rather than rely on the >>> flowid for outbound port choice - have you looked at using that? Did >>> that make any difference? >>> >> >> Yes, I set to 0 the net.link.lagg.0.use _flowid, it make a little >> difference to the default round robin implementation, but yet I can't reach >> more than 5 Gbit/s. With my patch and set the packets to 50, it improved a >> bit too. >> >> So, thank you so much for all review, I don't know if you have time and a >> testbed to make a real test, as I'm doing. I would be happy if you or more >> people could make tests on that patch. Also, I have only ixgbe(4) to make >> tests, would appreciate if this patch could be tested with other NICs too. >> >> Best Regards, >> >> -- >> Marcelo Araujo (__) >> araujo@FreeBSD.org \\\'',)http://www.FreeBSD.org \/ \ ^ >> Power To Server. .\. /_) >> >> > > > > > _______________________________________________ > freebsd-net@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" > From owner-freebsd-net@FreeBSD.ORG Fri Jul 18 18:28:42 2014 Return-Path: Delivered-To: net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id A17A6944; Fri, 18 Jul 2014 18:28:42 +0000 (UTC) Received: from mail-qc0-x234.google.com (mail-qc0-x234.google.com [IPv6:2607:f8b0:400d:c01::234]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 52B4520EC; Fri, 18 Jul 2014 18:28:42 +0000 (UTC) Received: by mail-qc0-f180.google.com with SMTP id l6so3634396qcy.39 for ; Fri, 18 Jul 2014 11:28:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=fTGVmHvqa3TrcVyq4bV8IPKl+EgHkjxKJhyKv8MCULo=; b=Z3EC9Sfr27sahja6EjoZPtTZrVqzgPv4XZ4K5HByGkZmlEVTAqkyEAv/k+nm4nwsHl nvbVEGerjLzCwR4qiV8GroO2aJoOgk+k0M1S9AveZ6b60cIKReiBS2iKgO2nKmGdgDBt 8+F0F42gz+/sVFeHiIoRdEr8CuQ8UXLvWpCnHvoggDUenu7nhhxk9JdtkBeJnxK/NIKN AutFQvjObs7VM4UkP7dQd4eU8P8v6hphAkjgb0RHBRa8pmlH9CWz0NfyD85r3J8LEfVp P1hLeoAQa/qBLM+lvztdAPkYL5GXhWiYLOZYqCh8jAkCa3f8poLCbg20+eiAZLst6eyY TrgA== MIME-Version: 1.0 X-Received: by 10.140.90.7 with SMTP id w7mr10544705qgd.52.1405708121024; Fri, 18 Jul 2014 11:28:41 -0700 (PDT) Received: by 10.96.73.39 with HTTP; Fri, 18 Jul 2014 11:28:40 -0700 (PDT) In-Reply-To: References: Date: Fri, 18 Jul 2014 11:28:40 -0700 Message-ID: Subject: Re: UDP sendto() returning ENOBUFS - "No buffer space available" From: hiren panchasara To: Adrian Chadd Content-Type: text/plain; charset=UTF-8 Cc: "freebsd-net@freebsd.org" X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 18 Jul 2014 18:28:42 -0000 On Wed, Jul 16, 2014 at 11:00 AM, Adrian Chadd wrote: > Hi! > > So the UDP transmit path is udp_usrreqs->pru_send() == udp_send() -> > udp_output() -> ip_output() > > udp_output() does do a M_PREPEND() which can return ENOBUFS. ip_output > can also return ENOBUFS. > > it doesn't look like the socket code (eg sosend_dgram()) is doing any > buffering - it's just copying the frame and stuffing it up to the > driver. No queuing involved before the NIC. Right. Thanks for confirming. > > So a _well behaved_ driver will return ENOBUFS _and_ not queue the > frame. However, it's entirely plausible that the driver isn't well > behaved - the intel drivers screwed up here and there with transmit > queue and failure to queue vs failure to transmit. > > So yeah, try tweaking the tx ring descriptor for the driver your'e > using and see how big a bufring it's allocating. Yes, so I am dealing with Broadcom BCM5706/BCM5708 Gigabit Ethernet, i.e. bce(4). I bumped up tx_pages from 2 (default) to 8 where each page is 255 buffer descriptors. I am seeing quite nice improvement on stable/10 where I can send *more* stuff :-) cheers, Hiren From owner-freebsd-net@FreeBSD.ORG Fri Jul 18 21:02:59 2014 Return-Path: Delivered-To: net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 9D378427 for ; Fri, 18 Jul 2014 21:02:59 +0000 (UTC) Received: from mail-qc0-x234.google.com (mail-qc0-x234.google.com [IPv6:2607:f8b0:400d:c01::234]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 5FFAA2F06 for ; Fri, 18 Jul 2014 21:02:59 +0000 (UTC) Received: by mail-qc0-f180.google.com with SMTP id l6so3782550qcy.39 for ; Fri, 18 Jul 2014 14:02:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:cc:content-type; bh=MCOdsC3I7Tiw25AoXctlCaUo1MQlIWqAHd/ZPpKBA6Y=; b=D17lmimicmB/zptIOv+X+eMq3C143Y/PgLtx7f8Z1V+/ZIFoLspQ2HELFXCiOMecoG Oc1xniIIVPeFQQNGjhKyHHnWoZ30vPcyg8SHmiT8IHIUqyzJzLO2zESbzozOaSC37/bi 4O1xgDvj5HBq528ZSLeF+uj7LD/cP120IPu+qqhOdS+9GEAm2RRopwtIWOGksC2N3Vh2 /pGfm4aboRYx1A5cHmSw8tApeX2yAIHGV3obULd9sm98GrWJ11M1Ypn4bniMpcZ/Mwa3 H9sEqNBky1P+d+HInK5CAjF2SJKC5M5oNN7WiDwYpVN77elHTrtXHirO7kchTra0Nagq r3jA== MIME-Version: 1.0 X-Received: by 10.224.15.72 with SMTP id j8mr13333873qaa.8.1405717378548; Fri, 18 Jul 2014 14:02:58 -0700 (PDT) Received: by 10.96.25.164 with HTTP; Fri, 18 Jul 2014 14:02:58 -0700 (PDT) Date: Fri, 18 Jul 2014 14:02:58 -0700 Message-ID: Subject: Error building netmap for centOS 6.5 From: Morgan Yang To: net@freebsd.org Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.18 Cc: rizzo@iet.unipi.it X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 18 Jul 2014 21:02:59 -0000 Hi: I downloaded the latest netmap repo from git and attempted to compile them. git clone https://code.google.com/p/netmap/ cd netmap/Linux make I get the following errors [devusr@testbox LINUX]$ make LIN_VER 20620 ---- Building from /lib/modules/2.6.32-431.20.3.el6.x86_64/build/drivers/ne= t ---- copying e1000 e1000e r8169.c --- From /lib/modules/2.6.32-431.20.3.el6.x86_64/build/drivers/net : drwxr-xr-x. 2 root root 4096 Jul 16 16:56 e1000/ drwxr-xr-x. 2 root root 4096 Jul 16 16:56 e1000e/ ** patch with diff--e1000--20620--99999 The text leading up to this was: -------------------------- |diff --git a/e1000/e1000_main.c b/e1000/e1000_main.c |index bcd192c..5de7009 100644 |--- a/e1000/e1000_main.c |+++ b/e1000/e1000_main.c -------------------------- No file to patch. Skipping patch. 9 out of 9 hunks ignored ** patch with diff--e1000e--20620--20623 The text leading up to this was: -------------------------- |diff --git a/e1000e/netdev.c b/e1000e/netdev.c |index fad8f9e..50f74e2 100644 |--- a/e1000e/netdev.c |+++ b/e1000e/netdev.c -------------------------- No file to patch. Skipping patch. 8 out of 8 hunks ignored ** patch with diff--r8169.c--20620--20625 The text leading up to this was: -------------------------- |diff --git a/r8169.c b/r8169.c |index 0fe2fc9..efee0a4 100644 |--- a/r8169.c |+++ b/r8169.c -------------------------- No file to patch. Skipping patch. 9 out of 9 hunks ignored Building the following drivers: e1000 e1000e r8169.c make -C /lib/modules/2.6.32-431.20.3.el6.x86_64/build M=3D/home/devusr/Documents/netmap/LINUX CONFIG_NETMAP=3Dm CONFIG_E1000=3Dm CONFIG_E1000E=3Dm CONFIG_IXGBE=3Dm CONFIG_IGB=3Dm CONFIG_BNX2X=3Dm CONFIG_M= LX4=3Dm CONFIG_VIRTIO_NET=3Dm \ EXTRA_CFLAGS=3D'-I/home/devusr/Documents/netmap/LINUX -I/home/devusr/Documents/netmap/LINUX/../sys -I/home/devusr/Documents/netmap/LINUX/../sys/dev -DCONFIG_NETMAP -Wno-unused-but-set-variable' \ O_DRIVERS=3D"e1000/ e1000e/" modules make[1]: Entering directory `/usr/src/kernels/2.6.32-431.20.3.el6.x86_64' CC [M] /home/devusr/Documents/netmap/LINUX/netmap.o /home/devusr/Documents/netmap/LINUX/../sys/dev/netmap/netmap.c: In function =E2=80=98netmap_attach=E2=80=99: /home/devusr/Documents/netmap/LINUX/../sys/dev/netmap/netmap.c:2255: error: =E2=80=98struct ethtool_ops=E2=80=99 has no member named =E2=80=98set_chann= els=E2=80=99 make[2]: *** [/home/devusr/Documents/netmap/LINUX/netmap.o] Error 1 make[1]: *** [_module_/home/devusr/Documents/netmap/LINUX] Error 2 make[1]: Leaving directory `/usr/src/kernels/2.6.32-431.20.3.el6.x86_64' make: *** [build] Error 2 Has anyone came across this issue before??? Much Thanks Morgan Yang From owner-freebsd-net@FreeBSD.ORG Fri Jul 18 22:20:24 2014 Return-Path: Delivered-To: net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 4AA518A6; Fri, 18 Jul 2014 22:20:24 +0000 (UTC) Received: from mail105.syd.optusnet.com.au (mail105.syd.optusnet.com.au [211.29.132.249]) by mx1.freebsd.org (Postfix) with ESMTP id CAECD2579; Fri, 18 Jul 2014 22:20:23 +0000 (UTC) Received: from c122-106-147-133.carlnfd1.nsw.optusnet.com.au (c122-106-147-133.carlnfd1.nsw.optusnet.com.au [122.106.147.133]) by mail105.syd.optusnet.com.au (Postfix) with ESMTPS id 33EC110C8258; Sat, 19 Jul 2014 06:40:16 +1000 (EST) Date: Sat, 19 Jul 2014 06:40:14 +1000 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: hiren panchasara Subject: Re: UDP sendto() returning ENOBUFS - "No buffer space available" In-Reply-To: Message-ID: <20140719053318.I15959@besplex.bde.org> References: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.1 cv=dZS5gxne c=1 sm=1 tr=0 a=7NqvjVvQucbO2RlWB8PEog==:117 a=PO7r1zJSAAAA:8 a=qwo5vUNQll0A:10 a=kj9zAlcOel0A:10 a=JzwRw_2MAAAA:8 a=6I5d2MoRAAAA:8 a=uUY9mpc2aE6cjnkfEgIA:9 a=qYTI4x4OGrQ0lSoH:21 a=OaoyB03c6sN3E27u:21 a=CjuIK1q_8ugA:10 a=SV7veod9ZcQA:10 Cc: Adrian Chadd , "freebsd-net@freebsd.org" X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 18 Jul 2014 22:20:24 -0000 On Fri, 18 Jul 2014, hiren panchasara wrote: > On Wed, Jul 16, 2014 at 11:00 AM, Adrian Chadd wrote: >> Hi! >> >> So the UDP transmit path is udp_usrreqs->pru_send() == udp_send() -> >> udp_output() -> ip_output() >> >> udp_output() does do a M_PREPEND() which can return ENOBUFS. ip_output >> can also return ENOBUFS. >> >> it doesn't look like the socket code (eg sosend_dgram()) is doing any >> buffering - it's just copying the frame and stuffing it up to the >> driver. No queuing involved before the NIC. > > Right. Thanks for confirming. Most buffering should be in ifq above the NIC. For UDP, I think udp_output() puts buffers on the ifq and calls the driver for every one, but the driver shouldn't do anything for most calls. The driver can't possibly do anything if its ring buffer is full, and shouldn't do anything if it is nearly full. Buffers accumulate in the ifq until the driver gets around to them or the queue fills up. Most ENOBUFS errors are for when it fills up. It can very easily fill up, especially since it is too small in most configurations. Just loop calling sendto(). This will fill the ifq almost instantly unless the hardware is faster than the software. >> So a _well behaved_ driver will return ENOBUFS _and_ not queue the >> frame. However, it's entirely plausible that the driver isn't well >> behaved - the intel drivers screwed up here and there with transmit >> queue and failure to queue vs failure to transmit. No, the driver doesn't have much control over the ifq. >> So yeah, try tweaking the tx ring descriptor for the driver your'e >> using and see how big a bufring it's allocating. > > Yes, so I am dealing with Broadcom BCM5706/BCM5708 Gigabit Ethernet, > i.e. bce(4). > > I bumped up tx_pages from 2 (default) to 8 where each page is 255 > buffer descriptors. > > I am seeing quite nice improvement on stable/10 where I can send > *more* stuff :-) 255 is not many. I am most familiar with bge where there is a single tx ring with 511 or 512 buffer descriptors (some bge's have more, but this is unportable and was not supported last time I looked. The extras might be only for input). One of my bge's can do 640 kpps with tiny packets (only 80 kpps with normal packets) and the other only 200 (?) kpps (both should be limited mainly by the PCI bus, but the slow one is limited by it being a dumbed down 5705"plus"). At 640 kpps, it takes 800 microseconds to transmit 512 packets. (There is 1 packet per buffer descriptor for small packets.) Considerable buffering in ifq is needed to prevent the transmitter running dry whenever the application stops generating packets for more than 800 microseconds for some reason, but the default buffering is stupidly small. The default is given by net.inet.ifqmaxlen and some corresponding macros, and is still just 50. 50 was enough for 1 Mpbs ethernet and perhaps even for 10 Mbps, but is now too small. Most drivers don't use it, but use their own too-small value. bge uses just its own ring buffer size of 511. I use 10000 or 40000 depending on hz: % diff -u2 if_bge.c~ if_bge.c % --- if_bge.c~ 2012-03-13 02:13:48.144002000 +0000 % +++ if_bge.c 2012-03-13 02:13:50.123023000 +0000 % @@ -3315,5 +3316,6 @@ % ifp->if_start = bge_start; % ifp->if_init = bge_init; % - ifp->if_snd.ifq_drv_maxlen = BGE_TX_RING_CNT - 1; % + ifp->if_snd.ifq_drv_maxlen = BGE_TX_RING_CNT + % + imax(4 * tick, 10000) / 1; % IFQ_SET_MAXLEN(&ifp->if_snd, ifp->if_snd.ifq_drv_maxlen); % IFQ_SET_READY(&ifp->if_snd); 40000 is what is needed for 4 tick's worth of buffering at hz = 100. 40000 is far too large where 50 is far too small, but something like it is needed when hz is large due to another problem: select() on the ENOBUFS condition is broken (unsupported), so when sendto() returns ENOBUFS there is no way for the application to tell how long it should wait before retrying. If it wants to burn CPU then it can spin calling sendto(). Otherwise, it should sleep, but with a sleep granularity of 1 tick this requires several ticks worth of buffering to avoid the transmitter running dry. Large queue lengths give a large latency for packets at the end of the queue and give no chance of the working set fitting in an Ln cache for small n. The precise stupidly small value of (tx_ring_count - 1) for the ifq length seems to be for no good reason. Subtracting 1 is apparently to increase the chance that all packets in the ifq can be fitted into the tx ring. But this is silly since the ifq packet count is in dufferent units to the buffer descriptor count. For normal-size packets, there are 2 descriptors per packet. So in the usual case where the ifq is full, only about half of it can be moved to the tx ring. And this is good since it gives a little more buffering. Otherwise, the effective buffering is just what is in the tx ring, since none is left in the ifq after transferring eveyrhing. (tx_ring_count - 1) is used by many other drivers. E.g., fxp. fxp is only 100 Mbps and its tx_ring_count is 128. 128 is a bit larger than 50 but not enough. Scaling down my 40000 gives 4000 for hz = 100 and 400 for hz = 1000. I never worried about this problem at 100 Mpbs. Changing from 2 rings of length 255 to 8 of length 255 shouldn't make much difference if other things are configured correctly. It doesn't matter much if the buffering is in ifq or in ring buffers. Multiple ring buffers, filled in advance of the active one running dry so that the next one can be switched to quickly, mainly give you a tiny latency optimization. I get similar optimizations more in software for bge, by doing watermark stuff. The boundary between the ifq and the tx ring also acts as a primitive watermark. With watermarks, it is best to not divide up the buffer evenly, but that is what the (tx_ring_count - 1) sizing for the ifq sort of does. Bruce From owner-freebsd-net@FreeBSD.ORG Sat Jul 19 02:06:24 2014 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id EFCA7844; Sat, 19 Jul 2014 02:06:24 +0000 (UTC) Received: from mail-wi0-x22b.google.com (mail-wi0-x22b.google.com [IPv6:2a00:1450:400c:c05::22b]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 5CBB227EC; Sat, 19 Jul 2014 02:06:24 +0000 (UTC) Received: by mail-wi0-f171.google.com with SMTP id hi2so1680693wib.10 for ; Fri, 18 Jul 2014 19:06:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:reply-to:in-reply-to:references:date:message-id :subject:from:to:cc:content-type; bh=XSIJg3/nRFJDGMbWLISWTRGC9h4+O32W0b+40fqgf1g=; b=w1a55eeur52wJUXMGz4nd474nIl1oFcXObgdcwBOsJ6q3krG/DisPap8zCqnti7i23 oXG8jODH7xeD0FA4YXcLPxBS7Z7BSbgyM2FJR8bM7lrCeKtFUVGqXlmc8fgV77gL5/bj ScUJ9/YOYNdQNmhAhuAIs7V8riMVXEbkkWjRyBA2DJFpEcOhQU35BKtHRIMH83NMpZWJ oZElvpUPk3Sb+hcfQH2FCO5c7ZifycsE6PoNIgEWyvL1o28zclP28xZwbb6cDKP9DG8s mzOSPUZnEjuzvzAi4JRAyqjatYIE2DtOGOIPwMJBuRtfX1QOahivOdZtsvjT/LIh7ejG OXEA== MIME-Version: 1.0 X-Received: by 10.194.158.226 with SMTP id wx2mr1490962wjb.107.1405735581697; Fri, 18 Jul 2014 19:06:21 -0700 (PDT) Received: by 10.216.190.194 with HTTP; Fri, 18 Jul 2014 19:06:21 -0700 (PDT) Reply-To: araujo@FreeBSD.org In-Reply-To: <53C964F7.8060503@gmail.com> References: <53C964F7.8060503@gmail.com> Date: Sat, 19 Jul 2014 10:06:21 +0800 Message-ID: Subject: Re: [patch][lagg] - Set a better granularity and distribution on roundrobin protocol. From: Marcelo Araujo To: Navdeep Parhar Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.18 Cc: FreeBSD Net , Adrian Chadd X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 19 Jul 2014 02:06:25 -0000 2014-07-19 2:18 GMT+08:00 Navdeep Parhar : > On 07/18/14 00:49, Marcelo Araujo wrote: > > Hello guys, > > > > I made few changes on the lagg(4) patch. Also, I made tests using igb(4), > > ixgbe(4) and em(4); seems everything worked pretty well. > > > > I'm wondering if anyone else could make a review, and what I need to do, > to > > see this patch committed. > > Deliberately putting out-of-order packets on the wire is never a good > idea. This would count as a serious regression in lagg(4) imho. > > Regards, > Navdeep > > > I'm wondering if anyone have tested the patch; because as I have explained in another email, the number of SACK is much less with this patch. I have put some pcap files here: http://people.freebsd.org/~araujo/lagg/ Also, as far as I know, the current roundrobin implementation has no such kind of mechanism to control the order of the packages that goes to the wire. And this patch, what it only does is, instead to send only one package through one interface and switch to the another one, it will send X(where X is the number of packets defined via sysctl) packets and then, switch to the next interface. So, could you show me, where this patch deliberately put out-of-order packets? Did I miss anything? Best Regards, -- -- Marcelo Araujo (__)araujo@FreeBSD.org \\\'',)http://www.FreeBSD.org \/ \ ^ Power To Server. .\. /_) From owner-freebsd-net@FreeBSD.ORG Sat Jul 19 02:44:00 2014 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id C477A1D7; Sat, 19 Jul 2014 02:44:00 +0000 (UTC) Received: from mail-pd0-x22f.google.com (mail-pd0-x22f.google.com [IPv6:2607:f8b0:400e:c02::22f]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 8682E2B15; Sat, 19 Jul 2014 02:44:00 +0000 (UTC) Received: by mail-pd0-f175.google.com with SMTP id r10so4514753pdi.20 for ; Fri, 18 Jul 2014 19:44:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=message-id:date:from:user-agent:mime-version:to:cc:subject :references:in-reply-to:content-type:content-transfer-encoding; bh=Sv3LLxcITLr/WH3IY8GYApG2HrC7OWCauaIhWaAOcJk=; b=0cMSWx92kOv4V6VJwkDdGhDfjOsR+FESJ+PKnSe/pIrMjzLu57PdfSgtO7v1XU/Nlt o7PxS1sD/a8hUOednq5bpw8SOoe765eNmZHRXIXGYpuKWiXCGjm4BkeRoiNV73l6ASQe Iuhu2ObM13xJAnx+2PuYxUu8EAfocnP3v+FqE3tau+W2s5rlH2X+4ecv2NZfyeCoHjkz gyZiUAdugHMUe0bnId6OTIjSruJClIxumH6tGtUnFy3GbHMSM5LfgRsBxRpDWsDdFfHE 8x82o4YZT8QRGzeGVKBOwq1x1OrbVG0DNm4kBn3xMA6t7B6++CBZ76kwn4rQodBmLIam cvjg== X-Received: by 10.68.136.226 with SMTP id qd2mr9759318pbb.72.1405737840010; Fri, 18 Jul 2014 19:44:00 -0700 (PDT) Received: from [10.192.166.0] (stargate.chelsio.com. [67.207.112.58]) by mx.google.com with ESMTPSA id kt2sm6953199pbc.83.2014.07.18.19.43.59 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Fri, 18 Jul 2014 19:43:59 -0700 (PDT) Message-ID: <53C9DB6E.8040205@gmail.com> Date: Fri, 18 Jul 2014 19:43:58 -0700 From: Navdeep Parhar User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:24.0) Gecko/20100101 Thunderbird/24.6.0 MIME-Version: 1.0 To: araujo@FreeBSD.org Subject: Re: [patch][lagg] - Set a better granularity and distribution on roundrobin protocol. References: <53C964F7.8060503@gmail.com> In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Cc: FreeBSD Net , Adrian Chadd X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 19 Jul 2014 02:44:00 -0000 On 07/18/14 19:06, Marcelo Araujo wrote: > > > > 2014-07-19 2:18 GMT+08:00 Navdeep Parhar >: > > On 07/18/14 00:49, Marcelo Araujo wrote: > > Hello guys, > > > > I made few changes on the lagg(4) patch. Also, I made tests using > igb(4), > > ixgbe(4) and em(4); seems everything worked pretty well. > > > > I'm wondering if anyone else could make a review, and what I need > to do, to > > see this patch committed. > > Deliberately putting out-of-order packets on the wire is never a good > idea. This would count as a serious regression in lagg(4) imho. > > Regards, > Navdeep > > > > I'm wondering if anyone have tested the patch; because as I have > explained in another email, the number of SACK is much less with this > patch. I have put some pcap files > here: http://people.freebsd.org/~araujo/lagg/ > > Also, as far as I know, the current roundrobin implementation has no > such kind of mechanism to control the order of the packages that goes to > the wire. And this patch, what it only does is, instead to send only one > package through one interface and switch to the another one, it will > send X(where X is the number of packets defined via sysctl) packets and > then, switch to the next interface. > > So, could you show me, where this patch deliberately put out-of-order > packets? Did I miss anything? Are you saying lagg's roundrobin implementation is already spraying packets for the same flow across interfaces? That would make it unsuitable for anything TCP. But then your patch isn't making it any worse so I don't have any objection to it any more. Looks like loadbalance does the right thing for flows. Regards, Navdeep From owner-freebsd-net@FreeBSD.ORG Sat Jul 19 03:28:11 2014 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 22302864; Sat, 19 Jul 2014 03:28:11 +0000 (UTC) Received: from mail-qa0-x22c.google.com (mail-qa0-x22c.google.com [IPv6:2607:f8b0:400d:c00::22c]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id C2B2C2EC6; Sat, 19 Jul 2014 03:28:10 +0000 (UTC) Received: by mail-qa0-f44.google.com with SMTP id f12so3614547qad.3 for ; Fri, 18 Jul 2014 20:28:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date:message-id:subject :from:to:cc:content-type; bh=w75OtZ7K9jbkw50JF/3vly0uRQTFw8VRS0hdBlaSZP4=; b=vZ0vMhJPvlmIUVh56qQY84qYj9n3I5yy+OFM4HiBKTkzUf9Tbpot8j+lzhftHbs5KI hpVaFrIWtmdld22dSimlX4WiT14+cWxmGQTDbwMmkCp5q+pWikowUbineZn73ttjd7mO SegsOoqTGZATe1s/QqYxISJKHC0SqK9a0/TKUiKmuZXoGQxs7gztju+KkPFl3G3XbI46 cNbWDvimbf7OgtE8fF6CCfToyMo07w6DpHDqv1mh5I7eM27ufU6ypFP24HyC1jNwxgax zHn5S0yj23T2+DQS4vLlGvsGT/RhcjVXhbDWj/PSAM8LCUTv40TixfCBDugsm0GiibuF ihBg== MIME-Version: 1.0 X-Received: by 10.229.174.70 with SMTP id s6mr14798770qcz.29.1405740489801; Fri, 18 Jul 2014 20:28:09 -0700 (PDT) Sender: adrian.chadd@gmail.com Received: by 10.224.1.6 with HTTP; Fri, 18 Jul 2014 20:28:09 -0700 (PDT) In-Reply-To: References: <53C964F7.8060503@gmail.com> Date: Fri, 18 Jul 2014 20:28:09 -0700 X-Google-Sender-Auth: e4_Jf2H8yjiTLJmu-inCWeXE9oI Message-ID: Subject: Re: [patch][lagg] - Set a better granularity and distribution on roundrobin protocol. From: Adrian Chadd To: araujo@freebsd.org Content-Type: text/plain; charset=UTF-8 Cc: FreeBSD Net , Navdeep Parhar X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 19 Jul 2014 03:28:11 -0000 On 18 July 2014 19:06, Marcelo Araujo wrote: > > > > 2014-07-19 2:18 GMT+08:00 Navdeep Parhar : > >> On 07/18/14 00:49, Marcelo Araujo wrote: >> > Hello guys, >> > >> > I made few changes on the lagg(4) patch. Also, I made tests using >> > igb(4), >> > ixgbe(4) and em(4); seems everything worked pretty well. >> > >> > I'm wondering if anyone else could make a review, and what I need to do, >> > to >> > see this patch committed. >> >> Deliberately putting out-of-order packets on the wire is never a good >> idea. This would count as a serious regression in lagg(4) imho. >> >> Regards, >> Navdeep >> >> > > I'm wondering if anyone have tested the patch; because as I have explained > in another email, the number of SACK is much less with this patch. I have > put some pcap files here: http://people.freebsd.org/~araujo/lagg/ > > Also, as far as I know, the current roundrobin implementation has no such > kind of mechanism to control the order of the packages that goes to the > wire. And this patch, what it only does is, instead to send only one package > through one interface and switch to the another one, it will send X(where X > is the number of packets defined via sysctl) packets and then, switch to the > next interface. > > So, could you show me, where this patch deliberately put out-of-order > packets? Did I miss anything? It doesn't introduce it, but it still continues potentially out of order behaviour depending upon CPU loading and NIC scheduling. If you're seeing reduced ACK / retransmits by doing this then there's gotta be some other underlying factor causing it. That's what I think needs to be fixed, not papering over it by more round robin hacks. :-P -a From owner-freebsd-net@FreeBSD.ORG Sat Jul 19 03:34:49 2014 Return-Path: Delivered-To: net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 70E9496E for ; Sat, 19 Jul 2014 03:34:49 +0000 (UTC) Received: from mail-qg0-x22f.google.com (mail-qg0-x22f.google.com [IPv6:2607:f8b0:400d:c04::22f]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 30AF92F68 for ; Sat, 19 Jul 2014 03:34:49 +0000 (UTC) Received: by mail-qg0-f47.google.com with SMTP id i50so3769848qgf.20 for ; Fri, 18 Jul 2014 20:34:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date:message-id:subject :from:to:cc:content-type; bh=IsKxbeb78G3FjT0R+NRwPfnryv4r/GuQWF4t4XzaMoI=; b=qbsOEVTEIcEbonn7Ww6lxxsD6kINRCuLbwRfz15tlD+sWGkSK7KIWR5ybvEvMkzA9H VLvz63CSScSUyejxl/fUv2iLkKw3RkldWGtZfrsERoDJh8jdMzbq76bIrIDZngOoirHN JvFi46QuLgcGy7702iIBGOZ+/GepClIPAxNw29Pkgah5c5+1c6Fe/rpB41JZey261NdT OFg65SLaN1beq/l3CS3vGHtVV/TF+EujS0r9KrBhp4urQRU1gaw+bDjKtpCNokl/xang 7op2mc9cVbs7A1HBf0WBeQpOXXw4SeGCemMH8KsvkvSzaqodZG+h8Yr/rr4PGaFNSA/7 c9TA== MIME-Version: 1.0 X-Received: by 10.229.171.196 with SMTP id i4mr14934441qcz.15.1405740888293; Fri, 18 Jul 2014 20:34:48 -0700 (PDT) Sender: adrian.chadd@gmail.com Received: by 10.224.1.6 with HTTP; Fri, 18 Jul 2014 20:34:48 -0700 (PDT) In-Reply-To: <20140719053318.I15959@besplex.bde.org> References: <20140719053318.I15959@besplex.bde.org> Date: Fri, 18 Jul 2014 20:34:48 -0700 X-Google-Sender-Auth: 2QgPtedPlrYmwj0jBHxikPv50sI Message-ID: Subject: Re: UDP sendto() returning ENOBUFS - "No buffer space available" From: Adrian Chadd To: Bruce Evans Content-Type: text/plain; charset=UTF-8 Cc: hiren panchasara , "freebsd-net@freebsd.org" X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 19 Jul 2014 03:34:49 -0000 Hi, On 18 July 2014 13:40, Bruce Evans wrote: > On Fri, 18 Jul 2014, hiren panchasara wrote: > >> On Wed, Jul 16, 2014 at 11:00 AM, Adrian Chadd wrote: >>> >>> Hi! >>> >>> So the UDP transmit path is udp_usrreqs->pru_send() == udp_send() -> >>> udp_output() -> ip_output() >>> >>> udp_output() does do a M_PREPEND() which can return ENOBUFS. ip_output >>> can also return ENOBUFS. >>> >>> it doesn't look like the socket code (eg sosend_dgram()) is doing any >>> buffering - it's just copying the frame and stuffing it up to the >>> driver. No queuing involved before the NIC. >> >> >> Right. Thanks for confirming. > > > Most buffering should be in ifq above the NIC. For UDP, I think > udp_output() puts buffers on the ifq and calls the driver for every > one, but the driver shouldn't do anything for most calls. The > driver can't possibly do anything if its ring buffer is full, and > shouldn't do anything if it is nearly full. Buffers accumulate in > the ifq until the driver gets around to them or the queue fills up. > Most ENOBUFS errors are for when it fills up. It can very easily > fill up, especially since it is too small in most configurations. > Just loop calling sendto(). This will fill the ifq almost > instantly unless the hardware is faster than the software. For if_transmit() drivers, there's no ifp queue. The queuing is being done in the driver. For drivers with if_transmit(), they may end up doing direct DMA ring dispatch or they may have a buf_ring in front of it.There's no ifq anymore. It upsets the ALTQ people too. -a From owner-freebsd-net@FreeBSD.ORG Sat Jul 19 03:49:58 2014 Return-Path: Delivered-To: net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 84D06AC5 for ; Sat, 19 Jul 2014 03:49:58 +0000 (UTC) Received: from mail-oa0-f51.google.com (mail-oa0-f51.google.com [209.85.219.51]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 474062044 for ; Sat, 19 Jul 2014 03:49:57 +0000 (UTC) Received: by mail-oa0-f51.google.com with SMTP id o6so4604448oag.38 for ; Fri, 18 Jul 2014 20:49:51 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:references:mime-version:in-reply-to:content-type :content-transfer-encoding:message-id:cc:from:subject:date:to; bh=7tXeLXBv24tQCNgBhIo7EWl1SoADfn6DaAotBnQR6Mw=; b=g3P/59wJ6vsGqZy3ICvciE5kEI86obT6BZ6q0dI7XudHVh3XPZBnovWXBd1rQCiLDi 5uU9ufGgSS2dC7BT81ZxY1d5OQzRVZAz4a21YddLJwkr0yMUndjrII0lQIEJ3ZtLqYMj gKahklux50pD5STvYQBBx9+qr4OvTScGxKP8OlJ1gLVomqgfxbGu1cCWvQ+ArMf0bXNn KGkpMDgPaVgamynJMf+LK60XsA9OgrC4n4zAaggbjJ+52JKCdYGn7R/8eTHNudT8Pgzt jXCHttz12xUGV9zwG4HU0OT3FCWCp8dJhJ6r/VjuVfx1NHRBBAZu5v3bu6h5g3NEcT3x viTQ== X-Gm-Message-State: ALoCoQmakv3UjOgADkFTFZVZ6VPiik4sF+8hxArzlVt+0JwCZuNM53Mt8fuZ5i1i989//rYeEwgh X-Received: by 10.182.89.164 with SMTP id bp4mr12924315obb.21.1405741791128; Fri, 18 Jul 2014 20:49:51 -0700 (PDT) Received: from [29.201.216.104] (66-87-116-104.pools.spcsdns.net. [66.87.116.104]) by mx.google.com with ESMTPSA id of9sm12687511obb.25.2014.07.18.20.49.50 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Fri, 18 Jul 2014 20:49:50 -0700 (PDT) References: <20140719053318.I15959@besplex.bde.org> Mime-Version: 1.0 (1.0) In-Reply-To: Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-Id: <82879751-237C-4BEB-8DD7-45884A5BB705@netgate.com> X-Mailer: iPhone Mail (11D257) From: Jim Thompson Subject: Re: UDP sendto() returning ENOBUFS - "No buffer space available" Date: Fri, 18 Jul 2014 23:49:48 -0400 To: Adrian Chadd Cc: hiren panchasara , "freebsd-net@freebsd.org" X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 19 Jul 2014 03:49:58 -0000 > On Jul 18, 2014, at 23:34, Adrian Chadd wrote: > > It upsets the ALTQ people too. I'm an ALTQ person (pfSense, so maybe one if the biggest) and I'm not upset. That cr*p needs to die in a fire. From owner-freebsd-net@FreeBSD.ORG Sat Jul 19 06:16:44 2014 Return-Path: Delivered-To: net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 36C19826; Sat, 19 Jul 2014 06:16:44 +0000 (UTC) Received: from mail106.syd.optusnet.com.au (mail106.syd.optusnet.com.au [211.29.132.42]) by mx1.freebsd.org (Postfix) with ESMTP id B06CF2BE1; Sat, 19 Jul 2014 06:16:43 +0000 (UTC) Received: from c122-106-147-133.carlnfd1.nsw.optusnet.com.au (c122-106-147-133.carlnfd1.nsw.optusnet.com.au [122.106.147.133]) by mail106.syd.optusnet.com.au (Postfix) with ESMTPS id A8BE13CF4BA; Sat, 19 Jul 2014 16:16:35 +1000 (EST) Date: Sat, 19 Jul 2014 16:16:28 +1000 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Adrian Chadd Subject: Re: UDP sendto() returning ENOBUFS - "No buffer space available" In-Reply-To: Message-ID: <20140719152125.A874@besplex.bde.org> References: <20140719053318.I15959@besplex.bde.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.1 cv=eojmkOZX c=1 sm=1 tr=0 a=7NqvjVvQucbO2RlWB8PEog==:117 a=PO7r1zJSAAAA:8 a=qwo5vUNQll0A:10 a=kj9zAlcOel0A:10 a=JzwRw_2MAAAA:8 a=6I5d2MoRAAAA:8 a=-bYW5cNtWXaxnCy8UjIA:9 a=lee2i37mc5CgPEWu:21 a=XQH7iBNxuV895bsf:21 a=CjuIK1q_8ugA:10 a=SV7veod9ZcQA:10 Cc: hiren panchasara , "freebsd-net@freebsd.org" X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 19 Jul 2014 06:16:44 -0000 On Fri, 18 Jul 2014, Adrian Chadd wrote: > On 18 July 2014 13:40, Bruce Evans wrote: >> On Fri, 18 Jul 2014, hiren panchasara wrote: >> >>> On Wed, Jul 16, 2014 at 11:00 AM, Adrian Chadd wrote: >>>> >>>> Hi! >>>> >>>> So the UDP transmit path is udp_usrreqs->pru_send() == udp_send() -> >>>> udp_output() -> ip_output() >>>> >>>> udp_output() does do a M_PREPEND() which can return ENOBUFS. ip_output >>>> can also return ENOBUFS. >>>> >>>> it doesn't look like the socket code (eg sosend_dgram()) is doing any >>>> buffering - it's just copying the frame and stuffing it up to the >>>> driver. No queuing involved before the NIC. >>> >>> Right. Thanks for confirming. >> >> Most buffering should be in ifq above the NIC. For UDP, I think >> udp_output() puts buffers on the ifq and calls the driver for every >> one, but the driver shouldn't do anything for most calls. The >> driver can't possibly do anything if its ring buffer is full, and >> shouldn't do anything if it is nearly full. Buffers accumulate in >> the ifq until the driver gets around to them or the queue fills up. >> Most ENOBUFS errors are for when it fills up. It can very easily >> fill up, especially since it is too small in most configurations. >> Just loop calling sendto(). This will fill the ifq almost >> instantly unless the hardware is faster than the software. > > For if_transmit() drivers, there's no ifp queue. The queuing is being > done in the driver. > > For drivers with if_transmit(), they may end up doing direct DMA ring > dispatch or they may have a buf_ring in front of it.There's no ifq > anymore. It upsets the ALTQ people too. Ah, a new source of bugs. Most drivers don't use this yet. Most still use ifq with the bogus size of (tx_ring_size - 1): Ones converted to the indirect API: % dev/bge/if_bge.c: if_setsendqlen(ifp, BGE_TX_RING_CNT - 1); % dev/bxe/bxe.c: if_setsendqlen(ifp, sc->tx_ring_size); bxe is one of the few without the silly subtraction of 1. % dev/e1000/if_em.c: if_setsendqlen(ifp, adapter->num_tx_desc - 1); % dev/e1000/if_lem.c: if_setsendqlen(ifp, adapter->num_tx_desc - 1); % dev/fxp/if_fxp.c: if_setsendqlen(ifp, FXP_NTXCB - 1); % dev/nfe/if_nfe.c: if_setsendqlen(ifp, NFE_TX_RING_COUNT - 1); Ones not converted: % dev/ae/if_ae.c: ifp->if_snd.ifq_drv_maxlen = ifqmaxlen; % dev/ae/if_ae.c: IFQ_SET_MAXLEN(&ifp->if_snd, ifp->if_snd.ifq_drv_maxlen); The double setting is related to ALTQ. I grepped for maxlen to find both. I might have missed alternative spellings. ifqmaxlen is usually 50, so all drivers using it have very little buffering. Even if their tx ring is tiny, this 50 is too small above 1 or 10 Mbps. % dev/age/if_age.c: ifp->if_snd.ifq_drv_maxlen = AGE_TX_RING_CNT - 1; % dev/age/if_age.c: IFQ_SET_MAXLEN(&ifp->if_snd, ifp->if_snd.ifq_drv_maxlen); % dev/alc/if_alc.c: ifp->if_snd.ifq_drv_maxlen = ALC_TX_RING_CNT - 1; % dev/alc/if_alc.c: IFQ_SET_MAXLEN(&ifp->if_snd, ifp->if_snd.ifq_drv_maxlen); % dev/ale/if_ale.c: ifp->if_snd.ifq_drv_maxlen = ALE_TX_RING_CNT - 1; % dev/ale/if_ale.c: IFQ_SET_MAXLEN(&ifp->if_snd, ifp->if_snd.ifq_drv_maxlen); % dev/an/if_an.c: IFQ_SET_MAXLEN(&ifp->if_snd, ifqmaxlen); % dev/an/if_an.c: ifp->if_snd.ifq_drv_maxlen = ifqmaxlen; % dev/asmc/asmc.c: uint8_t maxlen; % dev/asmc/asmc.c: maxlen = type[0]; Grepping for maxlen unfortunately found related things. I deleted most after this. % dev/ath/if_ath.c: IFQ_SET_MAXLEN(&ifp->if_snd, ifqmaxlen); % dev/ath/if_ath.c: ifp->if_snd.ifq_drv_maxlen = ifqmaxlen; % dev/bce/if_bce.c: ifp->if_snd.ifq_drv_maxlen = USABLE_TX_BD_ALLOC; % dev/bce/if_bce.c: IFQ_SET_MAXLEN(&ifp->if_snd, ifp->if_snd.ifq_drv_maxlen); % dev/bfe/if_bfe.c: ifp->if_snd.ifq_drv_maxlen = BFE_TX_QLEN; % dev/bm/if_bm.c: ifp->if_snd.ifq_drv_maxlen = BM_MAX_TX_PACKETS; % dev/bwi/if_bwi.c: IFQ_SET_MAXLEN(&ifp->if_snd, ifqmaxlen); % dev/bwi/if_bwi.c: ifp->if_snd.ifq_drv_maxlen = ifqmaxlen; % dev/bwn/if_bwn.c: IFQ_SET_MAXLEN(&ifp->if_snd, ifqmaxlen); % dev/bwn/if_bwn.c: ifp->if_snd.ifq_drv_maxlen = ifqmaxlen; % dev/cadence/if_cgem.c: ifp->if_snd.ifq_drv_maxlen = IFQ_MAXLEN; % dev/cas/if_cas.c: ifp->if_snd.ifq_drv_maxlen = CAS_TXQUEUELEN; % dev/ce/if_ce.c: d->queue.ifq_maxlen = ifqmaxlen; % dev/ce/if_ce.c: d->hi_queue.ifq_maxlen = ifqmaxlen; % dev/ce/if_ce.c: d->rqueue.ifq_maxlen = ifqmaxlen; % dev/ce/if_ce.c: d->rqueue.ifq_maxlen = ifqmaxlen; Seems silly to have many tiny queues, especially when their length is nominal and can be changed by tunables if not sysctls so that it is not actually tiny. But good for latency. % dev/cm/smc90cx6.c: ifp->if_snd.ifq_maxlen = ifqmaxlen; % dev/cp/if_cp.c: d->queue.ifq_maxlen = ifqmaxlen; % dev/cp/if_cp.c: d->hi_queue.ifq_maxlen = ifqmaxlen; % dev/cp/if_cp.c: d->queue.ifq_maxlen = NRBUF; % dev/cs/if_cs.c: IFQ_SET_MAXLEN(&ifp->if_snd, ifqmaxlen); % dev/ctau/if_ct.c: d->queue.ifq_maxlen = ifqmaxlen; % dev/ctau/if_ct.c: d->hi_queue.ifq_maxlen = ifqmaxlen; % dev/ctau/if_ct.c: d->queue.ifq_maxlen = NBUF; % dev/cx/if_cx.c: d->lo_queue.ifq_maxlen = ifqmaxlen; % dev/cx/if_cx.c: d->hi_queue.ifq_maxlen = ifqmaxlen; % dev/cx/if_cx.c: d->queue.ifq_maxlen = 2; Not that's a tiny queue which can't be broken by changing the sysctl. % dev/dc/if_dc.c: ifp->if_snd.ifq_drv_maxlen = DC_TX_LIST_CNT - 1; % dev/de/if_de.c: IFQ_SET_MAXLEN(&ifp->if_snd, ifqmaxlen); % dev/de/if_de.c: ifp->if_snd.ifq_drv_maxlen = ifqmaxlen; % dev/e1000/if_igb.c: ifp->if_snd.ifq_drv_maxlen = adapter->num_tx_desc - 1; % dev/ed/if_ed.c: IFQ_SET_MAXLEN(&ifp->if_snd, ifqmaxlen); % dev/ed/if_ed.c: ifp->if_snd.ifq_drv_maxlen = ifqmaxlen; % dev/ep/if_ep.c: IFQ_SET_MAXLEN(&ifp->if_snd, ifqmaxlen); % dev/ep/if_ep.c: ifp->if_snd.ifq_drv_maxlen = ifqmaxlen; % dev/et/if_et.c: ifp->if_snd.ifq_drv_maxlen = ET_TX_NDESC - 1; % dev/ex/if_ex.c: IFQ_SET_MAXLEN(&ifp->if_snd, ifqmaxlen); % dev/fatm/if_fatm.c: ifp->if_snd.ifq_maxlen = 512; % dev/fe/if_fe.c: IFQ_SET_MAXLEN(&ifp->if_snd, ifqmaxlen); % dev/ffec/if_ffec.c: ifp->if_snd.ifq_drv_maxlen = TX_DESC_COUNT - 1; % dev/firewire/if_fwe.c: ifp->if_snd.ifq_maxlen = TX_MAX_QUEUE; % dev/firewire/if_fwip.c: ifp->if_snd.ifq_maxlen = TX_MAX_QUEUE; % dev/gem/if_gem.c: ifp->if_snd.ifq_drv_maxlen = GEM_TXQUEUELEN; % dev/hme/if_hme.c: ifp->if_snd.ifq_drv_maxlen = HME_NTXQ; % dev/i40e/if_i40e.c: ifp->if_snd.ifq_maxlen = que->num_desc - 2; % dev/ie/if_ie.c: IFQ_SET_MAXLEN(&ifp->if_snd, ifqmaxlen); % dev/if_ndis/if_ndis.c: ifp->if_snd.ifq_drv_maxlen = 25; % dev/iicbus/if_ic.c: ifp->if_snd.ifq_maxlen = ifqmaxlen; % dev/ipw/if_ipw.c: IFQ_SET_MAXLEN(&ifp->if_snd, ifqmaxlen); % dev/ipw/if_ipw.c: ifp->if_snd.ifq_drv_maxlen = ifqmaxlen; % dev/iwi/if_iwi.c: IFQ_SET_MAXLEN(&ifp->if_snd, ifqmaxlen); % dev/iwi/if_iwi.c: ifp->if_snd.ifq_drv_maxlen = ifqmaxlen; % dev/iwn/if_iwn.c: IFQ_SET_MAXLEN(&ifp->if_snd, ifqmaxlen); % dev/iwn/if_iwn.c: ifp->if_snd.ifq_drv_maxlen = ifqmaxlen; % dev/ixgb/if_ixgb.c: ifp->if_snd.ifq_maxlen = adapter->num_tx_desc - 1; % dev/ixgbe/ixgbe.c: ifp->if_snd.ifq_drv_maxlen = adapter->num_tx_desc - 2; % dev/ixgbe/ixv.c: ifp->if_snd.ifq_maxlen = adapter->num_tx_desc - 2; % dev/jme/if_jme.c: ifp->if_snd.ifq_drv_maxlen = JME_TX_RING_CNT - 1; % dev/jme/if_jme.c: IFQ_SET_MAXLEN(&ifp->if_snd, ifp->if_snd.ifq_drv_maxlen); % dev/le/lance.c: IFQ_SET_MAXLEN(&ifp->if_snd, ifqmaxlen); % dev/le/lance.c: ifp->if_snd.ifq_drv_maxlen = ifqmaxlen; % dev/lge/if_lge.c: ifp->if_snd.ifq_maxlen = LGE_TX_LIST_CNT - 1; % dev/malo/if_malo.c: IFQ_SET_MAXLEN(&ifp->if_snd, ifqmaxlen); % dev/malo/if_malo.c: ifp->if_snd.ifq_drv_maxlen = ifqmaxlen; % dev/mge/if_mge.c: ifp->if_snd.ifq_drv_maxlen = MGE_TX_DESC_NUM - 1; % dev/mge/if_mge.c: IFQ_SET_MAXLEN(&ifp->if_snd, ifp->if_snd.ifq_drv_maxlen); % dev/msk/if_msk.c: ifp->if_snd.ifq_drv_maxlen = MSK_TX_RING_CNT - 1; % dev/mwl/if_mwl.c: IFQ_SET_MAXLEN(&ifp->if_snd, ifqmaxlen); % dev/mwl/if_mwl.c: ifp->if_snd.ifq_drv_maxlen = ifqmaxlen; % dev/mxge/if_mxge.c: sc->ifp->if_snd.ifq_drv_maxlen = sc->ifp->if_snd.ifq_maxlen; % dev/my/if_my.c: IFQ_SET_MAXLEN(&ifp->if_snd, ifqmaxlen); % dev/my/if_my.c: ifp->if_snd.ifq_drv_maxlen = ifqmaxlen; % dev/nge/if_nge.c: ifp->if_snd.ifq_drv_maxlen = NGE_TX_RING_CNT - 1; % dev/nge/if_nge.c: IFQ_SET_MAXLEN(&ifp->if_snd, ifp->if_snd.ifq_drv_maxlen); % dev/nxge/if_nxge.c: ifnetp->if_snd.ifq_maxlen = ifqmaxlen; % dev/oce/oce_if.c: sc->ifp->if_snd.ifq_drv_maxlen = OCE_MAX_TX_DESC - 1; % dev/oce/oce_if.c: IFQ_SET_MAXLEN(&sc->ifp->if_snd, sc->ifp->if_snd.ifq_drv_maxlen); % dev/patm/if_patm.c: sc->scd0->q.ifq_maxlen = PATM_DLFT_MAXQ; % dev/patm/if_patm.c: scd->q.ifq_maxlen = PATM_TX_IFQLEN; % dev/pcn/if_pcn.c: ifp->if_snd.ifq_maxlen = PCN_TX_LIST_CNT - 1; % dev/pdq/pdq_ifsubr.c: ifp->if_snd.ifq_maxlen = ifqmaxlen; % dev/ppbus/if_plip.c: ifp->if_snd.ifq_maxlen = ifqmaxlen; ifqmaxlen = 50 is actually right for plip, but only if someone doesn't bump it up using the sysctl. % dev/qlxgb/qla_os.c: IFQ_SET_MAXLEN(&ifp->if_snd, qla_get_ifq_snd_maxlen(ha)); % dev/qlxgb/qla_os.c: ifp->if_snd.ifq_drv_maxlen = qla_get_ifq_snd_maxlen(ha); % dev/qlxgbe/ql_os.c: IFQ_SET_MAXLEN(&ifp->if_snd, qla_get_ifq_snd_maxlen(ha)); % dev/qlxgbe/ql_os.c: ifp->if_snd.ifq_drv_maxlen = qla_get_ifq_snd_maxlen(ha); % dev/qlxge/qls_os.c: IFQ_SET_MAXLEN(&ifp->if_snd, qls_get_ifq_snd_maxlen(ha)); % dev/qlxge/qls_os.c: ifp->if_snd.ifq_drv_maxlen = qls_get_ifq_snd_maxlen(ha); % dev/ral/rt2560.c: IFQ_SET_MAXLEN(&ifp->if_snd, ifqmaxlen); % dev/ral/rt2560.c: ifp->if_snd.ifq_drv_maxlen = ifqmaxlen; % dev/ral/rt2661.c: IFQ_SET_MAXLEN(&ifp->if_snd, ifqmaxlen); % dev/ral/rt2661.c: ifp->if_snd.ifq_drv_maxlen = ifqmaxlen; % dev/ral/rt2860.c: IFQ_SET_MAXLEN(&ifp->if_snd, ifqmaxlen); % dev/ral/rt2860.c: ifp->if_snd.ifq_drv_maxlen = ifqmaxlen; % dev/re/if_re.c: ifp->if_snd.ifq_drv_maxlen = RL_IFQ_MAXLEN; % dev/rt/if_rt.c: ifp->if_snd.ifq_drv_maxlen = RT_TX_QLEN; % dev/sbni/if_sbni.c: IFQ_SET_MAXLEN(&ifp->if_snd, ifqmaxlen); % dev/sf/if_sf.c: ifp->if_snd.ifq_drv_maxlen = SF_TX_DLIST_CNT - 1; % dev/sfxge/sfxge.c: ifp->if_snd.ifq_drv_maxlen = SFXGE_NDESCS - 1; % dev/sge/if_sge.c: ifp->if_snd.ifq_drv_maxlen = SGE_TX_RING_CNT - 1; % dev/sge/if_sge.c: IFQ_SET_MAXLEN(&ifp->if_snd, ifp->if_snd.ifq_drv_maxlen); % dev/sis/if_sis.c: ifp->if_snd.ifq_drv_maxlen = SIS_TX_LIST_CNT - 1; % dev/sk/if_sk.c: ifp->if_snd.ifq_drv_maxlen = SK_TX_RING_CNT - 1; % dev/smc/if_smc.c: IFQ_SET_MAXLEN(&ifp->if_snd, ifqmaxlen); % dev/sn/if_sn.c: IFQ_SET_MAXLEN(&ifp->if_snd, ifqmaxlen); % dev/sn/if_sn.c: ifp->if_snd.ifq_maxlen = ifqmaxlen; % dev/snc/dp83932.c: IFQ_SET_MAXLEN(&ifp->if_snd, ifqmaxlen); % dev/ste/if_ste.c: ifp->if_snd.ifq_drv_maxlen = STE_TX_LIST_CNT - 1; % dev/stge/if_stge.c: ifp->if_snd.ifq_drv_maxlen = STGE_TX_RING_CNT - 1; % dev/stge/if_stge.c: IFQ_SET_MAXLEN(&ifp->if_snd, ifp->if_snd.ifq_drv_maxlen); % dev/ti/if_ti.c: ifp->if_snd.ifq_drv_maxlen = TI_TX_RING_CNT - 1; % dev/ti/if_ti.c: IFQ_SET_MAXLEN(&ifp->if_snd, ifp->if_snd.ifq_drv_maxlen); % dev/tl/if_tl.c: ifp->if_snd.ifq_maxlen = TL_TX_LIST_CNT - 1; % dev/tsec/if_tsec.c: ifp->if_snd.ifq_drv_maxlen = TSEC_TX_NUM_DESC - 1; % dev/txp/if_txp.c: ifp->if_snd.ifq_drv_maxlen = TX_ENTRIES - 1; % dev/txp/if_txp.c: IFQ_SET_MAXLEN(&ifp->if_snd, ifp->if_snd.ifq_drv_maxlen); % dev/usb/usb_dev.c: f->free_q.ifq_maxlen = nbuf; % dev/usb/usb_dev.c: f->used_q.ifq_maxlen = nbuf; % dev/vge/if_vge.c: ifp->if_snd.ifq_drv_maxlen = VGE_TX_DESC_CNT - 1; % dev/vr/if_vr.c: ifp->if_snd.ifq_maxlen = VR_TX_RING_CNT - 1; % dev/vte/if_vte.c: ifp->if_snd.ifq_drv_maxlen = VTE_TX_RING_CNT - 1; % dev/vte/if_vte.c: IFQ_SET_MAXLEN(&ifp->if_snd, ifp->if_snd.ifq_drv_maxlen); % dev/vx/if_vx.c: ifp->if_snd.ifq_maxlen = ifqmaxlen; % dev/vxge/vxge.c: ifp->if_snd.ifq_drv_maxlen = max(vdev->config.ifq_maxlen, ifqmaxlen); % dev/vxge/vxge.c: IFQ_SET_MAXLEN(&ifp->if_snd, ifp->if_snd.ifq_drv_maxlen); % dev/wb/if_wb.c: ifp->if_snd.ifq_maxlen = WB_TX_LIST_CNT - 1; % dev/wi/if_wi.c: IFQ_SET_MAXLEN(&ifp->if_snd, ifqmaxlen); % dev/wi/if_wi.c: ifp->if_snd.ifq_drv_maxlen = ifqmaxlen; % dev/wl/if_wl.c: ifp->if_snd.ifq_maxlen = ifqmaxlen; % dev/wpi/if_wpi.c: IFQ_SET_MAXLEN(&ifp->if_snd, ifqmaxlen); % dev/wpi/if_wpi.c: ifp->if_snd.ifq_drv_maxlen = ifqmaxlen; % dev/wtap/if_wtap.c: IFQ_SET_MAXLEN(&ifp->if_snd, ifqmaxlen); % dev/wtap/if_wtap.c: ifp->if_snd.ifq_drv_maxlen = ifqmaxlen; % dev/xe/if_xe.c: IFQ_SET_MAXLEN(&scp->ifp->if_snd, ifqmaxlen); % dev/xl/if_xl.c: ifp->if_snd.ifq_drv_maxlen = XL_TX_LIST_CNT - 1; There are so many drivers that just removing the silly subtraction of 1 or the excessive use of the global ifqmaxlen in them is a daunting task. You would have to check that the silly subtraction isn't actually needed. Changing ifqmaxlen is easier. It is supposed to be variable and not closely related to devices, so its not your fault if changing to another value (not so directly connected to ifqmaxlen) breaks the driver. I have only worked on bge and sk much. This required intricate device-dependent changes to implement watermark stuff in the tx rings. The hardware doesn't really support watermark stuff but it is possible to emulate it. % net/if.c:SYSCTL_INT(_net_link, OID_AUTO, ifqmaxlen, CTLFLAG_RDTUN, % net/if.c: &ifqmaxlen, 0, "max send queue size"); % net/if.c:int ifqmaxlen = IFQ_MAXLEN; % net/if_atmsubr.c: ifp->if_snd.ifq_maxlen = 50; /* dummy */ No reason to spell 50 as 50 instead of as ifqmaxlen? A random default works especially well when it is not used. % net/if_disc.c: ifp->if_snd.ifq_maxlen = 20; Tiny queues may be even worse for synthetic devices than for real ones. The real ones tend to have large enough tx rings except under load, and the load is limited by the link speed. But for synthetic ones there might not be any more buffering, and the only speed limits are in software. When the queue fills up, the application has the same problem of restarting as soon as possible without busy-waiting as for hardware devices. % net/if_edsc.c: ifp->if_snd.ifq_maxlen = ifqmaxlen; % net/if_enc.c: ifp->if_snd.ifq_maxlen = ifqmaxlen; % net/if_epair.c: ifp->if_snd.ifq_maxlen = ifqmaxlen; % net/if_epair.c: ifp->if_snd.ifq_maxlen = ifqmaxlen; % net/if_epair.c: epair_nh.nh_qlimit = 42 * ifqmaxlen; /* 42 shall be the number. */ It is a better too-small number than 50. % net/if_faith.c: ifp->if_snd.ifq_maxlen = ifqmaxlen; % net/if_gif.c: GIF2IFP(sc)->if_snd.ifq_maxlen = ifqmaxlen; % net/if_gre.c: GRE2IFP(sc)->if_snd.ifq_maxlen = ifqmaxlen; % net/if_loop.c: ifp->if_snd.ifq_maxlen = ifqmaxlen; % net/if_mib.c: ifmd.ifmd_snd_maxlen = ifp->if_snd.ifq_maxlen; % net/if_mib.c: ifp->if_snd.ifq_maxlen = ifmd.ifmd_snd_maxlen; % net/if_spppsubr.c: ifp->if_snd.ifq_maxlen = 32; % net/if_spppsubr.c: sp->pp_fastq.ifq_maxlen = 32; % net/if_spppsubr.c: sp->pp_cpq.ifq_maxlen = 20; % net/if_stf.c: ifp->if_snd.ifq_maxlen = ifqmaxlen; % net/if_tap.c: IFQ_SET_MAXLEN(&ifp->if_snd, ifqmaxlen); % net/if_tun.c: IFQ_SET_MAXLEN(&ifp->if_snd, ifqmaxlen); % net/if_tun.c: ifp->if_snd.ifq_drv_maxlen = 0; % netgraph/ng_device.c: IFQ_SET_MAXLEN(&priv->readq, ifqmaxlen); % netgraph/ng_eiface.c: ifp->if_snd.ifq_maxlen = ifqmaxlen; % netgraph/ng_iface.c: IFQ_SET_MAXLEN(&ifp->if_snd, ifqmaxlen); % netgraph/ng_iface.c: ifp->if_snd.ifq_drv_maxlen = ifqmaxlen; % netgraph/ng_source.c: sc->snd_queue.ifq_maxlen = 2048; /* XXX not checked */ % netgraph/ng_tty.c: IFQ_SET_MAXLEN(&sc->outq, ifqmaxlen); End of synthetic devices. % pci/if_rl.c: IFQ_SET_MAXLEN(&ifp->if_snd, ifqmaxlen); % pci/if_rl.c: ifp->if_snd.ifq_drv_maxlen = ifqmaxlen; if_transmit is relatively rarely used: % dev/ath/if_ath.c: ifp->if_transmit = ath_transmit; % dev/cxgb/cxgb_main.c: ifp->if_transmit = cxgb_transmit; % dev/cxgbe/t4_main.c: ifp->if_transmit = cxgbe_transmit; % dev/cxgbe/t4_netmap.c: ifp->if_transmit = cxgbe_nm_transmit; % dev/cxgbe/t4_tracer.c: ifp->if_transmit = tracer_transmit; % dev/e1000/if_igb.c: ifp->if_transmit = igb_mq_start; % dev/i40e/if_i40e.c: ifp->if_transmit = i40e_mq_start; % dev/ixgbe/ixgbe.c: ifp->if_transmit = ixgbe_mq_start; % dev/ixgbe/ixv.c: ifp->if_transmit = ixv_mq_start; % dev/mxge/if_mxge.c: ifp->if_transmit = mxge_transmit; % dev/netmap/netmap_freebsd.c: na->if_transmit = ifp->if_transmit; % dev/netmap/netmap_freebsd.c: ifp->if_transmit = netmap_transmit; % dev/netmap/netmap_freebsd.c: ifp->if_transmit = na->if_transmit; % dev/oce/oce_if.c: sc->ifp->if_transmit = oce_multiq_start; % dev/sfxge/sfxge.c: ifp->if_transmit = sfxge_if_transmit; % dev/sfxge/sfxge_tx.c:sfxge_if_transmit(struct ifnet *ifp, struct mbuf *m) % dev/vxge/vxge.c: ifp->if_transmit = vxge_mq_send; % dev/wtap/if_wtap.c:wtap_if_transmit(struct ifnet *ifp, struct mbuf *m) % dev/wtap/if_wtap.c: sc->if_transmit = ifp->if_transmit; % dev/wtap/if_wtap.c: ifp->if_transmit = wtap_if_transmit; Bruce From owner-freebsd-net@FreeBSD.ORG Sat Jul 19 09:26:47 2014 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id AAA18F3C for ; Sat, 19 Jul 2014 09:26:47 +0000 (UTC) Received: from mail.ipfw.ru (mail.ipfw.ru [IPv6:2a01:4f8:120:6141::2]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 6E98229BC for ; Sat, 19 Jul 2014 09:26:47 +0000 (UTC) Received: from [2a02:6b8:0:401:222:4dff:fe50:cd2f] (helo=ptichko.yndx.net) by mail.ipfw.ru with esmtpsa (TLSv1:DHE-RSA-AES128-SHA:128) (Exim 4.82 (FreeBSD)) (envelope-from ) id 1X8MyC-0009CH-NN; Sat, 19 Jul 2014 09:14:04 +0400 Message-ID: <53CA39BD.6050900@FreeBSD.org> Date: Sat, 19 Jul 2014 13:26:21 +0400 From: "Alexander V. Chernikov" User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:24.0) Gecko/20100101 Thunderbird/24.5.0 MIME-Version: 1.0 To: hiren panchasara , Kajetan Staszkiewicz Subject: Re: Why is r250764 not in 9.3? References: <201407151132.53587.vegeta@tuxpowered.net> In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Cc: "freebsd-net@freebsd.org" X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 19 Jul 2014 09:26:47 -0000 On 15.07.2014 21:03, hiren panchasara wrote: > + Alexander > > On Tue, Jul 15, 2014 at 2:32 AM, Kajetan Staszkiewicz > wrote: >> The time has come to upgrade my routers to FreeBSD 9.3. >> >> While going through list of patches I had on 9.1, I've noticed that r248070 got >> into 9.3 but r250764 did not. Why is that? > Probably just missed it. Yes, I've missed it. Unfortunately, I'm unable to merge it until 26July, feel free to do so if you wish. > > cheers, > Hiren > From owner-freebsd-net@FreeBSD.ORG Sat Jul 19 09:33:27 2014 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 8E32D28B for ; Sat, 19 Jul 2014 09:33:27 +0000 (UTC) Received: from mail.ipfw.ru (mail.ipfw.ru [IPv6:2a01:4f8:120:6141::2]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 541AE2A79 for ; Sat, 19 Jul 2014 09:33:27 +0000 (UTC) Received: from [2a02:6b8:0:401:222:4dff:fe50:cd2f] (helo=ptichko.yndx.net) by mail.ipfw.ru with esmtpsa (TLSv1:DHE-RSA-AES128-SHA:128) (Exim 4.82 (FreeBSD)) (envelope-from ) id 1X8N4g-0009HL-27; Sat, 19 Jul 2014 09:20:46 +0400 Message-ID: <53CA3B4E.8080608@FreeBSD.org> Date: Sat, 19 Jul 2014 13:33:02 +0400 From: "Alexander V. Chernikov" User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:24.0) Gecko/20100101 Thunderbird/24.5.0 MIME-Version: 1.0 To: Daniel Corbe , freebsd-net@freebsd.org Subject: Re: netmap, selective processing. References: In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 19 Jul 2014 09:33:27 -0000 On 16.07.2014 21:48, Daniel Corbe wrote: > I hope this it the right place to ask questions about netmap. I'm > toying with the idea of writing a netmap-based OSPF implementation > because bird's OSPF implementation isn't as good as its BGP Hm. What do you need from bird OSPF implementation? IMHO it is much easier to improve and merge bird code instead of writing another OSPF implementation from scratch. There are _some_ non-resolved issues with OSPF lsa withdrawal/announce, but it will be fixed "soon". > implementation, quagga doesn't scale well and openospfd doesn't compile > on 10-RELEASE or CURRENT. > > But I'm only interested in selectively processing packets on the > netmap-enabled interface. Is there a way to do this? Or alternatively Yes, you can do this by adding another to-host inteface. AFAIK current bridge code for netmap is a good example. In fact, we're using netmap as forwarding appliance with bird as control plane mechanism. > if I throw the IF into netmap mode, can I process what I'm interested in > processing and then somehow throw the rest of the traffic back up to the > host's IP stack? > > _______________________________________________ > freebsd-net@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" >