From owner-freebsd-net@FreeBSD.ORG  Wed Jul 16 23:58:20 2014
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: freebsd-net@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 4D9C7D6A
 for <freebsd-net@freebsd.org>; Wed, 16 Jul 2014 23:58:20 +0000 (UTC)
Received: from smtp.rlwinm.de (smtp.rlwinm.de [IPv6:2a01:4f8:201:31ef::e])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 1283322AF
 for <freebsd-net@freebsd.org>; Wed, 16 Jul 2014 23:58:20 +0000 (UTC)
Received: from hexe.rlwinm.de (p50834048.dip0.t-ipconnect.de [80.131.64.72])
 (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits))
 (No client certificate requested)
 by smtp.rlwinm.de (Postfix) with ESMTPSA id C2E00C0F3
 for <freebsd-net@freebsd.org>; Thu, 17 Jul 2014 01:58:15 +0200 (CEST)
Message-ID: <53C71196.4030501@rlwinm.de>
Date: Thu, 17 Jul 2014 01:58:14 +0200
From: Jan Bramkamp <crest@rlwinm.de>
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64;
 rv:24.0) Gecko/20100101 Thunderbird/24.6.0
MIME-Version: 1.0
To: freebsd-net@freebsd.org
Subject: Re: netmap, selective processing.
References: <ygfzjg9tcs5.fsf@corbe.net>
In-Reply-To: <ygfzjg9tcs5.fsf@corbe.net>
X-Enigmail-Version: 1.6
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net/>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 16 Jul 2014 23:58:20 -0000

On 16.07.2014 19:48, Daniel Corbe wrote:>
> I hope this it the right place to ask questions about netmap.  I'm
> toying with the idea of writing a netmap-based OSPF implementation
> because bird's OSPF implementation isn't as good as its BGP
> implementation, quagga doesn't scale well and openospfd doesn't compile
> on 10-RELEASE or CURRENT.

How many prefixes do you have in your OSPF area 0? If you run into
scalability problems with OSPF on current x86 CPUs your network design
probably is the cause of the problem e.g. redistributing announcements
from BGP into OSPF.

OSPF is just one more (rather ugly) IP protocol. Is moving the OSPF
packets between kernel and userspace really a problem worth optimizing
for? Putting netmap between the NIC and the kernel IP stack introduces
overhead to all non OSPF packets unless your netmap application also
implements IP routing and bypasses the kernel.

From owner-freebsd-net@FreeBSD.ORG  Thu Jul 17 00:19:26 2014
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: freebsd-net@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id D54A114C
 for <freebsd-net@freebsd.org>; Thu, 17 Jul 2014 00:19:26 +0000 (UTC)
Received: from mail-pa0-x233.google.com (mail-pa0-x233.google.com
 [IPv6:2607:f8b0:400e:c03::233])
 (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id A50152463
 for <freebsd-net@freebsd.org>; Thu, 17 Jul 2014 00:19:26 +0000 (UTC)
Received: by mail-pa0-f51.google.com with SMTP id ey11so2228542pad.38
 for <freebsd-net@freebsd.org>; Wed, 16 Jul 2014 17:19:26 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=sender:message-id:date:from:user-agent:mime-version:to:cc:subject
 :references:in-reply-to:content-type:content-transfer-encoding;
 bh=jQQlCAqmwh91Y5tmLJVQzgJrsm4wAFQHzbM2GX7p91M=;
 b=PAQxcnKv8Okv4WqT/nVOAmEDBbdYSnqa4gsBxaWdtHVRmvTk3gEzh6DzPKfajcUhXv
 Nja0TawI8iI1Dufm+khpDwvQ1PaBp1OqibiXQfR/NO96Get+7ZPPmVYIFA6cynb6P0zA
 uehmwd/215Se/0veJ74cGeR/wHwrGLzExTcZA1Q/F8TGN2fftrJExo00wdL7N6Kxao4I
 ib5SVv4p8XrRqibroL9n0K19bhgkN7y76wG7xlLqfc+cu3qF33iUKaIFnNJGTA0uWs6+
 c5bQzTa0hPhCe0fhBKMwRJ6DaeBf9yGT9DoLfXP7OjGSMsBzyLuDY9xQ+HE0FclgADJO
 StlA==
X-Received: by 10.69.17.230 with SMTP id gh6mr33475962pbd.0.1405556366173;
 Wed, 16 Jul 2014 17:19:26 -0700 (PDT)
Received: from [10.192.166.0] (stargate.chelsio.com. [67.207.112.58])
 by mx.google.com with ESMTPSA id nd10sm561921pbc.51.2014.07.16.17.19.25
 for <multiple recipients>
 (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128);
 Wed, 16 Jul 2014 17:19:25 -0700 (PDT)
Sender: Navdeep Parhar <nparhar@gmail.com>
Message-ID: <53C7168C.3050702@FreeBSD.org>
Date: Wed, 16 Jul 2014 17:19:24 -0700
From: Navdeep Parhar <np@FreeBSD.org>
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64;
 rv:24.0) Gecko/20100101 Thunderbird/24.6.0
MIME-Version: 1.0
To: Luigi Rizzo <rizzo@iet.unipi.it>
Subject: Re: change netmap global lock to sx?
References: <5385249D.9050501@FreeBSD.org>
 <CA+hQ2+gBypQa3KiSdZZS9ytW4dqE5jMNkuTK2FYwRtDGQAxtPA@mail.gmail.com>
In-Reply-To: <CA+hQ2+gBypQa3KiSdZZS9ytW4dqE5jMNkuTK2FYwRtDGQAxtPA@mail.gmail.com>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
Cc: "freebsd-net@freebsd.org" <freebsd-net@freebsd.org>
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net/>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 17 Jul 2014 00:19:27 -0000


On 05/27/14 17:32, Luigi Rizzo wrote:
>=20
>=20
>=20
> On Wed, May 28, 2014 at 1:49 AM, Navdeep Parhar wrote:
>=20
>     I'd like to change the netmap global lock from a mutex into a sleep=
able
>     shared/exclusive lock.  This will allow a driver's nm_register hook=

>     (which is called with the global lock held) to sleep if it has to. =
 I've
>     casually used pkt-gen after this conversion (patch attached) and th=
e
>     witness hasn't complained about it.
>=20
>=20
> =E2=80=8Bno objections, let me give this a try on stable/10
> stable/9 to make sure we can use the same code there as well


Any updates?  I'm considering what to have in cxgbe(4) in time for 10.1
and this needs to be sorted out before cxgbe's netmap support gets MFC'd
to any stable branch.

Regards,
Navdeep


>=20
> cheers
> luigi
> =E2=80=8B
>=20
>     Thoughts?
>=20
>     Regards,
>     Navdeep
>=20
>=20
>     diff -r 0300d80260f4 sys/dev/netmap/netmap_kern.h
>     --- a/sys/dev/netmap/netmap_kern.h      Fri May 23 19:00:56 2014 -0=
700
>     +++ b/sys/dev/netmap/netmap_kern.h      Sat May 24 12:49:15 2014 -0=
700
>     @@ -43,13 +43,13 @@
>      #define unlikely(x)    __builtin_expect((long)!!(x), 0L)
>=20
>      #define        NM_LOCK_T       struct mtx
>     -#define        NMG_LOCK_T      struct mtx
>     -#define NMG_LOCK_INIT()        mtx_init(&netmap_global_lock, \
>     -                               "netmap global lock", NULL, MTX_DEF=
)
>     -#define NMG_LOCK_DESTROY()     mtx_destroy(&netmap_global_lock)
>     -#define NMG_LOCK()     mtx_lock(&netmap_global_lock)
>     -#define NMG_UNLOCK()   mtx_unlock(&netmap_global_lock)
>     -#define NMG_LOCK_ASSERT()      mtx_assert(&netmap_global_lock,
>     MA_OWNED)
>     +#define        NMG_LOCK_T      struct sx
>     +#define NMG_LOCK_INIT()        sx_init(&netmap_global_lock, \
>     +                               "netmap global lock")
>     +#define NMG_LOCK_DESTROY()     sx_destroy(&netmap_global_lock)
>     +#define NMG_LOCK()     sx_xlock(&netmap_global_lock)
>     +#define NMG_UNLOCK()   sx_xunlock(&netmap_global_lock)
>     +#define NMG_LOCK_ASSERT()      sx_assert(&netmap_global_lock,
>     SA_XLOCKED)
>=20
>      #define        NM_SELINFO_T    struct selinfo
>      #define        MBUF_LEN(m)     ((m)->m_pkthdr.len)
>=20
>=20


From owner-freebsd-net@FreeBSD.ORG  Thu Jul 17 02:04:18 2014
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: freebsd-net@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 47628A4D
 for <freebsd-net@FreeBSD.org>; Thu, 17 Jul 2014 02:04:18 +0000 (UTC)
Received: from kenobi.freebsd.org (kenobi.freebsd.org
 [IPv6:2001:1900:2254:206a::16:76])
 (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 2DE2B2C2D
 for <freebsd-net@FreeBSD.org>; Thu, 17 Jul 2014 02:04:18 +0000 (UTC)
Received: from bugs.freebsd.org ([127.0.1.118])
 by kenobi.freebsd.org (8.14.8/8.14.8) with ESMTP id s6H24Ige083429
 for <freebsd-net@FreeBSD.org>; Thu, 17 Jul 2014 02:04:18 GMT
 (envelope-from bugzilla-noreply@freebsd.org)
From: bugzilla-noreply@freebsd.org
To: freebsd-net@FreeBSD.org
Subject: [Bug 187835] ngctl(8) strange behavior when adding more than 530
 vlan through nethraph
Date: Thu, 17 Jul 2014 02:04:18 +0000
X-Bugzilla-Reason: AssignedTo
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: Base System
X-Bugzilla-Component: bin
X-Bugzilla-Version: 10.0-STABLE
X-Bugzilla-Keywords: 
X-Bugzilla-Severity: Affects Only Me
X-Bugzilla-Who: admin@support.od.ua
X-Bugzilla-Status: In Discussion
X-Bugzilla-Priority: Normal
X-Bugzilla-Assigned-To: freebsd-net@FreeBSD.org
X-Bugzilla-Target-Milestone: ---
X-Bugzilla-Flags: 
X-Bugzilla-Changed-Fields: version
Message-ID: <bug-187835-2472-RpCNQVwSJd@https.bugs.freebsd.org/bugzilla/>
In-Reply-To: <bug-187835-2472@https.bugs.freebsd.org/bugzilla/>
References: <bug-187835-2472@https.bugs.freebsd.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: 7bit
X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net/>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 17 Jul 2014 02:04:18 -0000

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=187835

Vladislav V. Prodan <admin@support.od.ua> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Version|unspecified                 |10.0-STABLE

-- 
You are receiving this mail because:
You are the assignee for the bug.

From owner-freebsd-net@FreeBSD.ORG  Thu Jul 17 02:41:42 2014
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: freebsd-net@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 3D774F5
 for <freebsd-net@freebsd.org>; Thu, 17 Jul 2014 02:41:42 +0000 (UTC)
Received: from gpo3.cc.swin.edu.au (gpo3.cc.swin.edu.au [136.186.1.32])
 (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id C831E2EEF
 for <freebsd-net@freebsd.org>; Thu, 17 Jul 2014 02:41:41 +0000 (UTC)
Received: from [136.186.229.154] (nwilliams-laptop.caia.swin.edu.au
 [136.186.229.154])
 by gpo3.cc.swin.edu.au (8.14.3/8.14.3) with ESMTP id s6H2fdx3018271
 for <freebsd-net@freebsd.org>; Thu, 17 Jul 2014 12:41:39 +1000
Message-ID: <53C737DB.4030804@swin.edu.au>
Date: Thu, 17 Jul 2014 12:41:31 +1000
From: Nigel Williams <njwilliams@swin.edu.au>
User-Agent: Mozilla/5.0 (X11; Linux x86_64;
 rv:24.0) Gecko/20100101 Thunderbird/24.6.0
MIME-Version: 1.0
To: freebsd-net@freebsd.org
Subject: Re: Multipath TCP for FreeBSD v0.4
References: <513CB9AF.3090409@swin.edu.au> <53BF8945.3000802@swin.edu.au>
 <20140711102535.7613DBE5@hub.freebsd.org> <53C341FC.4060307@swin.edu.au>
 <20140714063019.876218DD@hub.freebsd.org>
In-Reply-To: <20140714063019.876218DD@hub.freebsd.org>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net/>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 17 Jul 2014 02:41:42 -0000

Just a quick note for anyone else that might be trying out the patch...

>
> and I've built the whole system on both nodes without WITNESS and other debug-
> ging functionalities:
> ===============================================================================
> Index: /usr/src/sys/amd64/conf/GENERIC
> ===================================================================
> --- /usr/src/sys/amd64/conf/GENERIC     (revision 265307)
> +++ /usr/src/sys/amd64/conf/GENERIC     (working copy)
> @@ -76,14 +76,14 @@
>   options        KDB                     # Enable kernel debugger support.
>   options        KDB_TRACE               # Print a stack trace for a panic.
>   # For full debugger support use (turn off in stable branch):
> -options        DDB                     # Support DDB.
> -options        GDB                     # Support remote GDB.
> -options        DEADLKRES               # Enable the deadlock resolver
> -options        INVARIANTS              # Enable calls of extra sanity checking
> -options        INVARIANT_SUPPORT       # Extra sanity checks of internal structures, required by INVARIANTS
> -options        WITNESS                 # Enable checks to detect deadlocks and cycles
> -options        WITNESS_SKIPSPIN        # Don't run witness on spinlocks for speed
> -options        MALLOC_DEBUG_MAXZONES=8 # Separate malloc(9) zones
> +#options       DDB                     # Support DDB.
> +#options       GDB                     # Support remote GDB.
> +#options       DEADLKRES               # Enable the deadlock resolver
> +#options       INVARIANTS              # Enable calls of extra sanity checking
> +#options       INVARIANT_SUPPORT       # Extra sanity checks of internal structures, required by INVARIANTS
> +#options       WITNESS                 # Enable checks to detect deadlocks and cycles
> +#options       WITNESS_SKIPSPIN        # Don't run witness on spinlocks for speed
> +#options       MALLOC_DEBUG_MAXZONES=8 # Separate malloc(9) zones
>
>   # Make an SMP-capable kernel by default
>   options        SMP                     # Symmetric MultiProcessor Kernel
> ===============================================================================

I'd recommend leaving debugging options on (at minimum INVARIANTS and 
INVARIANT_SUPPORT). This will slow network performance but will allow a 
number of assertions to run that can make it a little easier to debug 
some issues.

cheers,
nigel


From owner-freebsd-net@FreeBSD.ORG  Thu Jul 17 07:49:44 2014
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: freebsd-net@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 3DF1C249
 for <freebsd-net@freebsd.org>; Thu, 17 Jul 2014 07:49:44 +0000 (UTC)
Received: from atl4mhfb03.myregisteredsite.com
 (atl4mhfb03.myregisteredsite.com [209.17.115.61])
 (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 13EB628E7
 for <freebsd-net@freebsd.org>; Thu, 17 Jul 2014 07:49:43 +0000 (UTC)
Received: from atl4mhob12.myregisteredsite.com
 (atl4mhob12.myregisteredsite.com [209.17.115.50])
 by atl4mhfb03.myregisteredsite.com (8.14.4/8.14.4) with ESMTP id
 s6H7nEun019630
 for <freebsd-net@freebsd.org>; Thu, 17 Jul 2014 03:49:18 -0400
Received: from mailpod.hostingplatform.com ([10.30.71.211])
 by atl4mhob12.myregisteredsite.com (8.14.4/8.14.4) with ESMTP id
 s6H7n6qJ017267
 for <freebsd-net@freebsd.org>; Thu, 17 Jul 2014 03:49:06 -0400
Received: (qmail 15358 invoked by uid 0); 17 Jul 2014 07:49:06 -0000
X-TCPREMOTEIP: 118.186.129.16
X-Authenticated-UID: peterxu@cyphy.net
Received: from unknown (HELO Peters-MacAir.local)
 (peterxu@cyphy.net@118.186.129.16)
 by 0 with ESMTPA; 17 Jul 2014 07:49:05 -0000
Message-ID: <53C77FEE.9000707@cyphy.net>
Date: Thu, 17 Jul 2014 15:49:02 +0800
From: Xu Zhe <peterxu@cyphy.net>
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9;
 rv:24.0) Gecko/20100101 Thunderbird/24.6.0
MIME-Version: 1.0
To: freebsd-net@freebsd.org
Subject: Network unstability issue with ixgbe driver (ix0 local_faults non
 zero)
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit
Cc: Javen Wu <javenwu@cyphy.net>, Jason Zhang <jasonzhang@cyphy.net>
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net/>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 17 Jul 2014 07:49:44 -0000

Hi, Freebsd developers,

We are encountering network problem on Freebsd (version 8.2), with Intel X540T
10g card and ixgbe 2.5.15 (also tried a older version 2.5.8) driver. First, we
found the problem when SSH always fails due to timed out. Then we found that
it is possibly a generic network issue rather than SSH problem.

We found non-zero local_faults and remote_faults in sysctl:

# sysctl -a | grep ix.0
dev.ix.0.%desc: Intel(R) PRO/10GbE PCI-Express Network Driver, Version - 2.5.8
dev.ix.0.%driver: ix
dev.ix.0.%location: slot=0 function=0 handle=\_SB_.PCI0.NPE3.TGBE
dev.ix.0.%pnpinfo: vendor=0x8086 device=0x1528 subvendor=0x152d
subdevice=0x899f class=0x020000
dev.ix.0.%parent: pci3
dev.ix.0.fc: 3
dev.ix.0.enable_aim: 1
dev.ix.0.advertise_speed: 0
dev.ix.0.ts: 0
dev.ix.0.dropped: 0
dev.ix.0.mbuf_defrag_failed: 0
dev.ix.0.watchdog_events: 0
dev.ix.0.link_irq: 4
dev.ix.0.queue0.interrupt_rate: 55555
dev.ix.0.queue0.irqs: 1491075
dev.ix.0.queue0.txd_head: 604
dev.ix.0.queue0.txd_tail: 604
dev.ix.0.queue0.tso_tx: 154
dev.ix.0.queue0.no_tx_dma_setup: 0
dev.ix.0.queue0.no_desc_avail: 0
dev.ix.0.queue0.tx_packets: 948089
dev.ix.0.queue0.rxd_head: 620
dev.ix.0.queue0.rxd_tail: 619
dev.ix.0.queue0.rx_packets: 7799404
dev.ix.0.queue0.rx_bytes: 11075537104
dev.ix.0.queue0.rx_copies: 111468
dev.ix.0.queue0.lro_queued: 7788218
dev.ix.0.queue0.lro_flushed: 968958
dev.ix.0.queue1.interrupt_rate: 100000
dev.ix.0.queue1.irqs: 90817
dev.ix.0.queue1.txd_head: 1800
dev.ix.0.queue1.txd_tail: 1800
dev.ix.0.queue1.tso_tx: 2
dev.ix.0.queue1.no_tx_dma_setup: 0
dev.ix.0.queue1.no_desc_avail: 0
dev.ix.0.queue1.tx_packets: 32468
dev.ix.0.queue1.rxd_head: 1802
dev.ix.0.queue1.rxd_tail: 1801
dev.ix.0.queue1.rx_packets: 40714
dev.ix.0.queue1.rx_bytes: 4527395
dev.ix.0.queue1.rx_copies: 38784
dev.ix.0.queue1.lro_queued: 38668
dev.ix.0.queue1.lro_flushed: 38486
dev.ix.0.queue2.interrupt_rate: 71428
dev.ix.0.queue2.irqs: 28625
dev.ix.0.queue2.txd_head: 349
dev.ix.0.queue2.txd_tail: 349
dev.ix.0.queue2.tso_tx: 1
dev.ix.0.queue2.no_tx_dma_setup: 0
dev.ix.0.queue2.no_desc_avail: 0
dev.ix.0.queue2.tx_packets: 6981
dev.ix.0.queue2.rxd_head: 1952
dev.ix.0.queue2.rxd_tail: 1951
dev.ix.0.queue2.rx_packets: 6048
dev.ix.0.queue2.rx_bytes: 947930
dev.ix.0.queue2.rx_copies: 5241
dev.ix.0.queue2.lro_queued: 4846
dev.ix.0.queue2.lro_flushed: 4760
dev.ix.0.queue3.interrupt_rate: 500000
dev.ix.0.queue3.irqs: 54879
dev.ix.0.queue3.txd_head: 504
dev.ix.0.queue3.txd_tail: 504
dev.ix.0.queue3.tso_tx: 10
dev.ix.0.queue3.no_tx_dma_setup: 0
dev.ix.0.queue3.no_desc_avail: 0
dev.ix.0.queue3.tx_packets: 18406
dev.ix.0.queue3.rxd_head: 449
dev.ix.0.queue3.rxd_tail: 448
dev.ix.0.queue3.rx_packets: 20929
dev.ix.0.queue3.rx_bytes: 2572540
dev.ix.0.queue3.rx_copies: 20297
dev.ix.0.queue3.lro_queued: 19218
dev.ix.0.queue3.lro_flushed: 19102
dev.ix.0.queue4.interrupt_rate: 500000
dev.ix.0.queue4.irqs: 22609
dev.ix.0.queue4.txd_head: 1370
dev.ix.0.queue4.txd_tail: 1370
dev.ix.0.queue4.tso_tx: 1
dev.ix.0.queue4.no_tx_dma_setup: 0
dev.ix.0.queue4.no_desc_avail: 0
dev.ix.0.queue4.tx_packets: 3518
dev.ix.0.queue4.rxd_head: 1622
dev.ix.0.queue4.rxd_tail: 1621
dev.ix.0.queue4.rx_packets: 3670
dev.ix.0.queue4.rx_bytes: 474745
dev.ix.0.queue4.rx_copies: 3014
dev.ix.0.queue4.lro_queued: 2174
dev.ix.0.queue4.lro_flushed: 2171
dev.ix.0.queue5.interrupt_rate: 100000
dev.ix.0.queue5.irqs: 366375
dev.ix.0.queue5.txd_head: 833
dev.ix.0.queue5.txd_tail: 833
dev.ix.0.queue5.tso_tx: 326797
dev.ix.0.queue5.no_tx_dma_setup: 0
dev.ix.0.queue5.no_desc_avail: 0
dev.ix.0.queue5.tx_packets: 531092
dev.ix.0.queue5.rxd_head: 57
dev.ix.0.queue5.rxd_tail: 56
dev.ix.0.queue5.rx_packets: 796729
dev.ix.0.queue5.rx_bytes: 108295068
dev.ix.0.queue5.rx_copies: 582757
dev.ix.0.queue5.lro_queued: 795369
dev.ix.0.queue5.lro_flushed: 258290
dev.ix.0.queue6.interrupt_rate: 100000
dev.ix.0.queue6.irqs: 26775
dev.ix.0.queue6.txd_head: 1146
dev.ix.0.queue6.txd_tail: 1146
dev.ix.0.queue6.tso_tx: 13
dev.ix.0.queue6.no_tx_dma_setup: 0
dev.ix.0.queue6.no_desc_avail: 0
dev.ix.0.queue6.tx_packets: 5469
dev.ix.0.queue6.rxd_head: 1077
dev.ix.0.queue6.rxd_tail: 1076
dev.ix.0.queue6.rx_packets: 9269
dev.ix.0.queue6.rx_bytes: 6631479
dev.ix.0.queue6.rx_copies: 4878
dev.ix.0.queue6.lro_queued: 8054
dev.ix.0.queue6.lro_flushed: 4260
dev.ix.0.queue7.interrupt_rate: 55555
dev.ix.0.queue7.irqs: 243399
dev.ix.0.queue7.txd_head: 66
dev.ix.0.queue7.txd_tail: 66
dev.ix.0.queue7.tso_tx: 5
dev.ix.0.queue7.no_tx_dma_setup: 0
dev.ix.0.queue7.no_desc_avail: 0
dev.ix.0.queue7.tx_packets: 121101
dev.ix.0.queue7.rxd_head: 130
dev.ix.0.queue7.rxd_tail: 129
dev.ix.0.queue7.rx_packets: 127106
dev.ix.0.queue7.rx_bytes: 15197119
dev.ix.0.queue7.rx_copies: 118192
dev.ix.0.queue7.lro_queued: 125622
dev.ix.0.queue7.lro_flushed: 125138
dev.ix.0.mac_stats.crc_errs: 0
dev.ix.0.mac_stats.ill_errs: 0
dev.ix.0.mac_stats.byte_errs: 0
dev.ix.0.mac_stats.short_discards: 0
dev.ix.0.mac_stats.local_faults: 7              <=============== HERE
dev.ix.0.mac_stats.remote_faults: 1
dev.ix.0.mac_stats.rec_len_errs: 0
dev.ix.0.mac_stats.xon_txd: 0
dev.ix.0.mac_stats.xon_recvd: 0
dev.ix.0.mac_stats.xoff_txd: 0
dev.ix.0.mac_stats.xoff_recvd: 0
dev.ix.0.mac_stats.total_octets_rcvd: 11249450018
dev.ix.0.mac_stats.good_octets_rcvd: 11249396646
dev.ix.0.mac_stats.total_pkts_rcvd: 8804445
dev.ix.0.mac_stats.good_pkts_rcvd: 8803850
dev.ix.0.mac_stats.mcast_pkts_rcvd: 9311
dev.ix.0.mac_stats.bcast_pkts_rcvd: 1908
dev.ix.0.mac_stats.rx_frames_64: 18132
dev.ix.0.mac_stats.rx_frames_65_127: 759186
dev.ix.0.mac_stats.rx_frames_128_255: 116641
dev.ix.0.mac_stats.rx_frames_256_511: 686728
dev.ix.0.mac_stats.rx_frames_512_1023: 67041
dev.ix.0.mac_stats.rx_frames_1024_1522: 7156122
dev.ix.0.mac_stats.recv_undersized: 0
dev.ix.0.mac_stats.recv_fragmented: 0
dev.ix.0.mac_stats.recv_oversized: 0
dev.ix.0.mac_stats.recv_jabberd: 0
dev.ix.0.mac_stats.management_pkts_rcvd: 11219
dev.ix.0.mac_stats.management_pkts_drpd: 0
dev.ix.0.mac_stats.checksum_errs: 0
dev.ix.0.mac_stats.good_octets_txd: 20162287794
dev.ix.0.mac_stats.total_pkts_txd: 14419225
dev.ix.0.mac_stats.good_pkts_txd: 14419225
dev.ix.0.mac_stats.bcast_pkts_txd: 621
dev.ix.0.mac_stats.mcast_pkts_txd: 0
dev.ix.0.mac_stats.management_pkts_txd: 0
dev.ix.0.mac_stats.tx_frames_64: 12833
dev.ix.0.mac_stats.tx_frames_65_127: 549847
dev.ix.0.mac_stats.tx_frames_128_255: 80184
dev.ix.0.mac_stats.tx_frames_256_511: 631975
dev.ix.0.mac_stats.tx_frames_512_1023: 116264
dev.ix.0.mac_stats.tx_frames_1024_1522: 13028122

Does any one know what does local_faults/remot_faults mean here? Does this
means there is a hardware error? (We tried to find the adaptor manual, but
there is no detail on IXGBE_MLFC [0x04034] register)

Any suggestion on how to diagnose this problem is welcomed too.

Thanks in advance!
Peter

From owner-freebsd-net@FreeBSD.ORG  Thu Jul 17 11:46:16 2014
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: freebsd-net@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id A4954CB0;
 Thu, 17 Jul 2014 11:46:16 +0000 (UTC)
Received: from vps1.elischer.org (vps1.elischer.org [204.109.63.16])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client CN "vps1.elischer.org",
 Issuer "CA Cert Signing Authority" (not verified))
 by mx1.freebsd.org (Postfix) with ESMTPS id 8117C2D57;
 Thu, 17 Jul 2014 11:46:16 +0000 (UTC)
Received: from Julian-MBP3.local
 (ppp121-45-250-191.lns20.per2.internode.on.net [121.45.250.191])
 (authenticated bits=0)
 by vps1.elischer.org (8.14.9/8.14.9) with ESMTP id s6HBk2Wu072106
 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES128-SHA bits=128 verify=NO);
 Thu, 17 Jul 2014 04:46:05 -0700 (PDT)
 (envelope-from julian@freebsd.org)
Message-ID: <53C7B774.60304@freebsd.org>
Date: Thu, 17 Jul 2014 19:45:56 +0800
From: Julian Elischer <julian@freebsd.org>
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9;
 rv:24.0) Gecko/20100101 Thunderbird/24.6.0
MIME-Version: 1.0
To: John Baldwin <jhb@freebsd.org>, Rick Macklem <rmacklem@uoguelph.ca>
Subject: Re: NFS client READ performance on -current
References: <2136988575.13956627.1405199640153.JavaMail.root@uoguelph.ca>
 <201407151034.54681.jhb@freebsd.org>
In-Reply-To: <201407151034.54681.jhb@freebsd.org>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Cc: pyunyh@gmail.com, "Russell L. Carter" <rcarter@pinyon.org>,
 freebsd-net@freebsd.org
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net/>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 17 Jul 2014 11:46:16 -0000

On 7/15/14, 10:34 PM, John Baldwin wrote:
> On Saturday, July 12, 2014 5:14:00 pm Rick Macklem wrote:
>> Yonghyeon Pyun wrote:
>>> On Fri, Jul 11, 2014 at 09:54:23AM -0400, John Baldwin wrote:
>>>> On Thursday, July 10, 2014 6:31:43 pm Rick Macklem wrote:
>>>>> John Baldwin wrote:
>>>>>> On Thursday, July 03, 2014 8:51:01 pm Rick Macklem wrote:
>>>>>>> Russell L. Carter wrote:
>>>>>>>>
>>>>>>>> On 07/02/14 19:09, Rick Macklem wrote:
>>>>>>>>
>>>>>>>>> Could you please post the dmesg stuff for the network
>>>>>>>>> interface,
>>>>>>>>> so I can tell what driver is being used? I'll take a look
>>>>>>>>> at
>>>>>>>>> it,
>>>>>>>>> in case it needs to be changed to use m_defrag().
>>>>>>>> em0: <Intel(R) PRO/1000 Network Connection 7.4.2> port
>>>>>>>> 0xd020-0xd03f
>>>>>>>> mem
>>>>>>>> 0xfe4a0000-0xfe4bffff,0xfe480000-0xfe49ffff irq 44 at
>>>>>>>> device 0.0
>>>>>>>> on
>>>>>>>> pci2
>>>>>>>> em0: Using an MSI interrupt
>>>>>>>> em0: Ethernet address: 00:15:17:bc:29:ba
>>>>>>>> 001.000007 [2323] netmap_attach             success for em0
>>>>>>>> tx
>>>>>>>> 1/1024
>>>>>>>> rx
>>>>>>>> 1/1024 queues/slots
>>>>>>>>
>>>>>>>> This is one of those dual nic cards, so there is em1 as
>>>>>>>> well...
>>>>>>>>
>>>>>>> Well, I took a quick look at the driver and it does use
>>>>>>> m_defrag(),
>>>>>>> but
>>>>>>> I think that the "retry:" label it does a goto after doing so
>>>>>>> might
>>>>>>> be in
>>>>>>> the wrong place.
>>>>>>>
>>>>>>> The attached untested patch might fix this.
>>>>>>>
>>>>>>> Is it convenient to build a kernel with this patch applied
>>>>>>> and then
>>>>>>> try
>>>>>>> it with TSO enabled?
>>>>>>>
>>>>>>> rick
>>>>>>> ps: It does have the transmit segment limit set to 32. I have
>>>>>>> no
>>>>>>> idea if
>>>>>>>      this is a hardware limitation.
>>>>>> I think the retry is not in the wrong place, but the overhead
>>>>>> of all
>>>>>> those
>>>>>> pullups is apparently quite severe.
>>>>> The m_defrag() call after the first failure will just barely
>>>>> squeeze
>>>>> the just under 64K TSO segment into 32 mbuf clusters. Then I
>>>>> think any
>>>>> m_pullup() done during the retry will allocate an mbuf
>>>>> (at a glance it seems to always do this when the old mbuf is a
>>>>> cluster)
>>>>> and prepend that to the list.
>>>>> --> Now the list is > 32 mbufs again and the
>>>>> bus_dmammap_load_mbuf_sg()
>>>>>      will fail again on the retry, this time fatally, I think?
>>>>>
>>>>> I can't see any reason to re-do all the stuff using m_pullup()
>>>>> and Russell
>>>>> reported that moving the "retry:" fixed his problem, from what I
>>>>> understood.
>>>> Ah, I had assumed (incorrectly) that the m_pullup()s would all be
>>>> nops in this
>>>> case.  It seems the NIC would really like to have all those things
>>>> in a single
>>>> segment, but it is not required, so I agree that your patch is
>>>> fine.
>>>>
>>> I recall em(4) controllers have various limitation in TSO. Driver
>>> has to update IP header to make TSO work so driver has to get a
>>> writable mbufs.  bpf(4) consumers will see IP packet length is 0
>>> after this change.  I think tcpdump has a compile time option to
>>> guess correct IP packet length.  The firmware of controller also
>>> should be able to access complete IP/TCP header in a single buffer.
>>> I don't remember more details in TSO limitation but I guess you may
>>> be able to get more details TSO limitation from publicly available
>>> Intel data sheet.
>> I think that the patch should handle this ok. All of the m_pullup()
>> stuff gets done the first time. Then, if the result is more than 32
>> mbufs in the list, m_defrag() is called to copy the chain. This should
>> result in all the header stuff in the first mbuf cluster and the map
>> call is done again with this list of clusters. (Without the patch,
>> m_pullup() would allocate another prepended mbuf and make the chain
>> more than 32mbufs again.)
> Hmm, I am surprised by the m_pullup() behavior that it doesn't just
> notice that the first mbuf with a cluster has the desired data already
> and returns without doing anything.  That is, I'm surprised the first
> statement in m_pullup() isn't just:
>
> 	if (n->m_len >= len)
> 		return (n);
I seem to remember that the standard behaviour is for the caller to do 
exactly that.

>


From owner-freebsd-net@FreeBSD.ORG  Thu Jul 17 18:39:34 2014
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: freebsd-net@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id CDE5D307
 for <freebsd-net@freebsd.org>; Thu, 17 Jul 2014 18:39:34 +0000 (UTC)
Received: from a0i308.smtpcorp.com (a0i308.smtpcorp.com [216.22.15.140])
 (using TLSv1.2 with cipher DHE-RSA-AES128-SHA (128/128 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 8ED1823CC
 for <freebsd-net@freebsd.org>; Thu, 17 Jul 2014 18:39:34 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=smtpcorp.com;
 s=a0_1; 
 h=Content-Type:MIME-Version:Message-ID:In-Reply-To:Date:References:Subject:Cc:To:From;
 bh=jB3Y6nsqvhpkqVtuMs6Gn9SmJ8PAQNloHz9RmV7tRt8=; 
 b=XPz/2aOa54FtmqRfQrggpv5SQ0Hid4MKaZz4cpUmF023TBQ/qmGvppqEdN38J9vd+5joQb0uPtGaOAHsty+bGaXwFFxBVnuuTIRxKz7EE5yRsZVc0leu4GbhtDNoeJq2gbXBv2EuFn81pP/+aYc1lA68isU0dE17KVdes77EdXA=;
From: Daniel Corbe <corbe@corbe.net>
To: Jan Bramkamp <crest@rlwinm.de>
Subject: Re: netmap, selective processing.
References: <ygfzjg9tcs5.fsf@corbe.net> <53C71196.4030501@rlwinm.de>
Date: Thu, 17 Jul 2014 14:39:13 -0400
In-Reply-To: <53C71196.4030501@rlwinm.de> (Jan Bramkamp's message of "Thu, 17
 Jul 2014 01:58:14 +0200")
Message-ID: <ygf4myfu8xa.fsf@corbe.net>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.3 (berkeley-unix)
MIME-Version: 1.0
Content-Type: text/plain
X-Smtpcorp-Track: 1b7qal4gfuSHlT.yUhfk05hs
Cc: freebsd-net@freebsd.org
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net/>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 17 Jul 2014 18:39:34 -0000


Jan Bramkamp <crest@rlwinm.de> writes:

> On 16.07.2014 19:48, Daniel Corbe wrote:>
>> I hope this it the right place to ask questions about netmap.  I'm
>> toying with the idea of writing a netmap-based OSPF implementation
>> because bird's OSPF implementation isn't as good as its BGP
>> implementation, quagga doesn't scale well and openospfd doesn't compile
>> on 10-RELEASE or CURRENT.
>
> How many prefixes do you have in your OSPF area 0? If you run into
> scalability problems with OSPF on current x86 CPUs your network design
> probably is the cause of the problem e.g. redistributing announcements
> from BGP into OSPF.

I have about 15k interior routes.  And most of it is RFC1918 address
space or random /64s doing various things.  So when I say I'm worried
about scale issues, I should more accurately be saying "I just don't
want to use quagga but I can't get anything else to work."

>
> OSPF is just one more (rather ugly) IP protocol. Is moving the OSPF
> packets between kernel and userspace really a problem worth optimizing
> for? Putting netmap between the NIC and the kernel IP stack introduces
> overhead to all non OSPF packets unless your netmap application also
> implements IP routing and bypasses the kernel.

I've been searching for a reason to play with netmap.  It looks like a
neat toy.  And at worst I will have implemented something only useful to
one person but I'll also have learned something in the process.

>From the perspective of totally wrecking the performance of the host
network stack: how much more overhead am I really introducing by looking
at every packet inside of the netmap framework and going "am I really
interested in this?  Or should I simply pass it through to the host."

And I'm hoping this leads me down the avenue of doing interesting things
with MPLS.  MPLS is something that absolutely needs to look at
everything because labels should always be processed and forwarded
first.

-Daniel

From owner-freebsd-net@FreeBSD.ORG  Thu Jul 17 19:54:12 2014
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: freebsd-net@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 922775AF
 for <freebsd-net@freebsd.org>; Thu, 17 Jul 2014 19:54:12 +0000 (UTC)
Received: from smtp1.bushwire.net (f5.bushwire.net [199.48.133.46])
 by mx1.freebsd.org (Postfix) with SMTP id 417572AC2
 for <freebsd-net@freebsd.org>; Thu, 17 Jul 2014 19:54:10 +0000 (UTC)
Received: (qmail 37369 invoked by uid 1001); 17 Jul 2014 19:47:28 -0000
Delivered-To: qmda-intercept-freebsd-net@freebsd.org
DomainKey-Signature: a=rsa-sha1; q=dns; c=simple; s=s384; d=romeo.emu.st; 
 b=PAyCzCGW1cOCD7uQ54tbh3ub8h+RyrWFS84RHERIo/Um4ajH9M+HqHEz8PI0ovoX;
Comments: DomainKeys? See http://en.wikipedia.org/wiki/DomainKeys
DomainKey-Trace-MD: h=14; b=29; l=C18R71D32M65F38T27S42R39?29?28M17C39C27I40;
Comments: QMDA 0.3
Received: (qmail 37362 invoked by uid 1001); 17 Jul 2014 19:47:28 -0000
Date: 17 Jul 2014 19:47:28 +0000
Message-ID: <20140717194728.37361.qmail@f5-external.bushwire.net>
From: "Mark Delany" <c2h@romeo.emu.st>
To: freebsd-net@freebsd.org
Subject: Re: netmap, selective processing.
References: <ygfzjg9tcs5.fsf@corbe.net> <53C71196.4030501@rlwinm.de>
 <ygf4myfu8xa.fsf@corbe.net>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Disposition: inline
In-Reply-To: <ygf4myfu8xa.fsf@corbe.net>
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net/>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 17 Jul 2014 19:54:12 -0000

On 17Jul14, Daniel Corbe allegedly wrote:

> From the perspective of totally wrecking the performance of the host
> network stack: how much more overhead am I really introducing by looking
> at every packet inside of the netmap framework and going "am I really
> interested in this?  Or should I simply pass it through to the host."

If you haven't look at netmap in detail yet, then the main thing to
remember is that once netmap is active on an interface, *all* packets
on that interface enter (and potentially leave) your netmap handler
via an excursion into user space.

If the majority of packets are untouched and merely pushed back thru
the stack, then for each batch of packets you've introduced an
additional user-space context switch, at least one system call and the
cost of your own packet selection code.

I'm not sure that constitutes "totally wrecking" but something to keep
in mind if you plan to run on a busy system.

Another thing to keep in mind is, if you netmap app has bugs you could
break all the regular applications sitting on top of sockets.

You are probably right that OSPF in netmap may not be directly useful
to anyone else, but I think more people using netmap to implement
interesting applications is of value to netmap, frankly.


Mark.

From owner-freebsd-net@FreeBSD.ORG  Fri Jul 18 07:49:13 2014
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: freebsd-net@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 582B2812;
 Fri, 18 Jul 2014 07:49:13 +0000 (UTC)
Received: from mail-wi0-x22e.google.com (mail-wi0-x22e.google.com
 [IPv6:2a00:1450:400c:c05::22e])
 (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id B34D3251D;
 Fri, 18 Jul 2014 07:49:12 +0000 (UTC)
Received: by mail-wi0-f174.google.com with SMTP id d1so372478wiv.13
 for <multiple recipients>; Fri, 18 Jul 2014 00:49:10 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:reply-to:in-reply-to:references:date:message-id
 :subject:from:to:cc:content-type;
 bh=NCTzDB/O8dFVlltq3iVJ6EZa92sSB1CrpvKhNdHMIzY=;
 b=PTfTuiTc+kJmL+PXFeahnErRVAJZsgtJC46xRoN7tFyGjSJSEfNRE2xx4nqaOCK8ge
 LmykhJS/GVVvmeJ9UQOJBxDvYxAnZX14PEAu4kxQuKLCyRGfZANoOIRWKBcHe6By81Jq
 T87mmO0cxCuaUNaeI3lWsJJ6nL5t6QB6EAccYGWQqFYFwbeigLjA/8+WTrsYZpVDlnGO
 7BxeFIJkTDxF26TBvBsI0wauc45UBe6dilbetDbP9YyXRh7pW/tAaQ8WX6q6+GTt2PoT
 HRf+B5MhtAM8/ydjArC+g4TCuV3f+t+ucYOkM+7qwNDo9Kia8GGywY7IR5ycLMJpjzd9
 Vx7g==
MIME-Version: 1.0
X-Received: by 10.180.91.6 with SMTP id ca6mr4463775wib.77.1405669750597; Fri,
 18 Jul 2014 00:49:10 -0700 (PDT)
Received: by 10.216.190.194 with HTTP; Fri, 18 Jul 2014 00:49:10 -0700 (PDT)
Reply-To: araujo@FreeBSD.org
In-Reply-To: <CAOfEmZj5pk7bFB-PBqaJsi+bA73gbsUZzqggs4yEVky3_61NpQ@mail.gmail.com>
References: <CAOfEmZjmb1bdvn0gR6vD1WeP8o8g7KwXod4TE0iJfa=nicyeng@mail.gmail.com>
 <CAJ-Vmomt2QDXAVBVUk6m8oH4Pa5yErDdG6wWrP3X7+DW137xiA@mail.gmail.com>
 <CAOfEmZja8Tkv_xG8LyR5Nbj+Oga=vvdy=b3pxHqZi0-BBq25Uw@mail.gmail.com>
 <CAJ-VmomY2wP1EyVK4J16sGmMid=sJ9MPZrUY6pgcKGBDXm1T4g@mail.gmail.com>
 <CAOfEmZj5pk7bFB-PBqaJsi+bA73gbsUZzqggs4yEVky3_61NpQ@mail.gmail.com>
Date: Fri, 18 Jul 2014 15:49:10 +0800
Message-ID: <CAOfEmZhtZCettzD6pKQMHRiQE42nQmBuimOq28cA23R+Yyc13w@mail.gmail.com>
Subject: Re: [patch][lagg] - Set a better granularity and distribution on
 roundrobin protocol.
From: Marcelo Araujo <araujobsdport@gmail.com>
To: Adrian Chadd <adrian@freebsd.org>
Content-Type: multipart/mixed; boundary=f46d043c7e0c28456204fe72fed0
X-Content-Filtered-By: Mailman/MimeDel 2.1.18
Cc: FreeBSD Net <freebsd-net@freebsd.org>
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net/>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 18 Jul 2014 07:49:13 -0000

--f46d043c7e0c28456204fe72fed0
Content-Type: text/plain; charset=UTF-8

Hello guys,

I made few changes on the lagg(4) patch. Also, I made tests using igb(4),
ixgbe(4) and em(4); seems everything worked pretty well.

I'm wondering if anyone else could make a review, and what I need to do, to
see this patch committed.

Best Regards,


2014-06-24 10:40 GMT+08:00 Marcelo Araujo <araujobsdport@gmail.com>:

>
>
> 2014-06-24 6:54 GMT+08:00 Adrian Chadd <adrian@freebsd.org>:
>
> Hi,
>>
>> No, don't introduce out of order behaviour. Ever.
>
>
> Yes, it has out of order behavior; with my patch much less. I upload two
> pcap files and you can see by yourself, if you don't believe in what I'm
> talking about.
>
> Test done using: "iperf -s" and "iperf -c <ip> -i 1 -t 10".
>
> 1) Don't change the number of packets(default round robin behavior).
> http://people.freebsd.org/~araujo/lagg/lagg-nop.cap
> 8 out of order packets.
> Several SACKs.
>
> 2) Set the number of packets to 50.
> http://people.freebsd.org/~araujo/lagg/lagg.cap
> 0 out of order packets.
> Less SACKs.
>
>
>> You may not think
>> it's a problem for TCP, but UDP things and VPN things will start
>> getting very angry. There are VPN configurations out there that will
>> drop the VPN if frames are out of order.
>>
>
> I'm not thinking that will be a problem for TCP, but, in somehow it will
> be, less throughput as I showed before, and less SACK. About the VPN,
> please, tell me which softwares, and let me know where I can get a sample
> to make a testbed.
>
> However to be very honest, I don't believe anyone here when change
> something at network protocols will make this extensive testbed. It is
> almost impossible to predict what software it will works or not, and I
> don't believe anyone here has all these stuff in hands.
>
>
>>
>> The ixgbe driver is setting the flowid to the msix queue ID, rather
>> than a 32 bit unique flow id hash value for the flow. That makes it
>> hard to do traffic distribution where the flowid is available.
>>
>
> Thanks for the explanation.
>
>
>>
>> There's an lagg option to re-hash the mbuf rather than rely on the
>> flowid for outbound port choice - have you looked at using that? Did
>> that make any difference?
>>
>
> Yes, I set to 0 the net.link.lagg.0.use _flowid, it make a little
> difference to the default round robin implementation, but yet I can't reach
> more than 5 Gbit/s. With my patch and set the packets to 50, it improved a
> bit too.
>
> So, thank you so much for all review, I don't know if you have time and a
> testbed to make a real test, as I'm doing. I would be happy if you or more
> people could make tests on that patch. Also, I have only ixgbe(4) to make
> tests, would appreciate if this patch could be tested with other NICs too.
>
> Best Regards,
>
> --
> Marcelo Araujo            (__)
> araujo@FreeBSD.org     \\\'',)http://www.FreeBSD.org <http://www.freebsd.org/>   \/  \ ^
> Power To Server.         .\. /_)
>
>


-- 

-- 
Marcelo Araujo            (__)araujo@FreeBSD.org
\\\'',)http://www.FreeBSD.org <http://www.freebsd.org/>   \/  \ ^
Power To Server.         .\. /_)

--f46d043c7e0c28456204fe72fed0
Content-Type: application/octet-stream; name="if_lagg-rr.patch"
Content-Disposition: attachment; filename="if_lagg-rr.patch"
Content-Transfer-Encoding: base64
X-Attachment-Id: f_hxr7fodu0

SW5kZXg6IGlmX2xhZ2cuYwo9PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09
PT09PT09PT09PT09PT09PT09PT09PT09PT09Ci0tLSBpZl9sYWdnLmMJKHJldmlzaW9uIDI2ODgz
MikKKysrIGlmX2xhZ2cuYwkod29ya2luZyBjb3B5KQpAQCAtMTg3LDYgKzE4NywxMCBAQAogU1lT
Q1RMX0lOVChfbmV0X2xpbmtfbGFnZywgT0lEX0FVVE8sIGRlZmF1bHRfZmxvd2lkX3NoaWZ0LCBD
VExGTEFHX1JXVFVOLAogICAgICZkZWZfZmxvd2lkX3NoaWZ0LCAwLAogICAgICJEZWZhdWx0IHNl
dHRpbmcgZm9yIGZsb3dpZCBzaGlmdCBmb3IgbG9hZCBzaGFyaW5nIik7CitzdGF0aWMgaW50IGxh
Z2dfcnJfcGFja2V0cyA9IDA7IC8qIERlZmF1bHQgdmFsdWUgZm9yIHVzaW5nIHJyX3BhY2tldHMg
Ki8KK1NZU0NUTF9JTlQoX25ldF9saW5rX2xhZ2csIE9JRF9BVVRPLCBycl9wYWNrZXRzLCBDVExG
TEFHX1JXLAorICAgICZsYWdnX3JyX3BhY2tldHMsIDAsCisgICAgIkhvdyBtYW55IHBhY2tldHMg
dG8gYmUgc2VuZCBwZXIgaW50ZXJmYWNlIik7CiAKIHN0YXRpYyBpbnQKIGxhZ2dfbW9kZXZlbnQo
bW9kdWxlX3QgbW9kLCBpbnQgdHlwZSwgdm9pZCAqZGF0YSkKQEAgLTE2ODcsMTQgKzE2OTEsNzMg
QEAKIHsKIAlzdHJ1Y3QgbGFnZ19wb3J0ICpscDsKIAl1aW50MzJfdCBwOworCXVpbnQzMl90IHAy
OworCXVpbnQzMl90IHBrdF9zeXNjdGxfY291bnQ7CisJaW50IGlmcF9jb3VudCA9IDE7CiAKIAlw
ID0gYXRvbWljX2ZldGNoYWRkXzMyKCZzYy0+c2Nfc2VxLCAxKTsKIAlwICU9IHNjLT5zY19jb3Vu
dDsKKworCXAyID0gYXRvbWljX2ZldGNoYWRkXzMyKCZzYy0+c2Nfc2VxLCAxKTsKKwlwMiAlPSBz
Yy0+c2NfY291bnQ7CisKIAlscCA9IFNMSVNUX0ZJUlNUKCZzYy0+c2NfcG9ydHMpOwotCXdoaWxl
IChwLS0pCi0JCWxwID0gU0xJU1RfTkVYVChscCwgbHBfZW50cmllcyk7CiAKIAkvKgorCSAqIElm
IHRoZXJlIGlzIG5vIHJlZmVyZW5jZSBmb3IgdGhlIElGUCwgd2UgbXVzdAorIAkgKiBjb3B5IGl0
IG5vdy4KKwkgKi8KKwlpZiAoc3RybGVuKHNjLT5zY19yZWZfaWZwKSA9PSAwKQorCQlzdHJuY3B5
KHNjLT5zY19yZWZfaWZwLCBscC0+bHBfaWZwLT5pZl94bmFtZSwgc2l6ZW9mKHNjLT5zY19yZWZf
aWZwKSk7CisgICAgICAgICAgICAgIAorCS8qCisJICogSWYgaWZwX2NvdW50IHdhcyBub3QgeWV0
IGluaXRpYWxpemVkLCB3ZSBtdXN0CisJICogaW5pdGlhbGl6ZSBub3cuCisJICovCisJaWYgKHNj
LT5zY19pZnBfY291bnQgPT0gMCkKKwkJc2MtPnNjX2lmcF9jb3VudCA9IDE7CisKKwkvKgorCSAq
IElmIHRoZSBzeXNjdGwgcnJfcGFja2V0cyBpcyBzZXQgdG8gMCwgd2UgbXVzdCB1c2UgdGhlCisJ
ICogcm91bmRyb2JpbiBhcyBpdCBpcywgb3Igb3RoZXJ3aXNlLCB3ZSBtdXN0IGFwcGx5IHRoZQor
CSAqIGdyYW51bGFyaXR5IGJldHdlZW4gdGhlIGludGVyZmFjZXMgdGhhdCBhcmUgcGFydCBvZiB0
aGUgZ3JvdXAuCisJICovCisJaWYgKCFsYWdnX3JyX3BhY2tldHMpIHsKKwkJd2hpbGUgKHAtLSkK
KwkJCWxwID0gU0xJU1RfTkVYVChscCwgbHBfZW50cmllcyk7CisJCWdvdG8gc2VuZF9tYnVmOwor
CX0gZWxzZSB7CisJCXBrdF9zeXNjdGxfY291bnQgPSBhdG9taWNfZmV0Y2hhZGRfMzIoJnNjLT5z
Y19wa3RfY291bnQsIDEpOworCQlpZiAocGt0X3N5c2N0bF9jb3VudCA9PSBsYWdnX3JyX3BhY2tl
dHMpIHsKKwkJCWlmIChzYy0+c2NfaWZwX2NvdW50IDw9IHNjLT5zY19jb3VudCkgeworCQkJCXdo
aWxlIChpZnBfY291bnQgPCBzYy0+c2NfaWZwX2NvdW50KSB7CisJCQkJCWxwID0gU0xJU1RfTkVY
VChscCwgbHBfZW50cmllcyk7CisJCQkJCWlmcF9jb3VudCsrOworCQkJCX0KKwkJCQlzYy0+c2Nf
aWZwX2NvdW50Kys7CisJCQkJaWYgKHNjLT5zY19pZnBfY291bnQgPiBzYy0+c2NfY291bnQpCisJ
CQkJCXNjLT5zY19pZnBfY291bnQgPSAwOworCQkJfQorCQkJc3RybmNweShzYy0+c2NfcmVmX2lm
cCwgbHAtPmxwX2lmcC0+aWZfeG5hbWUsIHNpemVvZihzYy0+c2NfcmVmX2lmcCkpOworCQkJc2Mt
PnNjX3BrdF9jb3VudCA9IDA7CisJCX0KKwl9CisKKwkvKgorCSAqIENoZWNrIGlmIHRoZSBjdXJy
ZW50IGludGVyZmFjZSB0byBiZSBlbnF1ZXVlIGlzIG5vdCB0aGUKKwkgKiBzYW1lIHVzZWQgaW4g
dGhlIGxhc3Qgcm91bmQuCisJICovCisJbHAgPSBTTElTVF9GSVJTVCgmc2MtPnNjX3BvcnRzKTsK
Kwl3aGlsZSAocDItLSkgeworCQlpZiAoc3RyY21wKGxwLT5scF9pZnAtPmlmX3huYW1lLCBzYy0+
c2NfcmVmX2lmcCkgPT0gMCkKKwkJCWJyZWFrOworCQllbHNlCisJCQlscCA9IFNMSVNUX05FWFQo
bHAsIGxwX2VudHJpZXMpOworCX0KKwlnb3RvIHNlbmRfbWJ1ZjsKKworc2VuZF9tYnVmOgorCS8q
CiAJICogQ2hlY2sgdGhlIHBvcnQncyBsaW5rIHN0YXRlLiBUaGlzIHdpbGwgcmV0dXJuIHRoZSBu
ZXh0IGFjdGl2ZQogCSAqIHBvcnQgaWYgdGhlIGxpbmsgaXMgZG93biBvciB0aGUgcG9ydCBpcyBO
VUxMLgogCSAqLwpJbmRleDogaWZfbGFnZy5oCj09PT09PT09PT09PT09PT09PT09PT09PT09PT09
PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT0KLS0tIGlmX2xhZ2cuaAkocmV2
aXNpb24gMjY4ODMyKQorKysgaWZfbGFnZy5oCSh3b3JraW5nIGNvcHkpCkBAIC0yMzIsNiArMjMy
LDkgQEAKIAlzdHJ1Y3Qgc3lzY3RsX29pZAkJKnNjX29pZDsJLyogc3lzY3RsIHRyZWUgb2lkICov
CiAJaW50CQkJCXVzZV9mbG93aWQ7CS8qIHVzZSBNX0ZMT1dJRCAqLwogCWludAkJCQlmbG93aWRf
c2hpZnQ7CS8qIHNoaWZ0IHRoZSBmbG93aWQgKi8KKwl1aW50MzJfdAkJCXNjX3BrdF9jb3VudDsg
LyogdXNlIGZvciBjb3VudCBwYWNrYXRlcyBwZXIgaWZwICovCisJaW50CQkJCXNjX2lmcF9jb3Vu
dDsgLyogY291bnRlciByZWZlcmVuY2Ugb2YgaW50ZXJmYWNlcyBvbiByciAqLworCWNoYXIJCQkJ
c2NfcmVmX2lmcFtJRk5BTVNJWl07IC8qIG5hbWUgb2YgdGhlIGlmcCAqLwogfTsKIAogc3RydWN0
IGxhZ2dfcG9ydCB7Cg==
--f46d043c7e0c28456204fe72fed0--

From owner-freebsd-net@FreeBSD.ORG  Fri Jul 18 08:03:58 2014
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: freebsd-net@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id AC4BDDBE;
 Fri, 18 Jul 2014 08:03:58 +0000 (UTC)
Received: from mail-qc0-x22a.google.com (mail-qc0-x22a.google.com
 [IPv6:2607:f8b0:400d:c01::22a])
 (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id 5A7E826BA;
 Fri, 18 Jul 2014 08:03:58 +0000 (UTC)
Received: by mail-qc0-f170.google.com with SMTP id c9so3137223qcz.29
 for <multiple recipients>; Fri, 18 Jul 2014 01:03:57 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:sender:in-reply-to:references:date:message-id:subject
 :from:to:cc:content-type;
 bh=9ZLM3khD5OJzkQ2LG3bNwSSfKcCflqp89lF43c4fpbA=;
 b=NnzsvMkTB1SYl2bk37gIVrnGsQo+vRbGJvE/Cg2mdiFtJLT0/e4vPQz5wVVj3Iv0xB
 KSOVn3GQ9Fyi146+ut5L3wmWtOexUcQMDkHW9ebkD0PV6jtt8dWBvR0AdhIjq0uZ1B4Q
 r6aOBcHtqIKxPjygl8DHT3cu3FPAq2srjRrFaQES/1lb8u5p5YhkZOgkacrkLwNXda3P
 qxeiY7TTHEJIcJZjPF+ovRNo8Yjnmr0LSHqaB5j/TuvdTzDU2GV+mEtkTEQVPgkmL9Vu
 7g1gzSroxJKYRJOiCP/mey1/9VRIcZct9NZ+CaY5a577zEZGR5RnlWQYgOoR1Ejiveya
 cNuQ==
MIME-Version: 1.0
X-Received: by 10.140.38.169 with SMTP id t38mr4954791qgt.3.1405670635347;
 Fri, 18 Jul 2014 01:03:55 -0700 (PDT)
Sender: adrian.chadd@gmail.com
Received: by 10.224.202.193 with HTTP; Fri, 18 Jul 2014 01:03:55 -0700 (PDT)
In-Reply-To: <CAOfEmZhtZCettzD6pKQMHRiQE42nQmBuimOq28cA23R+Yyc13w@mail.gmail.com>
References: <CAOfEmZjmb1bdvn0gR6vD1WeP8o8g7KwXod4TE0iJfa=nicyeng@mail.gmail.com>
 <CAJ-Vmomt2QDXAVBVUk6m8oH4Pa5yErDdG6wWrP3X7+DW137xiA@mail.gmail.com>
 <CAOfEmZja8Tkv_xG8LyR5Nbj+Oga=vvdy=b3pxHqZi0-BBq25Uw@mail.gmail.com>
 <CAJ-VmomY2wP1EyVK4J16sGmMid=sJ9MPZrUY6pgcKGBDXm1T4g@mail.gmail.com>
 <CAOfEmZj5pk7bFB-PBqaJsi+bA73gbsUZzqggs4yEVky3_61NpQ@mail.gmail.com>
 <CAOfEmZhtZCettzD6pKQMHRiQE42nQmBuimOq28cA23R+Yyc13w@mail.gmail.com>
Date: Fri, 18 Jul 2014 01:03:55 -0700
X-Google-Sender-Auth: vvQfWXgrH5GdC8-IuckQYdtUeZU
Message-ID: <CAJ-VmomH6RqK92s1wO8C3w3nZTcV=qsgnOU7GFX2SxDU8uMysA@mail.gmail.com>
Subject: Re: [patch][lagg] - Set a better granularity and distribution on
 roundrobin protocol.
From: Adrian Chadd <adrian@freebsd.org>
To: araujo@freebsd.org
Content-Type: text/plain; charset=UTF-8
Cc: FreeBSD Net <freebsd-net@freebsd.org>
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net/>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 18 Jul 2014 08:03:58 -0000

Hi,

I strongly object to having a round-robin method like this. Yes, we
won't get > 1 link of bandwidth out of a single stream, but you're
showing that you can't even get that. There's still something else
weird going on.

I'm sorry, but introducing more out of order possibilities is being a
bad network citizen.


-a


On 18 July 2014 00:49, Marcelo Araujo <araujobsdport@gmail.com> wrote:
> Hello guys,
>
> I made few changes on the lagg(4) patch. Also, I made tests using igb(4),
> ixgbe(4) and em(4); seems everything worked pretty well.
>
> I'm wondering if anyone else could make a review, and what I need to do, to
> see this patch committed.
>
> Best Regards,
>
>
>
>
> 2014-06-24 10:40 GMT+08:00 Marcelo Araujo <araujobsdport@gmail.com>:
>
>>
>>
>> 2014-06-24 6:54 GMT+08:00 Adrian Chadd <adrian@freebsd.org>:
>>
>>> Hi,
>>>
>>> No, don't introduce out of order behaviour. Ever.
>>
>>
>> Yes, it has out of order behavior; with my patch much less. I upload two
>> pcap files and you can see by yourself, if you don't believe in what I'm
>> talking about.
>>
>> Test done using: "iperf -s" and "iperf -c <ip> -i 1 -t 10".
>>
>> 1) Don't change the number of packets(default round robin behavior).
>> http://people.freebsd.org/~araujo/lagg/lagg-nop.cap
>> 8 out of order packets.
>> Several SACKs.
>>
>> 2) Set the number of packets to 50.
>> http://people.freebsd.org/~araujo/lagg/lagg.cap
>> 0 out of order packets.
>> Less SACKs.
>>
>>>
>>> You may not think
>>> it's a problem for TCP, but UDP things and VPN things will start
>>> getting very angry. There are VPN configurations out there that will
>>> drop the VPN if frames are out of order.
>>
>>
>> I'm not thinking that will be a problem for TCP, but, in somehow it will
>> be, less throughput as I showed before, and less SACK. About the VPN,
>> please, tell me which softwares, and let me know where I can get a sample to
>> make a testbed.
>>
>> However to be very honest, I don't believe anyone here when change
>> something at network protocols will make this extensive testbed. It is
>> almost impossible to predict what software it will works or not, and I don't
>> believe anyone here has all these stuff in hands.
>>
>>>
>>>
>>> The ixgbe driver is setting the flowid to the msix queue ID, rather
>>> than a 32 bit unique flow id hash value for the flow. That makes it
>>> hard to do traffic distribution where the flowid is available.
>>
>>
>> Thanks for the explanation.
>>
>>>
>>>
>>> There's an lagg option to re-hash the mbuf rather than rely on the
>>> flowid for outbound port choice - have you looked at using that? Did
>>> that make any difference?
>>
>>
>> Yes, I set to 0 the net.link.lagg.0.use _flowid, it make a little
>> difference to the default round robin implementation, but yet I can't reach
>> more than 5 Gbit/s. With my patch and set the packets to 50, it improved a
>> bit too.
>>
>> So, thank you so much for all review, I don't know if you have time and a
>> testbed to make a real test, as I'm doing. I would be happy if you or more
>> people could make tests on that patch. Also, I have only ixgbe(4) to make
>> tests, would appreciate if this patch could be tested with other NICs too.
>>
>> Best Regards,
>>
>> --
>> Marcelo Araujo            (__)
>> araujo@FreeBSD.org     \\\'',)
>> http://www.FreeBSD.org   \/  \ ^
>> Power To Server.         .\. /_)
>
>
>
>
> --
>
> --
> Marcelo Araujo            (__)
> araujo@FreeBSD.org     \\\'',)
> http://www.FreeBSD.org   \/  \ ^
> Power To Server.         .\. /_)

From owner-freebsd-net@FreeBSD.ORG  Fri Jul 18 18:18:38 2014
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: freebsd-net@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id C451266B;
 Fri, 18 Jul 2014 18:18:38 +0000 (UTC)
Received: from mail-pd0-x229.google.com (mail-pd0-x229.google.com
 [IPv6:2607:f8b0:400e:c02::229])
 (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id 89CE82FB4;
 Fri, 18 Jul 2014 18:18:38 +0000 (UTC)
Received: by mail-pd0-f169.google.com with SMTP id y10so5473823pdj.28
 for <multiple recipients>; Fri, 18 Jul 2014 11:18:38 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=message-id:date:from:user-agent:mime-version:to:cc:subject
 :references:in-reply-to:content-type:content-transfer-encoding;
 bh=I/wfJBWA56SOxl24a2TyVa4MKrrU78ZtbWfQoVuJHis=;
 b=VPW19GBZS/FjnqX3iMGzXhsAQXNKXiUBI6aZ52hFg18l8h29hHw8GPAXLaFcjggshW
 W2tqtiaWDA5U60pJAyCUi78M4aFp16gtWzc3QsCB7p+3+Dpd6ajlYPDSYvhRve8/5Po5
 GImt9R/KsJcO7mSZ1KXl1yAvxaVb5jfIv+upiLtQnwbl/40dwkXwSXB4hm/WG/pPomTO
 AgduyZaPCb9GjiKQHLLJOluNZryZwegvWixraiMlKls8QneKdn0rCHDhp3IP+erbcI7t
 KzPDmVfDMBCeggORr/5VAQq9oq0oMowNX4e3p/2nK8DaKO1jCg+QkXmzyviqZ0DK4SAR
 OQQg==
X-Received: by 10.69.12.33 with SMTP id en1mr7571950pbd.66.1405707518042;
 Fri, 18 Jul 2014 11:18:38 -0700 (PDT)
Received: from [10.192.166.0] (stargate.chelsio.com. [67.207.112.58])
 by mx.google.com with ESMTPSA id xh10sm25630043pac.24.2014.07.18.11.18.36
 for <multiple recipients>
 (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128);
 Fri, 18 Jul 2014 11:18:37 -0700 (PDT)
Message-ID: <53C964F7.8060503@gmail.com>
Date: Fri, 18 Jul 2014 11:18:31 -0700
From: Navdeep Parhar <nparhar@gmail.com>
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64;
 rv:24.0) Gecko/20100101 Thunderbird/24.6.0
MIME-Version: 1.0
To: araujo@FreeBSD.org, Adrian Chadd <adrian@freebsd.org>
Subject: Re: [patch][lagg] - Set a better granularity and distribution on
 roundrobin protocol.
References: <CAOfEmZjmb1bdvn0gR6vD1WeP8o8g7KwXod4TE0iJfa=nicyeng@mail.gmail.com>
 <CAJ-Vmomt2QDXAVBVUk6m8oH4Pa5yErDdG6wWrP3X7+DW137xiA@mail.gmail.com>
 <CAOfEmZja8Tkv_xG8LyR5Nbj+Oga=vvdy=b3pxHqZi0-BBq25Uw@mail.gmail.com>
 <CAJ-VmomY2wP1EyVK4J16sGmMid=sJ9MPZrUY6pgcKGBDXm1T4g@mail.gmail.com>
 <CAOfEmZj5pk7bFB-PBqaJsi+bA73gbsUZzqggs4yEVky3_61NpQ@mail.gmail.com>
 <CAOfEmZhtZCettzD6pKQMHRiQE42nQmBuimOq28cA23R+Yyc13w@mail.gmail.com>
In-Reply-To: <CAOfEmZhtZCettzD6pKQMHRiQE42nQmBuimOq28cA23R+Yyc13w@mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Cc: FreeBSD Net <freebsd-net@freebsd.org>
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net/>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 18 Jul 2014 18:18:38 -0000

On 07/18/14 00:49, Marcelo Araujo wrote:
> Hello guys,
> 
> I made few changes on the lagg(4) patch. Also, I made tests using igb(4),
> ixgbe(4) and em(4); seems everything worked pretty well.
> 
> I'm wondering if anyone else could make a review, and what I need to do, to
> see this patch committed.

Deliberately putting out-of-order packets on the wire is never a good
idea.  This would count as a serious regression in lagg(4) imho.

Regards,
Navdeep


> 
> Best Regards,
> 
> 
> 
> 
> 2014-06-24 10:40 GMT+08:00 Marcelo Araujo <araujobsdport@gmail.com>:
> 
>>
>>
>> 2014-06-24 6:54 GMT+08:00 Adrian Chadd <adrian@freebsd.org>:
>>
>> Hi,
>>>
>>> No, don't introduce out of order behaviour. Ever.
>>
>>
>> Yes, it has out of order behavior; with my patch much less. I upload two
>> pcap files and you can see by yourself, if you don't believe in what I'm
>> talking about.
>>
>> Test done using: "iperf -s" and "iperf -c <ip> -i 1 -t 10".
>>
>> 1) Don't change the number of packets(default round robin behavior).
>> http://people.freebsd.org/~araujo/lagg/lagg-nop.cap
>> 8 out of order packets.
>> Several SACKs.
>>
>> 2) Set the number of packets to 50.
>> http://people.freebsd.org/~araujo/lagg/lagg.cap
>> 0 out of order packets.
>> Less SACKs.
>>
>>
>>> You may not think
>>> it's a problem for TCP, but UDP things and VPN things will start
>>> getting very angry. There are VPN configurations out there that will
>>> drop the VPN if frames are out of order.
>>>
>>
>> I'm not thinking that will be a problem for TCP, but, in somehow it will
>> be, less throughput as I showed before, and less SACK. About the VPN,
>> please, tell me which softwares, and let me know where I can get a sample
>> to make a testbed.
>>
>> However to be very honest, I don't believe anyone here when change
>> something at network protocols will make this extensive testbed. It is
>> almost impossible to predict what software it will works or not, and I
>> don't believe anyone here has all these stuff in hands.
>>
>>
>>>
>>> The ixgbe driver is setting the flowid to the msix queue ID, rather
>>> than a 32 bit unique flow id hash value for the flow. That makes it
>>> hard to do traffic distribution where the flowid is available.
>>>
>>
>> Thanks for the explanation.
>>
>>
>>>
>>> There's an lagg option to re-hash the mbuf rather than rely on the
>>> flowid for outbound port choice - have you looked at using that? Did
>>> that make any difference?
>>>
>>
>> Yes, I set to 0 the net.link.lagg.0.use _flowid, it make a little
>> difference to the default round robin implementation, but yet I can't reach
>> more than 5 Gbit/s. With my patch and set the packets to 50, it improved a
>> bit too.
>>
>> So, thank you so much for all review, I don't know if you have time and a
>> testbed to make a real test, as I'm doing. I would be happy if you or more
>> people could make tests on that patch. Also, I have only ixgbe(4) to make
>> tests, would appreciate if this patch could be tested with other NICs too.
>>
>> Best Regards,
>>
>> --
>> Marcelo Araujo            (__)
>> araujo@FreeBSD.org     \\\'',)http://www.FreeBSD.org <http://www.freebsd.org/>   \/  \ ^
>> Power To Server.         .\. /_)
>>
>>
> 
> 
> 
> 
> _______________________________________________
> freebsd-net@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org"
> 


From owner-freebsd-net@FreeBSD.ORG  Fri Jul 18 18:28:42 2014
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: net@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id A17A6944;
 Fri, 18 Jul 2014 18:28:42 +0000 (UTC)
Received: from mail-qc0-x234.google.com (mail-qc0-x234.google.com
 [IPv6:2607:f8b0:400d:c01::234])
 (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id 52B4520EC;
 Fri, 18 Jul 2014 18:28:42 +0000 (UTC)
Received: by mail-qc0-f180.google.com with SMTP id l6so3634396qcy.39
 for <multiple recipients>; Fri, 18 Jul 2014 11:28:41 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:in-reply-to:references:date:message-id:subject:from:to
 :cc:content-type;
 bh=fTGVmHvqa3TrcVyq4bV8IPKl+EgHkjxKJhyKv8MCULo=;
 b=Z3EC9Sfr27sahja6EjoZPtTZrVqzgPv4XZ4K5HByGkZmlEVTAqkyEAv/k+nm4nwsHl
 nvbVEGerjLzCwR4qiV8GroO2aJoOgk+k0M1S9AveZ6b60cIKReiBS2iKgO2nKmGdgDBt
 8+F0F42gz+/sVFeHiIoRdEr8CuQ8UXLvWpCnHvoggDUenu7nhhxk9JdtkBeJnxK/NIKN
 AutFQvjObs7VM4UkP7dQd4eU8P8v6hphAkjgb0RHBRa8pmlH9CWz0NfyD85r3J8LEfVp
 P1hLeoAQa/qBLM+lvztdAPkYL5GXhWiYLOZYqCh8jAkCa3f8poLCbg20+eiAZLst6eyY
 TrgA==
MIME-Version: 1.0
X-Received: by 10.140.90.7 with SMTP id w7mr10544705qgd.52.1405708121024; Fri,
 18 Jul 2014 11:28:41 -0700 (PDT)
Received: by 10.96.73.39 with HTTP; Fri, 18 Jul 2014 11:28:40 -0700 (PDT)
In-Reply-To: <CAJ-VmokEiZMpdfNjs+-C9pmRcjOOjjNGTvM88muh940sr7SmPw@mail.gmail.com>
References: <CALCpEUE7OtbXjVTk2C8+V7fjOKutuNq04BTo0SN42YEgX81k-Q@mail.gmail.com>
 <CAJ-VmokEiZMpdfNjs+-C9pmRcjOOjjNGTvM88muh940sr7SmPw@mail.gmail.com>
Date: Fri, 18 Jul 2014 11:28:40 -0700
Message-ID: <CALCpEUE-vebmaGSK5aGM+3q5YqzXkn1P=St7R8G_ztmHmgUBBA@mail.gmail.com>
Subject: Re: UDP sendto() returning ENOBUFS - "No buffer space available"
From: hiren panchasara <hiren.panchasara@gmail.com>
To: Adrian Chadd <adrian@freebsd.org>
Content-Type: text/plain; charset=UTF-8
Cc: "freebsd-net@freebsd.org" <net@freebsd.org>
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net/>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 18 Jul 2014 18:28:42 -0000

On Wed, Jul 16, 2014 at 11:00 AM, Adrian Chadd <adrian@freebsd.org> wrote:
> Hi!
>
> So the UDP transmit path is udp_usrreqs->pru_send() == udp_send() ->
> udp_output() -> ip_output()
>
> udp_output() does do a M_PREPEND() which can return ENOBUFS. ip_output
> can also return ENOBUFS.
>
> it doesn't look like the socket code (eg sosend_dgram()) is doing any
> buffering - it's just copying the frame and stuffing it up to the
> driver. No queuing involved before the NIC.

Right. Thanks for confirming.
>
> So a _well behaved_ driver will return ENOBUFS _and_ not queue the
> frame. However, it's entirely plausible that the driver isn't well
> behaved - the intel drivers screwed up here and there with transmit
> queue and failure to queue vs failure to transmit.
>
> So yeah, try tweaking the tx ring descriptor for the driver your'e
> using and see how big a bufring it's allocating.

Yes, so I am dealing with Broadcom BCM5706/BCM5708 Gigabit Ethernet,
i.e. bce(4).

I bumped up tx_pages from 2 (default) to 8 where each page is 255
buffer descriptors.

I am seeing quite nice improvement on stable/10 where I can send
*more* stuff :-)

cheers,
Hiren

From owner-freebsd-net@FreeBSD.ORG  Fri Jul 18 21:02:59 2014
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: net@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 9D378427
 for <net@freebsd.org>; Fri, 18 Jul 2014 21:02:59 +0000 (UTC)
Received: from mail-qc0-x234.google.com (mail-qc0-x234.google.com
 [IPv6:2607:f8b0:400d:c01::234])
 (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id 5FFAA2F06
 for <net@freebsd.org>; Fri, 18 Jul 2014 21:02:59 +0000 (UTC)
Received: by mail-qc0-f180.google.com with SMTP id l6so3782550qcy.39
 for <net@freebsd.org>; Fri, 18 Jul 2014 14:02:58 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:date:message-id:subject:from:to:cc:content-type;
 bh=MCOdsC3I7Tiw25AoXctlCaUo1MQlIWqAHd/ZPpKBA6Y=;
 b=D17lmimicmB/zptIOv+X+eMq3C143Y/PgLtx7f8Z1V+/ZIFoLspQ2HELFXCiOMecoG
 Oc1xniIIVPeFQQNGjhKyHHnWoZ30vPcyg8SHmiT8IHIUqyzJzLO2zESbzozOaSC37/bi
 4O1xgDvj5HBq528ZSLeF+uj7LD/cP120IPu+qqhOdS+9GEAm2RRopwtIWOGksC2N3Vh2
 /pGfm4aboRYx1A5cHmSw8tApeX2yAIHGV3obULd9sm98GrWJ11M1Ypn4bniMpcZ/Mwa3
 H9sEqNBky1P+d+HInK5CAjF2SJKC5M5oNN7WiDwYpVN77elHTrtXHirO7kchTra0Nagq
 r3jA==
MIME-Version: 1.0
X-Received: by 10.224.15.72 with SMTP id j8mr13333873qaa.8.1405717378548; Fri,
 18 Jul 2014 14:02:58 -0700 (PDT)
Received: by 10.96.25.164 with HTTP; Fri, 18 Jul 2014 14:02:58 -0700 (PDT)
Date: Fri, 18 Jul 2014 14:02:58 -0700
Message-ID: <CAEsLtnn3T3OH0OSFhcNNn8c1PZ07fP8rNX_fwm0S0-WWtMiJGA@mail.gmail.com>
Subject: Error building netmap for centOS 6.5
From: Morgan Yang <morphyno@gmail.com>
To: net@freebsd.org
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
X-Content-Filtered-By: Mailman/MimeDel 2.1.18
Cc: rizzo@iet.unipi.it
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net/>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 18 Jul 2014 21:02:59 -0000

Hi:

I downloaded the latest netmap repo from git and attempted to compile them.

git clone https://code.google.com/p/netmap/
cd netmap/Linux
make

I get the following errors

[devusr@testbox LINUX]$ make
LIN_VER 20620
---- Building from /lib/modules/2.6.32-431.20.3.el6.x86_64/build/drivers/ne=
t
---- copying e1000 e1000e r8169.c ---
  From /lib/modules/2.6.32-431.20.3.el6.x86_64/build/drivers/net :
    drwxr-xr-x. 2 root root 4096 Jul 16 16:56 e1000/
    drwxr-xr-x. 2 root root 4096 Jul 16 16:56 e1000e/
** patch with diff--e1000--20620--99999
The text leading up to this was:
--------------------------
|diff --git a/e1000/e1000_main.c b/e1000/e1000_main.c
|index bcd192c..5de7009 100644
|--- a/e1000/e1000_main.c
|+++ b/e1000/e1000_main.c
--------------------------
No file to patch.  Skipping patch.
9 out of 9 hunks ignored
** patch with diff--e1000e--20620--20623
The text leading up to this was:
--------------------------
|diff --git a/e1000e/netdev.c b/e1000e/netdev.c
|index fad8f9e..50f74e2 100644
|--- a/e1000e/netdev.c
|+++ b/e1000e/netdev.c
--------------------------
No file to patch.  Skipping patch.
8 out of 8 hunks ignored
** patch with diff--r8169.c--20620--20625
The text leading up to this was:
--------------------------
|diff --git a/r8169.c b/r8169.c
|index 0fe2fc9..efee0a4 100644
|--- a/r8169.c
|+++ b/r8169.c
--------------------------
No file to patch.  Skipping patch.
9 out of 9 hunks ignored
Building the following drivers: e1000 e1000e r8169.c
make -C /lib/modules/2.6.32-431.20.3.el6.x86_64/build
M=3D/home/devusr/Documents/netmap/LINUX CONFIG_NETMAP=3Dm CONFIG_E1000=3Dm
CONFIG_E1000E=3Dm CONFIG_IXGBE=3Dm CONFIG_IGB=3Dm CONFIG_BNX2X=3Dm CONFIG_M=
LX4=3Dm
CONFIG_VIRTIO_NET=3Dm \
                EXTRA_CFLAGS=3D'-I/home/devusr/Documents/netmap/LINUX
-I/home/devusr/Documents/netmap/LINUX/../sys
-I/home/devusr/Documents/netmap/LINUX/../sys/dev -DCONFIG_NETMAP
-Wno-unused-but-set-variable'                       \
                O_DRIVERS=3D"e1000/ e1000e/" modules
make[1]: Entering directory `/usr/src/kernels/2.6.32-431.20.3.el6.x86_64'
  CC [M]  /home/devusr/Documents/netmap/LINUX/netmap.o
/home/devusr/Documents/netmap/LINUX/../sys/dev/netmap/netmap.c: In function
=E2=80=98netmap_attach=E2=80=99:
/home/devusr/Documents/netmap/LINUX/../sys/dev/netmap/netmap.c:2255: error:
=E2=80=98struct ethtool_ops=E2=80=99 has no member named =E2=80=98set_chann=
els=E2=80=99
make[2]: *** [/home/devusr/Documents/netmap/LINUX/netmap.o] Error 1
make[1]: *** [_module_/home/devusr/Documents/netmap/LINUX] Error 2
make[1]: Leaving directory `/usr/src/kernels/2.6.32-431.20.3.el6.x86_64'
make: *** [build] Error 2

Has anyone came across this issue before???

Much Thanks
Morgan Yang

From owner-freebsd-net@FreeBSD.ORG  Fri Jul 18 22:20:24 2014
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: net@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 4AA518A6;
 Fri, 18 Jul 2014 22:20:24 +0000 (UTC)
Received: from mail105.syd.optusnet.com.au (mail105.syd.optusnet.com.au
 [211.29.132.249])
 by mx1.freebsd.org (Postfix) with ESMTP id CAECD2579;
 Fri, 18 Jul 2014 22:20:23 +0000 (UTC)
Received: from c122-106-147-133.carlnfd1.nsw.optusnet.com.au
 (c122-106-147-133.carlnfd1.nsw.optusnet.com.au [122.106.147.133])
 by mail105.syd.optusnet.com.au (Postfix) with ESMTPS id 33EC110C8258;
 Sat, 19 Jul 2014 06:40:16 +1000 (EST)
Date: Sat, 19 Jul 2014 06:40:14 +1000 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@besplex.bde.org
To: hiren panchasara <hiren.panchasara@gmail.com>
Subject: Re: UDP sendto() returning ENOBUFS - "No buffer space available"
In-Reply-To: <CALCpEUE-vebmaGSK5aGM+3q5YqzXkn1P=St7R8G_ztmHmgUBBA@mail.gmail.com>
Message-ID: <20140719053318.I15959@besplex.bde.org>
References: <CALCpEUE7OtbXjVTk2C8+V7fjOKutuNq04BTo0SN42YEgX81k-Q@mail.gmail.com>
 <CAJ-VmokEiZMpdfNjs+-C9pmRcjOOjjNGTvM88muh940sr7SmPw@mail.gmail.com>
 <CALCpEUE-vebmaGSK5aGM+3q5YqzXkn1P=St7R8G_ztmHmgUBBA@mail.gmail.com>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
X-Optus-CM-Score: 0
X-Optus-CM-Analysis: v=2.1 cv=dZS5gxne c=1 sm=1 tr=0
 a=7NqvjVvQucbO2RlWB8PEog==:117 a=PO7r1zJSAAAA:8 a=qwo5vUNQll0A:10
 a=kj9zAlcOel0A:10 a=JzwRw_2MAAAA:8 a=6I5d2MoRAAAA:8
 a=uUY9mpc2aE6cjnkfEgIA:9 a=qYTI4x4OGrQ0lSoH:21 a=OaoyB03c6sN3E27u:21
 a=CjuIK1q_8ugA:10 a=SV7veod9ZcQA:10
Cc: Adrian Chadd <adrian@freebsd.org>,
 "freebsd-net@freebsd.org" <net@freebsd.org>
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net/>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 18 Jul 2014 22:20:24 -0000

On Fri, 18 Jul 2014, hiren panchasara wrote:

> On Wed, Jul 16, 2014 at 11:00 AM, Adrian Chadd <adrian@freebsd.org> wrote:
>> Hi!
>>
>> So the UDP transmit path is udp_usrreqs->pru_send() == udp_send() ->
>> udp_output() -> ip_output()
>>
>> udp_output() does do a M_PREPEND() which can return ENOBUFS. ip_output
>> can also return ENOBUFS.
>>
>> it doesn't look like the socket code (eg sosend_dgram()) is doing any
>> buffering - it's just copying the frame and stuffing it up to the
>> driver. No queuing involved before the NIC.
>
> Right. Thanks for confirming.

Most buffering should be in ifq above the NIC.  For UDP, I think
udp_output() puts buffers on the ifq and calls the driver for every
one, but the driver shouldn't do anything for most calls.  The
driver can't possibly do anything if its ring buffer is full, and
shouldn't do anything if it is nearly full.  Buffers accumulate in
the ifq until the driver gets around to them or the queue fills up.
Most ENOBUFS errors are for when it fills up.  It can very easily
fill up, especially since it is too small in most configurations.
Just loop calling sendto().  This will fill the ifq almost
instantly unless the hardware is faster than the software.

>> So a _well behaved_ driver will return ENOBUFS _and_ not queue the
>> frame. However, it's entirely plausible that the driver isn't well
>> behaved - the intel drivers screwed up here and there with transmit
>> queue and failure to queue vs failure to transmit.

No, the driver doesn't have much control over the ifq.

>> So yeah, try tweaking the tx ring descriptor for the driver your'e
>> using and see how big a bufring it's allocating.
>
> Yes, so I am dealing with Broadcom BCM5706/BCM5708 Gigabit Ethernet,
> i.e. bce(4).
>
> I bumped up tx_pages from 2 (default) to 8 where each page is 255
> buffer descriptors.
>
> I am seeing quite nice improvement on stable/10 where I can send
> *more* stuff :-)

255 is not many.  I am most familiar with bge where there is a single
tx ring with 511 or 512 buffer descriptors (some bge's have more, but
this is unportable and was not supported last time I looked.  The
extras might be only for input).  One of my bge's can do 640 kpps with
tiny packets (only 80 kpps with normal packets) and the other only 200
(?) kpps (both should be limited mainly by the PCI bus, but the slow
one is limited by it being a dumbed down 5705"plus").  At 640 kpps,
it takes 800 microseconds to transmit 512 packets.  (There is 1 packet
per buffer descriptor for small packets.)

Considerable buffering in ifq is needed to prevent the transmitter
running dry whenever the application stops generating packets for more
than 800 microseconds for some reason, but the default buffering is
stupidly small.  The default is given by net.inet.ifqmaxlen and some
corresponding macros, and is still just 50.  50 was enough for 1 Mpbs
ethernet and perhaps even for 10 Mbps, but is now too small.  Most
drivers don't use it, but use their own too-small value.  bge uses
just its own ring buffer size of 511.  I use 10000 or 40000 depending
on hz:

% diff -u2 if_bge.c~ if_bge.c
% --- if_bge.c~	2012-03-13 02:13:48.144002000 +0000
% +++ if_bge.c	2012-03-13 02:13:50.123023000 +0000
% @@ -3315,5 +3316,6 @@
%  	ifp->if_start = bge_start;
%  	ifp->if_init = bge_init;
% -	ifp->if_snd.ifq_drv_maxlen = BGE_TX_RING_CNT - 1;
% +	ifp->if_snd.ifq_drv_maxlen = BGE_TX_RING_CNT +
% +	    imax(4 * tick, 10000) / 1;
%  	IFQ_SET_MAXLEN(&ifp->if_snd, ifp->if_snd.ifq_drv_maxlen);
%  	IFQ_SET_READY(&ifp->if_snd);

40000 is what is needed for 4 tick's worth of buffering at hz = 100.
40000 is far too large where 50 is far too small, but something like
it is needed when hz is large due to another problem: select() on
the ENOBUFS condition is broken (unsupported), so when sendto()
returns ENOBUFS there is no way for the application to tell how
long it should wait before retrying.  If it wants to burn CPU then
it can spin calling sendto().  Otherwise, it should sleep, but
with a sleep granularity of 1 tick this requires several ticks worth
of buffering to avoid the transmitter running dry.  Large queue lengths
give a large latency for packets at the end of the queue and give no
chance of the working set fitting in an Ln cache for small n.

The precise stupidly small value of (tx_ring_count - 1) for the ifq
length seems to be for no good reason.  Subtracting 1 is apparently
to increase the chance that all packets in the ifq can be fitted into
the tx ring.  But this is silly since the ifq packet count is in
dufferent units to the buffer descriptor count.  For normal-size
packets, there are 2 descriptors per packet.  So in the usual case
where the ifq is full, only about half of it can be moved to the tx
ring.  And this is good since it gives a little more buffering.
Otherwise, the effective buffering is just what is in the tx ring,
since none is left in the ifq after transferring eveyrhing.

(tx_ring_count - 1) is used by many other drivers.  E.g., fxp.  fxp
is only 100 Mbps and its tx_ring_count is 128.  128 is a bit larger
than 50 but not enough.  Scaling down my 40000 gives 4000 for hz = 100
and 400 for hz = 1000.  I never worried about this problem at 100 Mpbs.

Changing from 2 rings of length 255 to 8 of length 255 shouldn't make
much difference if other things are configured correctly.  It doesn't
matter much if the buffering is in ifq or in ring buffers.  Multiple
ring buffers, filled in advance of the active one running dry so that
the next one can be switched to quickly, mainly give you a tiny latency
optimization.  I get similar optimizations more in software for bge,
by doing watermark stuff.  The boundary between the ifq and the tx
ring also acts as a primitive watermark.  With watermarks, it is best
to not divide up the buffer evenly, but that is what the
(tx_ring_count - 1) sizing for the ifq sort of does.

Bruce

From owner-freebsd-net@FreeBSD.ORG  Sat Jul 19 02:06:24 2014
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: freebsd-net@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id EFCA7844;
 Sat, 19 Jul 2014 02:06:24 +0000 (UTC)
Received: from mail-wi0-x22b.google.com (mail-wi0-x22b.google.com
 [IPv6:2a00:1450:400c:c05::22b])
 (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id 5CBB227EC;
 Sat, 19 Jul 2014 02:06:24 +0000 (UTC)
Received: by mail-wi0-f171.google.com with SMTP id hi2so1680693wib.10
 for <multiple recipients>; Fri, 18 Jul 2014 19:06:21 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:reply-to:in-reply-to:references:date:message-id
 :subject:from:to:cc:content-type;
 bh=XSIJg3/nRFJDGMbWLISWTRGC9h4+O32W0b+40fqgf1g=;
 b=w1a55eeur52wJUXMGz4nd474nIl1oFcXObgdcwBOsJ6q3krG/DisPap8zCqnti7i23
 oXG8jODH7xeD0FA4YXcLPxBS7Z7BSbgyM2FJR8bM7lrCeKtFUVGqXlmc8fgV77gL5/bj
 ScUJ9/YOYNdQNmhAhuAIs7V8riMVXEbkkWjRyBA2DJFpEcOhQU35BKtHRIMH83NMpZWJ
 oZElvpUPk3Sb+hcfQH2FCO5c7ZifycsE6PoNIgEWyvL1o28zclP28xZwbb6cDKP9DG8s
 mzOSPUZnEjuzvzAi4JRAyqjatYIE2DtOGOIPwMJBuRtfX1QOahivOdZtsvjT/LIh7ejG
 OXEA==
MIME-Version: 1.0
X-Received: by 10.194.158.226 with SMTP id wx2mr1490962wjb.107.1405735581697; 
 Fri, 18 Jul 2014 19:06:21 -0700 (PDT)
Received: by 10.216.190.194 with HTTP; Fri, 18 Jul 2014 19:06:21 -0700 (PDT)
Reply-To: araujo@FreeBSD.org
In-Reply-To: <53C964F7.8060503@gmail.com>
References: <CAOfEmZjmb1bdvn0gR6vD1WeP8o8g7KwXod4TE0iJfa=nicyeng@mail.gmail.com>
 <CAJ-Vmomt2QDXAVBVUk6m8oH4Pa5yErDdG6wWrP3X7+DW137xiA@mail.gmail.com>
 <CAOfEmZja8Tkv_xG8LyR5Nbj+Oga=vvdy=b3pxHqZi0-BBq25Uw@mail.gmail.com>
 <CAJ-VmomY2wP1EyVK4J16sGmMid=sJ9MPZrUY6pgcKGBDXm1T4g@mail.gmail.com>
 <CAOfEmZj5pk7bFB-PBqaJsi+bA73gbsUZzqggs4yEVky3_61NpQ@mail.gmail.com>
 <CAOfEmZhtZCettzD6pKQMHRiQE42nQmBuimOq28cA23R+Yyc13w@mail.gmail.com>
 <53C964F7.8060503@gmail.com>
Date: Sat, 19 Jul 2014 10:06:21 +0800
Message-ID: <CAOfEmZigg8_3b073aEU7kJd9i+jLFOVvAV_V4aU0jHOAJGLVBg@mail.gmail.com>
Subject: Re: [patch][lagg] - Set a better granularity and distribution on
 roundrobin protocol.
From: Marcelo Araujo <araujobsdport@gmail.com>
To: Navdeep Parhar <nparhar@gmail.com>
Content-Type: text/plain; charset=UTF-8
X-Content-Filtered-By: Mailman/MimeDel 2.1.18
Cc: FreeBSD Net <freebsd-net@freebsd.org>, Adrian Chadd <adrian@freebsd.org>
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net/>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 19 Jul 2014 02:06:25 -0000

2014-07-19 2:18 GMT+08:00 Navdeep Parhar <nparhar@gmail.com>:

> On 07/18/14 00:49, Marcelo Araujo wrote:
> > Hello guys,
> >
> > I made few changes on the lagg(4) patch. Also, I made tests using igb(4),
> > ixgbe(4) and em(4); seems everything worked pretty well.
> >
> > I'm wondering if anyone else could make a review, and what I need to do,
> to
> > see this patch committed.
>
> Deliberately putting out-of-order packets on the wire is never a good
> idea.  This would count as a serious regression in lagg(4) imho.
>
> Regards,
> Navdeep
>
>
>
I'm wondering if anyone have tested the patch; because as I have explained
in another email, the number of SACK is much less with this patch. I have
put some pcap files here: http://people.freebsd.org/~araujo/lagg/

Also, as far as I know, the current roundrobin implementation has no such
kind of mechanism to control the order of the packages that goes to the
wire. And this patch, what it only does is, instead to send only one
package through one interface and switch to the another one, it will send
X(where X is the number of packets defined via sysctl) packets and then,
switch to the next interface.

So, could you show me, where this patch deliberately put out-of-order
packets? Did I miss anything?


Best Regards,
-- 

-- 
Marcelo Araujo            (__)araujo@FreeBSD.org
\\\'',)http://www.FreeBSD.org <http://www.freebsd.org/>   \/  \ ^
Power To Server.         .\. /_)

From owner-freebsd-net@FreeBSD.ORG  Sat Jul 19 02:44:00 2014
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: freebsd-net@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id C477A1D7;
 Sat, 19 Jul 2014 02:44:00 +0000 (UTC)
Received: from mail-pd0-x22f.google.com (mail-pd0-x22f.google.com
 [IPv6:2607:f8b0:400e:c02::22f])
 (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id 8682E2B15;
 Sat, 19 Jul 2014 02:44:00 +0000 (UTC)
Received: by mail-pd0-f175.google.com with SMTP id r10so4514753pdi.20
 for <multiple recipients>; Fri, 18 Jul 2014 19:44:00 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=message-id:date:from:user-agent:mime-version:to:cc:subject
 :references:in-reply-to:content-type:content-transfer-encoding;
 bh=Sv3LLxcITLr/WH3IY8GYApG2HrC7OWCauaIhWaAOcJk=;
 b=0cMSWx92kOv4V6VJwkDdGhDfjOsR+FESJ+PKnSe/pIrMjzLu57PdfSgtO7v1XU/Nlt
 o7PxS1sD/a8hUOednq5bpw8SOoe765eNmZHRXIXGYpuKWiXCGjm4BkeRoiNV73l6ASQe
 Iuhu2ObM13xJAnx+2PuYxUu8EAfocnP3v+FqE3tau+W2s5rlH2X+4ecv2NZfyeCoHjkz
 gyZiUAdugHMUe0bnId6OTIjSruJClIxumH6tGtUnFy3GbHMSM5LfgRsBxRpDWsDdFfHE
 8x82o4YZT8QRGzeGVKBOwq1x1OrbVG0DNm4kBn3xMA6t7B6++CBZ76kwn4rQodBmLIam
 cvjg==
X-Received: by 10.68.136.226 with SMTP id qd2mr9759318pbb.72.1405737840010;
 Fri, 18 Jul 2014 19:44:00 -0700 (PDT)
Received: from [10.192.166.0] (stargate.chelsio.com. [67.207.112.58])
 by mx.google.com with ESMTPSA id kt2sm6953199pbc.83.2014.07.18.19.43.59
 for <multiple recipients>
 (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128);
 Fri, 18 Jul 2014 19:43:59 -0700 (PDT)
Message-ID: <53C9DB6E.8040205@gmail.com>
Date: Fri, 18 Jul 2014 19:43:58 -0700
From: Navdeep Parhar <nparhar@gmail.com>
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64;
 rv:24.0) Gecko/20100101 Thunderbird/24.6.0
MIME-Version: 1.0
To: araujo@FreeBSD.org
Subject: Re: [patch][lagg] - Set a better granularity and distribution on
 roundrobin protocol.
References: <CAOfEmZjmb1bdvn0gR6vD1WeP8o8g7KwXod4TE0iJfa=nicyeng@mail.gmail.com>	<CAJ-Vmomt2QDXAVBVUk6m8oH4Pa5yErDdG6wWrP3X7+DW137xiA@mail.gmail.com>	<CAOfEmZja8Tkv_xG8LyR5Nbj+Oga=vvdy=b3pxHqZi0-BBq25Uw@mail.gmail.com>	<CAJ-VmomY2wP1EyVK4J16sGmMid=sJ9MPZrUY6pgcKGBDXm1T4g@mail.gmail.com>	<CAOfEmZj5pk7bFB-PBqaJsi+bA73gbsUZzqggs4yEVky3_61NpQ@mail.gmail.com>	<CAOfEmZhtZCettzD6pKQMHRiQE42nQmBuimOq28cA23R+Yyc13w@mail.gmail.com>	<53C964F7.8060503@gmail.com>
 <CAOfEmZigg8_3b073aEU7kJd9i+jLFOVvAV_V4aU0jHOAJGLVBg@mail.gmail.com>
In-Reply-To: <CAOfEmZigg8_3b073aEU7kJd9i+jLFOVvAV_V4aU0jHOAJGLVBg@mail.gmail.com>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit
Cc: FreeBSD Net <freebsd-net@freebsd.org>, Adrian Chadd <adrian@freebsd.org>
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net/>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 19 Jul 2014 02:44:00 -0000

On 07/18/14 19:06, Marcelo Araujo wrote:
> 
> 
> 
> 2014-07-19 2:18 GMT+08:00 Navdeep Parhar <nparhar@gmail.com
> <mailto:nparhar@gmail.com>>:
> 
>     On 07/18/14 00:49, Marcelo Araujo wrote:
>     > Hello guys,
>     >
>     > I made few changes on the lagg(4) patch. Also, I made tests using
>     igb(4),
>     > ixgbe(4) and em(4); seems everything worked pretty well.
>     >
>     > I'm wondering if anyone else could make a review, and what I need
>     to do, to
>     > see this patch committed.
> 
>     Deliberately putting out-of-order packets on the wire is never a good
>     idea.  This would count as a serious regression in lagg(4) imho.
> 
>     Regards,
>     Navdeep
> 
> 
> 
> I'm wondering if anyone have tested the patch; because as I have
> explained in another email, the number of SACK is much less with this
> patch. I have put some pcap files
> here: http://people.freebsd.org/~araujo/lagg/
> 
> Also, as far as I know, the current roundrobin implementation has no
> such kind of mechanism to control the order of the packages that goes to
> the wire. And this patch, what it only does is, instead to send only one
> package through one interface and switch to the another one, it will
> send X(where X is the number of packets defined via sysctl) packets and
> then, switch to the next interface.
> 
> So, could you show me, where this patch deliberately put out-of-order
> packets? Did I miss anything?

Are you saying lagg's roundrobin implementation is already spraying
packets for the same flow across interfaces?  That would make it
unsuitable for anything TCP.  But then your patch isn't making it any
worse so I don't have any objection to it any more.

Looks like loadbalance does the right thing for flows.

Regards,
Navdeep

From owner-freebsd-net@FreeBSD.ORG  Sat Jul 19 03:28:11 2014
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: freebsd-net@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 22302864;
 Sat, 19 Jul 2014 03:28:11 +0000 (UTC)
Received: from mail-qa0-x22c.google.com (mail-qa0-x22c.google.com
 [IPv6:2607:f8b0:400d:c00::22c])
 (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id C2B2C2EC6;
 Sat, 19 Jul 2014 03:28:10 +0000 (UTC)
Received: by mail-qa0-f44.google.com with SMTP id f12so3614547qad.3
 for <multiple recipients>; Fri, 18 Jul 2014 20:28:09 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:sender:in-reply-to:references:date:message-id:subject
 :from:to:cc:content-type;
 bh=w75OtZ7K9jbkw50JF/3vly0uRQTFw8VRS0hdBlaSZP4=;
 b=vZ0vMhJPvlmIUVh56qQY84qYj9n3I5yy+OFM4HiBKTkzUf9Tbpot8j+lzhftHbs5KI
 hpVaFrIWtmdld22dSimlX4WiT14+cWxmGQTDbwMmkCp5q+pWikowUbineZn73ttjd7mO
 SegsOoqTGZATe1s/QqYxISJKHC0SqK9a0/TKUiKmuZXoGQxs7gztju+KkPFl3G3XbI46
 cNbWDvimbf7OgtE8fF6CCfToyMo07w6DpHDqv1mh5I7eM27ufU6ypFP24HyC1jNwxgax
 zHn5S0yj23T2+DQS4vLlGvsGT/RhcjVXhbDWj/PSAM8LCUTv40TixfCBDugsm0GiibuF
 ihBg==
MIME-Version: 1.0
X-Received: by 10.229.174.70 with SMTP id s6mr14798770qcz.29.1405740489801;
 Fri, 18 Jul 2014 20:28:09 -0700 (PDT)
Sender: adrian.chadd@gmail.com
Received: by 10.224.1.6 with HTTP; Fri, 18 Jul 2014 20:28:09 -0700 (PDT)
In-Reply-To: <CAOfEmZigg8_3b073aEU7kJd9i+jLFOVvAV_V4aU0jHOAJGLVBg@mail.gmail.com>
References: <CAOfEmZjmb1bdvn0gR6vD1WeP8o8g7KwXod4TE0iJfa=nicyeng@mail.gmail.com>
 <CAJ-Vmomt2QDXAVBVUk6m8oH4Pa5yErDdG6wWrP3X7+DW137xiA@mail.gmail.com>
 <CAOfEmZja8Tkv_xG8LyR5Nbj+Oga=vvdy=b3pxHqZi0-BBq25Uw@mail.gmail.com>
 <CAJ-VmomY2wP1EyVK4J16sGmMid=sJ9MPZrUY6pgcKGBDXm1T4g@mail.gmail.com>
 <CAOfEmZj5pk7bFB-PBqaJsi+bA73gbsUZzqggs4yEVky3_61NpQ@mail.gmail.com>
 <CAOfEmZhtZCettzD6pKQMHRiQE42nQmBuimOq28cA23R+Yyc13w@mail.gmail.com>
 <53C964F7.8060503@gmail.com>
 <CAOfEmZigg8_3b073aEU7kJd9i+jLFOVvAV_V4aU0jHOAJGLVBg@mail.gmail.com>
Date: Fri, 18 Jul 2014 20:28:09 -0700
X-Google-Sender-Auth: e4_Jf2H8yjiTLJmu-inCWeXE9oI
Message-ID: <CAJ-VmonNuu5YunzZjUYsMFK9rFMbTf7=nswpPisjwV-RpoHzRA@mail.gmail.com>
Subject: Re: [patch][lagg] - Set a better granularity and distribution on
 roundrobin protocol.
From: Adrian Chadd <adrian@freebsd.org>
To: araujo@freebsd.org
Content-Type: text/plain; charset=UTF-8
Cc: FreeBSD Net <freebsd-net@freebsd.org>, Navdeep Parhar <nparhar@gmail.com>
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net/>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 19 Jul 2014 03:28:11 -0000

On 18 July 2014 19:06, Marcelo Araujo <araujobsdport@gmail.com> wrote:
>
>
>
> 2014-07-19 2:18 GMT+08:00 Navdeep Parhar <nparhar@gmail.com>:
>
>> On 07/18/14 00:49, Marcelo Araujo wrote:
>> > Hello guys,
>> >
>> > I made few changes on the lagg(4) patch. Also, I made tests using
>> > igb(4),
>> > ixgbe(4) and em(4); seems everything worked pretty well.
>> >
>> > I'm wondering if anyone else could make a review, and what I need to do,
>> > to
>> > see this patch committed.
>>
>> Deliberately putting out-of-order packets on the wire is never a good
>> idea.  This would count as a serious regression in lagg(4) imho.
>>
>> Regards,
>> Navdeep
>>
>>
>
> I'm wondering if anyone have tested the patch; because as I have explained
> in another email, the number of SACK is much less with this patch. I have
> put some pcap files here: http://people.freebsd.org/~araujo/lagg/
>
> Also, as far as I know, the current roundrobin implementation has no such
> kind of mechanism to control the order of the packages that goes to the
> wire. And this patch, what it only does is, instead to send only one package
> through one interface and switch to the another one, it will send X(where X
> is the number of packets defined via sysctl) packets and then, switch to the
> next interface.
>
> So, could you show me, where this patch deliberately put out-of-order
> packets? Did I miss anything?

It doesn't introduce it, but it still continues potentially out of
order behaviour depending upon CPU loading and NIC scheduling.

If you're seeing reduced ACK / retransmits by doing this then there's
gotta be some other underlying factor causing it. That's what I think
needs to be fixed, not papering over it by more round robin hacks. :-P


-a

From owner-freebsd-net@FreeBSD.ORG  Sat Jul 19 03:34:49 2014
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: net@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 70E9496E
 for <net@freebsd.org>; Sat, 19 Jul 2014 03:34:49 +0000 (UTC)
Received: from mail-qg0-x22f.google.com (mail-qg0-x22f.google.com
 [IPv6:2607:f8b0:400d:c04::22f])
 (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id 30AF92F68
 for <net@freebsd.org>; Sat, 19 Jul 2014 03:34:49 +0000 (UTC)
Received: by mail-qg0-f47.google.com with SMTP id i50so3769848qgf.20
 for <net@freebsd.org>; Fri, 18 Jul 2014 20:34:48 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:sender:in-reply-to:references:date:message-id:subject
 :from:to:cc:content-type;
 bh=IsKxbeb78G3FjT0R+NRwPfnryv4r/GuQWF4t4XzaMoI=;
 b=qbsOEVTEIcEbonn7Ww6lxxsD6kINRCuLbwRfz15tlD+sWGkSK7KIWR5ybvEvMkzA9H
 VLvz63CSScSUyejxl/fUv2iLkKw3RkldWGtZfrsERoDJh8jdMzbq76bIrIDZngOoirHN
 JvFi46QuLgcGy7702iIBGOZ+/GepClIPAxNw29Pkgah5c5+1c6Fe/rpB41JZey261NdT
 OFg65SLaN1beq/l3CS3vGHtVV/TF+EujS0r9KrBhp4urQRU1gaw+bDjKtpCNokl/xang
 7op2mc9cVbs7A1HBf0WBeQpOXXw4SeGCemMH8KsvkvSzaqodZG+h8Yr/rr4PGaFNSA/7
 c9TA==
MIME-Version: 1.0
X-Received: by 10.229.171.196 with SMTP id i4mr14934441qcz.15.1405740888293;
 Fri, 18 Jul 2014 20:34:48 -0700 (PDT)
Sender: adrian.chadd@gmail.com
Received: by 10.224.1.6 with HTTP; Fri, 18 Jul 2014 20:34:48 -0700 (PDT)
In-Reply-To: <20140719053318.I15959@besplex.bde.org>
References: <CALCpEUE7OtbXjVTk2C8+V7fjOKutuNq04BTo0SN42YEgX81k-Q@mail.gmail.com>
 <CAJ-VmokEiZMpdfNjs+-C9pmRcjOOjjNGTvM88muh940sr7SmPw@mail.gmail.com>
 <CALCpEUE-vebmaGSK5aGM+3q5YqzXkn1P=St7R8G_ztmHmgUBBA@mail.gmail.com>
 <20140719053318.I15959@besplex.bde.org>
Date: Fri, 18 Jul 2014 20:34:48 -0700
X-Google-Sender-Auth: 2QgPtedPlrYmwj0jBHxikPv50sI
Message-ID: <CAJ-Vmonunb=WqcyJ+hJNDknj+zjMQVOWWk+u=MqhXkJkcm_DFQ@mail.gmail.com>
Subject: Re: UDP sendto() returning ENOBUFS - "No buffer space available"
From: Adrian Chadd <adrian@freebsd.org>
To: Bruce Evans <brde@optusnet.com.au>
Content-Type: text/plain; charset=UTF-8
Cc: hiren panchasara <hiren.panchasara@gmail.com>,
 "freebsd-net@freebsd.org" <net@freebsd.org>
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net/>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 19 Jul 2014 03:34:49 -0000

Hi,


On 18 July 2014 13:40, Bruce Evans <brde@optusnet.com.au> wrote:
> On Fri, 18 Jul 2014, hiren panchasara wrote:
>
>> On Wed, Jul 16, 2014 at 11:00 AM, Adrian Chadd <adrian@freebsd.org> wrote:
>>>
>>> Hi!
>>>
>>> So the UDP transmit path is udp_usrreqs->pru_send() == udp_send() ->
>>> udp_output() -> ip_output()
>>>
>>> udp_output() does do a M_PREPEND() which can return ENOBUFS. ip_output
>>> can also return ENOBUFS.
>>>
>>> it doesn't look like the socket code (eg sosend_dgram()) is doing any
>>> buffering - it's just copying the frame and stuffing it up to the
>>> driver. No queuing involved before the NIC.
>>
>>
>> Right. Thanks for confirming.
>
>
> Most buffering should be in ifq above the NIC.  For UDP, I think
> udp_output() puts buffers on the ifq and calls the driver for every
> one, but the driver shouldn't do anything for most calls.  The
> driver can't possibly do anything if its ring buffer is full, and
> shouldn't do anything if it is nearly full.  Buffers accumulate in
> the ifq until the driver gets around to them or the queue fills up.
> Most ENOBUFS errors are for when it fills up.  It can very easily
> fill up, especially since it is too small in most configurations.
> Just loop calling sendto().  This will fill the ifq almost
> instantly unless the hardware is faster than the software.

For if_transmit() drivers, there's no ifp queue. The queuing is being
done in the driver.

For drivers with if_transmit(), they may end up doing direct DMA ring
dispatch or they may have a buf_ring in front of it.There's no ifq
anymore. It upsets the ALTQ people too.


-a

From owner-freebsd-net@FreeBSD.ORG  Sat Jul 19 03:49:58 2014
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: net@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 84D06AC5
 for <net@freebsd.org>; Sat, 19 Jul 2014 03:49:58 +0000 (UTC)
Received: from mail-oa0-f51.google.com (mail-oa0-f51.google.com
 [209.85.219.51])
 (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id 474062044
 for <net@freebsd.org>; Sat, 19 Jul 2014 03:49:57 +0000 (UTC)
Received: by mail-oa0-f51.google.com with SMTP id o6so4604448oag.38
 for <net@freebsd.org>; Fri, 18 Jul 2014 20:49:51 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20130820;
 h=x-gm-message-state:references:mime-version:in-reply-to:content-type
 :content-transfer-encoding:message-id:cc:from:subject:date:to;
 bh=7tXeLXBv24tQCNgBhIo7EWl1SoADfn6DaAotBnQR6Mw=;
 b=g3P/59wJ6vsGqZy3ICvciE5kEI86obT6BZ6q0dI7XudHVh3XPZBnovWXBd1rQCiLDi
 5uU9ufGgSS2dC7BT81ZxY1d5OQzRVZAz4a21YddLJwkr0yMUndjrII0lQIEJ3ZtLqYMj
 gKahklux50pD5STvYQBBx9+qr4OvTScGxKP8OlJ1gLVomqgfxbGu1cCWvQ+ArMf0bXNn
 KGkpMDgPaVgamynJMf+LK60XsA9OgrC4n4zAaggbjJ+52JKCdYGn7R/8eTHNudT8Pgzt
 jXCHttz12xUGV9zwG4HU0OT3FCWCp8dJhJ6r/VjuVfx1NHRBBAZu5v3bu6h5g3NEcT3x
 viTQ==
X-Gm-Message-State: ALoCoQmakv3UjOgADkFTFZVZ6VPiik4sF+8hxArzlVt+0JwCZuNM53Mt8fuZ5i1i989//rYeEwgh
X-Received: by 10.182.89.164 with SMTP id bp4mr12924315obb.21.1405741791128;
 Fri, 18 Jul 2014 20:49:51 -0700 (PDT)
Received: from [29.201.216.104] (66-87-116-104.pools.spcsdns.net.
 [66.87.116.104])
 by mx.google.com with ESMTPSA id of9sm12687511obb.25.2014.07.18.20.49.50
 for <multiple recipients>
 (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128);
 Fri, 18 Jul 2014 20:49:50 -0700 (PDT)
References: <CALCpEUE7OtbXjVTk2C8+V7fjOKutuNq04BTo0SN42YEgX81k-Q@mail.gmail.com>
 <CAJ-VmokEiZMpdfNjs+-C9pmRcjOOjjNGTvM88muh940sr7SmPw@mail.gmail.com>
 <CALCpEUE-vebmaGSK5aGM+3q5YqzXkn1P=St7R8G_ztmHmgUBBA@mail.gmail.com>
 <20140719053318.I15959@besplex.bde.org>
 <CAJ-Vmonunb=WqcyJ+hJNDknj+zjMQVOWWk+u=MqhXkJkcm_DFQ@mail.gmail.com>
Mime-Version: 1.0 (1.0)
In-Reply-To: <CAJ-Vmonunb=WqcyJ+hJNDknj+zjMQVOWWk+u=MqhXkJkcm_DFQ@mail.gmail.com>
Content-Type: text/plain;
	charset=us-ascii
Content-Transfer-Encoding: 7bit
Message-Id: <82879751-237C-4BEB-8DD7-45884A5BB705@netgate.com>
X-Mailer: iPhone Mail (11D257)
From: Jim Thompson <jim@netgate.com>
Subject: Re: UDP sendto() returning ENOBUFS - "No buffer space available"
Date: Fri, 18 Jul 2014 23:49:48 -0400
To: Adrian Chadd <adrian@freebsd.org>
Cc: hiren panchasara <hiren.panchasara@gmail.com>,
 "freebsd-net@freebsd.org" <net@freebsd.org>
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net/>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 19 Jul 2014 03:49:58 -0000


> On Jul 18, 2014, at 23:34, Adrian Chadd <adrian@freebsd.org> wrote:
> 
> It upsets the ALTQ people too.

I'm an ALTQ person (pfSense, so maybe one if the biggest) and I'm not upset.

That cr*p needs to die in a fire. 

From owner-freebsd-net@FreeBSD.ORG  Sat Jul 19 06:16:44 2014
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: net@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 36C19826;
 Sat, 19 Jul 2014 06:16:44 +0000 (UTC)
Received: from mail106.syd.optusnet.com.au (mail106.syd.optusnet.com.au
 [211.29.132.42])
 by mx1.freebsd.org (Postfix) with ESMTP id B06CF2BE1;
 Sat, 19 Jul 2014 06:16:43 +0000 (UTC)
Received: from c122-106-147-133.carlnfd1.nsw.optusnet.com.au
 (c122-106-147-133.carlnfd1.nsw.optusnet.com.au [122.106.147.133])
 by mail106.syd.optusnet.com.au (Postfix) with ESMTPS id A8BE13CF4BA;
 Sat, 19 Jul 2014 16:16:35 +1000 (EST)
Date: Sat, 19 Jul 2014 16:16:28 +1000 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@besplex.bde.org
To: Adrian Chadd <adrian@freebsd.org>
Subject: Re: UDP sendto() returning ENOBUFS - "No buffer space available"
In-Reply-To: <CAJ-Vmonunb=WqcyJ+hJNDknj+zjMQVOWWk+u=MqhXkJkcm_DFQ@mail.gmail.com>
Message-ID: <20140719152125.A874@besplex.bde.org>
References: <CALCpEUE7OtbXjVTk2C8+V7fjOKutuNq04BTo0SN42YEgX81k-Q@mail.gmail.com>
 <CAJ-VmokEiZMpdfNjs+-C9pmRcjOOjjNGTvM88muh940sr7SmPw@mail.gmail.com>
 <CALCpEUE-vebmaGSK5aGM+3q5YqzXkn1P=St7R8G_ztmHmgUBBA@mail.gmail.com>
 <20140719053318.I15959@besplex.bde.org>
 <CAJ-Vmonunb=WqcyJ+hJNDknj+zjMQVOWWk+u=MqhXkJkcm_DFQ@mail.gmail.com>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
X-Optus-CM-Score: 0
X-Optus-CM-Analysis: v=2.1 cv=eojmkOZX c=1 sm=1 tr=0
 a=7NqvjVvQucbO2RlWB8PEog==:117 a=PO7r1zJSAAAA:8 a=qwo5vUNQll0A:10
 a=kj9zAlcOel0A:10 a=JzwRw_2MAAAA:8 a=6I5d2MoRAAAA:8
 a=-bYW5cNtWXaxnCy8UjIA:9 a=lee2i37mc5CgPEWu:21 a=XQH7iBNxuV895bsf:21
 a=CjuIK1q_8ugA:10 a=SV7veod9ZcQA:10
Cc: hiren panchasara <hiren.panchasara@gmail.com>,
 "freebsd-net@freebsd.org" <net@freebsd.org>
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net/>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 19 Jul 2014 06:16:44 -0000

On Fri, 18 Jul 2014, Adrian Chadd wrote:

> On 18 July 2014 13:40, Bruce Evans <brde@optusnet.com.au> wrote:
>> On Fri, 18 Jul 2014, hiren panchasara wrote:
>>
>>> On Wed, Jul 16, 2014 at 11:00 AM, Adrian Chadd <adrian@freebsd.org> wrote:
>>>>
>>>> Hi!
>>>>
>>>> So the UDP transmit path is udp_usrreqs->pru_send() == udp_send() ->
>>>> udp_output() -> ip_output()
>>>>
>>>> udp_output() does do a M_PREPEND() which can return ENOBUFS. ip_output
>>>> can also return ENOBUFS.
>>>>
>>>> it doesn't look like the socket code (eg sosend_dgram()) is doing any
>>>> buffering - it's just copying the frame and stuffing it up to the
>>>> driver. No queuing involved before the NIC.
>>>
>>> Right. Thanks for confirming.
>>
>> Most buffering should be in ifq above the NIC.  For UDP, I think
>> udp_output() puts buffers on the ifq and calls the driver for every
>> one, but the driver shouldn't do anything for most calls.  The
>> driver can't possibly do anything if its ring buffer is full, and
>> shouldn't do anything if it is nearly full.  Buffers accumulate in
>> the ifq until the driver gets around to them or the queue fills up.
>> Most ENOBUFS errors are for when it fills up.  It can very easily
>> fill up, especially since it is too small in most configurations.
>> Just loop calling sendto().  This will fill the ifq almost
>> instantly unless the hardware is faster than the software.
>
> For if_transmit() drivers, there's no ifp queue. The queuing is being
> done in the driver.
>
> For drivers with if_transmit(), they may end up doing direct DMA ring
> dispatch or they may have a buf_ring in front of it.There's no ifq
> anymore. It upsets the ALTQ people too.

Ah, a new source of bugs.  Most drivers don't use this yet.  Most still
use ifq with the bogus size of (tx_ring_size - 1):

Ones converted to the indirect API:
% dev/bge/if_bge.c:	if_setsendqlen(ifp, BGE_TX_RING_CNT - 1);
% dev/bxe/bxe.c:    if_setsendqlen(ifp, sc->tx_ring_size);

bxe is one of the few without the silly subtraction of 1.

% dev/e1000/if_em.c:	if_setsendqlen(ifp, adapter->num_tx_desc - 1);
% dev/e1000/if_lem.c:	if_setsendqlen(ifp, adapter->num_tx_desc - 1);
% dev/fxp/if_fxp.c:	if_setsendqlen(ifp, FXP_NTXCB - 1);
% dev/nfe/if_nfe.c:	if_setsendqlen(ifp, NFE_TX_RING_COUNT - 1);

Ones not converted:
% dev/ae/if_ae.c:	ifp->if_snd.ifq_drv_maxlen = ifqmaxlen;
% dev/ae/if_ae.c:	IFQ_SET_MAXLEN(&ifp->if_snd, ifp->if_snd.ifq_drv_maxlen);

The double setting is related to ALTQ.  I grepped for maxlen to find both.
I might have missed alternative spellings.

ifqmaxlen is usually 50, so all drivers using it have very little buffering.
Even if their tx ring is tiny, this 50 is too small above 1 or 10 Mbps.

% dev/age/if_age.c:	ifp->if_snd.ifq_drv_maxlen = AGE_TX_RING_CNT - 1;
% dev/age/if_age.c:	IFQ_SET_MAXLEN(&ifp->if_snd, ifp->if_snd.ifq_drv_maxlen);
% dev/alc/if_alc.c:	ifp->if_snd.ifq_drv_maxlen = ALC_TX_RING_CNT - 1;
% dev/alc/if_alc.c:	IFQ_SET_MAXLEN(&ifp->if_snd, ifp->if_snd.ifq_drv_maxlen);
% dev/ale/if_ale.c:	ifp->if_snd.ifq_drv_maxlen = ALE_TX_RING_CNT - 1;
% dev/ale/if_ale.c:	IFQ_SET_MAXLEN(&ifp->if_snd, ifp->if_snd.ifq_drv_maxlen);
% dev/an/if_an.c:	IFQ_SET_MAXLEN(&ifp->if_snd, ifqmaxlen);
% dev/an/if_an.c:	ifp->if_snd.ifq_drv_maxlen = ifqmaxlen;
% dev/asmc/asmc.c:	uint8_t maxlen;
% dev/asmc/asmc.c:		maxlen = type[0];

Grepping for maxlen unfortunately found related things.  I deleted most
after this.

% dev/ath/if_ath.c:	IFQ_SET_MAXLEN(&ifp->if_snd, ifqmaxlen);
% dev/ath/if_ath.c:	ifp->if_snd.ifq_drv_maxlen = ifqmaxlen;
% dev/bce/if_bce.c:	ifp->if_snd.ifq_drv_maxlen = USABLE_TX_BD_ALLOC;
% dev/bce/if_bce.c:	IFQ_SET_MAXLEN(&ifp->if_snd, ifp->if_snd.ifq_drv_maxlen);
% dev/bfe/if_bfe.c:	ifp->if_snd.ifq_drv_maxlen = BFE_TX_QLEN;
% dev/bm/if_bm.c:	ifp->if_snd.ifq_drv_maxlen = BM_MAX_TX_PACKETS;
% dev/bwi/if_bwi.c:	IFQ_SET_MAXLEN(&ifp->if_snd, ifqmaxlen);
% dev/bwi/if_bwi.c:	ifp->if_snd.ifq_drv_maxlen = ifqmaxlen;
% dev/bwn/if_bwn.c:	IFQ_SET_MAXLEN(&ifp->if_snd, ifqmaxlen);
% dev/bwn/if_bwn.c:	ifp->if_snd.ifq_drv_maxlen = ifqmaxlen;
% dev/cadence/if_cgem.c:	ifp->if_snd.ifq_drv_maxlen = IFQ_MAXLEN;
% dev/cas/if_cas.c:	ifp->if_snd.ifq_drv_maxlen = CAS_TXQUEUELEN;
% dev/ce/if_ce.c:		d->queue.ifq_maxlen	= ifqmaxlen;
% dev/ce/if_ce.c:		d->hi_queue.ifq_maxlen	= ifqmaxlen;
% dev/ce/if_ce.c:		d->rqueue.ifq_maxlen	= ifqmaxlen;
% dev/ce/if_ce.c:		d->rqueue.ifq_maxlen	= ifqmaxlen;

Seems silly to have many tiny queues, especially when their length is
nominal and can be changed by tunables if not sysctls so that it is
not actually tiny.  But good for latency.

% dev/cm/smc90cx6.c:	ifp->if_snd.ifq_maxlen = ifqmaxlen;
% dev/cp/if_cp.c:		d->queue.ifq_maxlen = ifqmaxlen;
% dev/cp/if_cp.c:		d->hi_queue.ifq_maxlen = ifqmaxlen;
% dev/cp/if_cp.c:		d->queue.ifq_maxlen	= NRBUF;
% dev/cs/if_cs.c:	IFQ_SET_MAXLEN(&ifp->if_snd, ifqmaxlen);
% dev/ctau/if_ct.c:		d->queue.ifq_maxlen = ifqmaxlen;
% dev/ctau/if_ct.c:		d->hi_queue.ifq_maxlen = ifqmaxlen;
% dev/ctau/if_ct.c:		d->queue.ifq_maxlen	= NBUF;
% dev/cx/if_cx.c:		d->lo_queue.ifq_maxlen = ifqmaxlen;
% dev/cx/if_cx.c:		d->hi_queue.ifq_maxlen = ifqmaxlen;
% dev/cx/if_cx.c:		d->queue.ifq_maxlen	= 2;

Not that's a tiny queue which can't be broken by changing the sysctl.

% dev/dc/if_dc.c:	ifp->if_snd.ifq_drv_maxlen = DC_TX_LIST_CNT - 1;
% dev/de/if_de.c:    IFQ_SET_MAXLEN(&ifp->if_snd, ifqmaxlen);
% dev/de/if_de.c:    ifp->if_snd.ifq_drv_maxlen = ifqmaxlen;
% dev/e1000/if_igb.c:	ifp->if_snd.ifq_drv_maxlen = adapter->num_tx_desc - 1;
% dev/ed/if_ed.c:	IFQ_SET_MAXLEN(&ifp->if_snd, ifqmaxlen);
% dev/ed/if_ed.c:	ifp->if_snd.ifq_drv_maxlen = ifqmaxlen;
% dev/ep/if_ep.c:	IFQ_SET_MAXLEN(&ifp->if_snd, ifqmaxlen);
% dev/ep/if_ep.c:	ifp->if_snd.ifq_drv_maxlen = ifqmaxlen;
% dev/et/if_et.c:	ifp->if_snd.ifq_drv_maxlen = ET_TX_NDESC - 1;
% dev/ex/if_ex.c:	IFQ_SET_MAXLEN(&ifp->if_snd, ifqmaxlen);
% dev/fatm/if_fatm.c:	ifp->if_snd.ifq_maxlen = 512;
% dev/fe/if_fe.c:	IFQ_SET_MAXLEN(&ifp->if_snd, ifqmaxlen);
% dev/ffec/if_ffec.c:	ifp->if_snd.ifq_drv_maxlen = TX_DESC_COUNT - 1;
% dev/firewire/if_fwe.c:	ifp->if_snd.ifq_maxlen = TX_MAX_QUEUE;
% dev/firewire/if_fwip.c:	ifp->if_snd.ifq_maxlen = TX_MAX_QUEUE;
% dev/gem/if_gem.c:	ifp->if_snd.ifq_drv_maxlen = GEM_TXQUEUELEN;
% dev/hme/if_hme.c:	ifp->if_snd.ifq_drv_maxlen = HME_NTXQ;
% dev/i40e/if_i40e.c:	ifp->if_snd.ifq_maxlen = que->num_desc - 2;
% dev/ie/if_ie.c:	IFQ_SET_MAXLEN(&ifp->if_snd, ifqmaxlen);
% dev/if_ndis/if_ndis.c:	ifp->if_snd.ifq_drv_maxlen = 25;
% dev/iicbus/if_ic.c:	ifp->if_snd.ifq_maxlen = ifqmaxlen;
% dev/ipw/if_ipw.c:	IFQ_SET_MAXLEN(&ifp->if_snd, ifqmaxlen);
% dev/ipw/if_ipw.c:	ifp->if_snd.ifq_drv_maxlen = ifqmaxlen;
% dev/iwi/if_iwi.c:	IFQ_SET_MAXLEN(&ifp->if_snd, ifqmaxlen);
% dev/iwi/if_iwi.c:	ifp->if_snd.ifq_drv_maxlen = ifqmaxlen;
% dev/iwn/if_iwn.c:	IFQ_SET_MAXLEN(&ifp->if_snd, ifqmaxlen);
% dev/iwn/if_iwn.c:	ifp->if_snd.ifq_drv_maxlen = ifqmaxlen;
% dev/ixgb/if_ixgb.c:	ifp->if_snd.ifq_maxlen = adapter->num_tx_desc - 1;
% dev/ixgbe/ixgbe.c:	ifp->if_snd.ifq_drv_maxlen = adapter->num_tx_desc - 2;
% dev/ixgbe/ixv.c:	ifp->if_snd.ifq_maxlen = adapter->num_tx_desc - 2;
% dev/jme/if_jme.c:	ifp->if_snd.ifq_drv_maxlen = JME_TX_RING_CNT - 1;
% dev/jme/if_jme.c:	IFQ_SET_MAXLEN(&ifp->if_snd, ifp->if_snd.ifq_drv_maxlen);
% dev/le/lance.c:	IFQ_SET_MAXLEN(&ifp->if_snd, ifqmaxlen);
% dev/le/lance.c:	ifp->if_snd.ifq_drv_maxlen = ifqmaxlen;
% dev/lge/if_lge.c:	ifp->if_snd.ifq_maxlen = LGE_TX_LIST_CNT - 1;
% dev/malo/if_malo.c:	IFQ_SET_MAXLEN(&ifp->if_snd, ifqmaxlen);
% dev/malo/if_malo.c:	ifp->if_snd.ifq_drv_maxlen = ifqmaxlen;
% dev/mge/if_mge.c:	ifp->if_snd.ifq_drv_maxlen = MGE_TX_DESC_NUM - 1;
% dev/mge/if_mge.c:	IFQ_SET_MAXLEN(&ifp->if_snd, ifp->if_snd.ifq_drv_maxlen);
% dev/msk/if_msk.c:	ifp->if_snd.ifq_drv_maxlen = MSK_TX_RING_CNT - 1;
% dev/mwl/if_mwl.c:	IFQ_SET_MAXLEN(&ifp->if_snd, ifqmaxlen);
% dev/mwl/if_mwl.c:	ifp->if_snd.ifq_drv_maxlen = ifqmaxlen;
% dev/mxge/if_mxge.c:	sc->ifp->if_snd.ifq_drv_maxlen = sc->ifp->if_snd.ifq_maxlen;
% dev/my/if_my.c:	IFQ_SET_MAXLEN(&ifp->if_snd, ifqmaxlen);
% dev/my/if_my.c:	ifp->if_snd.ifq_drv_maxlen = ifqmaxlen;
% dev/nge/if_nge.c:	ifp->if_snd.ifq_drv_maxlen = NGE_TX_RING_CNT - 1;
% dev/nge/if_nge.c:	IFQ_SET_MAXLEN(&ifp->if_snd, ifp->if_snd.ifq_drv_maxlen);
% dev/nxge/if_nxge.c:	ifnetp->if_snd.ifq_maxlen = ifqmaxlen;
% dev/oce/oce_if.c:	sc->ifp->if_snd.ifq_drv_maxlen = OCE_MAX_TX_DESC - 1;
% dev/oce/oce_if.c:	IFQ_SET_MAXLEN(&sc->ifp->if_snd, sc->ifp->if_snd.ifq_drv_maxlen);
% dev/patm/if_patm.c:	sc->scd0->q.ifq_maxlen = PATM_DLFT_MAXQ;
% dev/patm/if_patm.c:	scd->q.ifq_maxlen = PATM_TX_IFQLEN;
% dev/pcn/if_pcn.c:	ifp->if_snd.ifq_maxlen = PCN_TX_LIST_CNT - 1;
% dev/pdq/pdq_ifsubr.c:    ifp->if_snd.ifq_maxlen = ifqmaxlen;
% dev/ppbus/if_plip.c:	ifp->if_snd.ifq_maxlen = ifqmaxlen;

ifqmaxlen = 50 is actually right for plip, but only if someone doesn't
bump it up using the sysctl.

% dev/qlxgb/qla_os.c:	IFQ_SET_MAXLEN(&ifp->if_snd, qla_get_ifq_snd_maxlen(ha));
% dev/qlxgb/qla_os.c:	ifp->if_snd.ifq_drv_maxlen = qla_get_ifq_snd_maxlen(ha);
% dev/qlxgbe/ql_os.c:	IFQ_SET_MAXLEN(&ifp->if_snd, qla_get_ifq_snd_maxlen(ha));
% dev/qlxgbe/ql_os.c:	ifp->if_snd.ifq_drv_maxlen = qla_get_ifq_snd_maxlen(ha);
% dev/qlxge/qls_os.c:	IFQ_SET_MAXLEN(&ifp->if_snd, qls_get_ifq_snd_maxlen(ha));
% dev/qlxge/qls_os.c:	ifp->if_snd.ifq_drv_maxlen = qls_get_ifq_snd_maxlen(ha);
% dev/ral/rt2560.c:	IFQ_SET_MAXLEN(&ifp->if_snd, ifqmaxlen);
% dev/ral/rt2560.c:	ifp->if_snd.ifq_drv_maxlen = ifqmaxlen;
% dev/ral/rt2661.c:	IFQ_SET_MAXLEN(&ifp->if_snd, ifqmaxlen);
% dev/ral/rt2661.c:	ifp->if_snd.ifq_drv_maxlen = ifqmaxlen;
% dev/ral/rt2860.c:	IFQ_SET_MAXLEN(&ifp->if_snd, ifqmaxlen);
% dev/ral/rt2860.c:	ifp->if_snd.ifq_drv_maxlen = ifqmaxlen;
% dev/re/if_re.c:	ifp->if_snd.ifq_drv_maxlen = RL_IFQ_MAXLEN;
% dev/rt/if_rt.c:	ifp->if_snd.ifq_drv_maxlen = RT_TX_QLEN;
% dev/sbni/if_sbni.c:	IFQ_SET_MAXLEN(&ifp->if_snd, ifqmaxlen);
% dev/sf/if_sf.c:	ifp->if_snd.ifq_drv_maxlen = SF_TX_DLIST_CNT - 1;
% dev/sfxge/sfxge.c:	ifp->if_snd.ifq_drv_maxlen = SFXGE_NDESCS - 1;
% dev/sge/if_sge.c:	ifp->if_snd.ifq_drv_maxlen = SGE_TX_RING_CNT - 1;
% dev/sge/if_sge.c:	IFQ_SET_MAXLEN(&ifp->if_snd, ifp->if_snd.ifq_drv_maxlen);
% dev/sis/if_sis.c:	ifp->if_snd.ifq_drv_maxlen = SIS_TX_LIST_CNT - 1;
% dev/sk/if_sk.c:	ifp->if_snd.ifq_drv_maxlen = SK_TX_RING_CNT - 1;
% dev/smc/if_smc.c:	IFQ_SET_MAXLEN(&ifp->if_snd, ifqmaxlen);
% dev/sn/if_sn.c:	IFQ_SET_MAXLEN(&ifp->if_snd, ifqmaxlen);
% dev/sn/if_sn.c:	ifp->if_snd.ifq_maxlen = ifqmaxlen;
% dev/snc/dp83932.c:	IFQ_SET_MAXLEN(&ifp->if_snd, ifqmaxlen);
% dev/ste/if_ste.c:	ifp->if_snd.ifq_drv_maxlen = STE_TX_LIST_CNT - 1;
% dev/stge/if_stge.c:	ifp->if_snd.ifq_drv_maxlen = STGE_TX_RING_CNT - 1;
% dev/stge/if_stge.c:	IFQ_SET_MAXLEN(&ifp->if_snd, ifp->if_snd.ifq_drv_maxlen);
% dev/ti/if_ti.c:	ifp->if_snd.ifq_drv_maxlen = TI_TX_RING_CNT - 1;
% dev/ti/if_ti.c:	IFQ_SET_MAXLEN(&ifp->if_snd, ifp->if_snd.ifq_drv_maxlen);
% dev/tl/if_tl.c:	ifp->if_snd.ifq_maxlen = TL_TX_LIST_CNT - 1;
% dev/tsec/if_tsec.c:	ifp->if_snd.ifq_drv_maxlen = TSEC_TX_NUM_DESC - 1;
% dev/txp/if_txp.c:	ifp->if_snd.ifq_drv_maxlen = TX_ENTRIES - 1;
% dev/txp/if_txp.c:	IFQ_SET_MAXLEN(&ifp->if_snd, ifp->if_snd.ifq_drv_maxlen);
% dev/usb/usb_dev.c:	f->free_q.ifq_maxlen = nbuf;
% dev/usb/usb_dev.c:	f->used_q.ifq_maxlen = nbuf;
% dev/vge/if_vge.c:	ifp->if_snd.ifq_drv_maxlen = VGE_TX_DESC_CNT - 1;
% dev/vr/if_vr.c:	ifp->if_snd.ifq_maxlen = VR_TX_RING_CNT - 1;
% dev/vte/if_vte.c:	ifp->if_snd.ifq_drv_maxlen = VTE_TX_RING_CNT - 1;
% dev/vte/if_vte.c:	IFQ_SET_MAXLEN(&ifp->if_snd, ifp->if_snd.ifq_drv_maxlen);
% dev/vx/if_vx.c:	ifp->if_snd.ifq_maxlen = ifqmaxlen;
% dev/vxge/vxge.c:	ifp->if_snd.ifq_drv_maxlen = max(vdev->config.ifq_maxlen, ifqmaxlen);
% dev/vxge/vxge.c:	IFQ_SET_MAXLEN(&ifp->if_snd, ifp->if_snd.ifq_drv_maxlen);
% dev/wb/if_wb.c:	ifp->if_snd.ifq_maxlen = WB_TX_LIST_CNT - 1;
% dev/wi/if_wi.c:	IFQ_SET_MAXLEN(&ifp->if_snd, ifqmaxlen);
% dev/wi/if_wi.c:	ifp->if_snd.ifq_drv_maxlen = ifqmaxlen;
% dev/wl/if_wl.c:    ifp->if_snd.ifq_maxlen = ifqmaxlen;
% dev/wpi/if_wpi.c:	IFQ_SET_MAXLEN(&ifp->if_snd, ifqmaxlen);
% dev/wpi/if_wpi.c:	ifp->if_snd.ifq_drv_maxlen = ifqmaxlen;
% dev/wtap/if_wtap.c:	IFQ_SET_MAXLEN(&ifp->if_snd, ifqmaxlen);
% dev/wtap/if_wtap.c:	ifp->if_snd.ifq_drv_maxlen = ifqmaxlen;
% dev/xe/if_xe.c:	IFQ_SET_MAXLEN(&scp->ifp->if_snd, ifqmaxlen);
% dev/xl/if_xl.c:	ifp->if_snd.ifq_drv_maxlen = XL_TX_LIST_CNT - 1;

There are so many drivers that just removing the silly subtraction of 1
or the excessive use of the global ifqmaxlen in them is a daunting task.
You would have to check that the silly subtraction isn't actually needed.
Changing ifqmaxlen is easier.  It is supposed to be variable and not
closely related to devices, so its not your fault if changing to another
value (not so directly connected to ifqmaxlen) breaks the driver.

I have only worked on bge and sk much.  This required intricate
device-dependent changes to implement watermark stuff in the tx rings.
The hardware doesn't really support watermark stuff but it is possible
to emulate it.

% net/if.c:SYSCTL_INT(_net_link, OID_AUTO, ifqmaxlen, CTLFLAG_RDTUN,
% net/if.c:    &ifqmaxlen, 0, "max send queue size");
% net/if.c:int	ifqmaxlen = IFQ_MAXLEN;
% net/if_atmsubr.c:	ifp->if_snd.ifq_maxlen = 50;	/* dummy */

No reason to spell 50 as 50 instead of as ifqmaxlen?  A random default
works especially well when it is not used.

% net/if_disc.c:	ifp->if_snd.ifq_maxlen = 20;

Tiny queues may be even worse for synthetic devices than for real ones.
The real ones tend to have large enough tx rings except under load,
and the load is limited by the link speed.  But for synthetic ones
there might not be any more buffering, and the only speed limits are
in software.  When the queue fills up, the application has the same
problem of restarting as soon as possible without busy-waiting as for
hardware devices.

% net/if_edsc.c:	ifp->if_snd.ifq_maxlen = ifqmaxlen;
% net/if_enc.c:	ifp->if_snd.ifq_maxlen = ifqmaxlen;
% net/if_epair.c:	ifp->if_snd.ifq_maxlen = ifqmaxlen;
% net/if_epair.c:	ifp->if_snd.ifq_maxlen = ifqmaxlen;
% net/if_epair.c:		epair_nh.nh_qlimit = 42 * ifqmaxlen; /* 42 shall be the number. */

It is a better too-small number than 50.

% net/if_faith.c:	ifp->if_snd.ifq_maxlen = ifqmaxlen;
% net/if_gif.c:	GIF2IFP(sc)->if_snd.ifq_maxlen = ifqmaxlen;
% net/if_gre.c:	GRE2IFP(sc)->if_snd.ifq_maxlen = ifqmaxlen;
% net/if_loop.c:	ifp->if_snd.ifq_maxlen = ifqmaxlen;
% net/if_mib.c:		ifmd.ifmd_snd_maxlen = ifp->if_snd.ifq_maxlen;
% net/if_mib.c:		ifp->if_snd.ifq_maxlen = ifmd.ifmd_snd_maxlen;
% net/if_spppsubr.c: 	ifp->if_snd.ifq_maxlen = 32;
% net/if_spppsubr.c: 	sp->pp_fastq.ifq_maxlen = 32;
% net/if_spppsubr.c: 	sp->pp_cpq.ifq_maxlen = 20;
% net/if_stf.c:	ifp->if_snd.ifq_maxlen = ifqmaxlen;
% net/if_tap.c:	IFQ_SET_MAXLEN(&ifp->if_snd, ifqmaxlen);
% net/if_tun.c:	IFQ_SET_MAXLEN(&ifp->if_snd, ifqmaxlen);
% net/if_tun.c:	ifp->if_snd.ifq_drv_maxlen = 0;
% netgraph/ng_device.c:	IFQ_SET_MAXLEN(&priv->readq, ifqmaxlen);
% netgraph/ng_eiface.c:	ifp->if_snd.ifq_maxlen = ifqmaxlen;
% netgraph/ng_iface.c:	IFQ_SET_MAXLEN(&ifp->if_snd, ifqmaxlen);
% netgraph/ng_iface.c:	ifp->if_snd.ifq_drv_maxlen = ifqmaxlen;
% netgraph/ng_source.c:	sc->snd_queue.ifq_maxlen = 2048;	/* XXX not checked */
% netgraph/ng_tty.c:	IFQ_SET_MAXLEN(&sc->outq, ifqmaxlen);

End of synthetic devices.

% pci/if_rl.c:	IFQ_SET_MAXLEN(&ifp->if_snd, ifqmaxlen);
% pci/if_rl.c:	ifp->if_snd.ifq_drv_maxlen = ifqmaxlen;

if_transmit is relatively rarely used:

% dev/ath/if_ath.c:	ifp->if_transmit = ath_transmit;
% dev/cxgb/cxgb_main.c:	ifp->if_transmit = cxgb_transmit;
% dev/cxgbe/t4_main.c:	ifp->if_transmit = cxgbe_transmit;
% dev/cxgbe/t4_netmap.c:	ifp->if_transmit = cxgbe_nm_transmit;
% dev/cxgbe/t4_tracer.c:	ifp->if_transmit = tracer_transmit;
% dev/e1000/if_igb.c:	ifp->if_transmit = igb_mq_start;
% dev/i40e/if_i40e.c:	ifp->if_transmit = i40e_mq_start;
% dev/ixgbe/ixgbe.c:	ifp->if_transmit = ixgbe_mq_start;
% dev/ixgbe/ixv.c:	ifp->if_transmit = ixv_mq_start;
% dev/mxge/if_mxge.c:	ifp->if_transmit = mxge_transmit;
% dev/netmap/netmap_freebsd.c:		na->if_transmit = ifp->if_transmit;
% dev/netmap/netmap_freebsd.c:		ifp->if_transmit = netmap_transmit;
% dev/netmap/netmap_freebsd.c:		ifp->if_transmit = na->if_transmit;
% dev/oce/oce_if.c:	sc->ifp->if_transmit = oce_multiq_start;
% dev/sfxge/sfxge.c:	ifp->if_transmit = sfxge_if_transmit;
% dev/sfxge/sfxge_tx.c:sfxge_if_transmit(struct ifnet *ifp, struct mbuf *m)
% dev/vxge/vxge.c:	ifp->if_transmit = vxge_mq_send;
% dev/wtap/if_wtap.c:wtap_if_transmit(struct ifnet *ifp, struct mbuf *m)
% dev/wtap/if_wtap.c:	sc->if_transmit = ifp->if_transmit;
% dev/wtap/if_wtap.c:	ifp->if_transmit = wtap_if_transmit;

Bruce

From owner-freebsd-net@FreeBSD.ORG  Sat Jul 19 09:26:47 2014
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: freebsd-net@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id AAA18F3C
 for <freebsd-net@freebsd.org>; Sat, 19 Jul 2014 09:26:47 +0000 (UTC)
Received: from mail.ipfw.ru (mail.ipfw.ru [IPv6:2a01:4f8:120:6141::2])
 (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 6E98229BC
 for <freebsd-net@freebsd.org>; Sat, 19 Jul 2014 09:26:47 +0000 (UTC)
Received: from [2a02:6b8:0:401:222:4dff:fe50:cd2f] (helo=ptichko.yndx.net)
 by mail.ipfw.ru with esmtpsa (TLSv1:DHE-RSA-AES128-SHA:128)
 (Exim 4.82 (FreeBSD)) (envelope-from <melifaro@FreeBSD.org>)
 id 1X8MyC-0009CH-NN; Sat, 19 Jul 2014 09:14:04 +0400
Message-ID: <53CA39BD.6050900@FreeBSD.org>
Date: Sat, 19 Jul 2014 13:26:21 +0400
From: "Alexander V. Chernikov" <melifaro@FreeBSD.org>
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64;
 rv:24.0) Gecko/20100101 Thunderbird/24.5.0
MIME-Version: 1.0
To: hiren panchasara <hiren.panchasara@gmail.com>, 
 Kajetan Staszkiewicz <vegeta@tuxpowered.net>
Subject: Re: Why is r250764 not in 9.3?
References: <201407151132.53587.vegeta@tuxpowered.net>
 <CALCpEUFAsdTeVivGqG0LzYWW0J==_vMC6muUjg4VsC5=NaefDQ@mail.gmail.com>
In-Reply-To: <CALCpEUFAsdTeVivGqG0LzYWW0J==_vMC6muUjg4VsC5=NaefDQ@mail.gmail.com>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Cc: "freebsd-net@freebsd.org" <freebsd-net@freebsd.org>
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net/>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 19 Jul 2014 09:26:47 -0000

On 15.07.2014 21:03, hiren panchasara wrote:
> + Alexander
>
> On Tue, Jul 15, 2014 at 2:32 AM, Kajetan Staszkiewicz
> <vegeta@tuxpowered.net> wrote:
>> The time has come to upgrade my routers to FreeBSD 9.3.
>>
>> While going through list of patches I had on 9.1, I've noticed that r248070 got
>> into 9.3 but r250764 did not. Why is that?
> Probably just missed it.
Yes, I've missed it.
Unfortunately, I'm unable to merge it until 26July, feel free to do so 
if you wish.
>
> cheers,
> Hiren
>


From owner-freebsd-net@FreeBSD.ORG  Sat Jul 19 09:33:27 2014
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: freebsd-net@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 8E32D28B
 for <freebsd-net@freebsd.org>; Sat, 19 Jul 2014 09:33:27 +0000 (UTC)
Received: from mail.ipfw.ru (mail.ipfw.ru [IPv6:2a01:4f8:120:6141::2])
 (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 541AE2A79
 for <freebsd-net@freebsd.org>; Sat, 19 Jul 2014 09:33:27 +0000 (UTC)
Received: from [2a02:6b8:0:401:222:4dff:fe50:cd2f] (helo=ptichko.yndx.net)
 by mail.ipfw.ru with esmtpsa (TLSv1:DHE-RSA-AES128-SHA:128)
 (Exim 4.82 (FreeBSD)) (envelope-from <melifaro@FreeBSD.org>)
 id 1X8N4g-0009HL-27; Sat, 19 Jul 2014 09:20:46 +0400
Message-ID: <53CA3B4E.8080608@FreeBSD.org>
Date: Sat, 19 Jul 2014 13:33:02 +0400
From: "Alexander V. Chernikov" <melifaro@FreeBSD.org>
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64;
 rv:24.0) Gecko/20100101 Thunderbird/24.5.0
MIME-Version: 1.0
To: Daniel Corbe <corbe@corbe.net>, freebsd-net@freebsd.org
Subject: Re: netmap, selective processing.
References: <ygfzjg9tcs5.fsf@corbe.net>
In-Reply-To: <ygfzjg9tcs5.fsf@corbe.net>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net/>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 19 Jul 2014 09:33:27 -0000

On 16.07.2014 21:48, Daniel Corbe wrote:
> I hope this it the right place to ask questions about netmap.  I'm
> toying with the idea of writing a netmap-based OSPF implementation
> because bird's OSPF implementation isn't as good as its BGP
Hm. What do you need from bird OSPF implementation?
IMHO it is much easier to improve and merge bird code instead of
writing another OSPF implementation from scratch.

There are _some_ non-resolved issues with OSPF lsa withdrawal/announce, 
but it will be fixed "soon".

> implementation, quagga doesn't scale well and openospfd doesn't compile
> on 10-RELEASE or CURRENT.
>
> But I'm only interested in selectively processing packets on the
> netmap-enabled interface.  Is there a way to do this?  Or alternatively
Yes, you can do this by adding another to-host inteface. AFAIK current 
bridge code for netmap is a good example.
In fact, we're using netmap as forwarding appliance with bird as control 
plane mechanism.
> if I throw the IF into netmap mode, can I process what I'm interested in
> processing and then somehow throw the rest of the traffic back up to the
> host's IP stack?
>
> _______________________________________________
> freebsd-net@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org"
>