Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 4 May 2015 16:41:46 +0000 (UTC)
From:      Barney Cordoba <barney_cordoba@yahoo.com>
To:        Ian Smith <smithi@nimnet.asn.au>
Cc:        Luigi Rizzo <rizzo@iet.unipi.it>,  "freebsd-net@freebsd.org" <freebsd-net@freebsd.org>
Subject:   Re: Fwd: netmap-ipfw on em0 em1
Message-ID:  <268446884.1206550.1430757707012.JavaMail.yahoo@mail.yahoo.com>
In-Reply-To: <20150505014431.O26659@sola.nimnet.asn.au>
References:  <20150505014431.O26659@sola.nimnet.asn.au>

next in thread | previous in thread | raw e-mail | index | archive | help
Nothing freely available. Many commercial companies have done such things. =
Why limit the general community by force-feeding =C2=A0a really fast packet=
 generator into the mainstream by squashing other ideas in their infancy? A=
nyone who understands how the kernel works understands what I'm saying. A p=
acket forwarder is a 3 day project (which means 2 weeks as we all know).=C2=
=A0
When you're can't debate the merits of an implementation without having som=
e weenie ask if you have a finished implementation to offer up for free, yo=
u end up stuck with misguided junk like netgraph and flowtables.=C2=A0
The mediocrity of freebsd network "utilities" is a function of the collecti=
ve imagination of its users. Its unfortunate that these lists can't be used=
 to brainstorm better potential better ideas. Luigi's efforts are not dimin=
ished by arguing that there is a better way to do something that he recomme=
nds to be done with netmap.
BC


     On Monday, May 4, 2015 11:52 AM, Ian Smith <smithi@nimnet.asn.au> wrot=
e:
  =20

 On Mon, 4 May 2015 15:29:13 +0000, Barney Cordoba via freebsd-net wrote:

 > It's not faster than "wedging" into the if_input()s. It simply can't=20
 > be. Your getting packets at interrupt time as soon as their processed=20
 > and =C2=A0you there's no network stack involved, and your able to receiv=
e=20
 > and transmit without a process switch. At worst it's the same,=20
 > without the extra plumbing. It's not rocket science to "bypass the=20
 > network stack".

 > The only advantage of bringing it into user space would be that it's=20
 > easier to write threaded handlers for complex uses; but not as a=20
 > firewall (which is the limit of the context of my comment). You can=20
 > do anything in the kernel that you can do in user space. The reason a=20
 > kernel module with if_input() hooks is better is that you can use the=20
 > standard kernel without all of the netmap hacks. You can just pop it=20
 > into any kernel and it works.

Barney, do you have a working alternative implementation you can share=20
with us to help put this silly inferior netmap thingy out of business?

Thanks, Ian


[I'm sorry, pine doesn't quote messages from some yahoo users properly:]

On Sunday, May 3, 2015 2:13 PM, Luigi Rizzo <rizzo@iet.unipi.it> wrote:

 On Sun, May 3, 2015 at 6:17 PM, Barney Cordoba via freebsd-net <
freebsd-net@freebsd.org> wrote:

> Frankly I'm baffled by netmap. You can easily write a loadable kernel
> module that moves packets from 1 interface to another and hook in the
> firewall; why would you want to bring them up into user space? It's 1000s
> of lines of unnecessary code.
>
>
Because it is much faster.

The motivation for netmap-like
solutions (that includes Intel's DPDK, PF_RING/DNA
and several proprietary implementations) is speed:
they bypass the entire network stack, and a
good part of the device drivers, so you can access
packets=20

10+ times faster.
So things are actually the other way around:
the 1000's of unnecessary
lines of code
(not really thousands, though)
are
those that you'd pay going through the standard
network stack
when you
don't need any of its services.

Going to userspace is just a side effect -- turns out to
be easier to develop and run your packet processing code
in userspace, but there are netmap clients (e.g. the
VALE software switch) which run entirely in the kernel.

cheers
luigi



>
>
>=C2=A0 =C2=A0 =C2=A0 On Sunday, May 3, 2015 3:10 AM, Raimundo Santos <rait=
ech@gmail.com>
> wrote:
>
>
>=C2=A0 Clarifying things for the sake of documentation:
>
> To use the host stack, append a ^ character after the name of the interfa=
ce
> you want to use. (Info from netmap(4) shipped with FreeBSD 10.1 RELEASE.)
>
> Examples:
>
> "kipfw em0" does nothing useful.
> "kipfw netmap:em0" disconnects the NIC from the usual data path, i.e.,
> there are no host communications.
> "kipfw netmap:em0 netmap:em0^" or "kipfw netmap:em0+" places the
> netmap-ipfw rules between the NIC and the host stack entry point associat=
ed
> (the IP addresses configured on it with ifconfig, ARP and RARP, etc...)
> with the same NIC.
>
> On 10 November 2014 at 18:29, Evandro Nunes <evandronunes12@gmail.com>
> wrote:
>
> > dear professor luigi,
> > i have some numbers, I am filtering 773Kpps with kipfw using 60% of CPU
> and
> > system using the rest, this system is a 8core at 2.4Ghz, but only one
> core
> > is in use
> > in this next round of tests, my NIC is now an avoton with igb(4) driver=
,
> > currently with 4 queues per NIC (total 8 queues for kipfw bridge)
> > i have read in your papers we should expect something similar to 1.48Mp=
ps
> > how can I benefit from the other CPUs which are completely idle? I trie=
d
> > CPU Affinity (cpuset) kipfw but system CPU usage follows userland kipfw
> so
> > I could not set one CPU to userland while other for system
> >
>
> All the papers talk about *generating* lots of packets, not *processing*
> lots of packets. What this netmap example does is processing. If someone
> really wants to use the host stack, the expected performance WILL BE wors=
e
> - what's the point of using a host stack bypassing tool/framework if
> someone will end up using the host stack?
>
> And by generating, usually the papers means: minimum sized UDP packets.
>
>
> >
> > can you please enlighten?
> >
>
> For everyone: read the manuals, read related and indicated materials
> (papers, web sites, etc), and, as a least resource, read the code. Within
> netmap's codes, it's more easy than it sounds.
> _______________________________________________
> freebsd-net@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org"
>
>
>
> _______________________________________________
> freebsd-net@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org"
>



--=20
-----------------------------------------+-------------------------------
 Prof. Luigi RIZZO, rizzo@iet.unipi.it=C2=A0 . Dip. di Ing. dell'Informazio=
ne
 http://www.iet.unipi.it/~luigi/ =C2=A0 =C2=A0 =C2=A0 . Universita` di Pisa
 TEL=C2=A0 =C2=A0 =C2=A0 +39-050-2217533=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 . via Diotisalvi 2
 Mobile=C2=A0 +39-338-6809875=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 . 56122 PISA (Italy)
-----------------------------------------+-------------------------------


  
From owner-freebsd-net@FreeBSD.ORG  Mon May  4 16:56:33 2015
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: freebsd-net@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 7120F6BD
 for <freebsd-net@freebsd.org>; Mon,  4 May 2015 16:56:33 +0000 (UTC)
Received: from nm30-vm1.bullet.mail.ne1.yahoo.com
 (nm30-vm1.bullet.mail.ne1.yahoo.com [98.138.90.46])
 (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 3930717AE
 for <freebsd-net@freebsd.org>; Mon,  4 May 2015 16:56:32 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s2048;
 t=1430758432; bh=kzW7jxGVCa+/TnW3EsP2PGk0/ND5AWRE6JXgfIkSuzk=;
 h=Date:From:Reply-To:To:Cc:In-Reply-To:References:Subject:From:Subject;
 b=Wu+ycNZuFOsI/a4s7D+4EaSheVkBtiREi3oIAYtm/a0pMKmmwAmcJqhwf38UYwuYzjdnTA3BJ0wt7D0OWy2e4/dk3c53cypJWjWiS+U28+6FNhqMAjf94VcWfVAEIZC+xkXd5fzx2IYFjuG+SeM5ZLc/C6WNkdBKIARV/sZXVOZt2IKiqXL2CIFY8Eeb1unKI/bztmQbX/h3okEjZerTNfW0jZCd94fP41/93j8pq51pz5CThnzFEPEHbhQg/5Bmwu/iNfjHKaUSmHXtw++zAmMKev8cMMELQEsQo71SjP2U9nislYtGJtZvOPwYCP7Ijs3gQPm4KNAM0jRPh1wTYA==
Received: from [98.138.100.116] by nm30.bullet.mail.ne1.yahoo.com with NNFMP;
 04 May 2015 16:53:52 -0000
Received: from [98.138.89.175] by tm107.bullet.mail.ne1.yahoo.com with NNFMP;
 04 May 2015 16:53:52 -0000
Received: from [127.0.0.1] by omp1031.mail.ne1.yahoo.com with NNFMP;
 04 May 2015 16:53:52 -0000
X-Yahoo-Newman-Property: ymail-3
X-Yahoo-Newman-Id: 27501.72420.bm@omp1031.mail.ne1.yahoo.com
X-YMail-OSG: lOUShNgVM1lvD2juE4H6iyCHDG2gyXKHmtgBLvCHtsKFvKCagDatXYytXMmjxBn
 HODVm6o0NGo_7nDT96qnZ2.sfMfO8AVet_EAVsLb5Od7XKop8w_URShal0Cn462IrhEehJzxv9U3
 iaCjEzbVDa8FL7WVHWSbKrggJYnKNEFqm3QrpB_x14Fe0p2svFOrwjEVxUrJORqPajNvu9uqHFLG
 8VDOcNuIbvQ3hL3eX7uD6vyu_pajvYBGd8Bc9_9dJLerPdqPKsrUthxw_CHNHydID6ZQpzn_H3pY
 e3xgh2AvF8meyJPXgbd15GHoFqmTu3Qx9Z0zIHyFxD9pyiIUsyU7Y4GTAB2N.DYsFTons0yXWMSJ
 FPRZV4lDLT6omNVx6ok..OQ.OeukoO9jPJELQUtaWnr19Hb5OSakq7zlF6Ba4EdfPLT9HEByyPnF
 hV8a6UJGk4jsInNjos7zVIxWiilzX_LiJA9ETGTUyiCJ2kGcNTR7S2HwZB9t5cTCTXhlceUxirs1
 jYTrLilyfWwS9xf_a
Received: by 98.138.101.163; Mon, 04 May 2015 16:53:51 +0000 
Date: Mon, 4 May 2015 16:53:51 +0000 (UTC)
From: Barney Cordoba <barney_cordoba@yahoo.com>
Reply-To: Barney Cordoba <barney_cordoba@yahoo.com>
To: Jim Thompson <jim@netgate.com>
Cc: "freebsd-net@freebsd.org" <freebsd-net@freebsd.org>, 
 Luigi Rizzo <rizzo@iet.unipi.it>
Message-ID: <612079785.1200069.1430758431058.JavaMail.yahoo@mail.yahoo.com>
In-Reply-To: <CDE844AB-1F64-4922-AA45-D6710C6BD99E@netgate.com>
References: <CDE844AB-1F64-4922-AA45-D6710C6BD99E@netgate.com>
Subject: Re: netmap-ipfw on em0 em1
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
X-Content-Filtered-By: Mailman/MimeDel 2.1.20
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.20
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net/>;
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 04 May 2015 16:56:33 -0000

I'll assume you're just not that clear on specific implementation. Hooking =
directly into if_input() bypasses all of the "cruft". It basically uses the=
 driver "as-is", so any driver can be used and it will be as good as the dr=
iver. The bloat starts in if_ethersubr.c, which is easily completely avoide=
d. Most drivers need to be tuned (or modified a bit) as most freebsd driver=
s are full of bloat and forced into a bad, cookie-cutter type way of doing =
things.
The problem with doing things in user space is that user space is unpredict=
able. Things work just dandily when nothing else is going on, but you can't=
 control when a user space program gets context under heavy loads. In the k=
ernel you can control almost exactly what the polling interval is through i=
nterrupt moderation on most modern controllers.=C2=A0
Many otherwise credible programmers argued for years that polling was "fast=
er", but it was only faster in artificially controlled environment. Its mai=
nly because 1) they're not thinking about the entire context of what "can" =
happen, and 2) because they test under unrealistic conditions that don't re=
present real world events, and 3) they don't have properly tuned ethernet d=
rivers.
BC=20


     On Monday, May 4, 2015 12:37 PM, Jim Thompson <jim@netgate.com> wrote:
  =20

=20
While it is a true statement that, "You can do anything in the kernel that =
you can do in user space.=E2=80=9D, it is not a helpful statement.=C2=A0 Ye=
s, the kernel is just a program.
In a similar way, =E2=80=9CYou can just pop it into any kernel and it works=
.=E2=80=9D is also not helpful.=C2=A0 It works, but it doesn=E2=80=99t work=
 well, because of other infrastructure issues.
Both of your statements reduce to the age-old, =E2=80=9Cproof is left as an=
 exercise for the student=E2=80=9D.

There is a lot of kernel infrastructure that is just plain crusty(*) and wh=
ich directly impedes performance in this area.

But there is plenty of cruft, Barney.=C2=A0 Here are two threads which are =
three years old, with the issues it points out still unresolved, and multip=
le places where 100ns or more is lost:
https://lists.freebsd.org/pipermail/freebsd-current/2012-April/033287.html
https://lists.freebsd.org/pipermail/freebsd-current/2012-April/033351.html

100ns is death at 10Gbps with min-sized packets.

quoting: http://luca.ntop.org/10g.pdf
---
Taking as a reference a 10 Gbit/s link, the raw throughput is well below th=
e memory bandwidth of modern systems (between 6 and 8 GBytes/s for CPU to m=
emory, up to 5 GBytes/s on PCI-Express x16). How- ever a 10Gbit/s link can =
generate up to 14.88 million Packets Per Second (pps), which means that the=
 system must be able to process one packet every 67.2 ns. This translates t=
o about 200 clock cycles even for the faster CPUs, and might be a challenge=
 considering the per- packet overheads normally involved by general-purpose=
 operating systems. The use of large frames reduces the pps rate by a facto=
r of 20..50, which is great on end hosts only concerned in bulk data transf=
er.=C2=A0 Monitoring systems and traffic generators, however, must be able =
to deal with worst case conditions.=E2=80=9D

Forwarding and filtering must also be able to deal with worst case, and nob=
ody does well with kernel-based networking here.=C2=A0 https://github.com/g=
vnn3/netperf/blob/master/Documentation/Papers/ABSDCon2015Paper.pdf

10Gbps NICs are $200-$300 today, and they=E2=80=99ll be included on the mot=
herboard during the next hardware refresh.=C2=A0 Broadwell-DE (Xeon-D) has =
10G in the SoC, and others are coming.
10Gbps switches can be had at around $100/port.=C2=A0 This is exactly the p=
oint at which the adoption curve for 1Gbps Ethernet ramped over a decade ag=
o.


(*) A few more simple examples of cruft:

Why, in 2015 does the kernel have a =E2=80=98fast forwarding=E2=80=99 optio=
n, and worse, one that isn=E2=80=99t enabled by default?=C2=A0 Shouldn=E2=
=80=99t =E2=80=9Cfast forwarding" be the default?

Why, in 2015, does FreeBSD not ship with IPSEC enabled in GENERIC?=C2=A0 (R=
eason: each and every time this has come up in recent memory, someone has p=
ointed out that it impacts performance.=C2=A0 https://bugs.freebsd.org/bugz=
illa/show_bug.cgi?id=3D128030)

Why, in 2015, does anyone think it=E2=80=99s acceptable for =E2=80=9Cfast f=
orwarding=E2=80=9D to break IPSEC?

Why, in 2015, does anyone think it=E2=80=99s acceptable that the setkey(8) =
man page documents, of all things, DES-CBC and HMAC-MD5 for a SA?=C2=A0 Tha=
t=E2=80=99s some kind of sick joke, right?
This completely flies in the face of RFC 4835.


> On May 4, 2015, at 10:29 AM, Barney Cordoba via freebsd-net <freebsd-net@=
freebsd.org> wrote:
>=20
> It's not faster than "wedging" into the if_input()s. It simply can't be. =
Your getting packets at interrupt time as soon as their processed and=C2=A0=
 you there's no network stack involved, and your able to receive and transm=
it without a process switch. At worst it's the same, without the extra plum=
bing. It's not rocket science to "bypass the network stack".
> The only advantage of bringing it into user space would be that it's easi=
er to write threaded handlers for complex uses; but not as a firewall (whic=
h is the limit of the context of my comment). You can do anything in the ke=
rnel that you can do in user space. The reason a kernel module with if_inpu=
t() hooks is better is that you can use the standard kernel without all of =
the netmap hacks. You can just pop it into any kernel and it works.
> BC=20
>=20
>=20
>=C2=A0 =C2=A0 On Sunday, May 3, 2015 2:13 PM, Luigi Rizzo <rizzo@iet.unipi=
.it> wrote:
>=20
>=20
> On Sun, May 3, 2015 at 6:17 PM, Barney Cordoba via freebsd-net <
> freebsd-net@freebsd.org> wrote:
>=20
>> Frankly I'm baffled by netmap. You can easily write a loadable kernel
>> module that moves packets from 1 interface to another and hook in the
>> firewall; why would you want to bring them up into user space? It's 1000=
s
>> of lines of unnecessary code.
>>=20
>>=20
> Because it is much faster.
>=20
> The motivation for netmap-like
> solutions (that includes Intel's DPDK, PF_RING/DNA
> and several proprietary implementations) is speed:
> they bypass the entire network stack, and a
> good part of the device drivers, so you can access
> packets=20
>=20
> 10+ times faster.
> So things are actually the other way around:
> the 1000's of unnecessary
> lines of code
> (not really thousands, though)
> are
> those that you'd pay going through the standard
> network stack
> when you
> don't need any of its services.
>=20
> Going to userspace is just a side effect -- turns out to
> be easier to develop and run your packet processing code
> in userspace, but there are netmap clients (e.g. the
> VALE software switch) which run entirely in the kernel.
>=20
> cheers
> luigi
>=20
>=20
>=20
>>=20
>>=20
>>=C2=A0 =C2=A0 =C2=A0 On Sunday, May 3, 2015 3:10 AM, Raimundo Santos <rai=
tech@gmail.com>
>> wrote:
>>=20
>>=20
>>=C2=A0 Clarifying things for the sake of documentation:
>>=20
>> To use the host stack, append a ^ character after the name of the interf=
ace
>> you want to use. (Info from netmap(4) shipped with FreeBSD 10.1 RELEASE.=
)
>>=20
>> Examples:
>>=20
>> "kipfw em0" does nothing useful.
>> "kipfw netmap:em0" disconnects the NIC from the usual data path, i.e.,
>> there are no host communications.
>> "kipfw netmap:em0 netmap:em0^" or "kipfw netmap:em0+" places the
>> netmap-ipfw rules between the NIC and the host stack entry point associa=
ted
>> (the IP addresses configured on it with ifconfig, ARP and RARP, etc...)
>> with the same NIC.
>>=20
>> On 10 November 2014 at 18:29, Evandro Nunes <evandronunes12@gmail.com>
>> wrote:
>>=20
>>> dear professor luigi,
>>> i have some numbers, I am filtering 773Kpps with kipfw using 60% of CPU
>> and
>>> system using the rest, this system is a 8core at 2.4Ghz, but only one
>> core
>>> is in use
>>> in this next round of tests, my NIC is now an avoton with igb(4) driver=
,
>>> currently with 4 queues per NIC (total 8 queues for kipfw bridge)
>>> i have read in your papers we should expect something similar to 1.48Mp=
ps
>>> how can I benefit from the other CPUs which are completely idle? I trie=
d
>>> CPU Affinity (cpuset) kipfw but system CPU usage follows userland kipfw
>> so
>>> I could not set one CPU to userland while other for system
>>>=20
>>=20
>> All the papers talk about *generating* lots of packets, not *processing*
>> lots of packets. What this netmap example does is processing. If someone
>> really wants to use the host stack, the expected performance WILL BE wor=
se
>> - what's the point of using a host stack bypassing tool/framework if
>> someone will end up using the host stack?
>>=20
>> And by generating, usually the papers means: minimum sized UDP packets.
>>=20
>>=20
>>>=20
>>> can you please enlighten?
>>>=20
>>=20
>> For everyone: read the manuals, read related and indicated materials
>> (papers, web sites, etc), and, as a least resource, read the code. Withi=
n
>> netmap's codes, it's more easy than it sounds.
>> _______________________________________________
>> freebsd-net@freebsd.org mailing list
>> http://lists.freebsd.org/mailman/listinfo/freebsd-net
>> To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org"
>>=20
>>=20
>>=20
>> _______________________________________________
>> freebsd-net@freebsd.org mailing list
>> http://lists.freebsd.org/mailman/listinfo/freebsd-net
>> To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org"
>>=20
>=20
>=20
>=20
> --=20
> -----------------------------------------+-------------------------------
> Prof. Luigi RIZZO, rizzo@iet.unipi.it=C2=A0 . Dip. di Ing. dell'Informazi=
one
> http://www.iet.unipi.it/~luigi/ =C2=A0 =C2=A0 =C2=A0 . Universita` di Pis=
a
> TEL=C2=A0 =C2=A0 =C2=A0 +39-050-2217533=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0=
 =C2=A0 =C2=A0 . via Diotisalvi 2
> Mobile=C2=A0 +39-338-6809875=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 . 56122 PISA (Italy)
> -----------------------------------------+-------------------------------
> _______________________________________________
> freebsd-net@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org"
>=20
>=20
>=20
> _______________________________________________
> freebsd-net@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org"

_______________________________________________
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org"

  
From owner-freebsd-net@FreeBSD.ORG  Mon May  4 17:32:48 2015
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: freebsd-net@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 4833F646
 for <freebsd-net@freebsd.org>; Mon,  4 May 2015 17:32:48 +0000 (UTC)
Received: from mail-oi0-f53.google.com (mail-oi0-f53.google.com
 [209.85.218.53])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id 0DD5C1C1A
 for <freebsd-net@freebsd.org>; Mon,  4 May 2015 17:32:47 +0000 (UTC)
Received: by oica37 with SMTP id a37so117665391oic.0
 for <freebsd-net@freebsd.org>; Mon, 04 May 2015 10:32:40 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20130820;
 h=x-gm-message-state:content-type:mime-version:subject:from
 :in-reply-to:date:cc:message-id:references:to;
 bh=iwlWQ6Ru6KxhYv4Z9wOr32hvGjn9znIdd7cuc2zGYqk=;
 b=i/3Has5z1tzamTwFM3+s6mRUJxBklDV09uCbpljaORKHqlT9xtDan/72+0z5NQYBV7
 aZ2k0dfXWAhWV/uT+pw1OK7NT+mTdKV3uVpGmxKn2uFs9LiZZ1wnh8kMqyeU7ycRgdPg
 gDpdLMSQdx7LtxoWMSPLxs8P9QIoTvn9Ch7QgDkx/Lywx9a5vNOQUU2rHRIkXbnHAj7V
 0w/Xyj/x6lkBP9WtuEroNyfo1F96RzpJL7wk+lVDNAYUTqXZ/GiCPSVhObH/Iwd2dut2
 tHVA/srZ8Si5KJbd+NTaEVTYeJV6ejoiGPD+6aV+Xq0kc7pOK7t2M7j8lnuISwWE76q7
 cJ0w==
X-Gm-Message-State: ALoCoQlYPhiUaMBNxGzaN1peiQfjZ8n5U03b4f1nJqwTLXs012ZkBsV6cMq+ThdvKfBgDBkHMUol
X-Received: by 10.202.217.196 with SMTP id q187mr596562oig.64.1430760760771;
 Mon, 04 May 2015 10:32:40 -0700 (PDT)
Received: from ?IPv6:2610:160:11:33:a5e2:6d5a:67d9:998e?
 ([2610:160:11:33:a5e2:6d5a:67d9:998e])
 by mx.google.com with ESMTPSA id sm8sm3108883obb.13.2015.05.04.10.32.39
 (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128);
 Mon, 04 May 2015 10:32:39 -0700 (PDT)
Mime-Version: 1.0 (Mac OS X Mail 8.2 \(2098\))
Subject: Re: netmap-ipfw on em0 em1
From: Jim Thompson <jim@netgate.com>
In-Reply-To: <612079785.1200069.1430758431058.JavaMail.yahoo@mail.yahoo.com>
Date: Mon, 4 May 2015 12:32:38 -0500
Cc: "freebsd-net@freebsd.org" <freebsd-net@freebsd.org>,
 Luigi Rizzo <rizzo@iet.unipi.it>
Message-Id: <9A95B98D-9B33-4F13-9106-504596365F3C@netgate.com>
References: <CDE844AB-1F64-4922-AA45-D6710C6BD99E@netgate.com>
 <612079785.1200069.1430758431058.JavaMail.yahoo@mail.yahoo.com>
To: Barney Cordoba <barney_cordoba@yahoo.com>
X-Mailer: Apple Mail (2.2098)
Content-Type: text/plain;
	charset=utf-8
Content-Transfer-Encoding: quoted-printable
X-Content-Filtered-By: Mailman/MimeDel 2.1.20
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.20
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net/>;
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 04 May 2015 17:32:48 -0000


> On May 4, 2015, at 11:53 AM, Barney Cordoba <barney_cordoba@yahoo.com> =
wrote:
>=20
> I'll assume you're just not that clear on specific implementation.

Thank you for your assumption.  Have you noticed that you tend to argue =
by insinuating that the other party is stupid, or at best, responsible =
for =E2=80=9Chacks=E2=80=9D?

> Hooking directly into if_input() bypasses all of the "cruft". It =
basically uses the driver "as-is", so any driver can be used and it will =
be as good as the driver. The bloat starts in if_ethersubr.c, which is =
easily completely avoided. Most drivers need to be tuned (or modified a =
bit) as most freebsd drivers are full of bloat and forced into a bad, =
cookie-cutter type way of doing things.

I am familiar, as I am familiar with exactly how fast DPDK (which uses =
the FreeBSD NIC drivers) can route and filter, .vs how fast FreeBSD can =
do the same.
They are, quite simply, an order of magnitude apart.


> The problem with doing things in user space is that user space is =
unpredictable.

The problem with people who think this way is that they forget that the =
kernel exists to run on behalf of user space, not the other way around.

> Things work just dandily when nothing else is going on, but you can't =
control when a user space program gets context under heavy loads.

the kernel is certainly in control of same.

> In the kernel you can control almost exactly what the polling interval =
is through interrupt moderation on most modern controllers.=20
>=20
> Many otherwise credible programmers argued for years that polling was =
"faster", but it was only faster in artificially controlled environment. =
Its mainly because 1) they're not thinking about the entire context of =
what "can" happen, and 2) because they test under unrealistic conditions =
that don't represent real world events, and 3) they don't have properly =
tuned ethernet drivers.
>=20
> BC
>=20
>=20
>=20
> On Monday, May 4, 2015 12:37 PM, Jim Thompson <jim@netgate.com> wrote:
>=20
>=20
>=20
> While it is a true statement that, "You can do anything in the kernel =
that you can do in user space.=E2=80=9D, it is not a helpful statement.  =
Yes, the kernel is just a program.
> In a similar way, =E2=80=9CYou can just pop it into any kernel and it =
works.=E2=80=9D is also not helpful.  It works, but it doesn=E2=80=99t =
work well, because of other infrastructure issues.
> Both of your statements reduce to the age-old, =E2=80=9Cproof is left =
as an exercise for the student=E2=80=9D.
>=20
> There is a lot of kernel infrastructure that is just plain crusty(*) =
and which directly impedes performance in this area.
>=20
> But there is plenty of cruft, Barney.  Here are two threads which are =
three years old, with the issues it points out still unresolved, and =
multiple places where 100ns or more is lost:
> =
https://lists.freebsd.org/pipermail/freebsd-current/2012-April/033287.html=
 =
<https://lists.freebsd.org/pipermail/freebsd-current/2012-April/033287.htm=
l>
> =
https://lists.freebsd.org/pipermail/freebsd-current/2012-April/033351.html=
 =
<https://lists.freebsd.org/pipermail/freebsd-current/2012-April/033351.htm=
l>
>=20
> 100ns is death at 10Gbps with min-sized packets.
>=20
> quoting: http://luca.ntop.org/10g.pdf <http://luca.ntop.org/10g.pdf>;
> ---
> Taking as a reference a 10 Gbit/s link, the raw throughput is well =
below the memory bandwidth of modern systems (between 6 and 8 GBytes/s =
for CPU to memory, up to 5 GBytes/s on PCI-Express x16). How- ever a =
10Gbit/s link can generate up to 14.88 million Packets Per Second (pps), =
which means that the system must be able to process one packet every =
67.2 ns. This translates to about 200 clock cycles even for the faster =
CPUs, and might be a challenge considering the per- packet overheads =
normally involved by general-purpose operating systems. The use of large =
frames reduces the pps rate by a factor of 20..50, which is great on end =
hosts only concerned in bulk data transfer.  Monitoring systems and =
traffic generators, however, must be able to deal with worst case =
conditions.=E2=80=9D
>=20
> Forwarding and filtering must also be able to deal with worst case, =
and nobody does well with kernel-based networking here.  =
https://github.com/gvnn3/netperf/blob/master/Documentation/Papers/ABSDCon2=
015Paper.pdf =
<https://github.com/gvnn3/netperf/blob/master/Documentation/Papers/ABSDCon=
2015Paper.pdf>
>=20
> 10Gbps NICs are $200-$300 today, and they=E2=80=99ll be included on =
the motherboard during the next hardware refresh.  Broadwell-DE (Xeon-D) =
has 10G in the SoC, and others are coming.
> 10Gbps switches can be had at around $100/port.  This is exactly the =
point at which the adoption curve for 1Gbps Ethernet ramped over a =
decade ago.
>=20
>=20
> (*) A few more simple examples of cruft:
>=20
> Why, in 2015 does the kernel have a =E2=80=98fast forwarding=E2=80=99 =
option, and worse, one that isn=E2=80=99t enabled by default?  =
Shouldn=E2=80=99t =E2=80=9Cfast forwarding" be the default?
>=20
> Why, in 2015, does FreeBSD not ship with IPSEC enabled in GENERIC?  =
(Reason: each and every time this has come up in recent memory, someone =
has pointed out that it impacts performance.  =
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D128030 =
<https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D128030>)
>=20
> Why, in 2015, does anyone think it=E2=80=99s acceptable for =E2=80=9Cfas=
t forwarding=E2=80=9D to break IPSEC?
>=20
> Why, in 2015, does anyone think it=E2=80=99s acceptable that the =
setkey(8) man page documents, of all things, DES-CBC and HMAC-MD5 for a =
SA?  That=E2=80=99s some kind of sick joke, right?
> This completely flies in the face of RFC 4835.
>=20
>=20
> > On May 4, 2015, at 10:29 AM, Barney Cordoba via freebsd-net =
<freebsd-net@freebsd.org <mailto:freebsd-net@freebsd.org>> wrote:
> >=20
> > It's not faster than "wedging" into the if_input()s. It simply can't =
be. Your getting packets at interrupt time as soon as their processed =
and  you there's no network stack involved, and your able to receive and =
transmit without a process switch. At worst it's the same, without the =
extra plumbing. It's not rocket science to "bypass the network stack".
> > The only advantage of bringing it into user space would be that it's =
easier to write threaded handlers for complex uses; but not as a =
firewall (which is the limit of the context of my comment). You can do =
anything in the kernel that you can do in user space. The reason a =
kernel module with if_input() hooks is better is that you can use the =
standard kernel without all of the netmap hacks. You can just pop it =
into any kernel and it works.
> > BC=20
> >=20
> >=20
> >    On Sunday, May 3, 2015 2:13 PM, Luigi Rizzo <rizzo@iet.unipi.it =
<mailto:rizzo@iet.unipi.it>> wrote:
> >=20
> >=20
> > On Sun, May 3, 2015 at 6:17 PM, Barney Cordoba via freebsd-net <
> > freebsd-net@freebsd.org <mailto:freebsd-net@freebsd.org>> wrote:
> >=20
> >> Frankly I'm baffled by netmap. You can easily write a loadable =
kernel
> >> module that moves packets from 1 interface to another and hook in =
the
> >> firewall; why would you want to bring them up into user space? It's =
1000s
> >> of lines of unnecessary code.
> >>=20
> >>=20
> > Because it is much faster.
> >=20
> > The motivation for netmap-like
> > solutions (that includes Intel's DPDK, PF_RING/DNA
> > and several proprietary implementations) is speed:
> > they bypass the entire network stack, and a
> > good part of the device drivers, so you can access
> > packets=20
> >=20
> > 10+ times faster.
> > So things are actually the other way around:
> > the 1000's of unnecessary
> > lines of code
> > (not really thousands, though)
> > are
> > those that you'd pay going through the standard
> > network stack
> > when you
> > don't need any of its services.
> >=20
> > Going to userspace is just a side effect -- turns out to
> > be easier to develop and run your packet processing code
> > in userspace, but there are netmap clients (e.g. the
> > VALE software switch) which run entirely in the kernel.
> >=20
> > cheers
> > luigi
> >=20
> >=20
> >=20
> >>=20
> >>=20
> >>      On Sunday, May 3, 2015 3:10 AM, Raimundo Santos =
<raitech@gmail.com <mailto:raitech@gmail.com>>
> >> wrote:
> >>=20
> >>=20
> >>  Clarifying things for the sake of documentation:
> >>=20
> >> To use the host stack, append a ^ character after the name of the =
interface
> >> you want to use. (Info from netmap(4) shipped with FreeBSD 10.1 =
RELEASE.)
> >>=20
> >> Examples:
> >>=20
> >> "kipfw em0" does nothing useful.
> >> "kipfw netmap:em0" disconnects the NIC from the usual data path, =
i.e.,
> >> there are no host communications.
> >> "kipfw netmap:em0 netmap:em0^" or "kipfw netmap:em0+" places the
> >> netmap-ipfw rules between the NIC and the host stack entry point =
associated
> >> (the IP addresses configured on it with ifconfig, ARP and RARP, =
etc...)
> >> with the same NIC.
> >>=20
> >> On 10 November 2014 at 18:29, Evandro Nunes =
<evandronunes12@gmail.com <mailto:evandronunes12@gmail.com>>
> >> wrote:
> >>=20
> >>> dear professor luigi,
> >>> i have some numbers, I am filtering 773Kpps with kipfw using 60% =
of CPU
> >> and
> >>> system using the rest, this system is a 8core at 2.4Ghz, but only =
one
> >> core
> >>> is in use
> >>> in this next round of tests, my NIC is now an avoton with igb(4) =
driver,
> >>> currently with 4 queues per NIC (total 8 queues for kipfw bridge)
> >>> i have read in your papers we should expect something similar to =
1.48Mpps
> >>> how can I benefit from the other CPUs which are completely idle? I =
tried
> >>> CPU Affinity (cpuset) kipfw but system CPU usage follows userland =
kipfw
> >> so
> >>> I could not set one CPU to userland while other for system
> >>>=20
> >>=20
> >> All the papers talk about *generating* lots of packets, not =
*processing*
> >> lots of packets. What this netmap example does is processing. If =
someone
> >> really wants to use the host stack, the expected performance WILL =
BE worse
> >> - what's the point of using a host stack bypassing tool/framework =
if
> >> someone will end up using the host stack?
> >>=20
> >> And by generating, usually the papers means: minimum sized UDP =
packets.
> >>=20
> >>=20
> >>>=20
> >>> can you please enlighten?
> >>>=20
> >>=20
> >> For everyone: read the manuals, read related and indicated =
materials
> >> (papers, web sites, etc), and, as a least resource, read the code. =
Within
> >> netmap's codes, it's more easy than it sounds.
> >> _______________________________________________
> >> freebsd-net@freebsd.org <mailto:freebsd-net@freebsd.org> mailing =
list
> >> http://lists.freebsd.org/mailman/listinfo/freebsd-net =
<http://lists.freebsd.org/mailman/listinfo/freebsd-net>;
> >> To unsubscribe, send any mail to =
"freebsd-net-unsubscribe@freebsd.org =
<mailto:freebsd-net-unsubscribe@freebsd.org>"
> >>=20
> >>=20
> >>=20
> >> _______________________________________________
> >> freebsd-net@freebsd.org <mailto:freebsd-net@freebsd.org> mailing =
list
> >> http://lists.freebsd.org/mailman/listinfo/freebsd-net =
<http://lists.freebsd.org/mailman/listinfo/freebsd-net>;
> >> To unsubscribe, send any mail to =
"freebsd-net-unsubscribe@freebsd.org =
<mailto:freebsd-net-unsubscribe@freebsd.org>"
> >>=20
> >=20
> >=20
> >=20
> > --=20
> > =
-----------------------------------------+-------------------------------
> > Prof. Luigi RIZZO, rizzo@iet.unipi.it <mailto:rizzo@iet.unipi.it>  . =
Dip. di Ing. dell'Informazione
> > http://www.iet.unipi.it/~luigi/  <http://www.iet.unipi.it/~luigi/>;   =
   . Universita` di Pisa
> > TEL      +39-050-2217533              . via Diotisalvi 2
> > Mobile  +39-338-6809875              . 56122 PISA (Italy)
>=20
> > =
-----------------------------------------+-------------------------------
> > _______________________________________________
> > freebsd-net@freebsd.org <mailto:freebsd-net@freebsd.org> mailing =
list
> > http://lists.freebsd.org/mailman/listinfo/freebsd-net =
<http://lists.freebsd.org/mailman/listinfo/freebsd-net>;
> > To unsubscribe, send any mail to =
"freebsd-net-unsubscribe@freebsd.org =
<mailto:freebsd-net-unsubscribe@freebsd.org>"
> >=20
> >=20
> >=20
> > _______________________________________________
> > freebsd-net@freebsd.org <mailto:freebsd-net@freebsd.org> mailing =
list
> > http://lists.freebsd.org/mailman/listinfo/freebsd-net =
<http://lists.freebsd.org/mailman/listinfo/freebsd-net>;
> > To unsubscribe, send any mail to =
"freebsd-net-unsubscribe@freebsd.org =
<mailto:freebsd-net-unsubscribe@freebsd.org>"
>=20
> _______________________________________________
> freebsd-net@freebsd.org <mailto:freebsd-net@freebsd.org> mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-net =
<http://lists.freebsd.org/mailman/listinfo/freebsd-net>;
> To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org =
<mailto:freebsd-net-unsubscribe@freebsd.org>"
>=20
>=20




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?268446884.1206550.1430757707012.JavaMail.yahoo>