Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 21 Nov 2017 10:58:00 +0100
From:      Harry Schmalzbauer <freebsd@omnilan.de>
To:        Vincenzo Maffione <v.maffione@gmail.com>
Cc:        "freebsd-net@freebsd.org" <freebsd-net@freebsd.org>, Giuseppe Lettieri <g.lettieri@iet.unipi.it>
Subject:   Re: netmap/vale periodic deadlock
Message-ID:  <5A13F8A8.2020209@omnilan.de>
In-Reply-To: <CA%2B_eA9giPsMJ2_O1CLvOro=rMm5TaJyQ-et_U01Re5J9%2B9VSqg@mail.gmail.com>
References:  <5A0F14CD.3040407@omnilan.de> <CA%2B_eA9giPsMJ2_O1CLvOro=rMm5TaJyQ-et_U01Re5J9%2B9VSqg@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
Bezüglich Vincenzo Maffione's Nachricht vom 21.11.2017 09:39 (localtime):
> Hi,
>   It's hard to say, specially because it happens after two days of
> normal use.
> Can't you enable deadlock debugging features in your kernel?
> https://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneldebug-deadlocks.html
> 
> However, if I understand correctly you have created some VLAN interfaces
> vlan0, vlan1, vlan2, ... on top of a NIC (say em0). And you have
> attached each VLAN interface to a vale switch:
> 
> # vale-ctl -a vale0:vlan0
> # vale-ctl -a vale1:vlan1
> # vale-ctl -a vale2:vlan2
> 
> and each VALE switch is attached to a different set of bhyve guests.

Hello Vincenzo,

thank you very much for your help again!

Your assumption is correct, here's my vale-ctl:
603.811416 bdg_ctl [148] bridge:0 port:0 vale1:nic1_dmz
603.811428 bdg_ctl [148] bridge:0 port:1 vale1:styx0
603.811430 bdg_ctl [148] bridge:0 port:2 vale1:korso
603.811432 bdg_ctl [148] bridge:0 port:3 vale1:kallisto

603.811434 bdg_ctl [148] bridge:1 port:0 vale2:nic1_inop
603.811435 bdg_ctl [148] bridge:1 port:1 vale2:styx0

603.811437 bdg_ctl [148] bridge:2 port:0 vale3:nic1_vnl
603.811439 bdg_ctl [148] bridge:2 port:1 vale3:styx0

603.811441 bdg_ctl [148] bridge:3 port:0 vale4:nic1_egn
603.811442 bdg_ctl [148] bridge:3 port:1 vale4:styx0
603.811444 bdg_ctl [148] bridge:3 port:2 vale4:preed
…


> If this is the case, although you are allowed to do that, I don't think
> it's a convenient way to use netmap.
> Since VLAN interfaces like vlan0 do not have (and cannot have) native
> netmap support, you are falling back to emulated netmap adapters (which
> are probably buggy on FreeBSD, specially when combined with VALE).
> Apart from bugs I think that with this setup you can't get decent
> performance that would justify using netmap rather than the standard
> kernel bridge and TAP devices.

I'm aware about the lost netmap-performace-benefit due to emulated
netmap fallback.
But there were some resonons why I chose vale(4) instead if_bridge(4):

1) Inter-Guest-traffic (virtio-net causes lot of LAPIC/IRQ overhead, but
still less overhead than tap(4)/if_bridge(4))

2) Future ptnetmap(4) upgrade path (which should save a lot of LAPIC/IRQ
CPU cycles and unleash huge performace benefits with inter-vm traffic)

3) Admin-mess and MTU limitation.  Each if_bridge(4) causes a host-stack
interface, which I don't use and which spams ifconfig(8) output; which
if_vtnet(4) even doubles.
Most important disadvantage: if_bridge(4) needs all members to have
exactly the same MTU.  This has been a problem for me many times over
the last years in various setups.

So with my current setup the overhead/efficiency of host-external packet
flow of
	bhyve_virtio-net+dyn_vale_port+vale(4)
is equal to
	bhyve_virtio-net+if_vtnet(4)+if_bridge(4)

But I have less disadvanteges with vale(4); as long as emulated netmap
mode doesn't destabilize my setup :-(


My second choice was ng_bridge(4).  Which I made great experiences in my
router-vm, running on that host in question (and in turn uses virtio-net
interfaces attached to the individual vale(4) switches on the host).
[ Even more impressive:  pf(4) runs in a VIMAGE jail in that guest,
utilizing those vale(4) interfaces.  Reason for that complicated setup:
 Closest hardware abstraction possible.  The setup (guest) should be
easily migratable to real hardware ].


> The right way to do it imho would be to write your own (userspace)
> netmap application that forwards packets between your bhyve guests and
> the NIC, prepending/stripping VLAN headers according to configuration
> (e.g. guest A is configured to be on VLAN 100, guest B on VLAN 200), etc.
> I think this would be a very interesting netmap application in general,
> and more importantly you would get the performance that you can't get
> with your setup.

I agree that having a userland application which, like you described,
utilizes netmap to enable minimalistic SDN features, would be a great
solution.  But I would need really a lot of time, since my C skills are
lousy, and I really don't have any time, not even one more day.


I'll see if I can get any useful information with the kernel deadlock
debuging feature you suggested
(https://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneldebug-deadlocks.html),
as soon as the problem shows up again.
Since I forgot to add all production-RAM, I had to shutdown yesterday,
so the lockup counter was reset ;-)
Another last-minute change was with netmap ring size: I changed the
vale-uplink interface.  The one I used for passthrough had 2 queues
(with EM_MULTIQUEUE support) and the one for the vale uplink onyl one,
and during evaluation phase I reduced rx/tx descriptors to make netmap's
default ring size working.
Now I use the 2-queue NIC with vale uplink and increased ring size to
81920 while leaving the hardware default of 4096 rx/tx desriptors.

But my wording wasn't technically correct I think, because I guess what
I'm suffering isn't a real deadlock in terms of locking, but any
netmap-internal lockup/overflow/limit/whatever.  Just guesing here!  I
don't know netmap code!  I only link symptoms, and since that setup is
working really nice for some limited time, I hoped you or any other
netmap expert could teach me how to find the root cause.
Your sentence about FreeBSD's netmap-interface-emulation leaves a bad
presentiment...

Thank you very much,

-harry



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?5A13F8A8.2020209>