Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 28 Sep 2020 16:44:10 +0200
From:      Alexander Leidinger <Alexander@leidinger.net>
To:        Kristof Provost <kp@freebsd.org>
Cc:        Shawn Webb <shawn.webb@hardenedbsd.org>, FreeBSD Current <freebsd-current@freebsd.org>
Subject:   Re: iflib/bridge kernel panic
Message-ID:  <20200928164410.Horde.mYBkuEeD_Q6xgnKnwNomv7P@webmail.leidinger.net>
In-Reply-To: <33903BFF-4158-4CD9-AD79-360BCD81F1C9@FreeBSD.org>
References:  <CAExMvskTkVprZsfXHBUv9stpiCo1QBAzoOg1VrWd4kRbz0NyJg@mail.gmail.com> <58CADEBB-64FD-414E-AB19-E4F8D3CABCA5@FreeBSD.org> <20200921121627.3dovpumnl6xub3kn@mutt-hbsd> <7FE1F106-2CEE-4692-95D0-14C5229ED768@FreeBSD.org> <20200928124531.Horde.0EjsBzIG5ktLzby_tFcoPPS@webmail.leidinger.net> <33903BFF-4158-4CD9-AD79-360BCD81F1C9@FreeBSD.org>

next in thread | previous in thread | raw e-mail | index | archive | help
This message is in MIME format and has been PGP signed.

--=_KKml1e9vaMIag8B9t2o7S-T
Content-Type: text/plain; charset=utf-8; format=flowed; DelSp=Yes
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable


Quoting Kristof Provost <kp@freebsd.org> (from Mon, 28 Sep 2020=20=20
13:53:16=20+0200):

> On 28 Sep 2020, at 12:45, Alexander Leidinger wrote:
>> Quoting Kristof Provost <kp@freebsd.org> (from Sun, 27 Sep 2020=20=20
>>=2017:51:32 +0200):
>>> Here=E2=80=99s an early version of a task queue based approach:=20=20
>>>=20http://people.freebsd.org/~kp/0001-bridge-Cope-with-if_ioctl-s-that-s=
leep.patch
>>>
>>> That still needs to be cleaned up, but this should resolve the=20=20
>>>=20sleep issue and the LOR.
>>
>> There are some issues... seems like inside a jail I can't ping=20=20
>>=20systems outside of the hardware.
>>
>> Bridge setup:
>>    - member jail A
>>    - member jail B
>>    - member external_if of host
>>
>> If I ping the router from the host, it works. If I ping from one=20=20
>>=20jail to another, it works. If I ping from the jail to the IP of the=20=
=20
>>=20external_if, it works. If I ping from a jail to the router, I do=20=20
>>=20not get a response.
>>
> Can you check for 'failed ifpromisc' error messages in dmesg? And=20=20
>=20verify that all bridge member interfaces are in promiscuous mode?

I have a panic for you...:
  - startup still in progress =3D 22 jails in startup, somewhere after a=20=
=20
few=20jails started the panic happened
  - tcpdump was running on the external interface
  - a ping to a jail IP from another system was running, the first=20=20
ping=20went through, then it paniced

First regarding your questions about promisc mode: no error, but the=20=20
promisc=20mode is directly disabled again on all interfaces.

Data (external_if =3D igb0, jail epairs are j_X_Yif with X the ID of the=20=
=20
jail=20and Y either h like host-side or j like jail-side):
---snip---
Host:

# ifconfig -a
igb0: flags=3D8863<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 150=
0
=20=20=20=20=20=20=20=20=20=20
options=3D4a520b9<RXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,WOL_=
MAGIC,VLAN_HWFILTER,VLAN_HWTSO,RXCSUM_IPV6,NOMAP>
         ether [...]:a4
         inet 192.168.1.x netmask 0xffffff00 broadcast 192.168.1.255
         inet6 fe80::[...]a4%igb0 prefixlen 64 scopeid 0x1
         inet6 fd73:[...] prefixlen 64
         inet6 2003:[...] prefixlen 64 autoconf
         inet6 fd73:[...] prefixlen 64 autoconf
         media: Ethernet autoselect (1000baseT <full-duplex>)
         status: active
         nd6 options=3D23<PERFORMNUD,ACCEPT_RTADV,AUTO_LINKLOCAL>
igb1: flags=3D8822<BROADCAST,SIMPLEX,MULTICAST> metric 0 mtu 1500
=20=20=20=20=20=20=20=20=20=20
options=3D4e527bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCS=
UM,TSO4,TSO6,LRO,WOL_MAGIC,VLAN_HWFILTER,VLAN_HWTSO,RXCSUM_IPV6,TXCSUM_IPV6=
,NOMAP>
         ether [...]:a5
         media: Ethernet autoselect
         status: no carrier
         nd6 options=3D29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
lo0: flags=3D8049<UP,LOOPBACK,RUNNING,MULTICAST> metric 0 mtu 16384
         options=3D680003<RXCSUM,TXCSUM,LINKSTATE,RXCSUM_IPV6,TXCSUM_IPV6>
         inet6 ::1 prefixlen 128
         inet6 fe80::1%lo0 prefixlen 64 scopeid 0x3
         inet 127.0.0.1 netmask 0xff000000
         groups: lo
         nd6 options=3D21<PERFORMNUD,AUTO_LINKLOCAL>
vswitch0: flags=3D8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu=
 1500
         ether [...]:a3
         id 00:00:00:00:00:00 priority 32768 hellotime 2 fwddelay 15
         maxage 20 holdcnt 6 proto stp-rstp maxaddr 2000 timeout 1200
         root id 00:00:00:00:00:00 priority 32768 ifcost 0 port 0
         member: j_weather_hif flags=3D143<LEARNING,DISCOVER,AUTOEDGE,AUTOP=
TP>
                 ifmaxaddr 0 port 9 priority 128 path cost 2000
         member: j_web_hif flags=3D143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP>
                 ifmaxaddr 0 port 8 priority 128 path cost 2000
         member: j_commit_hif flags=3D143<LEARNING,DISCOVER,AUTOEDGE,AUTOPT=
P>
                 ifmaxaddr 0 port 7 priority 128 path cost 2000
         member: j_video_hif flags=3D143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP=
>
                 ifmaxaddr 0 port 6 priority 128 path cost 2000
         member: j_dns_hif flags=3D143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP>
                 ifmaxaddr 0 port 5 priority 128 path cost 2000
         member: igb0 flags=3D143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP>
                 ifmaxaddr 0 port 1 priority 128 path cost 20000
         groups: bridge
         nd6 options=3D9<PERFORMNUD,IFDISABLED>
j_dns_hif: flags=3D8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0=20=
=20
mtu=201500
         options=3D8<VLAN_MTU>
         ether [...]:0a
         hwaddr [...]:0a
         inet6 fe80::[...]0a%j_dns_hif prefixlen 64 scopeid 0x5
         groups: epair
         media: Ethernet 10Gbase-T (10Gbase-T <full-duplex>)
         status: active
         nd6 options=3D21<PERFORMNUD,AUTO_LINKLOCAL>
[... some more jail interfaces ...]

# dmesg | grep promis
igb0: promiscuous mode enabled
igb0: promiscuous mode disabled
j_dns_hif: promiscuous mode enabled
j_dns_hif: promiscuous mode disabled
[... some more like this ...]

# jexec 2 ifconfig -a
lo0: flags=3D8049<UP,LOOPBACK,RUNNING,MULTICAST> metric 0 mtu 16384
         options=3D680003<RXCSUM,TXCSUM,LINKSTATE,RXCSUM_IPV6,TXCSUM_IPV6>
         inet6 ::1 prefixlen 128
         inet6 fe80::1%lo0 prefixlen 64 scopeid 0x1
         inet 127.0.0.1 netmask 0xff000000
         groups: lo
         nd6 options=3D21<PERFORMNUD,AUTO_LINKLOCAL>
j_dns_jif: flags=3D8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0=20=
=20
mtu=201500
         options=3D8<VLAN_MTU>
         ether [...]:0b
         hwaddr [...]:0b
         inet 192.168.1.y netmask 0xffffff00 broadcast 192.168.1.255
         inet6 fe80::[...]0b%j_dns_jif prefixlen 64 scopeid 0x2
         inet6 fd73:[...]:y prefixlen 64
         groups: epair
         media: Ethernet 10Gbase-T (10Gbase-T <full-duplex>)
         status: active
         nd6 options=3D21<PERFORMNUD,AUTO_LINKLOCAL>
---snip---

And here the backtrace of the panic:
---snip---
panic: if_setflag: decrement non-positive refcount 0 for flag 256
cpuid =3D 4
time =3D 1601300532
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe0378ea3=
920
vpanic() at vpanic+0x182/frame 0xfffffe0378ea3970
panic() at panic+0x43/frame 0xfffffe0378ea39d0
if_setflag() at if_setflag+0x137/frame 0xfffffe0378ea3a30
ifpromisc() at ifpromisc+0x2a/frame 0xfffffe0378ea3a60
bpf_detachd_locked() at bpf_detachd_locked+0x280/frame 0xfffffe0378ea3ab0
bpf_dtor() at bpf_dtor+0x87/frame 0xfffffe0378ea3ad0
devfs_destroy_cdevpriv() at devfs_destroy_cdevpriv+0xa1/frame=20=20
0xfffffe0378ea3af0
devfs_close_f()=20at devfs_close_f+0x6a/frame 0xfffffe0378ea3b20
_fdrop() at _fdrop+0x20/frame 0xfffffe0378ea3b40
closef() at closef+0x1ea/frame 0xfffffe0378ea3bd0
closefp() at closefp+0x90/frame 0xfffffe0378ea3c10
amd64_syscall() at amd64_syscall+0x13e/frame 0xfffffe0378ea3d30
fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe0378ea3d30


__curthread () at /space/system/usr_src/sys/amd64/include/pcpu_aux.h:55
55              __asm("movq %%gs:%P1,%0" : "=3Dr" (td) : "n"=20=20
(offsetof(struct=20pcpu,
(kgdb) #0  __curthread () at=20=20
/space/system/usr_src/sys/amd64/include/pcpu_aux.h:55
#1=20 doadump (textdump=3D1) at /space/system/usr_src/sys/kern/kern_shutdow=
n.c:394
#2  0xffffffff8051fb46 in kern_reboot (howto=3D260)
     at /space/system/usr_src/sys/kern/kern_shutdown.c:481
#3  0xffffffff8051ff8a in vpanic (fmt=3D<optimized out>, ap=3D<optimized ou=
t>)
     at /space/system/usr_src/sys/kern/kern_shutdown.c:913
#4  0xffffffff8051fcf3 in panic (fmt=3D<unavailable>)
     at /space/system/usr_src/sys/kern/kern_shutdown.c:839
#5  0xffffffff806321f7 in if_setflag (ifp=3D0xfffff800036cc000,
     flag=3D<unavailable>, pflag=3D<optimized out>, refcount=3D0xfffff80003=
6cc3a8,
     onswitch=3D<unavailable>) at /space/system/usr_src/sys/net/if.c:3135
#6  0xffffffff8063206a in ifpromisc (ifp=3D0xfffff800036cc000,
     pswitch=3D<unavailable>) at /space/system/usr_src/sys/net/if.c:3196
#7  0xffffffff80626450 in bpf_detachd_locked (d=3D<optimized out>,
     detached_ifp=3D<optimized out>) at /space/system/usr_src/sys/net/bpf.c=
:882
#8  0xffffffff80629277 in bpf_detachd (d=3D0xfffff8074cf42800)
     at /space/system/usr_src/sys/net/bpf.c:836
#9  bpf_dtor (data=3D0xfffff8074cf42800)
     at /space/system/usr_src/sys/net/bpf.c:913
#10 0xffffffff80487531 in devfs_destroy_cdevpriv (p=3D0xfffff8074cf29c40)
     at /space/system/usr_src/sys/fs/devfs/devfs_vnops.c:197
#11 0xffffffff8048b16a in devfs_fpdrop (fp=3D0xfffff8074cebaaf0)
     at /space/system/usr_src/sys/fs/devfs/devfs_vnops.c:211
#12 devfs_close_f (fp=3D0xfffff8074cebaaf0, td=3D<optimized out>)
     at /space/system/usr_src/sys/fs/devfs/devfs_vnops.c:787
#13 0xffffffff804c4d70 in fo_close (fp=3D0xfffff8074cebaaf0, td=3D<unavaila=
ble>)
     at /space/system/usr_src/sys/sys/file.h:364
#14 _fdrop (fp=3D0xfffff8074cebaaf0, td=3D<unavailable>)
     at /space/system/usr_src/sys/kern/kern_descrip.c:3120
#15 0xffffffff804c7eca in closef (fp=3D0xfffff8074cebaaf0,=20=20
td=3D0xfffffe0382567500)
     at /space/system/usr_src/sys/kern/kern_descrip.c:2606
#16 0xffffffff804c51e0 in closefp (fdp=3D0xfffffe0307cbd950, fd=3D3,
     fp=3D0xfffff8074cebaaf0, td=3D0xfffffe0382567500, holdleaders=3D<optim=
ized out>)
     at /space/system/usr_src/sys/kern/kern_descrip.c:1263
#17 0xffffffff808000ae in syscallenter (td=3D<optimized out>)
     at /space/system/usr_src/sys/amd64/amd64/../../kern/subr_syscall.c:162
---snip---

Bye,
Alexander.

--=20
http://www.Leidinger.net=20Alexander@Leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.org    netchild@FreeBSD.org  : PGP 0x8F31830F9F2772BF

--=_KKml1e9vaMIag8B9t2o7S-T
Content-Type: application/pgp-signature
Content-Description: Digitale PGP-Signatur
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1

iQIcBAABAgAGBQJfcfa6AAoJEBINsJsD+NiGwCIQAJE4+mpH6cnHJ8DwfwWy4l/L
X+0JqGPuNN9SJcJzPBD7gTebykJBmHHcTse3sTgHPNHugOx7NnBsolv1xlw1qoVk
oQCaDVZtsYpac2Z3hgeOyP5uQg5ecIngXHWgTEY+sLCr6/1mx6sYI59cDjr9k/jv
gzN/RMAGs8YVkyOpYP9hg3IF9IryKHPdTxX6APCLgJNBFypK/a6J8eTdDNxrhcCQ
XpNEyEziNeSeD+VjMgHEdFa9UvQzpMPTWgAf75YNwunKJoLNPkO3XnXvYcW3vK5b
WlTHcT9/1VCZ/LZlXcPlEqgjdOiEcto1gxSx/AIYl68GtA+EFRORLfw7Xa5RasiP
AyL96l84fy2peM7wShGPeASOGmAWLPrY1SENztEz+7zlOf9T0+SyEDS4zaA6eHJn
BNz2qB1/MrkvFSi+QWRJI20zHCXqm1x5kdqAyoeB6rWvNkWxxhWCMCWysOj8ToyM
ner1HMPMBUI1AFnDxMntMsXXgFiLau+NbKvz+8IjAyfRQ4cXbffRrL2TFtPQUa6n
DHxM/YP5TMcUeWoR3ZlApcNb6ZNbNdxkMaD8TIfibAQS1qnveCZ7hsOZYDI6tUIC
DhdGOvVrmbShlrn0L6hdt6lU5AKlaFcJLzqmi61D0JZjcfVzJOoVBl7BXvDytb+y
V2tbEJ8Hia8GRZGvpEna
=/Z8p
-----END PGP SIGNATURE-----

--=_KKml1e9vaMIag8B9t2o7S-T--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20200928164410.Horde.mYBkuEeD_Q6xgnKnwNomv7P>