Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 07 Mar 2017 09:08:24 +0000
From:      bugzilla-noreply@freebsd.org
To:        freebsd-bugs@FreeBSD.org
Subject:   [Bug 217606] Bridge stops working after some days
Message-ID:  <bug-217606-8@https.bugs.freebsd.org/bugzilla/>

next in thread | raw e-mail | index | archive | help
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D217606

            Bug ID: 217606
           Summary: Bridge stops working after some days
           Product: Base System
           Version: 11.0-RELEASE
          Hardware: amd64
                OS: Any
            Status: New
          Severity: Affects Some People
          Priority: ---
         Component: kern
          Assignee: freebsd-bugs@FreeBSD.org
          Reporter: aiko@torrentkino.de

Hello,

we recently upgraded our Bridging FWs from 10.1-RELEASE-pxx to 11.0-RELEASE=
-p8.
And since then they stop passing through traffic after some time. In this c=
ase
after ~4 days. One of them stopped yesterday evening. (We have a failover
mechanism to reduce the impact.)

$ uptime
9:26AM  up 4 days, 19:22, 2 users, load averages: 0.12, 0.06, 0.01

bridge0 consists of ix0/ix1:

ix0: <Intel(R) PRO/10GbE PCI-Express Network Driver, Version - 3.1.13-k> po=
rt
0xecc0-0xecdf mem 0xd9e80000-0xd9efffff,0xd9ff8000-0xd9ffbfff irq 48 at dev=
ice
0.0 numa-domain 0 on pci2
ix1: <Intel(R) PRO/10GbE PCI-Express Network Driver, Version - 3.1.13-k> po=
rt
0xece0-0xecff mem 0xd9f00000-0xd9f7ffff,0xd9ffc000-0xd9ffffff irq 52 at dev=
ice
0.1 numa-domain 0 on pci2

In case of error I see the following for IPv4. The bridge does IPv6 as well.
Same problem.

ix0: A load balancer is asking for its default GW. No reply...

$ tcpdump -i ix0 \( arp \)
09:37:47.330361 ARP, Request who-has A.A.A.A tell B.B.B.B, length 46

ix1: The default GW actually sends a reply. I can see it on ix1.

$ tcpdump -i ix1 \( arp \)
09:38:59.328956 ARP, Request who-has A.A.A.A tell B.B.B.B, length 46
09:38:59.329374 ARP, Reply A.A.A.A is-at 00:00:0a:0b:0c:0d (oui Cisco), len=
gth
46

A tcpdump for bridge0 show the same as ix1.

Some numbers of the currently not working system:

$ netstat -m
82409/6901/89310 mbufs in use (current/cache/total)
38692/4094/42786/1015426 mbuf clusters in use (current/cache/total/max)
38692/4065 mbuf+clusters out of packet secondary zone in use (current/cache)
0/192/192/507713 4k (page size) jumbo clusters in use (current/cache/total/=
max)
0/0/0/150433 9k jumbo clusters in use (current/cache/total/max)
0/0/0/84618 16k jumbo clusters in use (current/cache/total/max)
97986K/10681K/108667K bytes allocated to network (current/cache/total)
0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters)
0/0/0 requests for mbufs delayed (mbufs/clusters/mbuf+clusters)
0/0/0 requests for jumbo clusters delayed (4k/9k/16k)
0/0/0 requests for jumbo clusters denied (4k/9k/16k)
0 sendfile syscalls
0 sendfile syscalls completed without I/O request
0 requests for I/O initiated by sendfile
0 pages read by sendfile as part of a request
0 pages were valid at time of a sendfile request
0 pages were requested for read ahead by applications
0 pages were read ahead by sendfile
0 times sendfile encountered an already busy page
0 requests for sfbufs denied
0 requests for sfbufs delayed

$ netstat -b -d -h -i bridge0
Name    Mtu Network       Address              Ipkts Ierrs Idrop     Ibytes=
=20=20=20
Opkts Oerrs     Obytes  Coll  Drop
ix0    1.5K <Link#1>      00:00:00:00:00:0a      12G     0     0        11T=
=20=20=20=20
7.9G     0       1.1T     0  335k
ix1    1.5K <Link#2>      00:00:00:00:00:0b     7.9G     0     0       1.2T=
=20=20=20=20
 12G     0        11T     0     0
bridg  1.5K <Link#8>      00:00:00:00:00:0c      20G     0     0        12T=
=20=20=20=20
 20G  335k        12T     0     0

What I did so far:

# Disable Ethernet Flow-Control
# https://wiki.freebsd.org/10gFreeBSD/Router
dev.ix.0.fc=3D0
dev.ix.1.fc=3D0

# Disable TSO
cloned_interfaces=3D"bridge0"
ifconfig_bridge0=3D"addm ix0 addm ix1 up"
ifconfig_ix0=3D"up -tso"
ifconfig_ix1=3D"up -tso"

I found the following bug reports:
2004: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D185633
2016: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D212749

And since this system uses PF and Scrubbing. I applied this patch manually:
https://reviews.freebsd.org/D7780

But I have no success so far.

Shutting down ix0/ix1 and bringing them up makes brigde0 responsive again. =
But
time now works against me. Netstat after that procedure:

$ netstat -m
33281/56284/89565 mbufs in use (current/cache/total)
33280/9756/43036/2015426 mbuf clusters in use (current/cache/total/max)
33280/9730 mbuf+clusters out of packet secondary zone in use (current/cache)
0/192/192/507713 4k (page size) jumbo clusters in use (current/cache/total/=
max)
0/0/0/150433 9k jumbo clusters in use (current/cache/total/max)
0/0/0/84618 16k jumbo clusters in use (current/cache/total/max)
74880K/34351K/109231K bytes allocated to network (current/cache/total)
0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters)
0/0/0 requests for mbufs delayed (mbufs/clusters/mbuf+clusters)
0/0/0 requests for jumbo clusters delayed (4k/9k/16k)
0/0/0 requests for jumbo clusters denied (4k/9k/16k)
0 sendfile syscalls
0 sendfile syscalls completed without I/O request
0 requests for I/O initiated by sendfile
0 pages read by sendfile as part of a request
0 pages were valid at time of a sendfile request
0 pages were requested for read ahead by applications
0 pages were read ahead by sendfile
0 times sendfile encountered an already busy page
0 requests for sfbufs denied
0 requests for sfbufs delayed

Kind regards,
Aiko

--=20
You are receiving this mail because:
You are the assignee for the bug.=



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-217606-8>