Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 28 Sep 2017 14:30:03 -0600
From:      Alan Somers <asomers@freebsd.org>
To:        Josh Gitlin <jgitlin@goboomtown.com>
Cc:        FreeBSD Net <freebsd-net@freebsd.org>
Subject:   Re: Help with mbuf exhaustion
Message-ID:  <CAOtMX2j7k7GLO2hm-QNJ9yef1V5WMP9SVbQs0p%2Bg7RJOabg-5w@mail.gmail.com>
In-Reply-To: <322F6F4B-1153-4ECE-B854-B2981B0CDDF2@goboomtown.com>
References:  <322F6F4B-1153-4ECE-B854-B2981B0CDDF2@goboomtown.com>

next in thread | previous in thread | raw e-mail | index | archive | help
First of all, 10.3-RELEASE-p2 is very old and has known security
vulnerabilities.  Have you tried 10.3-RELEASE-p21 or even 10.4-RELEASE
?

On Thu, Sep 28, 2017 at 1:30 PM, Josh Gitlin <jgitlin@goboomtown.com> wrote=
:
> Hi FreeBSD Gurus!
>
> We're having an issue with mbuf exhaustion on a FreeBSD server which was =
recently upgraded from 10.3-STABLE to 10.3-RELEASE-p2. Under the course of =
normal operation, we see mbuf usage steadily increasing until we reach kern=
.ipc.nmbufs limit, at which point the machine becomes unresponsive over the=
 network (due to lack of mbufs for network access) and the console displays=
:
>
> cxl0: Interface stopped DISTRIBUTING, possible flapping
> cxl1: Interface stopped DISTRIBUTING, possible flapping
> [zone: mbuf] kern.ipc.nmbufs limit reached
> [zone: mbuf] kern.ipc.nmbufs limit reached
> The machine runs pf and acts as a packet filter, router, gateway and DHCP=
/DNS server. It has two Chelsio NICs in it, and is a CARP master with a sec=
ondary. The secondary has identical configuration of hardware and software =
and does not exhibit this issue.
>
> Given the downtime this causes, we set up our Nagios/Check_MK to graph th=
e output of `netstat -m` and alert when mbufs in use approaches `kern.ipc.n=
mbufs` and we see a steady linear increase in mbuf usage until we reboot:
>
> https://i.stack.imgur.com/8bzAq.png <https://i.stack.imgur.com/8bzAq.png>;
>
> mbuf *clusters* in use does not change when this happens and increasing m=
buf cluster limits has no effect:
>
> https://i.stack.imgur.com/7OzdN.png <https://i.stack.imgur.com/7OzdN.png>;
>
> This appears to be a kernel bug of some sort to me, looking for advice on=
 further troubleshooting or assistance in resolving this!
>
> Helpful (maybe) information:
>
> netstat -m:
>
> 679270/3080/682350 mbufs in use (current/cache/total)
> 10243/1657/11900/985360 mbuf clusters in use (current/cache/total/max)
> 10243/1648 mbuf+clusters out of packet secondary zone in use (current/cac=
he)
> 8128/482/8610/124025 4k (page size) jumbo clusters in use (current/cache/=
total/max)
> 0/0/0/36748 9k jumbo clusters in use (current/cache/total/max)
> 128/0/128/20670 16k jumbo clusters in use (current/cache/total/max)
> 224863K/6012K/230875K bytes allocated to network (current/cache/total)
> 0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters)
> 0/0/0 requests for mbufs delayed (mbufs/clusters/mbuf+clusters)
> 0/0/0 requests for jumbo clusters delayed (4k/9k/16k)
> 0/0/0 requests for jumbo clusters denied (4k/9k/16k)
> 0 requests for sfbufs denied
> 0 requests for sfbufs delayed
> 0 requests for I/O initiated by sendfile
>
> vmstat -z|grep -E '^ITEM|mbuf':
>
> ITEM                   SIZE  LIMIT     USED     FREE      REQ FAIL SLEEP
> mbuf_packet:            256, 1587540,   10239,    1652,84058893,   0,   0
> mbuf:                   256, 1587540,  671533,    1206,914478880,   0,   =
0
> mbuf_cluster:          2048, 985360,   11891,       9,   11891,   0,   0
> mbuf_jumbo_page:       4096, 124025,    8128,     512,15011847,   0,   0
> mbuf_jumbo_9k:         9216,  36748,       0,       0,       0,   0,   0
> mbuf_jumbo_16k:       16384,  20670,     128,       0,     128,   0,   0
> mbuf_ext_refcnt:          4,      0,       0,       0,       0,   0,   0
>
> vmstat -m:
>
>          Type InUse MemUse HighUse Requests  Size(s)
>  NFSD lckfile     1     1K       -        1  256
>      filedesc   103   383K       -  1134731  16,32,128,2048,4096,8192,163=
84,65536
>         sigio     1     1K       -        1  64
>      filecaps     0     0K       -      973  64
>       kdtrace   292    59K       -  1099386  64,256
>          kenv   121    13K       -      125  16,32,64,128,8192
>        kqueue    14    22K       -     5374  256,2048,8192
>     proc-args    54     5K       -   578448  16,32,64,128,256
>         hhook     2     1K       -        2  256
>       ithread   146    24K       -      146  32,128,256
>        KTRACE   100    13K       -      100  128
>        NFS fh     1     1K       -      584  32
>        linker   207  1052K       -      234  16,32,64,128,256,512,1024,20=
48,4096,8192,16384,65536
>         lockf    29     3K       -    20042  64,128
>    loginclass     2     1K       -     1192  64
>        devbuf 17205 36362K       -    17523  16,32,64,128,256,512,1024,20=
48,4096,8192,65536
>          temp   149    51K       -  1280113  16,32,64,128,256,512,1024,20=
48,4096,8192,16384,65536
>        ip6opt     5     2K       -        6  256
>        ip6ndp    27     2K       -       27  64,128
>        module   230    29K       -      230  128
>      mtx_pool     2    16K       -        2  8192
>           osd     3     1K       -        5  16,32,64
>      pmchooks     1     1K       -        1  128
>          pgrp    30     4K       -     2222  128
>       session    29     4K       -     2187  128
>          proc     2    32K       -        2  16384
>       subproc   211   368K       -  1099014  512,4096
>          cred   204    32K       -  6025704  64,256
>        plimit    19     5K       -     3985  256
>       uidinfo     9     5K       -    11892  128,4096
>  NFSD session     1     1K       -        1  1024
>        sysctl     0     0K       -    63851  16,32,64
>     sysctloid  7196   365K       -     7369  16,32,64,128
>     sysctltmp     0     0K       -    17834  16,32,64,128
>       tidhash     1    32K       -        1  32768
>       callout     5  2184K       -        5
>          umtx   522    66K       -      522  128
>      p1003.1b     1     1K       -        1  16
>          SWAP     2   549K       -        2  64
>           bus   802    86K       -     6536  16,32,64,128,256,1024
>        bus-sc    57  1671K       -     2431  16,32,64,128,256,512,1024,20=
48,4096,8192,16384,65536
>     newnfsmnt     1     1K       -        1  1024
>       devstat     8    17K       -        8  32,4096
>  eventhandler   116    10K       -      116  64,128
>          kobj   124   496K       -      296  4096
>      acpiintr     1     1K       -        1  64
>       Per-cpu     1     1K       -        1  32
>        acpica 14355  1420K       -   216546  16,32,64,128,256,512,1024,20=
48,4096
>      pci_link    16     2K       -       16  64,128
>     pfs_nodes    21     6K       -       21  256
>          rman   316    37K       -      716  16,32,128
>          sbuf     1     1K       -    41375  16,32,64,128,256,512,1024,20=
48,4096,8192,16384
>        sglist     8     8K       -        8  1024
>          GEOM    88    15K       -     1871  16,32,64,128,256,512,1024,20=
48,8192,16384
>       acpipwr     5     1K       -        5  64
>     taskqueue    43     7K       -       43  16,32,256
>        Unitno    22     2K       -  1208250  32,64
>          vmem     3   144K       -        6  1024,4096,8192
>      ioctlops     0     0K       -   185700  256,512,1024,2048,4096
>        select    89    12K       -       89  128
>           iov     0     0K       - 19808992  16,64,128,256,512,1024
>           msg     4    30K       -        4  2048,4096,8192,16384
>           sem     4   106K       -        4  2048,4096
>           shm     1    32K       -        1  32768
>           tty    20    20K       -      499  1024
>           pts     1     1K       -      480  256
>          accf     2     1K       -        2  64
>      mbuf_tag     0     0K       - 291472282  32,64,128
>         shmfd     1     8K       -        1  8192
>        soname    32     4K       -  1210442  16,32,128
>           pcb    36   663K       -    76872  16,32,64,128,1024,2048,8192
>       CAM CCB     0     0K       -   182128  2048
>           acl     0     0K       -        2  4096
>      vfscache     1  2048K       -        1
>    cl_savebuf     0     0K       -      480  64
>      vfs_hash     1  1024K       -        1
>        vnodes     1     1K       -        1  256
>       entropy  1026    65K       -    49107  32,64,4096
>         mount    64     3K       -      140  16,32,64,128,256
>   vnodemarker     0     0K       -     4212  512
>           BPF   112 20504K       -      131  16,64,128,512,4096
>      CAM path    11     1K       -       63  32
>         ifnet    29    57K       -       30  128,256,2048
>        ifaddr   315   105K       -      315  32,64,128,256,512,2048,4096
>   ether_multi   232    13K       -      282  16,32,64
>         clone    10     2K       -       10  128
>        arpcom    23     1K       -       23  16
>           gif     4     1K       -        4  32,256
>       lltable   155    53K       -      551  256,512
>          UART     6     5K       -        6  16,1024
>          vlan    56     5K       -       74  64,128
>      acpitask     1    16K       -        1  16384
>       acpisem   110    14K       -      110  128
>     raid_data     0     0K       -      108  32,128,256
>      routetbl   516   136K       -   101735  32,64,128,256,512
>          igmp    28     7K       -       28  256
>          CARP    76    30K       -       83  16,32,64,128,256,512,1024
>          ipid     2    24K       -        2  8192,16384
>    in_mfilter   112   112K       -      112  1024
>      in_multi    43    11K       -       43  256
>   ip_moptions   224    35K       -      224  64,256
>    CAM periph     7     2K       -       19  16,32,64,128,256
>       acpidev   128     8K       -      128  64
>     CAM queue    15     5K       -       39  16,32,512
> encap_export_host     4     4K       -        4  1024
>     sctp_a_it     0     0K       -       36  16
>      sctp_vrf     1     1K       -        1  64
>      sctp_ifa   115    15K       -      204  128
>      sctp_ifn    21     3K       -       23  128
>     sctp_iter     0     0K       -       36  256
>     hostcache     1    32K       -        1  32768
>      syncache     1    64K       -        1  65536
>   in6_mfilter     1     1K       -        1  1024
>     in6_multi    15     2K       -       15  32,256
>  ip6_moptions     2     1K       -        2  32,256
> CAM dev queue     6     1K       -        6  64
>        kbdmux     6    22K       -        6  16,512,1024,2048,16384
>           mld    26     4K       -       26  128
>           LED    20     2K       -       20  16,128
>   inpcbpolicy   365    12K       -   119277  32
>      secasvar     7     2K       -      214  256
>        sahead    10     3K       -       10  256
>   ipsecpolicy   748   187K       -   241562  256
>  ipsecrequest    18     3K       -       72  128
>    ipsec-misc    56     2K       -     1712  16,32,64
>     ipsec-saq     0     0K       -       24  128
>     ipsec-reg     3     1K       -        3  32
>        pfsync     2     2K       -      893  32,256,1024
>       pf_temp     0     0K       -       78  128
>       pf_hash     3  2880K       -        3
>      pf_ifnet    36    11K       -     9510  256,2048
>        pf_tag     7     1K       -        7  128
>       pf_altq     5     2K       -      125  256
>       pf_rule   964   904K       -    17500  128,1024
>       pf_osfp  1130   115K       -    28250  64,128
>      pf_table    49    98K       -      948  2048
>        crypto    37    11K       -     1072  64,128,256,512,1024
>         xform     7     1K       -  1530156  16,32,64,128,256
>           rpc    12    20K       -      304  64,128,512,1024,8192
> audit_evclass   187     6K       -      231  32
>   ufs_dirhash    93    18K       -       93  16,32,64,128,256,512
>     ufs_quota     1  1024K       -        1
>     ufs_mount     3    13K       -        3  512,4096,8192
>     vm_pgdata     2   513K       -        2  128
>       UMAHash     5     6K       -       10  512,1024,2048
>       CAM SIM     6     2K       -        6  256
>       CAM XPT    30     3K       -     1850  16,32,64,128,256,512,1024,20=
48,65536
>       CAM DEV     9    18K       -       16  2048
>   fpukern_ctx     3     6K       -        3  2048
>       memdesc     1     4K       -        1  4096
>           USB    23    33K       -       24  16,128,256,512,1024,2048,409=
6
>        DEVFS3   136    34K       -     2027  256
>        DEVFS1   108    54K       -      594  512
>        apmdev     1     1K       -        1  128
>    madt_table     0     0K       -        1  4096
>    DEVFS_RULE    55    26K       -       55  64,512
>         DEVFS    12     1K       -       13  16,128
>        DEVFSP    22     2K       -      167  64
>       io_apic     1     2K       -        1  2048
>        isadev     8     1K       -        8  128
>           MCA    15     2K       -       15  32,128
>           msi    30     4K       -       30  128
>      nexusdev     5     1K       -        5  16
>        USBdev    21     8K       -       21  32,64,128,256,512,1024,4096
> NFSD V4client     1     1K       -        1  256
>          cdev     5     2K       -        5  256
>         cxgbe    41   956K       -       44  128,256,512,1024,2048,4096,8=
192,16384
>          ipmi     0     0K       -    20155  128,2048
>     htcp data   127     4K       -    13675  32
>    aesni_data     3     3K       -        3  1024
>       solaris   142 12302K       -     3189  16,32,64,128,512,1024,8192
>    kstat_data     6     1K       -        6  64
>
> TCP States:
>
> https://i.stack.imgur.com/G7850.png
>
>
> --
>  <http://www.goboomtown.com/>;
> Josh Gitlin
> Senior Full Stack Developer
> (415) 690-1610 x155
>
> Stay up to date and join the conversation in Relay <http://relay.goboomto=
wn.com/>.
>
> _______________________________________________
> freebsd-net@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org"



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAOtMX2j7k7GLO2hm-QNJ9yef1V5WMP9SVbQs0p%2Bg7RJOabg-5w>