Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 20 Mar 2014 16:21:56 +0100
From:      Markus Gebert <markus.gebert@hostpoint.ch>
To:        wollman@bimajority.org
Cc:        jfv@freebsd.org, freebsd-net@freebsd.org, freebsd-stable@freebsd.org
Subject:   Re: Network stack returning EFBIG?
Message-ID:  <B0A8C00F-67C7-40B2-94DE-9449574FF63F@hostpoint.ch>
In-Reply-To: <201403201351.s2KDpghe080116@hergotha.csail.mit.edu>
References:  <201403201351.s2KDpghe080116@hergotha.csail.mit.edu>

next in thread | previous in thread | raw e-mail | index | archive | help

On 20.03.2014, at 14:51, wollman@bimajority.org wrote:

> In article <21290.60558.750106.630804@hergotha.csail.mit.edu>, I =
wrote:
>=20
>> Since we put this server into production, random network system calls
>> have started failing with [EFBIG] or maybe sometimes [EIO].  I've
>> observed this with a simple ping, but various daemons also log the
>> errors:
>> Mar 20 09:22:04 nfs-prod-4 sshd[42487]: fatal: Write failed: File too
>> large [preauth]
>> Mar 20 09:23:44 nfs-prod-4 nrpe[42492]: Error: Could not complete SSL
>> handshake. 5
>=20
> I found at least one call stack where this happens and it does get
> returned all the way to userspace:
>=20
> 17  15547   _bus_dmamap_load_buffer:return=20
>              kernel`_bus_dmamap_load_mbuf_sg+0x5f
>              kernel`bus_dmamap_load_mbuf_sg+0x38
>              kernel`ixgbe_xmit+0xcf
>              kernel`ixgbe_mq_start_locked+0x94
>              kernel`ixgbe_mq_start+0x12a
>              if_lagg.ko`lagg_transmit+0xc4
>              kernel`ether_output_frame+0x33
>              kernel`ether_output+0x4fe
>              kernel`ip_output+0xd74
>              kernel`tcp_output+0xfea
>              kernel`tcp_usr_send+0x325
>              kernel`sosend_generic+0x3f6
>              kernel`soo_write+0x5e
>              kernel`dofilewrite+0x85
>              kernel`kern_writev+0x6c
>              kernel`sys_write+0x64
>              kernel`amd64_syscall+0x5ea
>              kernel`0xffffffff808443c7

This looks pretty similar to what we=92ve seen when we got EFBIG:

 3  28502   _bus_dmamap_load_buffer:return=20
              kernel`_bus_dmamap_load_mbuf_sg+0x5f
              kernel`bus_dmamap_load_mbuf_sg+0x38
              kernel`ixgbe_xmit+0xcf
              kernel`ixgbe_mq_start_locked+0x94
              kernel`ixgbe_mq_start+0x12a
              kernel`ether_output_frame+0x33
              kernel`ether_output+0x4fe
              kernel`ip_output+0xd74
              kernel`rip_output+0x229
              kernel`sosend_generic+0x3f6
              kernel`kern_sendit+0x1a3
              kernel`sendit+0xdc
              kernel`sys_sendto+0x4d
              kernel`amd64_syscall+0x5ea
              kernel`0xffffffff80d35667

In our case it looks like some of the ixgbe tx queues get stuck, and =
some don=92t. You can test, wether your server shows the same symptoms =
with this command:

# for CPU in {0..7}; do echo "CPU${CPU}"; cpuset -l ${CPU} ping -i 0.5 =
-c 2 -W 1 10.0.0.1 | grep sendto; done

We also use 82599EB based ixgbe controllers on affected systems.

Also see these two threads on freebsd-net:

http://lists.freebsd.org/pipermail/freebsd-net/2014-February/037967.html
http://lists.freebsd.org/pipermail/freebsd-net/2014-March/038061.html

I have started the second one, and there are some more details of what =
we were seeing in case you=92re interested.

Then there is:

http://www.freebsd.org/cgi/query-pr.cgi?pr=3D183390
and:
https://bugs.freenas.org/issues/4560


Markus=



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?B0A8C00F-67C7-40B2-94DE-9449574FF63F>