Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 25 Jan 2016 23:15:21 +0000
From:      bugzilla-noreply@freebsd.org
To:        freebsd-bugs@FreeBSD.org
Subject:   [Bug 206624] igb(4) fails at fragmented mbufs
Message-ID:  <bug-206624-8@https.bugs.freebsd.org/bugzilla/>

next in thread | raw e-mail | index | archive | help
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D206624

            Bug ID: 206624
           Summary: igb(4) fails at fragmented mbufs
           Product: Base System
           Version: 11.0-CURRENT
          Hardware: Any
                OS: Any
            Status: New
          Severity: Affects Some People
          Priority: ---
         Component: kern
          Assignee: freebsd-bugs@FreeBSD.org
          Reporter: glebius@FreeBSD.org

Hi!

There are at least two evindences that igb(4) fails to transmit fragmented
mbufs. Of course, fragmented mbufs degrade performance and their appearance=
 in
the stack should be avoided, but still drivers must be capable to transmit
packets, instead of failing them silently.

1) First evidence came in with some restructuring to the ARP code. It bite =
me,
and I didn't have time to investigate, quickly fixed on the ARP side.

------------------------------------------------------------------------
r288991 | glebius | 2015-10-07 06:10:26 -0700 (=D1=81=D1=80, 07 =D0=BE=D0=
=BA=D1=82. 2015) | 10 lines

Fix regression from r287779, that bite me. If we call m_pullup()
unconditionally, we end up with an mbuf chain of two mbufs, which
later in in_arpreply() is rewritten from ARP request to ARP reply
and is sent out. Looks like igb(4) (at least mine, and at least
at my network) fails on such mbuf chain, so ARP reply doesn't go
out wire. Thus, make the m_pullup() call conditional, as it is
everywhere. Of course, the bug in igb(?) should be investigated,
but better first fix the head. And unconditional m_pullup() was
suboptimal, anyway.

------------------------------------------------------------------------

2) Second evidence involves libalias(3). Let me quote Karim:

"I've hit a very interesting problem with ipfw-nat and local TCP traffic th=
at
has enough TCP options to hit a special case in m_megapullup(). Here is the
story:

I am using the following NIC:

igb0@pci0:4:0:0:        class=3D0x020000 card=3D0x00008086 chip=3D0x150e808=
6 rev=3D0x01
hdr=3D0x00

And when I do ipfw nat to locally emitted packets I see packets not being
processed in the igb driver for HW checksum. Now a quick search for m_pullu=
p in
the igb driver code will show that our igb driver expects a contiguous ethe=
rnet
+ ip header in igb_tx_ctx_setup(). Now the friendly m_megapullup() in alias=
.c
doesn't reserve any space before the ip header for the ethernet header after
its call to m_getcl like tcp_output.c (see m->m_data +=3D max_linkhdr in
tcp_output.c).

So the call to M_PREPEND() in ether_output() is forced to prepend a new mbuf
for the ethernet header, leading to a non contiguous ether + ip. This in tu=
rn
leads to a failure to properly read the IP protocol in the igb driver and a=
pply
the proper HW checksum function. Particularly this call in igb_tcp_ctx_setu=
p():
ip =3D (struct ip *)(mp->m_data + ehdrlen);

To reproduce the issue I simply create a NAT rule for an igb interface and
initiate a TCP connection locally going out through that interface (it shou=
ld
go through NAT obviously) something like:

ipfw nat 1 config igb0 reset
ipfw add 10 nat 1 via igb0

 Although you need to make sure you fill enough of the SYN packet to trigger
the allocation of new memory in m_megapullup. You can do this by using enou=
gh
TCP options so its filling up almost all of the 256 mbuf or make RESERVE
something like 300 bytes in alias.c.

The fix I propose is very simple and faster for all drivers, including the =
ones
that do perform a check for ether + ip to be contiguous upon accessing the =
IP
header. If the leading space is available it doesn't allocate any extra spa=
ce
(as it should for most cases) but if for some reason the mbuf used doesn't =
have
100 bytes (RESERVE in megapullup) of free space it will reserve some at the
front too. If the leading space isn't necessary then it won't cause any har=
m.


-Subproject commit cfe39807fe9b1a23c13f73aabde302046736fa1c
+Subproject commit cfe39807fe9b1a23c13f73aabde302046736fa1c-dirty
diff --git a/freebsd/sys/netinet/libalias/alias.c
b/freebsd/sys/netinet/libalias/alias.c
index 876e958..dc424a6 100644
--- a/freebsd/sys/netinet/libalias/alias.c
+++ b/freebsd/sys/netinet/libalias/alias.c
@@ -1757,7 +1757,8 @@ m_megapullup(struct mbuf *m, int len) {
         * writable and has some extra space for expansion.
         * XXX: Constant 100bytes is completely empirical. */
 #define        RESERVE 100
-   if (m->m_next =3D=3D NULL && M_WRITABLE(m) && M_TRAILINGSPACE(m) >=3D R=
ESERVE)
+ if (m->m_next =3D=3D NULL && M_WRITABLE(m) &&
+                 M_TRAILINGSPACE(m) >=3D RESERVE && M_LEADINGSPACE(m) >=3D
max_linkhdr)
                return (m);

        if (len <=3D MCLBYTES - RESERVE) {
@@ -1779,6 +1780,7 @@ m_megapullup(struct mbuf *m, int len) {
                goto bad;

        m_move_pkthdr(mcl, m);
+ mcl->m_data +=3D max_linkhdr;
        m_copydata(m, 0, len, mtod(mcl, caddr_t));
        mcl->m_len =3D mcl->m_pkthdr.len =3D len;
        m_freem(m);

It would be nice if some FBSD comitter could review and hopefully add this
patch to FBSD."

--=20
You are receiving this mail because:
You are the assignee for the bug.=



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-206624-8>