Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 13 Jan 2016 14:55:42 -0500
From:      Karim Fodil-Lemelin <fodillemlinkarim@gmail.com>
To:        freebsd-ipfw@freebsd.org, freebsd-net <freebsd-net@FreeBSD.org>
Subject:   ipfw NAT, igb and hardware checksums
Message-ID:  <5696ABBE.4050709@gmail.com>

next in thread | raw e-mail | index | archive | help
Hi,

I've hit a very interesting problem with ipfw-nat and local TCP traffic 
that has enough TCP options to hit a special case in m_megapullup(). 
Here is the story:

I am using the following NIC:

igb0@pci0:4:0:0:        class=0x020000 card=0x00008086 chip=0x150e8086 
rev=0x01 hdr=0x00

And when I do ipfw nat to locally emitted packets I see packets not 
being processed in the igb driver for HW checksum. Now a quick search 
for m_pullup in the igb driver code will show that our igb driver 
expects a contiguous ethernet + ip header in igb_tx_ctx_setup(). Now the 
friendly m_megapullup() in alias.c doesn't reserve any space before the 
ip header for the ethernet header after its call to m_getcl like 
tcp_output.c (see m->m_data += max_linkhdr in tcp_output.c).

So the call to M_PREPEND() in ether_output() is forced to prepend a new 
mbuf for the ethernet header, leading to a non contiguous ether + ip. 
This in turn leads to a failure to properly read the IP protocol in the 
igb driver and apply the proper HW checksum function. Particularly this 
call in igb_tcp_ctx_setup(): ip = (struct ip *)(mp->m_data + ehdrlen);

To reproduce the issue I simply create a NAT rule for an igb interface 
and initiate a TCP connection locally going out through that interface 
(it should go through NAT obviously) something like:

ipfw nat 1 config igb0 reset
ipfw add 10 nat 1 via igb0

  Although you need to make sure you fill enough of the SYN packet to 
trigger the allocation of new memory in m_megapullup. You can do this by 
using enough TCP options so its filling up almost all of the 256 mbuf or 
make RESERVE something like 300 bytes in alias.c.

The fix I propose is very simple and faster for all drivers, including 
the ones that do perform a check for ether + ip to be contiguous upon 
accessing the IP header. If the leading space is available it doesn't 
allocate any extra space (as it should for most cases) but if for some 
reason the mbuf used doesn't have 100 bytes (RESERVE in megapullup) of 
free space it will reserve some at the front too. If the leading space 
isn't necessary then it won't cause any harm.


-Subproject commit cfe39807fe9b1a23c13f73aabde302046736fa1c
+Subproject commit cfe39807fe9b1a23c13f73aabde302046736fa1c-dirty
diff --git a/freebsd/sys/netinet/libalias/alias.c 
b/freebsd/sys/netinet/libalias/alias.c
index 876e958..dc424a6 100644
--- a/freebsd/sys/netinet/libalias/alias.c
+++ b/freebsd/sys/netinet/libalias/alias.c
@@ -1757,7 +1757,8 @@ m_megapullup(struct mbuf *m, int len) {
          * writable and has some extra space for expansion.
          * XXX: Constant 100bytes is completely empirical. */
  #define        RESERVE 100
-   if (m->m_next == NULL && M_WRITABLE(m) && M_TRAILINGSPACE(m) >= RESERVE)
+ if (m->m_next == NULL && M_WRITABLE(m) &&
+                 M_TRAILINGSPACE(m) >= RESERVE && M_LEADINGSPACE(m) >= 
max_linkhdr)
                 return (m);

         if (len <= MCLBYTES - RESERVE) {
@@ -1779,6 +1780,7 @@ m_megapullup(struct mbuf *m, int len) {
                 goto bad;

         m_move_pkthdr(mcl, m);
+ mcl->m_data += max_linkhdr;
         m_copydata(m, 0, len, mtod(mcl, caddr_t));
         mcl->m_len = mcl->m_pkthdr.len = len;
         m_freem(m);

It would be nice if some FBSD comitter could review and hopefully add 
this patch to FBSD.

Thank you,

Karim.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?5696ABBE.4050709>