Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 13 Sep 2002 11:31:00 +0100 (BST)
From:      Dominic Froud <dominic@indigo-ic.co.uk>
To:        FreeBSD-gnats-submit@FreeBSD.org
Subject:   kern/42727: [PATCH] Wrong MTU in need-frag ICMP using IPSEC tunnels w/out GIF
Message-ID:  <200209131031.g8DAV0QI006119@the-mayor.dom>

next in thread | raw e-mail | index | archive | help

>Number:         42727
>Category:       kern
>Synopsis:       [PATCH] Wrong MTU in need-frag ICMP using IPSEC tunnels w/out GIF
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    freebsd-bugs
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Fri Sep 13 03:40:02 PDT 2002
>Closed-Date:
>Last-Modified:
>Originator:     Dominic Froud
>Release:        FreeBSD 4.6-RELEASE i386
>Organization:
>Environment:
System: FreeBSD the-mayor.dom 4.6-RELEASE FreeBSD 4.6-RELEASE #17: Wed Sep 11 17:13:53 BST 2002 root@the-mayor.dom:/usr/src/sys/compile/SERVER i386

Kernel options:
INET
INET6
IPSEC
IPSEC_ESP
IPSEC_DEBUG
MROUTING
IPFIREWALL
IPFIREWALL_FORWARD
IPDIVERT
RANDOM_IP_ID
ICMP_BANDLIM

Server has two Macronix 98715AEC-C 10/100BaseTX cards at dc0 and dc1.

net.inet.ipsec.dfbit=1

>Description:
I bridged my LAN (subnet 10.0.1.0/24) with a friend's LAN (10.0.0.0/24)
using IPSEC tunnels without GIF devices. I use FreeBSD 4.6 and he uses
Linux RedHat 7.x. My friend couldn't pull any packets from machines on
my LAN that required MTU reduction to prevent fragmentation, e.g. SMB
TCP packets. Upon further inspection, my FreeBSD server was telling the
machine on my LAN that fragmentation was needed but was suggesting an
incorrect MTU of 1500 instead of one that took the IPSEC tunnel headers
into account. This would cause the machine on my LAN to simply retry the
same over-sized packet again and again, causing the requesting machine
on his LAN to eventually timeout with a short read. [The short read
timeout problem is a common symptom of other MTU issues but this specific
issue can be accurately diagnosed].


There is code in netinet/ip_input.c:ip_forward() that should deal with
this but it never has the chance to do the calculation as some prior
IPSEC function call returns an error. In ip_forward(), before ip_output()
called, a rough copy of the top mbuf at 'm' is made and pointed to by
'mcopy'. Only the IP header and up to 8 bytes are copied - but the length
as stored in the packet header (m_pkthdr) remains unchanged and reflects
the original packet length.


If ip_forward()'s call to ip_output() fails with EMSGSIZE and the packet
would have transversed an IPSEC tunnel, then ipsec_setspidx() in
netinet6/ipsec.c would (eventually) be called. This would sanity check
the passed mbuf and fail with an error like: "ipsec_setspidx: total of
m_len(28) != pkthdr.len(1500), ignored."


The 28 is obviously the truncated length of mcopy (IP header + max 8
bytes) and the 1500 is the size of the original packet. Hence the rest
of the reduced MTU calculation would be stopped at this point and an
unchanged MTU used to construct the ICMP frag-needed packet.

>How-To-Repeat:
Bridge two subnets using IPSEC tunnels without the GIF device. If you
bridge the encapsulating machines themselves as well, you should end up
with 8 policies like the following:


81.5.133.243[any] 10.0.1.0/24[any] any
        in ipsec
        esp/tunnel/81.5.133.243-62.31.234.90/require
        spid=1 seq=7 pid=235
        refcnt=1
81.5.133.243[any] 62.31.234.90[any] any
        in ipsec
        esp/tunnel/81.5.133.243-62.31.234.90/require
        spid=3 seq=6 pid=235
        refcnt=1
10.0.0.0/24[any] 10.0.1.0/24[any] any
        in ipsec
        esp/tunnel/81.5.133.243-62.31.234.90/require
        spid=5 seq=5 pid=235
        refcnt=1
10.0.0.0/24[any] 62.31.234.90[any] any
        in ipsec
        esp/tunnel/81.5.133.243-62.31.234.90/require
        spid=7 seq=4 pid=235
        refcnt=1
10.0.1.0/24[any] 81.5.133.243[any] any
        out ipsec
        esp/tunnel/62.31.234.90-81.5.133.243/require
        spid=2 seq=3 pid=235
        refcnt=1
62.31.234.90[any] 81.5.133.243[any] any
        out ipsec
        esp/tunnel/62.31.234.90-81.5.133.243/require
        spid=4 seq=2 pid=235
        refcnt=1
10.0.1.0/24[any] 10.0.0.0/24[any] any
        out ipsec
        esp/tunnel/62.31.234.90-81.5.133.243/require
        spid=6 seq=1 pid=235
        refcnt=1
62.31.234.90[any] 10.0.0.0/24[any] any
        out ipsec
        esp/tunnel/62.31.234.90-81.5.133.243/require
        spid=8 seq=0 pid=235
        refcnt=1


I am 62.31.234.90 with protected subnet 10.0.1.0/24.
Peer is 81.5.133.243 with protected subnet 10.0.0.0/24.


I also have net.inet.ipsec.dfbit set to 1 via /etc/sysctl.conf.


I logged into peer's server and used smbclient to request a file from
10.0.1.20 (win98se machine). Just each test, make sure all your SAD
entries are 'mature' and relatively fresh (i.e. not about to die on you
during your test) using "setkey -D | egrep '(diff|state)'".


Use tcpdump to log data packets from, and icmp packets to, your
protected host (in my case this was 10.0.1.20). Increase IPSEC logging
using "sysctl net.key.debug=0x45". To turn these messages off, use
"sysctl net.key.debug=0".


Now try to transfer a file from your target host that is bigger than
your MTU (>1500 so say, 16Kbytes).


tcpdump will produce output like:


11:44:03.378193 10.0.1.20.139 > 81.5.133.243.43396: tcp 1460 (DF) (ttl 128, id 26226, len 1500)
11:44:03.387030 10.0.1.2 > 10.0.1.20: icmp: 81.5.133.243 unreachable - need to frag (mtu 1500) (DF) (ttl 64, id 48070, len 56)
11:44:04.778191 10.0.1.20.139 > 81.5.133.243.43396: tcp 1460 (DF) (ttl 128, id 26226, len 1500)
11:44:04.787022 10.0.1.2 > 10.0.1.20: icmp: 81.5.133.243 unreachable - need to frag (mtu 1500) (DF) (ttl 64, id 48070, len 56)
(pattern repeats)


Your console should show lines like:
Sep 10 11:44:03 the-mayor /kernel: ipsec_setspidx: total of m_len(28) != pkthdr.len(1500), ignored.


The requesting host on the remote LAN will timeout.

>Fix:
Simply update the packet length in mcopy->m_pkthdr.len to reflect the
truncated nature of mcopy. This can be done at line 1799 in
netinet/ip_input.c rev 1.130.2.35 for just the EMSGSIZE IPSEC case or
at line 1703 if this is of more general use within ip_forward() and
functions called by it. I've tried the following diff at both line 1703
and line 1799 and both cure the problem as expected. On my machine,
I've left the code in at line 1799 because I don't know if other code
using mcopy makes use of the original packet length.



--- patch begins here ---
--- ip_input.c  Wed Sep 11 17:55:09 2002
+++ ip_input.c-patched  Wed Sep 11 18:23:47 2002
@@ -1796,6 +1796,13 @@
                        int ipsechdr;
                        struct route *ro;


+                       /* Pretend original packet was only this long
+                        * as IPSEC functions like ipsec_setspidx(),
+                        * called by ispec4_getpolicybyaddr() below,
+                        * expect a sane mbuf chain.
+                        */
+                       mcopy->m_pkthdr.len = mcopy->m_len;
+
                        sp = ipsec4_getpolicybyaddr(mcopy,
                                                    IPSEC_DIR_OUTBOUND,
                                                    IP_FORWARDING,
--- patch ends here ---



tcpdump with patched kernel looks like:


17:17:43.108193 10.0.1.20.139 > 81.5.133.243.43396: tcp 1460 (DF) (ttl 128, id 26226, len 1500)
17:17:43.108779 10.0.1.20.139 > 81.5.133.243.43396: tcp 652 (DF) (ttl 128, id 26482, len 692)
17:17:43.114394 10.0.1.2 > 10.0.1.20: icmp: 81.5.133.243 unreachable - need to frag (mtu 1443) (DF) (ttl 64, id 39851, len 56)
17:17:43.115869 10.0.1.20.139 > 81.5.133.243.43396: tcp 1403 (DF) (ttl 128, id 26738, len 1443)
>Release-Note:
>Audit-Trail:
>Unformatted:

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-bugs" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200209131031.g8DAV0QI006119>