Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 14 Aug 2002 10:03:36 -0700
From:      Steve Francis <steve@expertcity.com>
To:        freebsd-hackers@freebsd.org, freebsd-net@freebsd.org
Subject:   pmtu-d broken
Message-ID:  <3D5A8D68.AE7B43A5@expertcity.com>

next in thread | raw e-mail | index | archive | help
I find this hard to believe, but it seem PMTU-D is broken up to and
including 4.6.1-RELEASE-p10 (the latest I've tried. Also tried 4.4)

The behaviour of FreeBSD is such that when it sends a too large packet,
and receives a fragmentation required-DF bit set ICMP, it does not honor
it for the packet that caused the ICMP. It does correctly put the new
MTU in its cloned route table, and does correctly send future packets in
segment of size < the mtu, but it keeps retransmitting the packet that
caused the ICMP in the original, too big size, so it never makes it, and
just keeps generating more ICMPs.

tcpdump examples:
First, note that there is no specific entry for 10.4.0.80
dell350-12# netstat -anlr | grep 10.4
10.4.1.55          63.251.224.129     UGHW        1      990   1500
fxp0
10.4.1.58          63.251.224.129     UGHW        7   478339   1420
fxp0
10.4.1.233         63.251.224.129     UGHW3       0     2735   1420
fxp0
dell350-12#
From 10.4.0.80, which is on the other side of a VPN tunnel with MTU of
1420 bytes, I do wget:
Note that despite the ICMP messages telling it fragmentation is
required, the freeBSD box keeps sending 1500 byte packets with DF set.
dell350-12# tcpdump -vvi fxp0 host wonko.corp or host 10.16.5.8
tcpdump: listening on fxp0
09:40:25.938609 10.4.0.80.2793 > dell350-12.snv.http: S [tcp sum ok]
3671603378:
3671603378(0) win 16384 <mss 1460,nop,wscale 0,nop,nop,timestamp
1076950329 0> (
DF) (ttl 61, id 35804, len 60)
09:40:25.938665 dell350-12.snv.http > 10.4.0.80.2793: S [tcp sum ok]
3749220980:
3749220980(0) ack 3671603379 win 17376 <mss 1460,nop,wscale
0,nop,nop,timestamp
102689414 1076950329> (DF) (ttl 64, id 43056, len 60)
09:40:25.960106 10.4.0.80.2793 > dell350-12.snv.http: . [tcp sum ok]
1:1(0) ack
1 win 17376 <nop,nop,timestamp 1076950332 102689414> (DF) (ttl 61, id
35806, len
 52)
09:40:25.961626 10.4.0.80.2793 > dell350-12.snv.http: P 1:147(146) ack 1
win 173
76 <nop,nop,timestamp 1076950332 102689414> (DF) (ttl 61, id 35823, len
198)
09:40:25.961647 dell350-12.snv.http > 10.4.0.80.2793: . [tcp sum ok]
1:1(0) ack
147 win 17230 <nop,nop,timestamp 102689416 1076950332> (DF) (ttl 64, id
43078, l
en 52)
09:40:25.962318 dell350-12.snv.http > 10.4.0.80.2793: . 1:1449(1448) ack
147 win
 17376 <nop,nop,timestamp 102689416 1076950332> (DF) (ttl 64, id 43079,
len 1500
)
09:40:25.962337 dell350-12.snv.http > 10.4.0.80.2793: . 1449:2897(1448)
ack 147
win 17376 <nop,nop,timestamp 102689416 1076950332> (DF) (ttl 64, id
43080, len 1
500)
09:40:25.962352 dell350-12.snv.http > 10.4.0.80.2793: . 2897:4345(1448)
ack 147
win 17376 <nop,nop,timestamp 102689416 1076950332> (DF) (ttl 64, id
43081, len 1
500)
09:40:25.963573 10.16.5.8 > dell350-12.snv: icmp: 10.4.0.80 unreachable
- need t
o frag (mtu 1420) for dell350-12.snv.http > 10.4.0.80.2793: [|tcp] (DF)
(ttl 62,
 id 43079, len 1500) (ttl 254, id 16874, len 56)
09:40:25.963696 10.16.5.8 > dell350-12.snv: icmp: 10.4.0.80 unreachable
- need t
o frag (mtu 1420) for dell350-12.snv.http > 10.4.0.80.2793: [|tcp] (DF)
(ttl 62,
 id 43080, len 1500) (ttl 254, id 16875, len 56)
09:40:25.963826 10.16.5.8 > dell350-12.snv: icmp: 10.4.0.80 unreachable
- need t
o frag (mtu 1420) for dell350-12.snv.http > 10.4.0.80.2793: [|tcp] (DF)
(ttl 62,
 id 43081, len 1500) (ttl 254, id 16876, len 56)
09:40:26.953112 dell350-12.snv.http > 10.4.0.80.2793: . 1:1449(1448) ack
147 win
 17376 <nop,nop,timestamp 102689516 1076950332> (DF) (ttl 64, id 43456,
len 1500
)
09:40:26.954116 10.16.5.8 > dell350-12.snv: icmp: 10.4.0.80 unreachable
- need t
o frag (mtu 1420) for dell350-12.snv.http > 10.4.0.80.2793: [|tcp] (DF)
(ttl 62,
 id 43456, len 1500) (ttl 254, id 17435, len 56)
09:40:28.953114 dell350-12.snv.http > 10.4.0.80.2793: . 1:1449(1448) ack
147 win
 17376 <nop,nop,timestamp 102689716 1076950332> (DF) (ttl 64, id 44025,
len 1500
)
09:40:28.954114 10.16.5.8 > dell350-12.snv: icmp: 10.4.0.80 unreachable
- need t
o frag (mtu 1420) for dell350-12.snv.http > 10.4.0.80.2793: [|tcp] (DF)
(ttl 62,
 id 44025, len 1500) (ttl 254, id 18997, len 56)
09:40:32.953061 dell350-12.snv.http > 10.4.0.80.2793: . 1:1449(1448) ack
147 win
 17376 <nop,nop,timestamp 102690116 1076950332> (DF) (ttl 64, id 45089,
len 1500
)
09:40:32.954454 10.16.5.8 > dell350-12.snv: icmp: 10.4.0.80 unreachable
- need t
o frag (mtu 1420) for dell350-12.snv.http > 10.4.0.80.2793: [|tcp] (DF)
(ttl 62,
 id 45089, len 1500) (ttl 254, id 22545, len 56)
^C

However, we do see that it correcltly updated the mtu:
dell350-12# netstat -anlr | grep 10.4
10.4.0.80          63.251.224.129     UGHW        1     2736   1420
fxp0
10.4.1.55          63.251.224.129     UGHW        1      990   1500
fxp0
10.4.1.58          63.251.224.129     UGHW        7   478423   1420
fxp0

dell350-12#
And a repeated wget works fine:
dell350-12# tcpdump -vvi fxp0 host wonko.corp or host 10.16.5.8
tcpdump: listening on fxp0
09:40:59.134496 10.4.0.80.1979 > dell350-12.snv.http: S [tcp sum ok]
926911706:9
26911706(0) win 16384 <mss 1460,nop,wscale 0,nop,nop,timestamp
1076953649 0> (DF
) (ttl 61, id 38108, len 60)
09:40:59.134532 dell350-12.snv.http > 10.4.0.80.1979: S [tcp sum ok]
422570166:4
22570166(0) ack 926911707 win 16416 <mss 1460,nop,wscale
0,nop,nop,timestamp 102
692734 1076953649> (DF) (ttl 64, id 53036, len 60)
09:40:59.156451 10.4.0.80.1979 > dell350-12.snv.http: . [tcp sum ok]
1:1(0) ack
1 win 17376 <nop,nop,timestamp 1076953651 102692734> (DF) (ttl 61, id
38170, len
 52)
09:40:59.156820 10.4.0.80.1979 > dell350-12.snv.http: P 1:147(146) ack 1
win 173
76 <nop,nop,timestamp 1076953651 102692734> (DF) (ttl 61, id 38171, len
198)
09:40:59.156842 dell350-12.snv.http > 10.4.0.80.1979: . [tcp sum ok]
1:1(0) ack
147 win 16270 <nop,nop,timestamp 102692736 1076953651> (DF) (ttl 64, id
53037, l
en 52)
09:40:59.157780 dell350-12.snv.http > 10.4.0.80.1979: . 1:1369(1368) ack
147 win
 16416 <nop,nop,timestamp 102692736 1076953651> (DF) (ttl 64, id 53038,
len 1420
)
09:40:59.157799 dell350-12.snv.http > 10.4.0.80.1979: . 1369:2737(1368)
ack 147
win 16416 <nop,nop,timestamp 102692736 1076953651> (DF) (ttl 64, id
53039, len 1
420)
09:40:59.157816 dell350-12.snv.http > 10.4.0.80.1979: . 2737:4105(1368)
ack 147
win 16416 <nop,nop,timestamp 102692736 1076953651> (DF) (ttl 64, id
53040, len 1
420)
sending nothing larger than 1420 bytes.

So freeBSD is behaving in a broken way that violates the RFC, unless I'm
much mistaken.

Unfortunately, I am not a coder, so cant go poking at source to verify
or fix this. (Well, it would take me a very long time.)
Anyone care to confirm (and ideally, fix)?
I can replicate this at will, so can easily gather more data if people
want.

TIA




To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3D5A8D68.AE7B43A5>