From owner-freebsd-stable  Wed Aug  1 11:36:28 2001
Delivered-To: freebsd-stable@freebsd.org
Received: from aussie.org (hallam.lnk.telstra.net [139.130.54.166])
	by hub.freebsd.org (Postfix) with ESMTP id 1C24737B401
	for <freebsd-stable@freebsd.org>; Wed,  1 Aug 2001 11:36:18 -0700 (PDT)
	(envelope-from mlnn4@oaks.com.au)
Received: from dualp2 (dualp2 [203.29.75.73])
	by aussie.org (8.11.3/8.11.4) with SMTP id f71IaFZ01795
	for <freebsd-stable@freebsd.org>; Thu, 2 Aug 2001 04:36:15 +1000 (EST)
	(envelope-from mlnn4@oaks.com.au)
Message-Id: <200108011836.f71IaFZ01795@aussie.org>
From: "Chris" <mlnn4@oaks.com.au>
To: "freebsd-stable@freebsd.org" <freebsd-stable@freebsd.org>
Date: Thu, 02 Aug 2001 04:34:29 +1000
Reply-To: "Chris" <mlnn4@oaks.com.au>
X-Mailer: PMMail 98 Standard (2.01.1600) For Windows NT (5.0.2195;2)
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Subject: kernel upgrade causing truncated IPSEC packets [followup]
Sender: owner-freebsd-stable@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-stable.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-stable>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-stable>
X-Loop: FreeBSD.ORG

This is a followup to a message I posted in -net on the 29th of July. I'm
hoping that someone will at least be able to confirm that they also see
the problem.

As a quick summary, when I upgraded a few boxes to a recent kernel (their
old ones were from mid-April), our IPSEC VPN took a dive, and I'm trying
to find out why. I see the below problem on multiple machines.

The actual cause seems to be that the IPSEC-encapsulated packets are being
truncated before they get to the PPP process, even though tcpdump and netstat
on the tunnel device indicate that the full packet is being passed through.

Additionally, I have now noticed that the truncation will happen even in 
transport mode, if the outgoing packets are large enough (I originally
thought it was confined to tunnel mode).

For example, if I ping a remote address with which I have a transport mode 
session established using a payload of 204 bytes, I get a reply. If I ping it 
with a payload of 205 bytes or more, the truncation occurs and I do not get a 
reply.

You can see this clearly in the below output which is obtained from the PPP 
process by using the 'set log local Async' option.

Result of 'ping -s 204 -c 1 210.XXX.XXX.XXX'

  Async: Write
  Async:  7e 21 45 00 01 20 0b 88 00 00 40 33 XX ee d1 XX
  Async:  XX XX d2 XX XX XX 32 04 00 00 08 de 67 5e 00 00
  Async:  00 20 67 8e 44 39 6a 40 92 47 bf 62 c3 4f 0f 88
  Async:  f7 1b 00 00 00 20 a1 b4 83 d6 1b b3 ac 60 27 65
  Async:  3d 3a c3 af 80 5a 54 e0 1c 7d 5d 37 aa f0 ec 9c
  Async:  0f 9f e5 02 0a 8b 39 05 ea 8a 7d 5e 90 ca 50 60
  Async:  a5 80 a4 e2 85 f6 9a a0 47 32 19 c1 46 f7 f0 46
  Async:  8f 10 3a 49 dd 4d 21 32 61 7b 35 03 ee 71 68 75
  Async:  26 7a fd 18 d6 4e 1b 34 85 f9 bd 53 00 a2 8c ed
  Async:  3a 6e 8e 98 96 7d 33 39 37 06 5a 7b 9a a6 32 23
  Async:  ca f6 53 2c 56 f1 f3 43 02 43 2f 83 8a a1 b7 46
  Async:  4d 71 db 7d 5d a8 97 db 9f aa 8c 72 10 eb 58 77
  Async:  eb 4b 1f d2 a4 88 f9 77 e5 7a 3b 95 00 70 f2 7d
  Async:  5d ee 79 69 14 eb 78 ff ae 4f c4 b4 d7 b8 6e 65
  Async:  0d a6 0c 4a 1e 2b b4 b3 56 76 b1 28 82 de 6d c5
  Async:  7f 1f 3c 43 58 58 e3 6b 90 c0 e2 6e 86 6b 61 b6
  Async:  7a 93 8b d6 81 ff 60 fc 23 2a a0 c1 74 b2 a7 21
  Async:  fd c8 50 c0 4a 47 9f 2c cc 41 f0 95 a2 90 ca 7c
  Async:  98 51 70 c7 e4 19 7a 43 9e 7e
  Async: Read
  Async:  7e 21 45 00 01 20 c1 57 00 00 3b 33 XX 1e d2 XX
  Async:  XX XX d1 XX XX XX 32 04 00 00 09 4b e8 da 00 00
  Async:  00 16 7d 5d 5c 3d 9f f4 1a 23 28 73 53 f6 55 06

  [rest of reply snipped]

As you can see, the entire packet went out and I got a reply. The IP
header indicates that the proper packet length is 0x120 (288) bytes,
and 298 were sent (the rest being PPP overhead).

Result of 'ping -s 205 -c 1 210.XXX.XXX.XXX'

  Async: Write
  Async:  7e 21 45 00 01 20 0b d9 00 00 40 33 XX 9d d1 XX
  Async:  XX XX d2 XX XX XX 32 04 00 00 08 de 67 5e 00 00
  Async:  00 21 62 8e 15 84 58 c8 4f 64 8e f4 d2 b2 0f 88
  Async:  f7 1b 00 00 00 21 60 2c ea 2a a2 68 07 74 01 23
  Async:  7e

  [This is all that there was]

By adding one byte to the size of the output packet, the IPSEC transport
session now fails, with the above packet being truncated. I cannot get a
sucessful transmission of any larger packet, either. As you can see from
the above IP header, 0x0120 bytes should have been in the packet (identi-
cal to the previous sucessful example due to padding within the IPSEC
code) but only 65 were sent.

Both tcpdump on tun0 and 'netstat -bI tun0' indicate that as far as the
kernel is concerned, the full packet went out, even though it did not.

FWIW here's the more verbose debug output for the second instance above -

  tun0: TCP/IP: OUT AH: 209.XXX.XXX.XXX ---> 210.XXX.XXX.XXX, spi 0xbfbff170
  tun0: Debug: m_enqueue: len = 1
  tun0: Debug: m_dequeue: queue len = 1
  tun0: Debug: proto_LayerPush: Using 0x0021
  tun0: HDLC: hdlc_Output
  tun0: HDLC:  21 45 00 01 20 1c 8b 00 00 40 33 XX eb d1 XX XX 
  tun0: HDLC:  XX d2 XX XX XX 32 04 00 00 09 c5 50 46 00 00 00 
  tun0: HDLC:  07 30 c8 2d d4 51 83 bb 6e ee e3 d5 f3 07 0c 2b 
  tun0: HDLC:  1f 00 00 00 07 b3 7f 42 e4 74 cd db c9 98 ca    
  tun0: Async: Write
  tun0: Async:  7e 21 45 00 01 20 1c 8b 00 00 40 33 XX eb d1 XX
  tun0: Async:  XX XX d2 08 a2 04 32 04 00 00 09 c5 50 46 00 00
  tun0: Async:  00 07 30 c8 2d d4 51 83 bb 6e ee e3 d5 f3 07 0c
  tun0: Async:  2b 1f 00 00 00 07 b3 7f 42 e4 74 cd db c9 98 ca
  tun0: Async:  7e
  tun0: Debug: link_PushPacket: Transmit proto 0x0021
  tun0: Debug: m_enqueue: len = 1
  tun0: Debug: m_dequeue: queue len = 1
  tun0: Debug: link_Dequeue: Dequeued from queue 0, containing 0 more packets
  tun0: Physical: write
  tun0: Physical:  7e 21 45 00 01 20 1c 8b 00 00 40 33 XX eb d1 XX
  tun0: Physical:  XX XX d2 XX XX XX 32 04 00 00 09 c5 50 46 00 00
  tun0: Physical:  00 07 30 c8 2d d4 51 83 bb 6e ee e3 d5 f3 07 0c
  tun0: Physical:  2b 1f 00 00 00 07 b3 7f 42 e4 74 cd db c9 98 ca
  tun0: Physical:  7e
  tun0: Debug: deflink: DescriptorWrite: wrote 65(65) to 2

Which shows that as far as the PPP process is concerned, there were only
65 bytes to write (including PPP overhead), despite the kernel thinking
otherwise.

I am running the most recent PPP and a 4.3-STABLE kernel that was cvsupped
on the 17th of July. A kernel built today shows basically identical behaviour.
Several of the machines on the VPN do not use modems and are unaffected by
the problem.

Can anyone confirm my findings or offer suggestions ?

                     ----------------------------

Topology (IP's are illustrative) -

  o Local LAN has machine 'A' at 10.0.58.2/24
  o A's default gateway is the FBSD box 'B' at 10.0.58.1/24
  o B is dialled up using ppp to routable address 1.2.3.4
  o Central gateway 'C' is on the net at 5.6.7.8
  o C has an interface hosting a local LAN at 10.0.48.1/24

I use IPSEC AH/ESP transport mode between 'B' and 'C', and have set up a
native IPSEC tunnel (not using gif) between 10.0.58.0/24 and 10.0.48.0/24.

This has been in place and working for a good part of a year. Since I put
in the new kernel, the tunnel between A and C fails completely, and the
tunnel and transport mode between B and C is intermittent (depending on
the size of the packets). Swapping back to the April kernel made the
problem go away so I do not expect the problem is in the PPP process
per se.

-- Chris


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-stable" in the body of the message