From owner-freebsd-net@FreeBSD.ORG Fri Feb 10 22:19:19 2006 Return-Path: X-Original-To: net@FreeBSD.org Delivered-To: freebsd-net@FreeBSD.ORG Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 1496916A420; Fri, 10 Feb 2006 22:19:19 +0000 (GMT) (envelope-from ru@ip.net.ua) Received: from tigra.ip.net.ua (tigra.ip.net.ua [82.193.96.10]) by mx1.FreeBSD.org (Postfix) with ESMTP id 4759443D45; Fri, 10 Feb 2006 22:19:17 +0000 (GMT) (envelope-from ru@ip.net.ua) Received: from localhost (rocky.ip.net.ua [82.193.96.2]) by tigra.ip.net.ua (8.12.11/8.12.11) with ESMTP id k1AMJGte066362; Sat, 11 Feb 2006 00:19:16 +0200 (EET) (envelope-from ru@ip.net.ua) Received: from tigra.ip.net.ua ([82.193.96.10]) by localhost (rocky.ip.net.ua [82.193.96.2]) (amavisd-new, port 10024) with LMTP id 97555-03; Sat, 11 Feb 2006 00:19:11 +0200 (EET) Received: from heffalump.ip.net.ua (heffalump.ip.net.ua [82.193.96.213]) by tigra.ip.net.ua (8.12.11/8.12.11) with ESMTP id k1AMEEa7066289 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Sat, 11 Feb 2006 00:14:14 +0200 (EET) (envelope-from ru@ip.net.ua) Received: (from ru@localhost) by heffalump.ip.net.ua (8.13.4/8.13.4) id k1AMEFOM041979; Sat, 11 Feb 2006 00:14:15 +0200 (EET) (envelope-from ru) Date: Sat, 11 Feb 2006 00:14:15 +0200 From: Ruslan Ermilov To: Andre Oppermann Message-ID: <20060210221415.GC33588@ip.net.ua> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="St7VIuEGZ6dlpu13" Content-Disposition: inline User-Agent: Mutt/1.5.11 X-Virus-Scanned: amavisd-new at ip.net.ua Cc: net@FreeBSD.org Subject: [FIX] dummynet breaks IP reassembly X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 10 Feb 2006 22:19:19 -0000 --St7VIuEGZ6dlpu13 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Hi Andre, If I'm not mistaken, *you* converted ipfw to use pfil(9). During the conversion, the following bug was introduced. When forwarding fragmented packets through a dummynet pipe (ip_input -> ip_forward -> ip_output -> pipe -> ip_output) the last ip_output() in the chain that does the actual IP delivery sets ip_id of all fragments to different values, making it impossible to reassemble the packet at receive side. Example of forwarding a fragmented IP datagram from em0 to xl0: # tcpdump -nvi em0 net 192.168.2 tcpdump: listening on em0, link-type EN10MB (Ethernet), capture size 96 byt= es 00:04:35.170994 IP (tos 0x0, ttl 64, id 2186, offset 0, flags [+], proto: = ICMP (1), length: 1500) 192.168.1.3 > 192.168.2.1: ICMP echo request, id 28= 19, seq 0, length 1480 00:04:35.171242 IP (tos 0x0, ttl 64, id 2186, offset 1480, flags [none], p= roto: ICMP (1), length: 548) 192.168.1.3 > 192.168.2.1: icmp # tcpdump -nvi xl0 net 192.168.2 tcpdump: listening on xl0, link-type EN10MB (Ethernet), capture size 96 byt= es 00:04:35.171028 IP (tos 0x0, ttl 63, id 10560, offset 0, flags [+], proto:= ICMP (1), length: 1500) 192.168.1.3 > 192.168.2.1: ICMP echo request, id 2= 819, seq 0, length 1480 00:04:35.171266 IP (tos 0x0, ttl 63, id 10561, offset 1480, flags [none], = proto: ICMP (1), length: 548) 192.168.1.3 > 192.168.2.1: icmp Note that IDs were rewritten from the original 2186 and worse, they are different. In 4,x, the "flags" field (see the patch below) was used to keep the IP_FORWARDING bit passed to ip_output() by ip_forward(). This bit was kept in the dummynet packet tag (in the "flags" field) and later passed to the second ip_output() call, causing the latter to NOT touch the IP header again. This feature was lost, resulting in a bug. Since dummynet never calls ip_output() with unfilled header, it's safe to always call ip_output() from dummynet with the IP_FORWARDING bit set, to indicate it's forwarded =66rom another ip_output() and so it shouldn't attempt to modify the header. Surprisingly, it "seemed" to work for packets exceeding MTU and originating from the dummynetting host, mainly because the packet wasn't fragmented while in dummynet. Yet, the ip_id field was always incremented in my tests (pipe with no bandwidth limitation), comparing it before and after the dummynet processing. Now, the ip_id is always preserved. # tcpdump -nvi em0 net 192.168.2 tcpdump: listening on em0, link-type EN10MB (Ethernet), capture size 96 byt= es 00:12:04.654669 IP (tos 0x0, ttl 64, id 2192, offset 0, flags [+], proto: = ICMP (1), length: 1500) 192.168.1.3 > 192.168.2.1: ICMP echo request, id 79= 39, seq 0, length 1480 00:12:04.654917 IP (tos 0x0, ttl 64, id 2192, offset 1480, flags [none], p= roto: ICMP (1), length: 548) 192.168.1.3 > 192.168.2.1: icmp # tcpdump -nvi xl0 net 192.168.2 tcpdump: listening on xl0, link-type EN10MB (Ethernet), capture size 96 byt= es 00:12:04.654703 IP (tos 0x0, ttl 63, id 2192, offset 0, flags [+], proto: = ICMP (1), length: 1500) 192.168.1.3 > 192.168.2.1: ICMP echo request, id 79= 39, seq 0, length 1480 00:12:04.654939 IP (tos 0x0, ttl 63, id 2192, offset 1480, flags [none], p= roto: ICMP (1), length: 548) 192.168.1.3 > 192.168.2.1: icmp (Note the ip_id is preserved when forwarding.) I'm not sure about the IPv6 portion but it's consistent with the normal forwarding path so I believe it's correct. Comments? %%% Index: ip_dummynet.c =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D RCS file: /home/ncvs/src/sys/netinet/ip_dummynet.c,v retrieving revision 1.98 diff -u -p -r1.98 ip_dummynet.c --- ip_dummynet.c 3 Feb 2006 11:38:19 -0000 1.98 +++ ip_dummynet.c 10 Feb 2006 21:20:59 -0000 @@ -769,7 +769,7 @@ dummynet_send(struct mbuf *m) pkt =3D dn_tag_get(m); switch (pkt->dn_dir) { case DN_TO_IP_OUT: - ip_output(m, NULL, NULL, pkt->flags, NULL, NULL); + ip_output(m, NULL, NULL, IP_FORWARDING, NULL, NULL); break ; case DN_TO_IP_IN : ip =3D mtod(m, struct ip *); @@ -783,7 +783,7 @@ dummynet_send(struct mbuf *m) break; =20 case DN_TO_IP6_OUT: - ip6_output(m, NULL, NULL, pkt->flags, NULL, NULL, NULL); + ip6_output(m, NULL, NULL, IPV6_FORWARDING, NULL, NULL, NULL); break; #endif case DN_TO_IFB_FWD: @@ -1129,7 +1129,6 @@ locate_pipe(int pipe_nr) * ifp the 'ifp' parameter from the caller. * NULL in ip_input, destination interface in ip_output, * rule matching rule, in case of multiple passes - * flags flags from the caller, only used in ip_output * */ static int @@ -1213,8 +1212,6 @@ dummynet_io(struct mbuf *m, int dir, str pkt->dn_dir =3D dir ; =20 pkt->ifp =3D fwa->oif; - if (dir =3D=3D DN_TO_IP_OUT || dir =3D=3D DN_TO_IP6_OUT) - pkt->flags =3D fwa->flags; =20 if (q->head =3D=3D NULL) q->head =3D m; Index: ip_dummynet.h =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D RCS file: /home/ncvs/src/sys/netinet/ip_dummynet.h,v retrieving revision 1.38 diff -u -p -r1.38 ip_dummynet.h --- ip_dummynet.h 29 Nov 2005 00:11:01 -0000 1.38 +++ ip_dummynet.h 10 Feb 2006 21:13:45 -0000 @@ -130,7 +130,6 @@ struct dn_pkt_tag { =20 dn_key output_time; /* when the pkt is due for delivery */ struct ifnet *ifp; /* interface, for ip_output */ - int flags ; /* flags, for ip_output (IPv6 ?) */ struct _ip6dn_args ip6opt; /* XXX ipv6 options */ }; #endif /* _KERNEL */ Index: ip_fw.h =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D RCS file: /home/ncvs/src/sys/netinet/ip_fw.h,v retrieving revision 1.103 diff -u -p -r1.103 ip_fw.h --- ip_fw.h 13 Dec 2005 12:16:02 -0000 1.103 +++ ip_fw.h 10 Feb 2006 21:15:41 -0000 @@ -510,8 +510,6 @@ struct ip_fw_args { struct ip_fw *rule; /* matching rule */ struct ether_header *eh; /* for bridged packets */ =20 - int flags; /* for dummynet */ - struct ipfw_flow_id f_id; /* grabbed from IP header */ u_int32_t cookie; /* a cookie depending on rule action */ struct inpcb *inp; %%% Cheers, --=20 Ruslan Ermilov ru@FreeBSD.org FreeBSD committer --St7VIuEGZ6dlpu13 Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2 (FreeBSD) iD8DBQFD7RA3qRfpzJluFF4RAixfAJ9hPuJK5tiSZsXGuRzCIi/WsH9jrQCcCkAe cgPUE8BgRuyhmE8MvRF3FXc= =+aWq -----END PGP SIGNATURE----- --St7VIuEGZ6dlpu13--