From owner-freebsd-net@freebsd.org Fri Nov 20 15:08:11 2015 Return-Path: Delivered-To: freebsd-net@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 1D62BA3467B for ; Fri, 20 Nov 2015 15:08:11 +0000 (UTC) (envelope-from ddb@neosystem.org) Received: from mail.neosystem.cz (mail.neosystem.cz [IPv6:2001:41d0:2:5ab8::10:15]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id C7E8A1388 for ; Fri, 20 Nov 2015 15:08:10 +0000 (UTC) (envelope-from ddb@neosystem.org) Received: from mail.neosystem.cz (unknown [127.0.10.15]) by mail.neosystem.cz (Postfix) with ESMTP id 6F20EB803 for ; Fri, 20 Nov 2015 16:08:00 +0100 (CET) X-Virus-Scanned: amavisd-new at mail.neosystem.cz Received: from iron.sn.neosystem.cz (unknown [IPv6:2001:41d0:2:5ab8::100:107]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.neosystem.cz (Postfix) with ESMTPSA id 4E31AB7FD for ; Fri, 20 Nov 2015 16:07:55 +0100 (CET) Date: Fri, 20 Nov 2015 15:55:11 +0100 From: Daniel Bilik To: freebsd-net@freebsd.org Subject: Outgoing packets being sent via wrong interface Message-Id: <20151120155511.5fb0f3b07228a0c829fa223f@neosystem.org> X-Mailer: Sylpheed 3.4.3 (GTK+ 2.24.28; amd64-portbld-freebsd10.2) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 20 Nov 2015 15:08:11 -0000 Hi. (Please keep me in CC as I'm not subscribed to freebsd-net@.) A router running recent 10-stable configured like this... re0: flags=8843 metric 0 mtu 1500 options=8209b ether 90:2b:34:bb:b2:e7 inet 82.x.y.50 netmask 0xfffffff0 broadcast 82.x.y.255 nd6 options=29 media: Ethernet autoselect (100baseTX ) status: active re1: flags=8843 metric 0 mtu 1500 options=8209b ether b8:a3:86:7b:e9:9c inet 192.168.2.8 netmask 0xffffff00 broadcast 192.168.2.255 inet 192.168.2.5 netmask 0xffffff00 broadcast 192.168.2.255 inet 192.168.2.15 netmask 0xffffff00 broadcast 192.168.2.255 inet 192.168.2.1 netmask 0xffffff00 broadcast 192.168.2.255 nd6 options=29 media: Ethernet autoselect (1000baseT ) status: active ... exhibits a very weird behaviour. When rebooted, it runs (= routes traffic) with no problem. After some time (or maybe after some event, see below), it starts to think that some hosts from 192.168.2.x subnet are "outside" and sends _some_ packets destined to 192.168.2.x via re0 interface. The most absurd is the case when a jail @ 192.168.2.5 makes a tcp connection to jail @ 192.168.2.15, and the system tries to push this via re0. There is pf nat rule on re0 that updates the sender address, and packets are then dropped by firewall rules that prohibit sending packets to private subnets via public interface. This is a capture from pflog... 00:00:00.151624 rule 53..16777216/0(match): block out on re0: 82.x.y.50.59615 > 192.168.2.15.3306: Flags [.], ack 3421441707, win 1275, options [nop,nop,TS val 250304266 ecr 1531964207], length 0 00:00:00.000226 rule 53..16777216/0(match): block out on re0: 82.x.y.50.55539 > 192.168.2.15.3306: Flags [.], ack 3421441707, win 1275, options [nop,nop,TS val 250304266 ecr 1531964207], length 0 00:00:00.000143 rule 53..16777216/0(match): block out on re0: 82.x.y.50.50682 > 192.168.2.15.3306: Flags [P.], seq 2828804971:2828805095, ack 3421441707, win 1275, options [nop,nop,TS val 250304266 ecr 1531964207], length 124 ... and this is the case when jail @ 192.168.2.15 makes request to DNS server running @ 192.168.2.1... 00:00:01.784024 rule 53..16777216/0(match): block out on re0: 82.x.y.50.57878 > 192.168.2.15.12269: UDP, length 104 00:00:00.001487 rule 53..16777216/0(match): block out on re0: 82.x.y.50.65204 > 192.168.2.15.12269: UDP, length 178 00:00:00.002425 rule 53..16777216/0(match): block out on re0: 82.x.y.50.59323 > 192.168.2.15.12269: UDP, length 229 00:00:00.026260 rule 53..16777216/0(match): block out on re0: 82.x.y.50.55382 > 192.168.2.15.12269: UDP, length 120 In both cases, response packets, originally from 192.168.2.5 and 192.168.2.1, are natted to sender 82.x.y.50 and pushed out via re0. Weird is also ping result for affected address... PING 192.168.2.15 (192.168.2.15): 56 data bytes ping: sendto: Operation not permitted ping: sendto: Operation not permitted 64 bytes from 192.168.2.15: icmp_seq=2 ttl=64 time=0.027 ms ping: sendto: Operation not permitted 64 bytes from 192.168.2.15: icmp_seq=4 ttl=64 time=0.102 ms It seems like some packets go the right way, some go wrong (and are dropped by pf -> "not permitted"). Routing and ARP table entries are correct... 82.x.y.0/24 link#2 U re0 82.x.y.50 link#2 UHS lo0 192.168.2.0/24 link#3 U re1 192.168.2.1 link#3 UHS lo0 192.168.2.5 link#3 UHS lo0 192.168.2.8 link#3 UHS lo0 192.168.2.15 link#3 UHS lo0 ? (192.168.2.15) at b8:a3:86:7b:e9:9c on re1 permanent [ethernet] ? (192.168.2.5) at b8:a3:86:7b:e9:9c on re1 permanent [ethernet] ? (82.x.y.50) at 90:2b:34:bb:b2:e7 on re0 permanent [ethernet] Refreshing ARP entries (by removing them) and/or manually adding specific host routes does not help. Only reboot cures the problem. May be related: the host has a cron task that changes default route based on a connectivity check. On another host (identical software- and hardware-wise), that serves the same purpose in another locality, just without default route mangling, we've never observed this weird behaviour. Note: the host was previously running 9-stable, same configuration, including the cron task to update default route. But problems described above started just after upgrading it to 10-stable. Has anybody observed such behaviour on any 10.x system? Any hints on how to debug and/or solve the problem, when it happens? Thanks for your attention. -- Dan