From owner-freebsd-stable@FreeBSD.ORG Wed Jan 31 19:39:48 2007 Return-Path: X-Original-To: freebsd-stable@freebsd.org Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 085C116A402 for ; Wed, 31 Jan 2007 19:39:48 +0000 (UTC) (envelope-from freebsd@j-davis.com) Received: from servata.com (ip-216-152-249-241.servata.com [216.152.249.241]) by mx1.freebsd.org (Postfix) with ESMTP id E814D13C442 for ; Wed, 31 Jan 2007 19:39:47 +0000 (UTC) (envelope-from freebsd@j-davis.com) Received: from [209.162.219.253] (helo=dogma.v10.wvs) by servata.com with esmtpsa (TLS-1.0:RSA_ARCFOUR_MD5:16) (Exim 4.50) id 1HCKT7-0006Vu-KZ for freebsd-stable@freebsd.org; Wed, 31 Jan 2007 10:46:05 -0800 From: Jeff Davis To: freebsd-stable@freebsd.org Content-Type: text/plain Date: Wed, 31 Jan 2007 10:46:03 -0800 Message-Id: <1170269163.22436.71.camel@dogma.v10.wvs> Mime-Version: 1.0 X-Mailer: Evolution 2.0.4 (2.0.4-7) Content-Transfer-Encoding: 7bit Subject: send() returns error even though data is sent, TCP connection still alive X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 31 Jan 2007 19:39:48 -0000 I am on FreeBSD 6.1 and I'm seeing write() return EHOSTDOWN while keeping the connection alive. I wrote a simple C client on the affected FreeBSD box to write a series of integers to a server program on another machine. When the client's write receives an the EHOSTDOWN, the data it sent arrives on the server program anyway. Moreover, when I write() again on the same socket, the data goes through as if nothing ever happened without further errors. The connection is not broken by the EHOSTDOWN, and the client never knows the difference. In fact, if the application just ignores the error from write() everything appears fine after that. The simplest way to see the problem is with SSH. Machine A is a freebsd box, and machine B is another box on the same switch. (1) ssh from A to B (2) see on A that "arp -a" shows the entry for B (3) on A do "arp -d B" (4) pull network cable (5) type to try to send data over the SSH session (of course nothing will happen, the network cable is still out) (6) after the network cable has been unplugged for about 8 seconds, plug it back in (7) type in the SSH session again You should see something like "write failed: host is down" and the session will terminate. Of course, when ssh exits, the TCP connection closes. The only way to see that it's still open and active is by writing (or using) an application that ignores EHOSTDOWN errors from write(). I think some scripting languages do not generate an exception in that case. This is very strange behavior and it's causing all kinds of problems on our network. Does anyone have an explanation for this? Why would a TCP operation return an error without closing the connection and send the data anyway? This has existed for a long time. I believe this is related to: http://www.freebsd.org/cgi/query-pr.cgi?pr=100172 which is related to: http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/netinet/if_ether.c? only_with_tag=RELENG_6#rev1.137.2.5 I tried the patch here: http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/netinet/if_ether.c? f=h#rev1.158 (rev 1.158) but I can still generate the error I mentioned. Also, what's even more strange is that I set arp to be static on the production machine, and I am still getting EHOSTDOWNs. Regards, Jeff Davis