From owner-freebsd-net@FreeBSD.ORG Sat Apr 10 17:57:52 2004 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 6CF2616A4CE for ; Sat, 10 Apr 2004 17:57:52 -0700 (PDT) Received: from sccrmhc12.comcast.net (sccrmhc12.comcast.net [204.127.202.56]) by mx1.FreeBSD.org (Postfix) with ESMTP id 3278743D41 for ; Sat, 10 Apr 2004 17:57:52 -0700 (PDT) (envelope-from berhart@erhartgroup.com) Received: from cocaine.erhartgroup.com (c-67-166-0-138.client.comcast.net[67.166.0.138]) by comcast.net (sccrmhc12) with SMTP id <200404110057510120022ccie>; Sun, 11 Apr 2004 00:57:51 +0000 Message-Id: <6.0.2.0.2.20040410185743.01cb85a8@mx1.erhartgroup.com> X-Sender: berhart%erhartgroup.com@mx1.erhartgroup.com (Unverified) X-Mailer: QUALCOMM Windows Eudora Version 6.0.2.0 Date: Sat, 10 Apr 2004 18:57:55 -0600 To: freebsd-net@freebsd.org From: Brandon Erhart Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; format=flowed Subject: Weird behavior with either reading or write()ing !? X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 11 Apr 2004 00:57:52 -0000 Hello, This is a rather odd bug/weird behavior. Confidence is high that it is not logic in my code this time. Please read the following carefully! In a web-crawling program I am writing, I deal with several thousand fds at a time. I am using FreeBSD's KQueue to keep track of them all so that I may be notified when an event is pending on a given socket. The program works as it should for about 75% of the connections. The other 25% don't work so well. I have implemented read timeouts in the fashion that, whenever I am in the callback function for data being wait to be read off an fd (EVFILT_READ or whatever), I store the last time (via gettimeofday()) that data was read on that socket. Then, in my main loop, I check all sockets to see if the last time data was read isn't any greater than 10 seconds ago. However, I am receiving a lot of read timeouts. I keep track of the last response from the remote server, and the current state I'm in (E.G., sent another GET request on a keepalive connection). In several cases, I had received a response for the last page I requested, processed/parsed it, and sent down another request. However, data never got back to me. Even after 10 seconds. Hell, even after 30 seconds in some cases. What I am wondering is, is it possible for either my write() to be failing it's ability to get data to the remote site (I check the return value of write(), and its always returning the amount of bytes I am writing), or possibly for data to be being "dropped" per-se on my end by the kernel (no data waiting on the socket). I have all my sockets in O_NONBLOCK mode. To test the possibility of perhaps KQueue not notifying me of data waiting, or me not grabbing the event off the queue in time, I call a read() on the socket one last time when I catch the read timeout. Most of the time (99% of it) there is no data waiting. This all seems to be random. It's never consistent (same server) over several runs of the program. Any ideas folks? This has completely stumped me. Thanks for your support, Brandon