From owner-freebsd-current@FreeBSD.ORG Wed Feb 20 11:35:35 2008 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 3B5D116A40E for ; Wed, 20 Feb 2008 11:35:35 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from cyrus.watson.org (cyrus.watson.org [209.31.154.42]) by mx1.freebsd.org (Postfix) with ESMTP id 0DE7B13C4EF for ; Wed, 20 Feb 2008 11:35:35 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from fledge.watson.org (fledge.watson.org [209.31.154.41]) by cyrus.watson.org (Postfix) with ESMTP id 871A846B06; Wed, 20 Feb 2008 06:35:34 -0500 (EST) Date: Wed, 20 Feb 2008 11:35:34 +0000 (GMT) From: Robert Watson X-X-Sender: robert@fledge.watson.org To: Unga In-Reply-To: <235549.36535.qm@web57008.mail.re3.yahoo.com> Message-ID: <20080220112911.W44565@fledge.watson.org> References: <235549.36535.qm@web57008.mail.re3.yahoo.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: mux@FreeBSD.org, freebsd-current@freebsd.org Subject: Re: Frequent network access freeze (in 7.0) X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 20 Feb 2008 11:35:35 -0000 On Wed, 20 Feb 2008, Unga wrote: > I'm running 7.0-PRERELEASE (RC2, dated 15/02/2008), compiled from sources on > i386 machine (512MB RAM, 3.0GHz, tx0: ). > > Network access freezes very frequently. Cannot ping to any ip address. The > only way to get networking working again is reboot. > > I'm having this problem on 7.0 ever since I tried it from BETA4. I have > reported also to this list before but sadly nobody was interested on it. > > If somebody is interested to look into this problem, I could furnish with > more detail and participate in testing. This sort of problem frequently turns out to be a bug in a device driver or a problem with interrupt probing/configuration, so my first guess would be a problem with the if_tx driver. The usual starting diagnostics when ping fails are to try to use tcpdump to determine whether it's receive or transmit failing (or both). Quiet the network between two endpoints as much as you can so you can avoid noise from making the dumps more complex, and dump arp and icmp at both endpoints. Now try to ping from each end point to the other. One potential source of confusion is that ping requires ARP to work, and ARP can be a slightly confusing protocol as it usually resolves actively (query, response) but sometimes it receives passive updates or extends existing entries. What you want to look for is a packet sent by one side that isn't received by the other. You might find, for example, that your host receives packets fine, but the packets it transmits are never received. This would be indicative of a driver bug in which it fails to properly handle (for example) transmit queues filling, and might only trigger under very high load. Or, you might find that your host never receives anything the other side transmits, but can send fine. This might be indicative of a driver bug involving the receive code, or a problem with how interrupts are being handled more generally. It looks like the last non-routine maintenance to the driver was done by Maxime in about 2003; the more recent changes have all been updates to newbus/busdma infrastructure, ifnet changes, locking changes, etc. I've CC'd him as it sounds like he may have hardware... My advice would be to do the above tests and see if you can narrow down whether it's transmit, receive, or both failing. Robert N M Watson Computer Laboratory University of Cambridge