From owner-freebsd-stable@FreeBSD.ORG Thu Nov 2 18:39:26 2006 Return-Path: X-Original-To: freebsd-stable@freebsd.org Delivered-To: freebsd-stable@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 1C2E716A54C for ; Thu, 2 Nov 2006 18:39:26 +0000 (UTC) (envelope-from jfvogel@gmail.com) Received: from py-out-1112.google.com (py-out-1112.google.com [64.233.166.183]) by mx1.FreeBSD.org (Postfix) with ESMTP id DCB9543D7C for ; Thu, 2 Nov 2006 18:39:17 +0000 (GMT) (envelope-from jfvogel@gmail.com) Received: by py-out-1112.google.com with SMTP id z59so143475pyg for ; Thu, 02 Nov 2006 10:39:17 -0800 (PST) DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:message-id:date:from:to:subject:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; b=K6r3ta8fGnUrD6WQHayWhuO36xuP3B+9gB1yU0le+CK+XoZf6u1boT8J6X2VpNI3u6B5/4jJ4iLoPW/N/9gxFgjhGhJbQWuj6mfHOmevPKvSiJgBSTPg5SMLhysC8S/zJLcOAnDfISUooCU4dWgcOnZJvdaprXN3pSoSxheNI5g= Received: by 10.35.107.20 with SMTP id j20mr1420070pym.1162492756859; Thu, 02 Nov 2006 10:39:16 -0800 (PST) Received: by 10.35.118.6 with HTTP; Thu, 2 Nov 2006 10:39:16 -0800 (PST) Message-ID: <2a41acea0611021039j30b054a1w1462c9cc85bd661b@mail.gmail.com> Date: Thu, 2 Nov 2006 10:39:16 -0800 From: "Jack Vogel" To: "Jack Vogel" , "Patrick M. Hausen" , freebsd-stable@freebsd.org, zenker@punkt.de In-Reply-To: <20061102181059.GA23733@icarus.home.lan> MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Content-Disposition: inline References: <20061102094332.GA15810@hugo10.ka.punkt.de> <2a41acea0611020943p9c91b6fv1e61cd9ea0082b77@mail.gmail.com> <20061102181059.GA23733@icarus.home.lan> Cc: Subject: Re: New em driver - still watchdog timeouts X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 02 Nov 2006 18:39:26 -0000 On 11/2/06, Jeremy Chadwick wrote: > On Thu, Nov 02, 2006 at 09:43:34AM -0800, Jack Vogel wrote: > > Yes, I know this is still happening. I also have pretty good data now > > that its a bogus problem, meaning due to scheduling issues the > > watchdog does not get reset even though the system is just fine > > as far as transmit descriptors is concerned. I have a patch that > > detects this and keeps the watchdog from erroneously resetting > > you, it has been running on my test system for days now without > > problems. > > I don't understand this explanation of the problem. Here's how I > read this paragraph: > > * It's a "bogus problem" (which means there's not a problem) > * ...due to "scheduling issues" (which means there IS a problem) > * The watchdog does NOT get reset > * ...but there's a patch (to fix the "bogus problem"? or what?) > * ...which keeps the watchdog from resetting (but you just said...) > > Maybe you were in a hurry, I don't know. Either way, the paragraph > doesn't make sense. I call for clarification! ;-) OK OK, so I wasnt at my most lucid :) When I said its bogus what I mean is that the watchdog is designed to detect and correct a certain condition, but what is really happening is NOT THAT condition. The watchdog gets set when there is transmit cleanup work pending, everytime SOME progress is made on cleaning it gets restarted, if you actually clean the WHOLE ring then you turn it off. So the idea is it protects against transmit hangs. So why do I say what we see is bogus... because the watchdog is firing even though we DON'T have tx hangs or descriptor shortages. I have a hack that rechecks the number of free descriptors in the watchdog code and returns without resetting if we have max free. I am still trying to figure out how this can happen in the first place however, I'd rather do something that didnt feel quite as much a hack :) So, is that somewhat clearer? Jack