Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 2 Nov 2006 10:39:16 -0800
From:      "Jack Vogel" <jfvogel@gmail.com>
To:        "Jack Vogel" <jfvogel@gmail.com>, "Patrick M. Hausen" <hausen@punkt.de>,  freebsd-stable@freebsd.org, zenker@punkt.de
Subject:   Re: New em driver - still watchdog timeouts
Message-ID:  <2a41acea0611021039j30b054a1w1462c9cc85bd661b@mail.gmail.com>
In-Reply-To: <20061102181059.GA23733@icarus.home.lan>
References:  <20061102094332.GA15810@hugo10.ka.punkt.de> <2a41acea0611020943p9c91b6fv1e61cd9ea0082b77@mail.gmail.com> <20061102181059.GA23733@icarus.home.lan>

next in thread | previous in thread | raw e-mail | index | archive | help
On 11/2/06, Jeremy Chadwick <freebsd@jdc.parodius.com> wrote:
> On Thu, Nov 02, 2006 at 09:43:34AM -0800, Jack Vogel wrote:
> > Yes, I know this is still happening. I also have pretty good data now
> > that its a bogus problem, meaning due to scheduling issues the
> > watchdog does not get reset even though the system is just fine
> > as far as transmit descriptors is concerned. I have a patch that
> > detects this and keeps the watchdog from erroneously resetting
> > you, it has been running on my test system for days now without
> > problems.
>
> I don't understand this explanation of the problem.  Here's how I
> read this paragraph:
>
> * It's a "bogus problem" (which means there's not a problem)
> * ...due to "scheduling issues" (which means there IS a problem)
> * The watchdog does NOT get reset
> * ...but there's a patch (to fix the "bogus problem"? or what?)
> * ...which keeps the watchdog from resetting (but you just said...)
>
> Maybe you were in a hurry, I don't know.  Either way, the paragraph
> doesn't make sense.  I call for clarification!  ;-)

OK OK, so I wasnt at my most lucid :)

When I said its bogus what I mean is that the watchdog is designed to
detect and correct a certain condition, but what is really happening is
NOT THAT condition.

The watchdog gets set when there is transmit cleanup work pending,
everytime SOME progress is made on cleaning it gets restarted, if
you actually clean the WHOLE ring then you turn it off. So the idea is
it protects against transmit hangs.

So why do I say what we see is bogus... because the watchdog is
firing even though we DON'T have tx hangs or descriptor shortages.

I have a hack that rechecks the number of free descriptors in the
watchdog code and returns without resetting if we have max free.

I am still trying to figure out how this can happen in the first place
however, I'd rather do something that didnt feel quite as much a
hack :)

So, is that somewhat clearer?

Jack



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?2a41acea0611021039j30b054a1w1462c9cc85bd661b>