Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 24 Dec 2006 00:59:18 +0300
From:      Oleg Bulyzhin <oleg@FreeBSD.org>
To:        Bruce Evans <bde@zeta.org.au>
Cc:        cvs-all@FreeBSD.org, cvs-src@FreeBSD.org, Gleb Smirnoff <glebius@FreeBSD.org>, Bruce Evans <bde@FreeBSD.org>, src-committers@FreeBSD.org
Subject:   Re: cvs commit: src/sys/dev/bge if_bge.c
Message-ID:  <20061223215918.GA33627@lath.rinet.ru>
In-Reply-To: <20061222003115.R16146@delplex.bde.org>
References:  <200612201203.kBKC3MhO053666@repoman.freebsd.org> <20061220132631.GH34400@FreeBSD.org> <20061222003115.R16146@delplex.bde.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, Dec 22, 2006 at 01:24:45AM +1100, Bruce Evans wrote:
> On Wed, 20 Dec 2006, Gleb Smirnoff wrote:
> 
> >On Wed, Dec 20, 2006 at 12:03:21PM +0000, Bruce Evans wrote:
> >B> bde         2006-12-20 12:03:21 UTC
> >B>
> >B>   FreeBSD src repository
> >B>
> >B>   Modified files:
> >B>     sys/dev/bge          if_bge.c
> >B>   Log:
> >B>   In bge_txeof(), cancel the watchdog timeout if all descriptors have
> >B>   been handled instead of when at least one descriptor was just handled.
> >B>   For bge, it is normal to get a txeof when only a small fraction of the
> >B>   queued tx descriptors have been handled, so the bug broke the watchdog
> >B>   in a usual case.
> >
> >I have a suspicion that this may cause a problem under high load. Imagine
> >that thread #1 is spinning in bge_start_locked() getting packets out
> >of interface queue and putting them into TX ring. Some other threads are
> >putting the packets into interface queue while its lock is temporarily
> >relinguished be the thread #1. In the same time interrupts happen, some
> >packets are sent, but the TX ring is never got empty.
> >
> >The above scenario will cause a fake watchdog event.
> 
> bge_start_locked() starts with the bge (sc) lock held and never releases
> it as far as I can see.  This this problem can't happen (the lock
> prevents both txeof and the watchdog from being reached before start
> resets the timeout to 5 seconds).
> 
> I could only find the lock being released and reacquired in a nested
> routine in bge_rxeof() (for calling if_input()).  I hope this complication
> is never needed for start routines.
> 
> Bruce

I have to agree with Gleb here. In theory false watchdog is possible (though
it's quite unusal) and it is not lock related:
1) bge_start_locked() & bge_encap fills tx ring.
2) during next 5 seconds we do not have packets for transmit (i.e. no
   bge_start_locked() calls --> no bge_timer refreshing)
3) for any reason (don't ask me how can this happen), chip was unable to
   send whole tx ring (only part of it).
4) here we have false watchdog - chip is not wedged but bge_watchdog would
   reset it.

I'd say correct solution would be combination of yours and previous bge_txeof()
behaviour:
1) We should cancel watchdog if tx ring is empty (as you did)
2) We should not clear bge_timer inside loop (like it was before), but set it
   to 5. So watchdog is active until tx ring is empty and we will not get
   watchdog event while tx ring consumer index is moving.
   
What do you think?

-- 
Oleg.

================================================================
=== Oleg Bulyzhin -- OBUL-RIPN -- OBUL-RIPE -- oleg@rinet.ru ===
================================================================




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20061223215918.GA33627>