From owner-freebsd-current  Mon Jan 28  1:57:47 2002
Delivered-To: freebsd-current@freebsd.org
Received: from harrier.prod.itd.earthlink.net (harrier.mail.pas.earthlink.net [207.217.120.12])
	by hub.freebsd.org (Postfix) with ESMTP id 7BE3037B402
	for <freebsd-current@freebsd.org>; Mon, 28 Jan 2002 01:57:38 -0800 (PST)
Received: from pool0101.cvx22-bradley.dialup.earthlink.net ([209.179.198.101] helo=mindspring.com)
	by harrier.prod.itd.earthlink.net with esmtp (Exim 3.33 #1)
	id 16V8Xg-0003xJ-00; Mon, 28 Jan 2002 01:57:37 -0800
Message-ID: <3C55208A.7FAF3C87@mindspring.com>
Date: Mon, 28 Jan 2002 01:57:30 -0800
From: Terry Lambert <tlambert2@mindspring.com>
X-Mailer: Mozilla 4.7 [en]C-CCK-MCD {Sony}  (Win98; U)
X-Accept-Language: en
MIME-Version: 1.0
To: Shizuka Kudo <shizukakudo_99@yahoo.com>
Cc: freebsd-current@freebsd.org
Subject: Re: No buffer space available
References: <20020128034716.90219.qmail@web11408.mail.yahoo.com>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-freebsd-current@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-current.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-current>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-current>
X-Loop: FreeBSD.ORG

Shizuka Kudo wrote:
> Thanks for your response. I wonder if I misunderstand
> your advice. When looking at the if_rl.c (dated Dec
> 14), there's already a timer attached to
> ifp->if_watchdog. Is this the timer you referred to?
> If so, it looks like this timer never called by the
> driver in my case as I never saw "watchdog timeout"
> error.
> 
> Any advice?

It's not clear to me that the watchdog timer has been
initialized at the time of the problem.

The rl_txeof() function zeros it.

Are you sure you are not getting *one* watchdog reset?

One thing you might try is to put:

	rl_reset(sc);

Between the rl_rxeof() call and the rl_init() call in the
rl_watchdog() code.  I'm not positive this is the right
place for it: perhaps it -- and the rl_init()? -- should
be before the rl_txeof() call.  It is noticible that the
rl_reset() function is used everywhere else before the
rl_init() in the error recovery case, but not here, as
when you down and re-up the interface, that's what's
happens as well.


It looks like if the receive interrupt is lost, that the
watchdog doesn't cover that case, that it's specific to
the transmit interrupt.

This won't help with incoming connections initiated by
a remote side (the initial SYN of the three way handshake)
if the thing is wedged at the time, but...

One possible workaround that would cause the transmit to
fix the receive in case the receive interrupt was lost
would be to call rl_rxeof(sc) as the first thing in the
rl_txeof() routine.  That way, a lost interrupt would be
recovered when your ping packet went out by reaping the
receivable data withouyt an interupt at all (basically,
it makes it into a "poll on transmit" model, which is a
really bad model, since it fails in the case I noted,
but what the hack.  8-)).

If the problem is a race window in the receive interrupt
for the flag getting set (bad hardware, bad flag checks
in the driver, etc.), one possible workaround would be to
call the rl_rxeof() unconditionally in the interrupt, even
if you *think* the interrupt is not for the rl device (i.e.
perhaps the interrupt is sent before the RL_INTRS flag is
set in the status word, or perhaps the reading of the
status word is prone to failure).

The way to handle this is to to change the for(;;)
loop in the rl_int() function;  specifically, move the

	if((status & RL_INTRS) == 0)
		break;

To the *end* of the loop, after the check.  You may also
want to *unconditionally* call rl_rxeof(), instead of
doing the call conditionally, just to be sure (do this
only if nothing else fixes the problem for you).

What's the net effect of this?

The overall effect of doing this would be to slow down any
device that shared a PCI interrupt with the if_rl card(s)
in your system.  This is why it's not done by default.


Another possible approach is a *long* watchdog -- a second
watchdog timer.  Basically, this timer would fire and call
the rl_intr() function on the interface, as if there had
been a hardware interrupt.  You would not want to do this
more than once a second.  The tricky part here is that you
will need a wrapper function to raise and lower the SPL
over the call (I'm actually curious why the current
watchdog timer can get away with not raising the SPL to
splbio from splnet, but I suppose it's so incredibly rare
it's not a practical problem).

If none of this works, you might consider labelling the
harware as "broken", and swapping out the rl interface
(probably means swapping out the motherboard for you,
but rl 10/100 cards are US$9 at Frys, these days, so it
would be a cheap experiment, if you wanted to try it
that way, instead).

-- Terry

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-current" in the body of the message