From owner-freebsd-current Mon Jan 28 1:57:47 2002 Delivered-To: freebsd-current@freebsd.org Received: from harrier.prod.itd.earthlink.net (harrier.mail.pas.earthlink.net [207.217.120.12]) by hub.freebsd.org (Postfix) with ESMTP id 7BE3037B402 for ; Mon, 28 Jan 2002 01:57:38 -0800 (PST) Received: from pool0101.cvx22-bradley.dialup.earthlink.net ([209.179.198.101] helo=mindspring.com) by harrier.prod.itd.earthlink.net with esmtp (Exim 3.33 #1) id 16V8Xg-0003xJ-00; Mon, 28 Jan 2002 01:57:37 -0800 Message-ID: <3C55208A.7FAF3C87@mindspring.com> Date: Mon, 28 Jan 2002 01:57:30 -0800 From: Terry Lambert X-Mailer: Mozilla 4.7 [en]C-CCK-MCD {Sony} (Win98; U) X-Accept-Language: en MIME-Version: 1.0 To: Shizuka Kudo Cc: freebsd-current@freebsd.org Subject: Re: No buffer space available References: <20020128034716.90219.qmail@web11408.mail.yahoo.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-current@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG Shizuka Kudo wrote: > Thanks for your response. I wonder if I misunderstand > your advice. When looking at the if_rl.c (dated Dec > 14), there's already a timer attached to > ifp->if_watchdog. Is this the timer you referred to? > If so, it looks like this timer never called by the > driver in my case as I never saw "watchdog timeout" > error. > > Any advice? It's not clear to me that the watchdog timer has been initialized at the time of the problem. The rl_txeof() function zeros it. Are you sure you are not getting *one* watchdog reset? One thing you might try is to put: rl_reset(sc); Between the rl_rxeof() call and the rl_init() call in the rl_watchdog() code. I'm not positive this is the right place for it: perhaps it -- and the rl_init()? -- should be before the rl_txeof() call. It is noticible that the rl_reset() function is used everywhere else before the rl_init() in the error recovery case, but not here, as when you down and re-up the interface, that's what's happens as well. It looks like if the receive interrupt is lost, that the watchdog doesn't cover that case, that it's specific to the transmit interrupt. This won't help with incoming connections initiated by a remote side (the initial SYN of the three way handshake) if the thing is wedged at the time, but... One possible workaround that would cause the transmit to fix the receive in case the receive interrupt was lost would be to call rl_rxeof(sc) as the first thing in the rl_txeof() routine. That way, a lost interrupt would be recovered when your ping packet went out by reaping the receivable data withouyt an interupt at all (basically, it makes it into a "poll on transmit" model, which is a really bad model, since it fails in the case I noted, but what the hack. 8-)). If the problem is a race window in the receive interrupt for the flag getting set (bad hardware, bad flag checks in the driver, etc.), one possible workaround would be to call the rl_rxeof() unconditionally in the interrupt, even if you *think* the interrupt is not for the rl device (i.e. perhaps the interrupt is sent before the RL_INTRS flag is set in the status word, or perhaps the reading of the status word is prone to failure). The way to handle this is to to change the for(;;) loop in the rl_int() function; specifically, move the if((status & RL_INTRS) == 0) break; To the *end* of the loop, after the check. You may also want to *unconditionally* call rl_rxeof(), instead of doing the call conditionally, just to be sure (do this only if nothing else fixes the problem for you). What's the net effect of this? The overall effect of doing this would be to slow down any device that shared a PCI interrupt with the if_rl card(s) in your system. This is why it's not done by default. Another possible approach is a *long* watchdog -- a second watchdog timer. Basically, this timer would fire and call the rl_intr() function on the interface, as if there had been a hardware interrupt. You would not want to do this more than once a second. The tricky part here is that you will need a wrapper function to raise and lower the SPL over the call (I'm actually curious why the current watchdog timer can get away with not raising the SPL to splbio from splnet, but I suppose it's so incredibly rare it's not a practical problem). If none of this works, you might consider labelling the harware as "broken", and swapping out the rl interface (probably means swapping out the motherboard for you, but rl 10/100 cards are US$9 at Frys, these days, so it would be a cheap experiment, if you wanted to try it that way, instead). -- Terry To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-current" in the body of the message