Date: Fri, 20 Oct 2006 19:40:50 -0700 From: "Jack Vogel" <jfvogel@gmail.com> To: "Bill Paul" <wpaul@freebsd.org> Cc: freebsd-stable@freebsd.org, kris@obsecurity.org Subject: Re: em network issues Message-ID: <2a41acea0610201940g5718e12avf9bfa61bb38e777d@mail.gmail.com> In-Reply-To: <20061020234636.1BD5216A40F@hub.freebsd.org> References: <2a41acea0610201452v22f2bae9mcc0e71d2157d8bbb@mail.gmail.com> <20061020234636.1BD5216A40F@hub.freebsd.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On 10/20/06, Bill Paul <wpaul@freebsd.org> wrote: > > [...] > > > > Another thing that might be handy is improving the watchdog timeout > > > message so that it dumps the state of the ICR and ICM registers (and > > > maybe some other interesting driver and/or device state). The timeout > > > implies no interrupts were delivered for a Long Time (tm). If the > > > ICM register indicates interrupts have been masked, then that means > > > em_intr_fast() was triggered by and interrupt and it scheduled work, > > > but that work never executed. If that really is what happened, then > > > I can understand the watchdog error occuring. If that's _not_ what > > > happened, them something else is screwed up. > > > > Jesse Brandeburg just did an interesting hack for the Linux driver, I > > was considering trying to code an equivalent thing up for us. We > > have evidence that on some AMD based systems there are writebacks > > that get lost, since the TX cleanup relies on the DD being set you > > are hosed when this happens. What he did was make a cleanup > > routine that ONLY uses the head and tail pointers and NOT the done > > bit. Then, in the watchdog routine, if there is evidence of this problem > > it will switch the cleanup function pointer to this alternate clean code. > > Oho, I didn't realize the 8254x had producer/consumer indexes like this. > Hm. But the documentation for the Transmit Descriptor Head register > says: > > "Reading the transmit descriptor head to determine which buffers > have been used (and can be returned to the memory pool) is not reliable." > > There's a similar notation for the Receive Descriptor Head register. > > I wonder what's unreliable about it. > > > At least one user that was having a problem has reported this solved > > it. It may be one of the issues hitting us as well. > > Switching from testing the descriptor completion bits to using the > consumer indexes should be pretty straightforward. It's worth a shot > at any rate. > I have not yet looked at Jesse's code to see if he does anything fancy but there is one other driver that I know of on our hardware (and no its not for that so-called OS from Redmond) that has always done this so it must not be THAT unreliable. It just isnt using the full capability of the hardware, but if it works.... :) Jesse's code is supposed to be on our driver site on sourceforge, I just have been too busy to go look for it, but its public. BTW, I got a Smartbits unit in my cubicle today, got software installed and hardware almost there, not quite done yet. It sure can pump LOTS of packets though :) Will report results as I get them. Jack
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?2a41acea0610201940g5718e12avf9bfa61bb38e777d>