Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 5 Oct 2009 22:09:00 +0200
From:      Greg Byshenk <freebsd@byshenk.net>
To:        Daniel Bond <db@danielbond.org>
Cc:        FreeBSD Stable <freebsd-stable@freebsd.org>
Subject:   Re: em0 watchdog timeouts
Message-ID:  <20091005200900.GE15606@core.byshenk.net>
In-Reply-To: <57F8F331-E823-4F88-BDD5-A8B95A3B4CB6@danielbond.org>
References:  <2a41acea0909301556g1df7dbafv813f5924553c8bfb@mail.gmail.com> <4AC5198E.7030609@monkeybrains.net> <4AC51B4C.7080905@monkeybrains.net> <2a41acea0910011450v41590f3dn112f367f26faed2d@mail.gmail.com> <4AC64835.3060107@monkeybrains.net> <2a41acea0910021237w415efa2cs4354a0f99aef8f6@mail.gmail.com> <4AC66437.4040704@monkeybrains.net> <6194E9BC-3A3D-4941-A777-88C7411905B0@danielbond.org> <2a41acea0910050957x2d085e90w2ebea7f9eb87c3e4@mail.gmail.com> <57F8F331-E823-4F88-BDD5-A8B95A3B4CB6@danielbond.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, Oct 05, 2009 at 08:32:14PM +0200, Daniel Bond wrote:
 
> What I need is useful advice/help. I never stated I needed a driver  
> developer.
> 
> I'd like to be able to run my favorite OS on cool hardware, in the  
> future, for a high-performing NFS-server, without problems like I've  
> experienced the past 6months, on a production system.
> Please note that I'm managing a server-park almost completely based on  
> FreeBSD, and I'm running many NFS servers on other hardware, for other  
> services, without issues.
> 
> I've seen several other FreeBSD-users having problems with this too,  
> so I think it's of importance for the project. As I mentioned  
> originally, I'm happy to dispose the hardware to any FreeBSD developer
> that might want to look further into this. Debugging it further is  
> above my skill-set, I don't even know where to begin looking,  
> especially since I can't produce any panics.

I can give one bit of advice that helped me in a similar situation:
check you motherboards.

I run about a dozen fileservers on FreeBSD, and have always been very
happy with their performance, but some months ago I began to experience
problems with one of them.  These problems were 'watchdog timeout'
errors.  Tried all manner of things, different NICs of different types,
changing settings, etc., but nothing helped over the long term.  At 
some point, when very heavy i/o was going on to our Beowulf cluster, the
'watchdog timeouts' would begin.  What was strange is that other 
(supposedly identical) machines handled _more_ i/o without a problem.

Finally, while doing some comparisons, I realized that the motherboard
having the problem was _not_ the same as the others; it was similar, but
not identical.  I changed the motherboard and all the problems went away,
never to reappear.

I don't know if it was a specific problem with that particular
motherboard, or something about that model, but for whatever reason, it
appears that the buses just couldn't handle a RAID card and three active
NICs.


-- 
greg byshenk  -  gbyshenk@byshenk.net  -  Leiden, NL



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20091005200900.GE15606>