Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 03 Feb 2010 20:43:38 +0100
From:      Attila Nagy <bra@fsn.hu>
To:        Bob Friesenhahn <bfriesen@simple.dallas.tx.us>
Cc:        freebsd-fs@freebsd.org
Subject:   Re: Machine stops for some seconds with ZFS
Message-ID:  <4B69D1EA.7020209@fsn.hu>
In-Reply-To: <alpine.GSO.2.01.1002031241270.17824@freddy.simplesystems.org>
References:  <4B694689.2030704@fsn.hu> <alpine.GSO.2.01.1002030935150.17824@freddy.simplesystems.org> <4B69BD8E.5020501@fsn.hu> <alpine.GSO.2.01.1002031241270.17824@freddy.simplesystems.org>

next in thread | previous in thread | raw e-mail | index | archive | help
Bob Friesenhahn wrote:
> Your previous description made it sound like the fan speed changes 
> were quite abrupt and dramatic.  If this was so, then it could 
> indicate a software or hardware problem related to power management. 
> For example, some needed hardware might be temporarily shut down.
It seems this misguided you (or maybe others), maybe I shouldn't put 
that in. I've mentioned the fans spinning and not spinning issue, 
because my machine works OK otherwise with FreeBSD (I know), and I know 
how much the fan spins in normal, idle time and how much when I use 100% 
CPU.
And compared to that, those blackouts, when there was no activity, the 
fan stopped.
It seems to you this means a hardware failure, for me it's a clear 
indicator, that even the small load, which keeps it running disappeared 
for that time.

>> Yes, of course this is ZFS on FreeBSD, I could write that into the 
>> subject, but if this wasn't on FreeBSD, I would wrote to the 
>> OpenSolaris list...
>
> My point is that you are assuming that the issue is with ZFS since 
> 12-days after switching to it, you encountered a problem.  The problem 
> may very well be something else such as a hardware problem, a device 
> driver problem, or something related to power management.
I'm not assuming, I know. I use ZFS on FreeBSD since it hit the tree, so 
I've seen a lot of odd things.
I see this on a number of other machines, the difference is that those 
are netbooted, and hence not everything is on ZFS, so I can do things 
while this happens and also can work on UFS as normal.

I've already written about that pretty much ago, and others were seeing 
the same issue (and it seems it's still with us, as you can see, I'm not 
the only one noticing similar problems).

I would like to help to get this sorted out, but I'm not sure how.
During the freeze (on the servers, not my desktop), even an NMI can't 
help (writes "NMI ... going to debugger" lines the number of the CPU 
cores the machine has and nothing happens, only hard reset solves the 
issue).
On those machines we use a moderate amount of NFS (30-50 Mbps), so I 
thought this is related to it, but I ran into the same on my desktop, 
which of course doesn't do NFS serving.
So it's not 12 days, but about two years (or when was the ZFS code 
imported into -CURRENT).

With heavy NFS IO I can reproduce this somewhat, but I'm not sure if 
anyone has the time to look into the issue.
As always, remote access is granted to developers, if that helps.




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4B69D1EA.7020209>