From owner-freebsd-fs@FreeBSD.ORG Wed Feb 3 19:43:42 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id D53B11065670 for ; Wed, 3 Feb 2010 19:43:42 +0000 (UTC) (envelope-from bra@fsn.hu) Received: from people.fsn.hu (people.fsn.hu [195.228.252.137]) by mx1.freebsd.org (Postfix) with ESMTP id 90F258FC0C for ; Wed, 3 Feb 2010 19:43:41 +0000 (UTC) Received: by people.fsn.hu (Postfix, from userid 1001) id CE856209AE1; Wed, 3 Feb 2010 20:43:40 +0100 (CET) X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MF-ACE0E1EA [pR: 24.6762] X-CRM114-CacheID: sfid-20100203_20434_40AB70EA X-CRM114-Status: Good ( pR: 24.6762 ) Message-ID: <4B69D1EA.7020209@fsn.hu> Date: Wed, 03 Feb 2010 20:43:38 +0100 From: Attila Nagy User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.1.23) Gecko/20090817 Thunderbird/2.0.0.23 Mnenhy/0.7.6.0 MIME-Version: 1.0 To: Bob Friesenhahn References: <4B694689.2030704@fsn.hu> <4B69BD8E.5020501@fsn.hu> In-Reply-To: X-Stationery: 0.4.10 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.3 (people.fsn.hu); Wed, 03 Feb 2010 20:43:39 +0100 (CET) Cc: freebsd-fs@freebsd.org Subject: Re: Machine stops for some seconds with ZFS X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 03 Feb 2010 19:43:42 -0000 Bob Friesenhahn wrote: > Your previous description made it sound like the fan speed changes > were quite abrupt and dramatic. If this was so, then it could > indicate a software or hardware problem related to power management. > For example, some needed hardware might be temporarily shut down. It seems this misguided you (or maybe others), maybe I shouldn't put that in. I've mentioned the fans spinning and not spinning issue, because my machine works OK otherwise with FreeBSD (I know), and I know how much the fan spins in normal, idle time and how much when I use 100% CPU. And compared to that, those blackouts, when there was no activity, the fan stopped. It seems to you this means a hardware failure, for me it's a clear indicator, that even the small load, which keeps it running disappeared for that time. >> Yes, of course this is ZFS on FreeBSD, I could write that into the >> subject, but if this wasn't on FreeBSD, I would wrote to the >> OpenSolaris list... > > My point is that you are assuming that the issue is with ZFS since > 12-days after switching to it, you encountered a problem. The problem > may very well be something else such as a hardware problem, a device > driver problem, or something related to power management. I'm not assuming, I know. I use ZFS on FreeBSD since it hit the tree, so I've seen a lot of odd things. I see this on a number of other machines, the difference is that those are netbooted, and hence not everything is on ZFS, so I can do things while this happens and also can work on UFS as normal. I've already written about that pretty much ago, and others were seeing the same issue (and it seems it's still with us, as you can see, I'm not the only one noticing similar problems). I would like to help to get this sorted out, but I'm not sure how. During the freeze (on the servers, not my desktop), even an NMI can't help (writes "NMI ... going to debugger" lines the number of the CPU cores the machine has and nothing happens, only hard reset solves the issue). On those machines we use a moderate amount of NFS (30-50 Mbps), so I thought this is related to it, but I ran into the same on my desktop, which of course doesn't do NFS serving. So it's not 12 days, but about two years (or when was the ZFS code imported into -CURRENT). With heavy NFS IO I can reproduce this somewhat, but I'm not sure if anyone has the time to look into the issue. As always, remote access is granted to developers, if that helps.