From owner-freebsd-stable@FreeBSD.ORG Sat Jul 17 18:35:23 2010 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 8ED941065672 for ; Sat, 17 Jul 2010 18:35:23 +0000 (UTC) (envelope-from markus.gebert@hostpoint.ch) Received: from mail.adm.hostpoint.ch (mail.adm.hostpoint.ch [217.26.48.124]) by mx1.freebsd.org (Postfix) with ESMTP id 4A3188FC17 for ; Sat, 17 Jul 2010 18:35:22 +0000 (UTC) Received: from 77-58-137-22.dclient.hispeed.ch ([77.58.137.22]:43538 helo=[172.16.1.3]) by mail.adm.hostpoint.ch with esmtpsa (TLSv1:AES128-SHA:128) (Exim 4.69 (FreeBSD)) (envelope-from ) id 1OaCEL-0001oF-Us for freebsd-stable@freebsd.org; Sat, 17 Jul 2010 20:35:22 +0200 Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Apple Message framework v1078) From: Markus Gebert In-Reply-To: <9DCFE2F6-D7CB-49CB-8EBC-06C1E5EBB727@hostpoint.ch> Date: Sat, 17 Jul 2010 20:35:21 +0200 Content-Transfer-Encoding: quoted-printable Message-Id: References: <6B57591F-9FA2-45EB-825F-1DB025C0635D@hostpoint.ch> <201007091603.31843.jhb@freebsd.org> <08562D52-02AA-46CF-BFCD-00D0A3C4DC34@hostpoint.ch> <9DCFE2F6-D7CB-49CB-8EBC-06C1E5EBB727@hostpoint.ch> To: freebsd-stable X-Mailer: Apple Mail (2.1078) Subject: Re: 8.1-RC2 MCE caused by some LAPIC/clock changes? (was: 8.1-RC2 - PCI fatal error or MCE triggered by USB/ehci on Sun X4100M2?) X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 17 Jul 2010 18:35:23 -0000 On 13.07.2010, at 16:02, Markus Gebert wrote: > Unfortunately, I have not been able to get anything useful out the svn = commit logs, which could explain this. Maybe someone else has an idea = what could have changed between 7 and 8 to break it, and again between 8 = and CURRENT to magically fix it again. I tracked this down further. I couldn't easily downgrade my 8.1 = installation to see when the problem was introduced because the zpool = version used is 14. So I tried to figure out, when the problem was = solved in CURRENT. I started with the first possible revision that can boot off my v14 pool = (r201143, Dec 28, zfs v14 commit). With this revision, I was able to = trigger the MCE. Then I took some later revision (rev206010, Apr 1, chosen randomly), and = I couldn't reproduce the problem. I started narrowing the revisions down = until I found out, that while on r202386 I'm still able to trigger the = MCE, r202387 seems to solve the problem on CURRENT: http://svn.freebsd.org/viewvc/base?view=3Drevision&revision=3D202387 Since John Baldwin mentioned this problem could be timing related, it = seems reasonable, that a clock-related change could be fix it. But this = commit seems to have been MFC'd to 8-STABLE and 8.1 (at least as far as = I can tell) along with some other changes to amd64 specific code. I = thought that maybe these other changes that have been MFC'd could have = reintroduced the problem later on, but so far I could not reproduce the = problem with newer CURRENT revisions. So, I actually nailed this one = done to a single commit on CURRENT, but still cannot tell what the = actual difference is compared to 8-STABLE/8.1. Any ideas how to proceed? Markus=