From owner-freebsd-sparc64@FreeBSD.ORG Mon Sep 29 04:23:05 2014 Return-Path: Delivered-To: freebsd-sparc64@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 789E1B4D for ; Mon, 29 Sep 2014 04:23:05 +0000 (UTC) Received: from h2.funkthat.com (gate2.funkthat.com [208.87.223.18]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client CN "funkthat.com", Issuer "funkthat.com" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 524AD767 for ; Mon, 29 Sep 2014 04:23:05 +0000 (UTC) Received: from h2.funkthat.com (localhost [127.0.0.1]) by h2.funkthat.com (8.14.3/8.14.3) with ESMTP id s8T4Mndg016588 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Sun, 28 Sep 2014 21:22:50 -0700 (PDT) (envelope-from jmg@h2.funkthat.com) Received: (from jmg@localhost) by h2.funkthat.com (8.14.3/8.14.3/Submit) id s8T4Mn8t016587; Sun, 28 Sep 2014 21:22:49 -0700 (PDT) (envelope-from jmg) Date: Sun, 28 Sep 2014 21:22:49 -0700 From: John-Mark Gurney To: Chris Ross Subject: Re: FreeBSD 10-STABLE/sparc64 panic Message-ID: <20140929042249.GK43300@funkthat.com> Mail-Followup-To: Chris Ross , freebsd-sparc64@freebsd.org References: <323A3936-DE55-459A-B8AA-CFF463922F22@distal.com> <7DD7D2DC-A265-40D6-9995-16ABAF79C1FB@distal.com> <456226AE-0712-4510-AEF5-2053F36F2181@distal.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <456226AE-0712-4510-AEF5-2053F36F2181@distal.com> User-Agent: Mutt/1.4.2.3i X-Operating-System: FreeBSD 7.2-RELEASE i386 X-PGP-Fingerprint: 54BA 873B 6515 3F10 9E88 9322 9CB1 8F74 6D3F A396 X-Files: The truth is out there X-URL: http://resnet.uoregon.edu/~gurney_j/ X-Resume: http://resnet.uoregon.edu/~gurney_j/resume.html X-TipJar: bitcoin:13Qmb6AeTgQecazTWph4XasEsP7nGRbAPE X-to-the-FBI-CIA-and-NSA: HI! HOW YA DOIN? can i haz chizburger? X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.2.2 (h2.funkthat.com [127.0.0.1]); Sun, 28 Sep 2014 21:22:50 -0700 (PDT) Cc: freebsd-sparc64@freebsd.org X-BeenThere: freebsd-sparc64@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Porting FreeBSD to the Sparc List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 29 Sep 2014 04:23:05 -0000 Chris Ross wrote this message on Mon, Sep 29, 2014 at 00:00 -0400: > On Jun 30, 2014, at 10:40 , Chris Ross wrote: > > tl;dr : I?ve finished my testing and have a result, but see other things I > > don?t understand. Could use more help. > > Old thread, problem still exists. Noticed in head around: > > http://lists.freebsd.org/pipermail/freebsd-sparc64/2014-March/009261.html > > And in stable/10 as of revision 263676 (likely earlier). As numerous people > have tried, I have also tried, to narrow it down to a commit, or small number > of commits, but the failure is sporadic. I think looking at the current code which > is still failing may be most useful. > > I am right now seeing this on stable/10 code updated today, 10.1-BETA3, > r272264. As noted earlier in these threads, I am running a Sun Fire v240. At > least one or two other folks with v240's have seen this, and I think a variant > of SunBlade that also has bge's on it. > > Multiuser boot panics at: > > Setting hostname: hostname.distal.com. > bge0: link state changed to DOWN > spin lock 0xc0c95330 (smp rendezvous) held by 0xfffff8000560a490 (tid 100347) too long > timeout stopping cpus > panic: spin lock held too long > cpuid = 1 > KDB: stack backtrace: > #0 0xc054a0d0 at _mtx_lock_spin_failed+0x50 > #1 0xc054a198 at _mtx_lock_spin_cookie+0xb8 > #2 0xc08b989c at tick_get_timecount_mp+0xdc > #3 0xc056c33c at binuptime+0x3c > #4 0xc08857ac at timercb+0x6c > #5 0xc08b9c00 at tick_intr+0x220 > Uptime: 20s > Automatic reboot in 15 seconds - press a key on the console to abort > > In past kernels, ones more recent than March 2014, it will sometimes > boot [to multiuser] the first try, but usually will crash a few times, but > eventually come all the way up. Given 30-40 minutes, it will usually > recover to multiuser, and is stable forever (in past testing) at that point. > This evening, it was rebooting for about 40 minutes (11 panic and > reboot sequences), but then came up. > > I would be happy to dig into this further, but will need some advice and > instruction. I fear I may not even have built the kernel with full debugging, > but can do so. I'll look into that now that the machine is up again. > > Please let me know what I can do to help. Thanks. If you could get a core dump (call doadump) that'd be good, but dumping the stack of the tid that held the spinlock too long would be a good start.. -- John-Mark Gurney Voice: +1 415 225 5579 "All that I will do, has been done, All that I have, has not."