From owner-freebsd-sparc64@FreeBSD.ORG  Mon Sep 29 04:23:05 2014
Return-Path: <owner-freebsd-sparc64@FreeBSD.ORG>
Delivered-To: freebsd-sparc64@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 789E1B4D
 for <freebsd-sparc64@freebsd.org>; Mon, 29 Sep 2014 04:23:05 +0000 (UTC)
Received: from h2.funkthat.com (gate2.funkthat.com [208.87.223.18])
 (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
 (Client CN "funkthat.com", Issuer "funkthat.com" (not verified))
 by mx1.freebsd.org (Postfix) with ESMTPS id 524AD767
 for <freebsd-sparc64@freebsd.org>; Mon, 29 Sep 2014 04:23:05 +0000 (UTC)
Received: from h2.funkthat.com (localhost [127.0.0.1])
 by h2.funkthat.com (8.14.3/8.14.3) with ESMTP id s8T4Mndg016588
 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
 Sun, 28 Sep 2014 21:22:50 -0700 (PDT)
 (envelope-from jmg@h2.funkthat.com)
Received: (from jmg@localhost)
 by h2.funkthat.com (8.14.3/8.14.3/Submit) id s8T4Mn8t016587;
 Sun, 28 Sep 2014 21:22:49 -0700 (PDT) (envelope-from jmg)
Date: Sun, 28 Sep 2014 21:22:49 -0700
From: John-Mark Gurney <jmg@funkthat.com>
To: Chris Ross <cross+freebsd@distal.com>
Subject: Re: FreeBSD 10-STABLE/sparc64 panic
Message-ID: <20140929042249.GK43300@funkthat.com>
Mail-Followup-To: Chris Ross <cross+freebsd@distal.com>,
 freebsd-sparc64@freebsd.org
References: <CA75738D-066D-4EDC-9018-89936EE861C6@distal.com>
 <AB5649B5-BBFB-4284-9CFF-4784D28A18F3@distal.com>
 <A9D37635-CA61-401B-BEAE-14C4F370BFD6@distal.com>
 <BC35853D-DA5E-4799-947C-4C64A0BC7D36@distal.com>
 <D9350E94-1F01-4FFD-A51E-AD8761F5C9CF@distal.com>
 <E48E7175-310B-4449-B3E1-2058F9E681D0@distal.com>
 <323A3936-DE55-459A-B8AA-CFF463922F22@distal.com>
 <7DD7D2DC-A265-40D6-9995-16ABAF79C1FB@distal.com>
 <AF5EA0E6-860B-47DF-AC5E-6A45317C6092@distal.com>
 <456226AE-0712-4510-AEF5-2053F36F2181@distal.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <456226AE-0712-4510-AEF5-2053F36F2181@distal.com>
User-Agent: Mutt/1.4.2.3i
X-Operating-System: FreeBSD 7.2-RELEASE i386
X-PGP-Fingerprint: 54BA 873B 6515 3F10 9E88  9322 9CB1 8F74 6D3F A396
X-Files: The truth is out there
X-URL: http://resnet.uoregon.edu/~gurney_j/
X-Resume: http://resnet.uoregon.edu/~gurney_j/resume.html
X-TipJar: bitcoin:13Qmb6AeTgQecazTWph4XasEsP7nGRbAPE
X-to-the-FBI-CIA-and-NSA: HI! HOW YA DOIN? can i haz chizburger?
X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.2.2
 (h2.funkthat.com [127.0.0.1]); Sun, 28 Sep 2014 21:22:50 -0700 (PDT)
Cc: freebsd-sparc64@freebsd.org
X-BeenThere: freebsd-sparc64@freebsd.org
X-Mailman-Version: 2.1.18-1
Precedence: list
List-Id: Porting FreeBSD to the Sparc <freebsd-sparc64.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-sparc64>, 
 <mailto:freebsd-sparc64-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-sparc64/>
List-Post: <mailto:freebsd-sparc64@freebsd.org>
List-Help: <mailto:freebsd-sparc64-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-sparc64>,
 <mailto:freebsd-sparc64-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 29 Sep 2014 04:23:05 -0000

Chris Ross wrote this message on Mon, Sep 29, 2014 at 00:00 -0400:
> On Jun 30, 2014, at 10:40 , Chris Ross <cross+freebsd@distal.com> wrote:
> > tl;dr : I?ve finished my testing and have a result, but see other things I
> > don?t understand.  Could use more help.
> 
>   Old thread, problem still exists.  Noticed in head around:
> 
> http://lists.freebsd.org/pipermail/freebsd-sparc64/2014-March/009261.html
> 
>   And in stable/10 as of revision 263676 (likely earlier).  As numerous people
> have tried, I have also tried, to narrow it down to a commit, or small number
> of commits, but the failure is sporadic.  I think looking at the current code which
> is still failing may be most useful.
> 
>   I am right now seeing this on stable/10 code updated today, 10.1-BETA3,
> r272264.  As noted earlier in these threads, I am running a Sun Fire v240.  At
> least one or two other folks with v240's have seen this, and I think a variant
> of SunBlade that also has bge's on it.
> 
>   Multiuser boot panics at:
> 
> Setting hostname: hostname.distal.com.
> bge0: link state changed to DOWN
> spin lock 0xc0c95330 (smp rendezvous) held by 0xfffff8000560a490 (tid 100347) too long
> timeout stopping cpus
> panic: spin lock held too long
> cpuid = 1
> KDB: stack backtrace:
> #0 0xc054a0d0 at _mtx_lock_spin_failed+0x50
> #1 0xc054a198 at _mtx_lock_spin_cookie+0xb8
> #2 0xc08b989c at tick_get_timecount_mp+0xdc
> #3 0xc056c33c at binuptime+0x3c
> #4 0xc08857ac at timercb+0x6c
> #5 0xc08b9c00 at tick_intr+0x220
> Uptime: 20s
> Automatic reboot in 15 seconds - press a key on the console to abort
> 
>   In past kernels, ones more recent than March 2014, it will sometimes
> boot [to multiuser] the first try, but usually will crash a few times, but
> eventually come all the way up.  Given 30-40 minutes, it will usually
> recover to multiuser, and is stable forever (in past testing) at that point.
> This evening, it was rebooting for about 40 minutes (11 panic and
> reboot sequences), but then came up.
> 
>   I would be happy to dig into this further, but will need some advice and
> instruction.  I fear I may not even have built the kernel with full debugging,
> but can do so.  I'll look into that now that the machine is up again.
> 
>   Please let me know what I can do to help.  Thanks.

If you could get a core dump (call doadump) that'd be good, but dumping
the stack of the tid that held the spinlock too long would be a good
start..

-- 
  John-Mark Gurney				Voice: +1 415 225 5579

     "All that I will do, has been done, All that I have, has not."