Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 16 Jun 2011 22:55:31 +1000
From:      Peter Jeremy <peterjeremy@acm.org>
To:        Marius Strobl <marius@alchemy.franken.de>
Cc:        freebsd-sparc64@freebsd.org
Subject:   Re: 'make -j16 universe' gives SIReset
Message-ID:  <20110616125531.GA74096@server.vk2pj.dyndns.org>
In-Reply-To: <20110615231226.GY7064@alchemy.franken.de>
References:  <20110526234728.GA69750@server.vk2pj.dyndns.org> <20110527120659.GA78000@alchemy.franken.de> <20110601231237.GA5267@server.vk2pj.dyndns.org> <20110608224801.GB35494@alchemy.franken.de> <20110613235144.GA12470@server.vk2pj.dyndns.org> <20110614214959.GB91014@server.vk2pj.dyndns.org> <20110615231226.GY7064@alchemy.franken.de>

next in thread | previous in thread | raw e-mail | index | archive | help

--AhhlLboLdkugWU4S
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On 2011-Jun-16 01:12:26 +0200, Marius Strobl <marius@alchemy.franken.de> wr=
ote:
>This backtrace shows two things that just shouldn't happen hardware-wise:
>a) The CPU issues an stray interrupt vector. This would explain the SIRs
>   you were seeing without the patch which tries to make these non-fatal.
>b) The CPU faults on an address which is covered by an locked TLB slot.
>
>The funny thing is that the CPU then actually still manages to panic; if
>something like b) occurs I'd expect it to be in a totally unusable state.
>I'm not sure what to do about these as it still looks like broken hardware
>or a silicon bug to me but at least the public errata doesn't mention
>something like that and the OpenSolaris source doesn't seem to work
>around something like these in an obvious way either. The only thing I
>can think of is to try whether just ignoring the stray interrupt vectors
>with the below patch avoids any further issues. You'll need to revert
>sparc64_intr_vector_stray.diff for that or at least the exception.S
>part.

I guess it's possible that neither Solaris nor SunVTS is tripping over
a hardware fault but I'm not sure how to prove or disprove that.

Just in case it was something I'd done, I reverted to a completely
stock -current (slightly newer than r223035 but I don't have the exact
revision) and that gave me a SIR during "make -j32 universe".

I then changed the DCR_DTPE to DCR_SI and that also gave SIR.

I then added your original stray interrupt code and that managed to
complete a "make -j32 universe" without problem (which it's never
managed before) - I left it running that in a loop and will check
tomorrow.  It does look like the issue is sensitive to code layout.

I'll try your latest suggestions tomorrow.

--=20
Peter Jeremy

--AhhlLboLdkugWU4S
Content-Type: application/pgp-signature

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.17 (FreeBSD)

iEYEARECAAYFAk35/UMACgkQ/opHv/APuIdgigCcCYfteqPtr0GlAfb7WFgkDKb7
tpMAoMAdsw0Sf+WBRe02ZoTanSGquIbM
=mtWp
-----END PGP SIGNATURE-----

--AhhlLboLdkugWU4S--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20110616125531.GA74096>