Date: Thu, 16 Jun 2011 22:55:31 +1000 From: Peter Jeremy <peterjeremy@acm.org> To: Marius Strobl <marius@alchemy.franken.de> Cc: freebsd-sparc64@freebsd.org Subject: Re: 'make -j16 universe' gives SIReset Message-ID: <20110616125531.GA74096@server.vk2pj.dyndns.org> In-Reply-To: <20110615231226.GY7064@alchemy.franken.de> References: <20110526234728.GA69750@server.vk2pj.dyndns.org> <20110527120659.GA78000@alchemy.franken.de> <20110601231237.GA5267@server.vk2pj.dyndns.org> <20110608224801.GB35494@alchemy.franken.de> <20110613235144.GA12470@server.vk2pj.dyndns.org> <20110614214959.GB91014@server.vk2pj.dyndns.org> <20110615231226.GY7064@alchemy.franken.de>
next in thread | previous in thread | raw e-mail | index | archive | help
--AhhlLboLdkugWU4S Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On 2011-Jun-16 01:12:26 +0200, Marius Strobl <marius@alchemy.franken.de> wr= ote: >This backtrace shows two things that just shouldn't happen hardware-wise: >a) The CPU issues an stray interrupt vector. This would explain the SIRs > you were seeing without the patch which tries to make these non-fatal. >b) The CPU faults on an address which is covered by an locked TLB slot. > >The funny thing is that the CPU then actually still manages to panic; if >something like b) occurs I'd expect it to be in a totally unusable state. >I'm not sure what to do about these as it still looks like broken hardware >or a silicon bug to me but at least the public errata doesn't mention >something like that and the OpenSolaris source doesn't seem to work >around something like these in an obvious way either. The only thing I >can think of is to try whether just ignoring the stray interrupt vectors >with the below patch avoids any further issues. You'll need to revert >sparc64_intr_vector_stray.diff for that or at least the exception.S >part. I guess it's possible that neither Solaris nor SunVTS is tripping over a hardware fault but I'm not sure how to prove or disprove that. Just in case it was something I'd done, I reverted to a completely stock -current (slightly newer than r223035 but I don't have the exact revision) and that gave me a SIR during "make -j32 universe". I then changed the DCR_DTPE to DCR_SI and that also gave SIR. I then added your original stray interrupt code and that managed to complete a "make -j32 universe" without problem (which it's never managed before) - I left it running that in a loop and will check tomorrow. It does look like the issue is sensitive to code layout. I'll try your latest suggestions tomorrow. --=20 Peter Jeremy --AhhlLboLdkugWU4S Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.17 (FreeBSD) iEYEARECAAYFAk35/UMACgkQ/opHv/APuIdgigCcCYfteqPtr0GlAfb7WFgkDKb7 tpMAoMAdsw0Sf+WBRe02ZoTanSGquIbM =mtWp -----END PGP SIGNATURE----- --AhhlLboLdkugWU4S--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20110616125531.GA74096>