Date: Sat, 22 Oct 2011 09:17:05 +1100 From: Peter Jeremy <peterjeremy@acm.org> To: Marius Strobl <marius@alchemy.franken.de> Cc: freebsd-sparc64@freebsd.org Subject: Re: 'make -j16 universe' gives SIReset Message-ID: <20111021221705.GD45938@server.vk2pj.dyndns.org> In-Reply-To: <20111018172718.GT39118@alchemy.franken.de> References: <20110830152725.GA28552@alchemy.franken.de> <20110831212458.GA25926@server.vk2pj.dyndns.org> <20110902153206.GR40781@alchemy.franken.de> <20111006120411.GA903@alchemy.franken.de> <20111011030529.GA4093@server.vk2pj.dyndns.org> <20111011205543.GA81376@alchemy.franken.de> <20111013035648.GA54190@server.vk2pj.dyndns.org> <20111013184224.GG39118@alchemy.franken.de> <20111018042646.GA18863@server.vk2pj.dyndns.org> <20111018172718.GT39118@alchemy.franken.de>
next in thread | previous in thread | raw e-mail | index | archive | help
--Kj7319i9nmIyA2yE Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On 2011-Oct-18 19:27:18 +0200, Marius Strobl <marius@alchemy.franken.de> wr= ote: >On Tue, Oct 18, 2011 at 03:26:46PM +1100, Peter Jeremy wrote: >> On 2011-Oct-13 20:42:25 +0200, Marius Strobl <marius@alchemy.franken.de>= wrote: >> >On Thu, Oct 13, 2011 at 02:56:48PM +1100, Peter Jeremy wrote: >> >> Unfortunately, I can't get a crashdump because dumpon(8) doesn't like >> >> my Solaris swap partitions: >> >> GEOM_PART: Partition 'da0b' not suitable for kernel dumps (wrong type= ?) >> >> GEOM_PART: Partition 'da6b' not suitable for kernel dumps (wrong type= ?) >> >> No suitable dump device was found. >> >>=20 >> >> I did write a patch for that but took it out during some earlier >> >> testing to get back to stock code. It looks like I didn't PR it >> >> either so I will do that when I get some time. >>=20 >> I've resurrected that patch (and will send-pr it later). Thanks for committing it. >Hrm, AFAICT this would mean that the _mtx_obtain_lock(), which boils >down to a atomic_cmpset_acq_ptr(), in _mtx_trylock() didn't work as >expected, I currently can't think of a good reason why that could >happen though. The assembly generated for that code also looks just >fine. Have you run the workload which is triggering this before? It >would be interesting to know whether it also happens with SCHED_4BSD >with current sources, pre-r226054 and pre-r225889 if the machine >previously survived that load. It was running 6 parallel -j16 buildworlds. I switched to SCHED_4BSD and haven't been able to reproduce it - even with a pile of added "sysctl sysctl vm.vmtotal". I haven't tried rolling back to an earlier kernel. >Have you enabled PREEMPTION by chance? That was using GENERIC and only changing the scheduler. >The other thing that worries me is that it could be a silicon bug, >especially since that machine also has that issue of issuing stale >vector interrupts along with a state in which it traps even on >locked TLB entries, which isn't mentioned in the public erratum ... I've had a rummage around in the OpenSolaris sources and nothing jumps out at me. (Actually, I can't find any special case code that looks like it addresses silicon bugs in Jaguar). One other thing is that I'm getting lots of isp watchdog timeouts: (da4:isp0:0:4:0): first watchdog (handle 0x5cf020f3) timed out- deferring f= or grace period (da4:isp0:0:4:0): first watchdog (handle 0x5cf1206d) timed out- deferring f= or grace period (da4:isp0:0:4:0): first watchdog (handle 0x5cf2203a) timed out- deferring f= or grace period isp0: isp_watchdog: timeout for handle 0x5cad2046 (da4:isp0:0:4:0): FIN dl16384 resid 0 CDB=3D0x2a 0x00 0x0f 0xdd 0xe8 0xe0 0= x00 0x00 0x20 0x00 STS 0x0 XS_ERR=3D0xb isp0: bad request handle 0x5cad2046 (iocb type 0x3) isp0: isp_watchdog: timeout for handle 0x5cdb20cb (da4:isp0:0:4:0): FIN dl16384 resid 0 CDB=3D0x2a 0x00 0x0f 0xe3 0xa8 0x00 0= x00 0x00 0x20 0x00 STS 0x0 XS_ERR=3D0xb isp0: isp_watchdog: timeout for handle 0x5cdc2059 (da4:isp0:0:4:0): FIN dl16384 resid 0 CDB=3D0x2a 0x00 0x0f 0xe3 0xa8 0x20 0= x00 0x00 0x20 0x00 STS 0x0 XS_ERR=3D0xb isp0: isp_watchdog: timeout for handle 0x5cdd2020 (da4:isp0:0:4:0): FIN dl16384 resid 0 CDB=3D0x2a 0x00 0x0f 0xe3 0xa8 0x40 0= x00 0x00 0x20 0x00 STS 0x0 XS_ERR=3D0xb isp0: bad request handle 0x5cdb20cb (iocb type 0x3) isp0: bad request handle 0x5cdc2059 (iocb type 0x3) isp0: bad request handle 0x5cdd2020 (iocb type 0x3) (da4:isp0:0:4:0): first watchdog (handle 0x6b9520bb) timed out- deferring f= or grace period (da4:isp0:0:4:0): first watchdog (handle 0x6b96200e) timed out- deferring f= or grace period Any ideas on that? --=20 Peter Jeremy --Kj7319i9nmIyA2yE Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.18 (FreeBSD) iEYEARECAAYFAk6h72EACgkQ/opHv/APuId+4QCeOZF5pKFYCK8YNDvtgW8cqvkx 7HMAniAXehip+/skW2wTqX7/18FkvXlc =91W+ -----END PGP SIGNATURE----- --Kj7319i9nmIyA2yE--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20111021221705.GD45938>