From owner-freebsd-ppc@FreeBSD.ORG Sat Apr 3 20:48:59 2010 Return-Path: Delivered-To: freebsd-ppc@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 792EC1065716 for ; Sat, 3 Apr 2010 20:48:59 +0000 (UTC) (envelope-from nwhitehorn@freebsd.org) Received: from argol.doit.wisc.edu (argol.doit.wisc.edu [144.92.197.212]) by mx1.freebsd.org (Postfix) with ESMTP id 4B1F38FC0C for ; Sat, 3 Apr 2010 20:48:59 +0000 (UTC) MIME-version: 1.0 Content-transfer-encoding: 7BIT Content-type: text/plain; CHARSET=US-ASCII; format=flowed Received: from avs-daemon.smtpauth3.wiscmail.wisc.edu by smtpauth3.wiscmail.wisc.edu (Sun Java(tm) System Messaging Server 7u2-7.05 32bit (built Jul 30 2009)) id <0L0B00G0AJ5MWX00@smtpauth3.wiscmail.wisc.edu> for freebsd-ppc@freebsd.org; Sat, 03 Apr 2010 15:48:58 -0500 (CDT) Received: from comporellon.tachypleus.net ([unknown] [76.210.70.96]) by smtpauth3.wiscmail.wisc.edu (Sun Java(tm) System Messaging Server 7u2-7.05 32bit (built Jul 30 2009)) with ESMTPSA id <0L0B009YLJ5FLF40@smtpauth3.wiscmail.wisc.edu>; Sat, 03 Apr 2010 15:48:52 -0500 (CDT) Date: Sat, 03 Apr 2010 15:48:50 -0500 From: Nathan Whitehorn In-reply-to: <7F22E2B9-34FB-4E3B-981E-8D2EF73A4F64@dragondata.com> To: Kevin Day Message-id: <4BB7A9B2.3080901@freebsd.org> X-Spam-Report: AuthenticatedSender=yes, SenderIP=76.210.70.96 X-Spam-PmxInfo: Server=avs-10, Version=5.5.5.374460, Antispam-Engine: 2.7.1.369594, Antispam-Data: 2010.4.3.203040, SenderIP=76.210.70.96 References: <40B1BEB2-6620-4188-BB71-F8B5ED4AA234@dragondata.com> <4BB5EE68.2040504@freebsd.org> <7F22E2B9-34FB-4E3B-981E-8D2EF73A4F64@dragondata.com> User-Agent: Thunderbird 2.0.0.24 (X11/20100320) Cc: freebsd-ppc@freebsd.org Subject: Re: Xserve G4 stability (random processes crashing) X-BeenThere: freebsd-ppc@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Porting FreeBSD to the PowerPC List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 03 Apr 2010 20:48:59 -0000 Kevin Day wrote: > On Apr 2, 2010, at 8:17 AM, Nathan Whitehorn wrote: > > >> Kevin Day wrote: >> >>> Thanks to some help, we've got 8.0-STABLE running on several Xserve G4 boxes now, in both UP and SMP configurations. >>> >>> However, all of them are showing weird stability problems. Running OS X Server, they were completely stable for years doing pretty hard work (video encoding) with no errors. They all pass Apple's hardware burn-in, too. But, doing a "buildworld" or "buildkernel" will result in random segfaults, invalid .o files being created, or ICEs that go away after immediately retrying. (i.e. it doesn't appear to be data from the disks being cached incorrectly, I don't have to force a re-read to fix) Pure CPU tasks (like memtester from ports) work fine for days. >>> Are there any known issues with 8.0 on an XServe G4? >>> >>> -- Kevin >>> >>> >> Could you try rolling back from 8.0-STABLE to 8.0-RELEASE on one? I think Marcel was seeing similar G4-specific problems, and it is likely to have been something introduced recently. >> -Nathan >> > > If anything, it seems worse on -RELEASE than -STABLE. In -STABLE I was at least able to get through a buildworld with only restarting it once, and now in -RELEASE I've restarted about 10 times and still haven't made it all the way through. > > Same symptoms as before, gcc giving internal compiler errors, segfaults, or corrupt .o files being produced. Memtester (even running in parallel with buildworld) never reports any errors. I'll keep fiddling with this, but if anyone has any suggestions on where to look for some clues, it'd be appreciated. > Since you say UP kernels have the same problems, other G4 machines seem not to have issues, and SMP G5 Xserves are completely stable, that points at some G4 Xserve-specific piece of hardware. I'd guess the ATA controller. Could you try chroot to an NFS volume mounted from a known-stable machine, or a USB or Firewire disk, and trying the same things? -Nathan