From owner-freebsd-questions@FreeBSD.ORG Wed Jul 11 17:34:04 2007 Return-Path: X-Original-To: freebsd-questions@freebsd.org Delivered-To: freebsd-questions@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 5B4FD16A469 for ; Wed, 11 Jul 2007 17:34:04 +0000 (UTC) (envelope-from noc@hdk5.net) Received: from guam10.hdk5.net (guam10.hdk5.net [66.180.132.235]) by mx1.freebsd.org (Postfix) with ESMTP id 0D1F413C44B for ; Wed, 11 Jul 2007 17:34:04 +0000 (UTC) (envelope-from noc@hdk5.net) Received: from [192.168.1.29] (unknown [66.180.149.18]) by guam10.hdk5.net (Postfix) with ESMTP id CD3C45C1F; Wed, 11 Jul 2007 07:34:02 -1000 (HST) Message-ID: <4695148A.9010200@hdk5.net> Date: Wed, 11 Jul 2007 07:34:02 -1000 From: NetOpsCenter User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.13) Gecko/20060417 FreeBSD/i386 X-Accept-Language: en-us, en MIME-Version: 1.0 To: Feargal Reilly , FreeBSD Mailing List References: <20070711182300.4634e278@mablung.edhellond.fbi.ie> In-Reply-To: <20070711182300.4634e278@mablung.edhellond.fbi.ie> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Cc: Subject: Re: Complete hang during boot at boot2 prompt X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 11 Jul 2007 17:34:04 -0000 Feargal Reilly wrote: >Hi, > >I have a server which went down overnight, and >would not subsequently boot. A reboot was performed by >facilities staff before I got to look at it so I don't know what >was showing on the console. The reason for the outage is >unknown, and nothing showed in /var/log/messages, other than >routine ntpd time sync messages. > >The server in question is a Intel SR1425BK1 server running >FreeBSD 6.2 amd64 GENERIC with a SATA RAID-1 array >provided by an onboard LSILogic MegaRAID controller. > >When booted, it would pass the various BIOS screens without >problem, the RAID utility would say that the array was optimal, >and then FreeBSD would start to boot, but it couldn't get past >boot2: > > > >>>FreeBSD/amd64 BOOT >>> >>> >Default: 0:ad(0,a)/boot/loader >boot: > >At this point, the server emitted a single continous beep, and >nothing else happened. Keyboard input did nothing, although >Ctrl-Alt-Del still worked, and at one point a heart symbol >appeared after I hit keys randomly for a while. > >My question is, what could have caused this failure? > >My initial guesses were either a memory failure or a really >badly corrupted boot sector, but I'm not convinced by either >explanation, for reasons outlined below. > >I urgently needed the data to be online again, so I yanked one >disk out of the machine and inserted it into another host, and >took the server back to the office. > >There, I yanked a memory module, and it booted fine, albeit >complaining about the degraded RAID array. However, when I >reinserted the memory, it continued to boot. I didn't have the >foresight to try it before I fiddled with the disks, but I can't >imagine that it had been seated incorrectly as the server had >been up for two months without problem. Also, the BIOS tests >passed, although I know they aren't too in depth. I'll run >sysutils/memtest anyway, and see what that throws up. > >Meanwhile, I inserted a replacement disk and rebuilt the RAID-1 >array, and it is still booting fine, so my best guess now is a >corrupted boot sector. The disk that I removed to insert into >another host was ad4, which I'm guessing is the disk that it >would have being trying to boot from in the first place. So a >bad sector could be responsible, but it would seem to be very >convenient, as there does not appear to be any other data >corruption on the disk. > >Also, I've run a short SMART test, and everything is okay as far >as it is concerned. I'm in the process of running a long test, >but that won't finish before I leave the office. If it were a >corrupted sector, would it be able to get to boot2? > >Any other suggestions as to what caused the failure? I know I've >changed the conditions and may never be able to reproduce it >(nor do I want to), but if I've failing hardware, I'd like a >best guess as to where it is. > >Thanks for your time, > >-fr. > > > Aloha, I have had memory chips walk out of the slots on several occasions. Sometimes its vibration or in Hawaii we have humidity issues occasionally that tend to cause this too. I have learned to spray the sockets and card connections with contact cleaner about every 6 months to avaid this problem. Especially in areas where servers are not in a cool environment. ~Al Plant - Honolulu, Hawaii - Phone: 808-284-2740 + http://hawaiidakine.com + http://freebsdinfo.org + noc@hdk5.net + + http://internetohana.org - Supporting - FreeBSD 6.* - 7.* + "All that's really worth doing is what we do for others."- Lewis Carrol