Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 11 Jul 2007 07:34:02 -1000
From:      NetOpsCenter <noc@hdk5.net>
To:        Feargal Reilly <feargal@fbi.ie>,  FreeBSD Mailing List <freebsd-questions@freebsd.org>
Subject:   Re: Complete hang during boot at boot2 prompt
Message-ID:  <4695148A.9010200@hdk5.net>
In-Reply-To: <20070711182300.4634e278@mablung.edhellond.fbi.ie>
References:  <20070711182300.4634e278@mablung.edhellond.fbi.ie>

Next in thread | Previous in thread | Raw E-Mail | Index | Archive | Help
Feargal Reilly wrote:

>Hi,
>
>I have a server which went down overnight, and
>would not subsequently boot. A reboot was performed by
>facilities staff before I got to look at it so I don't know what
>was showing on the console. The reason for the outage is
>unknown, and nothing showed in /var/log/messages, other than
>routine ntpd time sync messages.
>
>The server in question is a Intel SR1425BK1 server running
>FreeBSD 6.2 amd64 GENERIC with a SATA RAID-1 array
>provided by an onboard LSILogic MegaRAID controller.
>
>When booted, it would pass the various BIOS screens without
>problem, the RAID utility would say that the array was optimal,
>and then FreeBSD would start to boot, but it couldn't get past
>boot2:
>
>  
>
>>>FreeBSD/amd64 BOOT
>>>      
>>>
>Default: 0:ad(0,a)/boot/loader
>boot:
>
>At this point, the server emitted a single continous beep, and
>nothing else happened. Keyboard input did nothing, although
>Ctrl-Alt-Del still worked, and at one point a heart symbol
>appeared after I hit keys randomly for a while.
>
>My question is, what could have caused this failure? 
>
>My initial guesses were either a memory failure or a really
>badly corrupted boot sector, but I'm not convinced by either
>explanation, for reasons outlined below.
>
>I urgently needed the data to be online again, so I yanked one 
>disk out of the machine and inserted it into another host, and
>took the server back to the office.
>
>There, I yanked a memory module, and it booted fine, albeit
>complaining about the degraded RAID array. However, when I
>reinserted the memory, it continued to boot. I didn't have the
>foresight to try it before I fiddled with the disks, but I can't
>imagine that it had been seated incorrectly as the server had
>been up for two months without problem. Also, the BIOS tests
>passed, although I know they aren't too in depth. I'll run
>sysutils/memtest anyway, and see what that throws up.
>
>Meanwhile, I inserted a replacement disk and rebuilt the RAID-1
>array, and it is still booting fine, so my best guess now is a
>corrupted boot sector. The disk that I removed to insert into
>another host was ad4, which I'm guessing is the disk that it
>would have being trying to boot from in the first place. So a
>bad sector could be responsible, but it would seem to be very
>convenient, as there does not appear to be any other data
>corruption on the disk.
>
>Also, I've run a short SMART test, and everything is okay as far
>as it is concerned. I'm in the process of running a long test,
>but that won't finish before I leave the office. If it were a
>corrupted sector, would it be able to get to boot2?
>
>Any other suggestions as to what caused the failure? I know I've
>changed the conditions and may never be able to reproduce it
>(nor do I want to), but if I've failing hardware, I'd like a
>best guess as to where it is.
>
>Thanks for your time,
>
>-fr.
>
>  
>
Aloha,

I have had memory chips walk out of the slots on several occasions. 
Sometimes its vibration or in Hawaii we have humidity issues 
occasionally that tend to cause this too.
I have learned to spray the sockets and card connections with contact 
cleaner about every 6 months to avaid this problem. Especially in areas 
where servers are not in a cool environment.



~Al Plant - Honolulu, Hawaii -  Phone:  808-284-2740
  + http://hawaiidakine.com + http://freebsdinfo.org + noc@hdk5.net +
  + http://internetohana.org   - Supporting - FreeBSD 6.* - 7.* +
"All that's really worth doing is what we do for others."- Lewis Carrol





Want to link to this message? Use this URL: <http://docs.FreeBSD.org/cgi/mid.cgi?4695148A.9010200>