Date: Mon, 9 Jan 2012 12:50:54 -0500 From: John Nielsen <lists@jnielsen.net> To: Freddie Cash <fjwcash@gmail.com> Cc: FreeBSD Stable <freebsd-stable@freebsd.org> Subject: Re: Upgrade from 8.2-STABLE to 9.0-RELEASE wedges on SuperMicro H8DGiF-based system Message-ID: <F9A87D68-27E4-4872-A2F2-CD3F0F4D1BE4@jnielsen.net> In-Reply-To: <CAOjFWZ6PbXCBoOinZRvXKmHDM8xWsYU657yPh5-i9TsmnFpdVg@mail.gmail.com> References: <CAOjFWZ6PbXCBoOinZRvXKmHDM8xWsYU657yPh5-i9TsmnFpdVg@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Jan 9, 2012, at 12:40 PM, Freddie Cash wrote: > Just wondering if anyone else has run into a similar issue. >=20 > We have a ZFS storage server that was running 8.2-STABLE (from around > beginning of Dec 2011) without any issues, that was upgraded to > 9.0-RELEASE (to consolidate all the ZFS and networking fixes/updates > and bring it up to version parity with our other ZFS storage server > running 9.0) last Thursday. The "svn switch" of the source tree, the > buildworld, the buildkernel, the installkernel, the reboot with the > new kernel, the installworld, the reboot into the new world, the > mergemaster processes all completed successfully. About half-way > through the "make delete-old" process, the box locked up. No messages > on the console, no log entries of any kind, everything just stopped. > Had to do a power-cycle. And then everything went to hell. :( >=20 > On reboot, the loader complained about not being able to determine > which disk it was booting from (even though the new loader had already > booted at least once), and gave strange messages about > panic/free/something or other (didn't write that error down). >=20 > I was able to boot using a 9.0 install CD, drop to a loader prompt, > unload the kernel/modules from CD, load the kernel/modules from the > harddrive, set currdev to the harddrive, and boot. But no matter what > I did (gpart bootcode using pmbr/gptboot from CD or from HD; copy > loader from CD, copy /boot from CD), I could not get the loader on the > HD to load the kernel; always gave the same error message: can't > determine which disk we're booting from. >=20 > After trying for 24 hours to make it work, I just re-installed off the > 9.0-RELEASE CD. >=20 > Now, this box (alphadrive) will freeze after running for between 3 and > 10 hours. Even when left completely idle, it will lock up after about > 3 hours. :( >=20 > I have another system (betadrive) that's almost identical hardware > (chassis, backplane, SATA controllers are different, everything else > is the same) that went from 8.2-STABLE to 9.0-RC2 to 9.0-RC3 to > 9.0-RELEASE without any issues. I've tried copying /boot/loader.conf, > /etc/make.conf, /etc/src.conf, /etc/sysctl.conf, /etc/rc.conf from > betadrive to alphadrive, without any change in the freezing behaviour. >=20 > These are ZFS storage systems, with / (UFS) and swap on SSDs, with 16 > or 24 SATA HDs in the pool (3x 5-disk raidz2 + spare and 4x 6-disk > raidz2 resp). All of the ZFS settings are identical between the two > systems (pool name, pool properties, ZFS filesystems, ZFS properties > per filesystem). Dedupe and compression (LZJB) are enabled on both > systems. >=20 > When alphadrive locks up, there are no entries made in any log files; > there are no log entries on the console; there are no entries in the > BIOS event log; there are no entries in the IPMI event log; the > CPU/case temps are below 40C (emergency shutoff is 75C) as shown via > IPMI; RAM usage is under 20 GB (24 GB per box) with the lowest being > under 2 GB used (I run top on the console so I can see the stats when > it locks up, and the time it locks up). It just ... stops. >=20 > The system will even lock up when running in single-user mode, with > only / mounted (ZFS not loaded, zpool not imported). >=20 > Hardware (alphadrive): > Chenbro 5U rackmount chassis with 24 hot-swap drive bays > SuperMicro H8DGi-F motherboard > AMD Opteron 2218 CPU (8-cores at 2.0 GHz) > 24 GB DDR3-SDRAM > 3x SuperMicro AOC-USAS-L8i SATA controllers (multi-lane break-out = cables) > 8x Seagate 7200.12 1.5 TB SATA harddrives > 16x WD RE4 1.0 TB SATA harddrives > 1x Kingston 60 GB SSD (for /, swap, L2ARC) >=20 > Hardware (betadrive): > SuperMicro 4U rackmount chassis with 16 hot-swap drive bays > SuperMicro H8DGi-F motherboard > AMD Opteron 2218 CPU (8-cores at 2.0 GHz) > 24 GB DDR3-SDRAM > 2x SuperMicro AOC-USAS2-L8i SATA controllers (multi-lane cables) > 16x WD RE4 2.0 TB SATA harddrives > 1x Kingston 60 GB SSD (for /, swap, L2ARC) >=20 > betadrive runs perfectly with FreeBSD 9.0-RELEASE. > alphadrive locks up with FreeBSD 9.0-RELEASE. >=20 > We're currently investigating hardware firmware revisions to see if > anything else is different between the two systems. >=20 > Has anyone experience anything similar? Does anyone have any ideas on > what to look for? Any suggestions on what to try next? =46rom what you've said I strongly suspect that you have some kind of = hardware issue. Dodgy RAM is my first guess, something cooling-related = is my 2nd, and PSU is my 3rd. It is a little suspicious that you only = started having problems after your upgrade but it could be coincidence = or it could be something about the new software tickling the hardware = differently than the old. Open it up, make sure you don't have dust buildup and that all the fans = are spinning, re-seat the RAM and then boot into memtest for a few = hours. If you have spare similar hardware you can also try swapping = components until you isolate the fault. Good luck, JN
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?F9A87D68-27E4-4872-A2F2-CD3F0F4D1BE4>