Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 28 Feb 2007 12:21:08 -0500
From:      alex@schnarff.com
To:        freebsd-questions@freebsd.org
Cc:        Jean Lagarde <jean.lagarde@gmail.com>
Subject:   Stability Issues on 5.4-RELEASE Box
Message-ID:  <20070228122108.bhd56o5wn4ss8c4g@mail.schnarff.com>

next in thread | raw e-mail | index | archive | help
Hello All,

I've recently fallen into the task of administering a FreeBSD 
5.4-RELEASE box that acts as the web server for a small non-profit that 
I volunteer for. Unfortunately, the system has been having some 
extremely vexing stability issues over the last month or so, which even 
my 6+ years of experience as an OpenBSD admin have not helped me track 
down.

First things first, let me say explicitly that I'm not trying to say 
"FreeBSD sucks, it's not stable" or anything like that. It's a fine OS, 
and I'm sure that it's either faulty hardware or a misconfiguration of 
some sort causing these problems. :-)

That said, here are some of the symptoms the box has been experiencing:

* Occasional random reboots. I've only personally witnessed one, and 
they don't happen often, but any time a *NIX box just reboots for no 
apparent reason (there was no indication of a problem in any of the 
logs, at least that I could see), something really bad is going on.

* Random extreme slowness when logging in via SSH, with the time to get 
a shell ranging from a second or two all the way up to 80 seconds. The 
box isn't busy enough that it's just slow due to load (especially 
since, once you're in, things fly), and it's not just a reverse DNS 
issue like I've seen on OpenBSD (this occurs even when logging in from 
locations listed in /etc/hosts that resolve properly out of that file). 
Until I upgraded to the current version of OpenSSL/OpenSSH, the box 
would occasionally just become unresponsive altogether over SSH, not 
allowing logins for 15+ minutes at a time.

* Issues with files that are not found on startup sometimes, but are 
other times. Prime example: the Zope CMS system that's been installed 
failed to find libmysqlclient.so after a planned soft reboot, but found 
it with no trouble on a subsequent boot a few minutes later, with no 
config changes in between.

* A warning in /var/log/messages that the root filesystem was full, 
when it was at 60% capacity (and something like 2% inode capacity); the 
problem has yet to repeat, though no files have been cleared off of 
that filesystem.

* Random crashes of the Zope/Plone system that's running the main part 
of the web site. While I realize that, in and of itself, this means 
nothing about the stability of the underlying OS, in the context of all 
of the other things going on (as well as the fact that the Zope list 
has been unable to help figure out why it's crashing), it seems like it 
might be further evidence of a larger problem.

Thus far, besides simply scanning log files, constantly watching "top" 
and "ps", etc., I've not been able to do much with the box. As I said, 
I upgraded OpenSSL/OpenSSH to current versions, and I installed pf as 
the firewall (there was none before I arrived...don't even get me 
started on that). This weekend the guy who was the previous admin will 
be running a Memtest for me and disabling hyperthreading (which there's 
no performance justification for, and which has caused me stability 
issues at least on Linux in the past), since the server is in Oregon 
and I'm in the DC area. That's about the extent of what I've been able 
to do to date, since this is a production box.

What I'd like to know from you guys is:

* Am I justified in suspecting hyperthreading as a potential cause of 
instability?

* Does 5.4-RELEASE have any known bugs that might cause stability 
issues like the ones I've described here? More importantly, would an 
upgrade to 6.2-RELEASE be worthwhile (as is my instinct), in terms of 
being generally more stable and/or having better hardware support? 
Would such an upgrade be possible/relatively painless to perform 
without being physically at a console, as has been the case with 
OpenBSD over the years?

* Given my dmesg below, do you see any specific problems?

* Do you have any other suggestions for debugging this problem?

Thanks in advance for any help you can provide. :-)

Alex Kirk

dmesg:

Copyright (c) 1992-2005 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
         The Regents of the University of California. All rights reserved.
FreeBSD 5.4-RELEASE #0: Sun May  8 10:21:06 UTC 2005
     root@harlow.cse.buffalo.edu:/usr/obj/usr/src/sys/GENERIC
ACPI APIC Table: <INTEL  D945GTP >
Timecounter "i8254" frequency 1193182 Hz quality 0
CPU: Intel(R) Pentium(R) 4 CPU 3.20GHz (3200.01-MHz 686-class CPU)
   Origin = "GenuineIntel"  Id = 0xf43  Stepping = 3
   
Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE>
   Hyperthreading: 2 logical CPUs
real memory  = 2137509888 (2038 MB)
avail memory = 2086207488 (1989 MB)
ioapic0: Changing APIC ID to 2
ioapic0 <Version 2.0> irqs 0-23 on motherboard
npx0: <math processor> on motherboard
npx0: INT 16 interface
acpi0: <INTEL D945GTP> on motherboard
acpi0: Power Button (fixed)
Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000
acpi_timer0: <24-bit timer at 3.579545MHz> port 0x408-0x40b on acpi0
cpu0: <ACPI CPU> on acpi0
acpi_button0: <Sleep Button> on acpi0
pcib0: <ACPI Host-PCI bridge> port 0xcf8-0xcff on acpi0
pci0: <ACPI PCI bus> on pcib0
pci0: <display, VGA> at device 2.0 (no driver attached)
pcib1: <ACPI PCI-PCI bridge> at device 28.0 on pci0
pci1: <ACPI PCI bus> on pcib1
pcib2: <ACPI PCI-PCI bridge> at device 28.2 on pci0
pci2: <ACPI PCI bus> on pcib2
pcib3: <ACPI PCI-PCI bridge> at device 28.3 on pci0
pci3: <ACPI PCI bus> on pcib3
pcib4: <ACPI PCI-PCI bridge> at device 28.4 on pci0
pci4: <ACPI PCI bus> on pcib4
pcib5: <ACPI PCI-PCI bridge> at device 28.5 on pci0
pci5: <ACPI PCI bus> on pcib5
uhci0: <UHCI (generic) USB controller> port 0x2080-0x209f irq 23 at 
device 29.0 on pci0
usb0: <UHCI (generic) USB controller> on uhci0
usb0: USB revision 1.0
uhub0: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub0: 2 ports with 2 removable, self powered
uhci1: <UHCI (generic) USB controller> port 0x2060-0x207f irq 19 at 
device 29.1 on pci0
usb1: <UHCI (generic) USB controller> on uhci1
usb1: USB revision 1.0
uhub1: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub1: 2 ports with 2 removable, self powered
uhci2: <UHCI (generic) USB controller> port 0x2040-0x205f irq 18 at 
device 29.2 on pci0
usb2: <UHCI (generic) USB controller> on uhci2
usb2: USB revision 1.0
uhub2: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub2: 2 ports with 2 removable, self powered
uhci3: <UHCI (generic) USB controller> port 0x2020-0x203f irq 16 at 
device 29.3 on pci0
usb3: <UHCI (generic) USB controller> on uhci3

usb3: USB revision 1.0
uhub3: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub3: 2 ports with 2 removable, self powered
pci0: <serial bus, USB> at device 29.7 (no driver attached)
pcib6: <ACPI PCI-PCI bridge> at device 30.0 on pci0
pci6: <ACPI PCI bus> on pcib6
fxp0: <Intel 82550 Pro/100 Ethernet> port 0x1100-0x113f mem 
0x88000000-0x8801ffff,0x88021000-0x88021fff irq 21 at device 0.0 on pci6
miibus0: <MII bus> on fxp0
inphy0: <i82555 10/100 media interface> on miibus0
inphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
fxp0: Ethernet address: 00:02:b3:d5:4d:3f
ahc0: <Adaptec 2940 Ultra SCSI adapter> port 0x1000-0x10ff mem 
0x88020000-0x88020fff irq 22 at device 1.0 on pci6
aic7880: Ultra Wide Channel A, SCSI Id=7, 16/253 SCBs
isab0: <PCI-ISA bridge> at device 31.0 on pci0
isa0: <ISA bus> on isab0
atapci0: <GENERIC ATA controller> port 
0x20b0-0x20bf,0x376,0x170-0x177,0x3f6,0x1f0-0x1f7 irq 18 at device 31.1 
on pci0
ata0: channel #0 on atapci0
ata1: channel #1 on atapci0
atapci1: <GENERIC ATA controller> port 
0x20a0-0x20af,0x20e8-0x20eb,0x20c0-0x20c7,0x20ec-0x20ef,0x20c8-0x20cf 
irq 19 at device 31.2 on pci0
ata2: channel #0 on atapci1
ata3: channel #1 on atapci1
pci0: <serial bus, SMBus> at device 31.3 (no driver attached)
fdc0: <floppy drive controller> port 0x3f0,0x3f0-0x3f5 irq 6 drq 2 on acpi0
fd0: <1440-KB 3.5" drive> on fdc0 drive 0
sio0: <16550A-compatible COM port> port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0
sio0: type 16550A
orm0: <ISA Option ROMs> at iomem 0xcc800-0xccfff,0xcb000-0xcc7ff on isa0
pmtimer0 on isa0
atkbdc0: <Keyboard controller (i8042)> at port 0x64,0x60 on isa0
atkbd0: <AT Keyboard> irq 1 on atkbdc0
kbd0 at atkbd0
ppc0: parallel port not found.
sc0: <System console> at flags 0x100 on isa0
sc0: VGA <16 virtual consoles, flags=0x300>
sio1: configured irq 3 not in bitmap of probed irqs 0
sio1: port may not be enabled
vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0
RTC BIOS diagnostic error 80<clock_battery>
Timecounter "TSC" frequency 3200012824 Hz quality 800
Timecounters tick every 10.000 msec
acd0: CDRW <LITE-ON CD-RW SOHR-5239S/2S03> at ata0-slave PIO4
Interrupt storm detected on "irq19: uhci1+"; throttling interrupt source
ad4: 238475MB <ST3250823AS/3.03> [484521/16/63] at ata2-master UDMA33
ad5: 238475MB <ST3250823AS/3.03> [484521/16/63] at ata2-slave UDMA33
ad6: 238475MB <ST3250823AS/3.03> [484521/16/63] at ata3-master UDMA33
ad7: 238475MB <ST3250823AS/3.03> [484521/16/63] at ata3-slave UDMA33
Waiting 15 seconds for SCSI devices to settle
sa0 at ahc0 bus 0 target 6 lun 0
sa0: <SEAGATE DAT    9SP40-000 910B> Removable Sequential Access SCSI-3 device
sa0: 40.000MB/s transfers (20.000MHz, offset 8, 16bit)
Mounting root from ufs:/dev/ad4s1a
IP Filter: v3.4.35 initialized.  Default = pass all, Logging = enabled




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20070228122108.bhd56o5wn4ss8c4g>