Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 9 Jun 2005 10:30:02 -0400
From:      Steve Richardson <prefect@sidehack.sat.gweep.net>
To:        freebsd-questions@freebsd.org
Subject:   FBSD 5.4-STABLE/3Ware Escalade 7506-4LP on dual Opteron issue
Message-ID:  <20050609143002.GA74546@sidehack.sat.gweep.net>

next in thread | raw e-mail | index | archive | help

Hi,

We're building out brand new dual Opteron box to run our public access unix
site.  We're running FreeBSD 5.4 and a 3Ware Escalade 7506-4LP.  We are
having difficulties with the system, and any help you can offer would be
greatly appreciated.

For the most part, everything behaves fine.  We've got the system built and
installed.  Unfortunately, we're having a periodic, catastrophic failure
involving the 3Ware card.

Periodically, the system will partly lock up with the following errors:

twe0: unexpected status bit(s) 100000<PCIABRT>
twe0: PCI abort, clearing.

I say partly lock up because the kernel does not panic, nor do the console
keyboard or network interfaces become non-responsive (i.e. you can type
stuff at the login prompt, and ping the server).  However, the disk
subsystem does appear to cease functioning once this has occurred.

Frankly at this point we are baffled, because the system is stable enough to
run for days on end under light load, and will even occasionally handle
periods of medium disk load (e.g. many hours of rsyncing from our live
server, build world, etc). 

We have been using the bonnie++ hard disk benchmarking suite as a means for
recreating the problem, as follows:

> mkdir testdir   
> bonnie++ -d ./dbench -s 2g -n 100:500000:1000 -x 100    

I've included system information below, including dmesg output. 


regards,
Steve Richardson
System Administrator
GweepNet Cooperative Network



System Description:
Gigabyte GA-7A8DW motherboard
(2) AMD Opteron 246 2GHz CPUs
2GB Samsung PC3200 ECC RAM
3Ware Escalade 7506-4LP parallel ATA RAID, installed in 64 bit PCI slot

OS:
FreeBSD 5.4-STABLE FreeBSD 5.4-STABLE amd64


dmesg output:

Copyright (c) 1992-2005 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
	The Regents of the University of California. All rights reserved.
FreeBSD 5.4-STABLE #2: Tue Jun  7 00:10:29 EDT 2005
    root@newsidey.gweep.net:/usr/obj/usr/src/sys/SIDEHACK
Timecounter "i8254" frequency 1193182 Hz quality 0
CPU: AMD Opteron(tm) Processor 246 (1993.79-MHz K8-class CPU)
  Origin = "AuthenticAMD"  Id = 0xf5a  Stepping = 10
  Features=0x78bfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2>
  AMD Features=0xe0500800<SYSCALL,NX,MMX+,LM,3DNow+,3DNow>
real memory  = 2146893824 (2047 MB)
avail memory = 2061205504 (1965 MB)
ACPI APIC Table: <PTLTD  	 APIC  >
FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs
 cpu0 (BSP): APIC ID:  0
 cpu1 (AP): APIC ID:  1
MADT: Forcing active-low polarity and level trigger for SCI
ioapic0 <Version 1.1> irqs 0-23 on motherboard
ioapic1 <Version 1.1> irqs 24-27 on motherboard
ioapic2 <Version 1.1> irqs 28-31 on motherboard
acpi0: <PTLTD 	 XSDT> on motherboard
acpi0: Power Button (fixed)
acpi0: Sleep Button (fixed)
acpi_bus_number: can't get _ADR
acpi_bus_number: can't get _ADR
acpi_bus_number: can't get _ADR
unknown: I/O range not supported
unknown: I/O range not supported
    ACPI-1304: *** Error: Method execution failed [\\_SB_.PCI0.LPC_.LPT_._CRS] (Node 0xffffff0000a70080), AE_AML_BUFFER_LIMIT
    ACPI-0239: *** Error: Method execution failed [\\_SB_.PCI0.LPC_.LPT_._CRS] (Node 0xffffff0000a70080), AE_AML_BUFFER_LIMIT
can't fetch resources for \\_SB_.PCI0.LPC_.LPT_ - AE_AML_BUFFER_LIMIT
Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000
acpi_timer0: <24-bit timer at 3.579545MHz> port 0x8008-0x800b on acpi0
cpu0: <ACPI CPU> on acpi0
cpu1: <ACPI CPU> on acpi0
acpi_button0: <Power Button> on acpi0
pcib0: <ACPI Host-PCI bridge> port 0xcf8-0xcff on acpi0
pci0: <ACPI PCI bus> on pcib0
pcib1: <ACPI PCI-PCI bridge> at device 1.0 on pci0
pci1: <ACPI PCI bus> on pcib1
pci1: <display, VGA> at device 0.0 (no driver attached)
pcib2: <ACPI PCI-PCI bridge> at device 6.0 on pci0
pci2: <ACPI PCI bus> on pcib2
ohci0: <OHCI (generic) USB controller> mem 0xd0110000-0xd0110fff irq 19 at device 0.0 on pci2
usb0: OHCI version 1.0, legacy support
usb0: SMM does not respond, resetting
usb0: <OHCI (generic) USB controller> on ohci0
usb0: USB revision 1.0
uhub0: AMD OHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub0: 3 ports with 3 removable, self powered
ohci1: <OHCI (generic) USB controller> mem 0xd0111000-0xd0111fff irq 19 at device 0.1 on pci2
usb1: OHCI version 1.0, legacy support
usb1: SMM does not respond, resetting
usb1: <OHCI (generic) USB controller> on ohci1
usb1: USB revision 1.0
uhub1: AMD OHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub1: 3 ports with 3 removable, self powered
ahc0: <Adaptec 2902/04/10/15/20C/30C SCSI adapter> port 0x3000-0x30ff mem 0xd0112000-0xd0112fff irq 17 at device 4.0 on pci2
aic7850: Single Channel A, SCSI Id=7, 3/253 SCBs
bge0: <Broadcom BCM5705 Gigabit Ethernet, ASIC rev. 0x3003> mem 0xd0100000-0xd010ffff irq 19 at device 5.0 on pci2
miibus0: <MII bus> on bge0
brgphy0: <BCM5705 10/100/1000baseTX PHY> on miibus0
brgphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseTX, 1000baseTX-FDX, auto
bge0: Ethernet address: 00:0f:ea:7e:b1:81
atapci0: <SiI 3114 SATA150 controller> port 0x3400-0x340f,0x3410-0x3413,0x3418-0x341f,0x3414-0x3417,0x3420-0x3427 mem 0xd0113000-0xd01133ff irq 18 at device 6.0 on pci2
ata2: channel #0 on atapci0
ata3: channel #1 on atapci0
ata4: channel #2 on atapci0
ata5: channel #3 on atapci0
isab0: <PCI-ISA bridge> at device 7.0 on pci0
isa0: <ISA bus> on isab0
atapci1: <AMD 8111 UDMA133 controller> port 0x1000-0x100f,0x376,0x170-0x177,0x3f6,0x1f0-0x1f7 at device 7.1 on pci0
ata0: channel #0 on atapci1
ata1: channel #1 on atapci1
pci0: <bridge> at device 7.3 (no driver attached)
pcib3: <ACPI Host-PCI bridge> on acpi0
pci8: <ACPI PCI bus> on pcib3
pcib4: <ACPI PCI-PCI bridge> at device 3.0 on pci8
pci9: <ACPI PCI bus> on pcib4
pci8: <base peripheral, interrupt controller> at device 3.1 (no driver attached)
pcib5: <ACPI PCI-PCI bridge> at device 4.0 on pci8
pci14: <ACPI PCI bus> on pcib5
twe0: <3ware Storage Controller. Driver version 1.50.01.002> port 0x4000-0x400f mem 0xf0800000-0xf0ffffff irq 30 at device 2.0 on pci14
twe0: 4 ports, Firmware FE7X 1.05.00.068, BIOS BE7X 1.08.00.048
pci8: <base peripheral, interrupt controller> at device 4.1 (no driver attached)
atkbdc0: <Keyboard controller (i8042)> port 0x64,0x60 irq 1 on acpi0
atkbd0: <AT Keyboard> flags 0x1 irq 1 on atkbdc0
kbd0 at atkbd0
fdc0: <floppy drive controller> port 0x3f7,0x3f0-0x3f5 irq 6 drq 2 on acpi0
sio0: <16550A-compatible COM port> port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0
sio0: type 16550A
sio1: <16550A-compatible COM port> port 0x2f8-0x2ff irq 3 on acpi0
sio1: type 16550A
ppc0: cannot reserve I/O port range
ppc0: cannot reserve I/O port range
orm0: <ISA Option ROMs> at iomem 0xd0000-0xd0fff,0xc0000-0xcffff on isa0
ppc0: cannot reserve I/O port range
sc0: <System console> at flags 0x100 on isa0
sc0: VGA <16 virtual consoles, flags=0x300>
vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0
Timecounters tick every 1.000 msec
ahc0: Someone reset channel A
ad0: 152627MB <SAMSUNG SP1614N/TM100-30> [310101/16/63] at ata0-master UDMA100
ad2: 286188MB <Maxtor 6B300R0/BAH41B70> [581463/16/63] at ata1-master UDMA133
Waiting 15 seconds for SCSI devices to settle
twed0: <Unit 0, RAID5, Normal> on twe0
twed0: 305253MB (625159424 sectors)
sa0 at ahc0 bus 0 target 3 lun 0
sa0: <EXABYTE EXB-89008E00012F V39e> Removable Sequential Access SCSI-2 device 
sa0: 10.000MB/s transfers (10.000MHz, offset 15)
SMP: AP CPU #1 Launched!
Mounting root from ufs:/dev/twed0s1a
WARNING: / was not properly dismounted
WARNING: /home/crib was not properly dismounted
WARNING: /home/domus was not properly dismounted
WARNING: /tmp was not properly dismounted
WARNING: /u was not properly dismounted
WARNING: /u/backup/nearline was not properly dismounted
WARNING: /u/backup/online was not properly dismounted
WARNING: /u/news was not properly dismounted
WARNING: /u/news/nntpcached was not properly dismounted
WARNING: /usr was not properly dismounted
WARNING: /var was not properly dismounted
WARNING: /var/tmp was not properly dismounted
bge0: firmware handshake timed out
bge0: RX CPU self-diagnostics failed!
bge0: watchdog timeout -- resetting



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20050609143002.GA74546>