Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 9 Jun 2005 09:32:07 -0500 (CDT)
From:      Tony Shadwick <tshadwick@goinet.com>
To:        Steve Richardson <prefect@sidehack.sat.gweep.net>
Cc:        freebsd-questions@freebsd.org
Subject:   Re: FBSD 5.4-STABLE/3Ware Escalade 7506-4LP on dual Opteron issue
Message-ID:  <20050609093125.O71755@mail.goinet.com>
In-Reply-To: <20050609143002.GA74546@sidehack.sat.gweep.net>
References:  <20050609143002.GA74546@sidehack.sat.gweep.net>

next in thread | previous in thread | raw e-mail | index | archive | help
I'm not claiming this will fix your issue, but are you running the 
absolute latest kernel sources?  There is the possibility this issue has 
been resolve in a newer kernel.

cvsup your sources and try doing a build.  See what happens.

On Thu, 9 Jun 2005, Steve Richardson wrote:

>
> Hi,
>
> We're building out brand new dual Opteron box to run our public access unix
> site.  We're running FreeBSD 5.4 and a 3Ware Escalade 7506-4LP.  We are
> having difficulties with the system, and any help you can offer would be
> greatly appreciated.
>
> For the most part, everything behaves fine.  We've got the system built and
> installed.  Unfortunately, we're having a periodic, catastrophic failure
> involving the 3Ware card.
>
> Periodically, the system will partly lock up with the following errors:
>
> twe0: unexpected status bit(s) 100000<PCIABRT>
> twe0: PCI abort, clearing.
>
> I say partly lock up because the kernel does not panic, nor do the console
> keyboard or network interfaces become non-responsive (i.e. you can type
> stuff at the login prompt, and ping the server).  However, the disk
> subsystem does appear to cease functioning once this has occurred.
>
> Frankly at this point we are baffled, because the system is stable enough to
> run for days on end under light load, and will even occasionally handle
> periods of medium disk load (e.g. many hours of rsyncing from our live
> server, build world, etc).
>
> We have been using the bonnie++ hard disk benchmarking suite as a means for
> recreating the problem, as follows:
>
>> mkdir testdir
>> bonnie++ -d ./dbench -s 2g -n 100:500000:1000 -x 100
>
> I've included system information below, including dmesg output.
>
>
> regards,
> Steve Richardson
> System Administrator
> GweepNet Cooperative Network
>
>
>
> System Description:
> Gigabyte GA-7A8DW motherboard
> (2) AMD Opteron 246 2GHz CPUs
> 2GB Samsung PC3200 ECC RAM
> 3Ware Escalade 7506-4LP parallel ATA RAID, installed in 64 bit PCI slot
>
> OS:
> FreeBSD 5.4-STABLE FreeBSD 5.4-STABLE amd64
>
>
> dmesg output:
>
> Copyright (c) 1992-2005 The FreeBSD Project.
> Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
> 	The Regents of the University of California. All rights reserved.
> FreeBSD 5.4-STABLE #2: Tue Jun  7 00:10:29 EDT 2005
>    root@newsidey.gweep.net:/usr/obj/usr/src/sys/SIDEHACK
> Timecounter "i8254" frequency 1193182 Hz quality 0
> CPU: AMD Opteron(tm) Processor 246 (1993.79-MHz K8-class CPU)
>  Origin = "AuthenticAMD"  Id = 0xf5a  Stepping = 10
>  Features=0x78bfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2>
>  AMD Features=0xe0500800<SYSCALL,NX,MMX+,LM,3DNow+,3DNow>
> real memory  = 2146893824 (2047 MB)
> avail memory = 2061205504 (1965 MB)
> ACPI APIC Table: <PTLTD  	 APIC  >
> FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs
> cpu0 (BSP): APIC ID:  0
> cpu1 (AP): APIC ID:  1
> MADT: Forcing active-low polarity and level trigger for SCI
> ioapic0 <Version 1.1> irqs 0-23 on motherboard
> ioapic1 <Version 1.1> irqs 24-27 on motherboard
> ioapic2 <Version 1.1> irqs 28-31 on motherboard
> acpi0: <PTLTD 	 XSDT> on motherboard
> acpi0: Power Button (fixed)
> acpi0: Sleep Button (fixed)
> acpi_bus_number: can't get _ADR
> acpi_bus_number: can't get _ADR
> acpi_bus_number: can't get _ADR
> unknown: I/O range not supported
> unknown: I/O range not supported
>    ACPI-1304: *** Error: Method execution failed [\\_SB_.PCI0.LPC_.LPT_._CRS] (Node 0xffffff0000a70080), AE_AML_BUFFER_LIMIT
>    ACPI-0239: *** Error: Method execution failed [\\_SB_.PCI0.LPC_.LPT_._CRS] (Node 0xffffff0000a70080), AE_AML_BUFFER_LIMIT
> can't fetch resources for \\_SB_.PCI0.LPC_.LPT_ - AE_AML_BUFFER_LIMIT
> Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000
> acpi_timer0: <24-bit timer at 3.579545MHz> port 0x8008-0x800b on acpi0
> cpu0: <ACPI CPU> on acpi0
> cpu1: <ACPI CPU> on acpi0
> acpi_button0: <Power Button> on acpi0
> pcib0: <ACPI Host-PCI bridge> port 0xcf8-0xcff on acpi0
> pci0: <ACPI PCI bus> on pcib0
> pcib1: <ACPI PCI-PCI bridge> at device 1.0 on pci0
> pci1: <ACPI PCI bus> on pcib1
> pci1: <display, VGA> at device 0.0 (no driver attached)
> pcib2: <ACPI PCI-PCI bridge> at device 6.0 on pci0
> pci2: <ACPI PCI bus> on pcib2
> ohci0: <OHCI (generic) USB controller> mem 0xd0110000-0xd0110fff irq 19 at device 0.0 on pci2
> usb0: OHCI version 1.0, legacy support
> usb0: SMM does not respond, resetting
> usb0: <OHCI (generic) USB controller> on ohci0
> usb0: USB revision 1.0
> uhub0: AMD OHCI root hub, class 9/0, rev 1.00/1.00, addr 1
> uhub0: 3 ports with 3 removable, self powered
> ohci1: <OHCI (generic) USB controller> mem 0xd0111000-0xd0111fff irq 19 at device 0.1 on pci2
> usb1: OHCI version 1.0, legacy support
> usb1: SMM does not respond, resetting
> usb1: <OHCI (generic) USB controller> on ohci1
> usb1: USB revision 1.0
> uhub1: AMD OHCI root hub, class 9/0, rev 1.00/1.00, addr 1
> uhub1: 3 ports with 3 removable, self powered
> ahc0: <Adaptec 2902/04/10/15/20C/30C SCSI adapter> port 0x3000-0x30ff mem 0xd0112000-0xd0112fff irq 17 at device 4.0 on pci2
> aic7850: Single Channel A, SCSI Id=7, 3/253 SCBs
> bge0: <Broadcom BCM5705 Gigabit Ethernet, ASIC rev. 0x3003> mem 0xd0100000-0xd010ffff irq 19 at device 5.0 on pci2
> miibus0: <MII bus> on bge0
> brgphy0: <BCM5705 10/100/1000baseTX PHY> on miibus0
> brgphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseTX, 1000baseTX-FDX, auto
> bge0: Ethernet address: 00:0f:ea:7e:b1:81
> atapci0: <SiI 3114 SATA150 controller> port 0x3400-0x340f,0x3410-0x3413,0x3418-0x341f,0x3414-0x3417,0x3420-0x3427 mem 0xd0113000-0xd01133ff irq 18 at device 6.0 on pci2
> ata2: channel #0 on atapci0
> ata3: channel #1 on atapci0
> ata4: channel #2 on atapci0
> ata5: channel #3 on atapci0
> isab0: <PCI-ISA bridge> at device 7.0 on pci0
> isa0: <ISA bus> on isab0
> atapci1: <AMD 8111 UDMA133 controller> port 0x1000-0x100f,0x376,0x170-0x177,0x3f6,0x1f0-0x1f7 at device 7.1 on pci0
> ata0: channel #0 on atapci1
> ata1: channel #1 on atapci1
> pci0: <bridge> at device 7.3 (no driver attached)
> pcib3: <ACPI Host-PCI bridge> on acpi0
> pci8: <ACPI PCI bus> on pcib3
> pcib4: <ACPI PCI-PCI bridge> at device 3.0 on pci8
> pci9: <ACPI PCI bus> on pcib4
> pci8: <base peripheral, interrupt controller> at device 3.1 (no driver attached)
> pcib5: <ACPI PCI-PCI bridge> at device 4.0 on pci8
> pci14: <ACPI PCI bus> on pcib5
> twe0: <3ware Storage Controller. Driver version 1.50.01.002> port 0x4000-0x400f mem 0xf0800000-0xf0ffffff irq 30 at device 2.0 on pci14
> twe0: 4 ports, Firmware FE7X 1.05.00.068, BIOS BE7X 1.08.00.048
> pci8: <base peripheral, interrupt controller> at device 4.1 (no driver attached)
> atkbdc0: <Keyboard controller (i8042)> port 0x64,0x60 irq 1 on acpi0
> atkbd0: <AT Keyboard> flags 0x1 irq 1 on atkbdc0
> kbd0 at atkbd0
> fdc0: <floppy drive controller> port 0x3f7,0x3f0-0x3f5 irq 6 drq 2 on acpi0
> sio0: <16550A-compatible COM port> port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0
> sio0: type 16550A
> sio1: <16550A-compatible COM port> port 0x2f8-0x2ff irq 3 on acpi0
> sio1: type 16550A
> ppc0: cannot reserve I/O port range
> ppc0: cannot reserve I/O port range
> orm0: <ISA Option ROMs> at iomem 0xd0000-0xd0fff,0xc0000-0xcffff on isa0
> ppc0: cannot reserve I/O port range
> sc0: <System console> at flags 0x100 on isa0
> sc0: VGA <16 virtual consoles, flags=0x300>
> vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0
> Timecounters tick every 1.000 msec
> ahc0: Someone reset channel A
> ad0: 152627MB <SAMSUNG SP1614N/TM100-30> [310101/16/63] at ata0-master UDMA100
> ad2: 286188MB <Maxtor 6B300R0/BAH41B70> [581463/16/63] at ata1-master UDMA133
> Waiting 15 seconds for SCSI devices to settle
> twed0: <Unit 0, RAID5, Normal> on twe0
> twed0: 305253MB (625159424 sectors)
> sa0 at ahc0 bus 0 target 3 lun 0
> sa0: <EXABYTE EXB-89008E00012F V39e> Removable Sequential Access SCSI-2 device
> sa0: 10.000MB/s transfers (10.000MHz, offset 15)
> SMP: AP CPU #1 Launched!
> Mounting root from ufs:/dev/twed0s1a
> WARNING: / was not properly dismounted
> WARNING: /home/crib was not properly dismounted
> WARNING: /home/domus was not properly dismounted
> WARNING: /tmp was not properly dismounted
> WARNING: /u was not properly dismounted
> WARNING: /u/backup/nearline was not properly dismounted
> WARNING: /u/backup/online was not properly dismounted
> WARNING: /u/news was not properly dismounted
> WARNING: /u/news/nntpcached was not properly dismounted
> WARNING: /usr was not properly dismounted
> WARNING: /var was not properly dismounted
> WARNING: /var/tmp was not properly dismounted
> bge0: firmware handshake timed out
> bge0: RX CPU self-diagnostics failed!
> bge0: watchdog timeout -- resetting
> _______________________________________________
> freebsd-questions@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-questions
> To unsubscribe, send any mail to "freebsd-questions-unsubscribe@freebsd.org"
>



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20050609093125.O71755>