From owner-freebsd-stable@FreeBSD.ORG Fri Jul 10 18:40:08 2009 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id D557E1065672 for ; Fri, 10 Jul 2009 18:40:08 +0000 (UTC) (envelope-from quakelee@geekcn.org) Received: from tarsier.delphij.net (delphij-pt.tunnel.tserv2.fmt.ipv6.he.net [IPv6:2001:470:1f03:2c9::2]) by mx1.freebsd.org (Postfix) with ESMTP id EFAF48FC0C for ; Fri, 10 Jul 2009 18:40:07 +0000 (UTC) (envelope-from quakelee@geekcn.org) Received: from tarsier.geekcn.org (tarsier.geekcn.org [211.166.10.233]) (using TLSv1 with cipher ADH-CAMELLIA256-SHA (256/256 bits)) (No client certificate requested) by tarsier.delphij.net (Postfix) with ESMTPS id 216105C024 for ; Sat, 11 Jul 2009 02:40:07 +0800 (CST) Received: from localhost (tarsier.geekcn.org [211.166.10.233]) by tarsier.geekcn.org (Postfix) with ESMTP id DD8C755CD558 for ; Sat, 11 Jul 2009 02:40:06 +0800 (CST) X-Virus-Scanned: amavisd-new at geekcn.org Received: from tarsier.geekcn.org ([211.166.10.233]) by localhost (mail.geekcn.org [211.166.10.233]) (amavisd-new, port 10024) with ESMTP id 7INBONn2bQBj for ; Sat, 11 Jul 2009 02:39:07 +0800 (CST) Received: from qlmacbook.xbaynet.com (unknown [222.131.114.89]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by tarsier.geekcn.org (Postfix) with ESMTPSA id 51BC055CD54A for ; Sat, 11 Jul 2009 02:38:58 +0800 (CST) DomainKey-Signature: a=rsa-sha1; s=default; d=geekcn.org; c=nofws; q=dns; h=message-id:date:from:user-agent:mime-version:to:subject: content-type:content-transfer-encoding; b=npQuj+ZvnZn1sgS+KB9iedDta894Bpx+27SDCmhmGrzWNxYn1HAp5TjaMpLLevAIO FP6+tQ6S6qRLT+pYkup8g== Message-ID: <4A578AB6.3000600@geekcn.org> Date: Sat, 11 Jul 2009 02:38:46 +0800 From: Chao Shin User-Agent: Thunderbird 2.0.0.22 (Macintosh/20090605) MIME-Version: 1.0 To: freebsd-stable@freebsd.org Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Subject: weird sata DMA error X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 10 Jul 2009 18:40:09 -0000 Hi all, I have a dell optiplex 755 box. I plug a second disk on it for backup logs. It always report error message like below when heavy read load. ad9: TIMEOUT - READ_DMA48 retrying (1 retry left) LBA=1511796863 ad9: TIMEOUT - READ_DMA48 retrying (0 retries left) LBA=1511796863 ad9: FAILURE - READ_DMA48 timed out LBA=1511796863 g_vfs_done():ad9s1d[READ(offset=774039961600, length=16384)]error = 5 ad9: TIMEOUT - READ_DMA48 retrying (1 retry left) LBA=1513678143 ad9: TIMEOUT - READ_DMA48 retrying (0 retries left) LBA=1513678143 ad9: FAILURE - READ_DMA48 timed out LBA=1513678143 g_vfs_done():ad9s1d[READ(offset=775003176960, length=16384)]error = 5 We had changed another disk, sata cable, box and upgrade freebsd from 7.1-RELEASE-p5 to 7.2-RELEASE. but error message still come out. I have tried smartctl to send self-test command to disk, but it didn't start test, I have no idea about that now. There is smart message and dmesg below: # smartctl -a /dev/ad9 smartctl version 5.38 [amd64-portbld-freebsd7.2] Copyright (C) 2002-8 Bruce Allen Home page is http://smartmontools.sourceforge.net/ === START OF INFORMATION SECTION === Device Model: WDC WD15EADS-00R6B0 Serial Number: WD-WCAVY0256398 Firmware Version: 01.00A01 User Capacity: 1,500,301,910,016 bytes Device is: Not in smartctl database [for details use: -P showall] ATA Version is: 8 ATA Standard is: Exact ATA specification draft version not indicated Local Time is: Thu Jul 9 13:42:22 2009 CST SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x85) Offline data collection activity was aborted by an interrupting command from host. Auto Offline Data Collection: Enabled. Self-test execution status: ( 249) Self-test routine in progress... 90% of test remaining. Total time to complete Offline data collection: (32400) seconds. Offline data collection capabilities: (0x7b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 2) minutes. Extended self-test routine recommended polling time: ( 255) minutes. Conveyance self-test routine recommended polling time: ( 5) minutes. SCT capabilities: (0x303f) SCT Status supported. SCT Feature Control supported. SCT Data Table supported. SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x002f 198 195 051 Pre-fail Always - 20340 3 Spin_Up_Time 0x0027 155 144 021 Pre-fail Always - 9250 4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 22 5 Reallocated_Sector_Ct 0x0033 193 193 140 Pre-fail Always - 52 7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0 9 Power_On_Hours 0x0032 099 099 000 Old_age Always - 904 10 Spin_Retry_Count 0x0032 100 253 000 Old_age Always - 0 11 Calibration_Retry_Count 0x0032 100 253 000 Old_age Always - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 20 192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 4 193 Load_Cycle_Count 0x0032 200 200 000 Old_age Always - 18 194 Temperature_Celsius 0x0022 115 100 000 Old_age Always - 37 196 Reallocated_Event_Count 0x0032 149 149 000 Old_age Always - 51 197 Current_Pending_Sector 0x0032 196 195 000 Old_age Always - 1213 198 Offline_Uncorrectable 0x0030 200 196 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0 200 Multi_Zone_Error_Rate 0x0008 006 006 000 Old_age Offline - 38929 SMART Error Log Version: 1 No Errors Logged SMART Self-test log structure revision number 1 No self-tests have been logged. [To run self-tests, use: smartctl -t] SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay. There is hardware info, Copyright (c) 1992-2009 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD is a registered trademark of The FreeBSD Foundation. FreeBSD 7.2-RELEASE-p2 #0: Thu Jul 9 12:46:37 CST 2009 root@birdspark1.intra.umessage.com.cn:/usr/obj/usr/src/sys/BIRD Timecounter "i8254" frequency 1193182 Hz quality 0 CPU: Intel(R) Core(TM)2 Duo CPU E6550 @ 2.33GHz (2327.50-MHz K8-class CPU) Origin = "GenuineIntel" Id = 0x6fb Stepping = 11 Features=0xbfebfbff Features2=0xe3fd AMD Features=0x20100800 AMD Features2=0x1 Cores per package: 2 usable memory = 6287015936 (5995 MB) avail memory = 6046175232 (5766 MB) ACPI APIC Table: FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs cpu0 (BSP): APIC ID: 0 cpu1 (AP): APIC ID: 1 ioapic0: Changing APIC ID to 8 ioapic0 irqs 0-23 on motherboard lapic0: Forcing LINT1 to edge trigger kbd1 at kbdmux0 acpi0: on motherboard acpi0: [ITHREAD] acpi0: Power Button (fixed) Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000 acpi_timer0: <24-bit timer at 3.579545MHz> port 0x808-0x80b on acpi0 acpi_hpet0: iomem 0xfed00000-0xfed003ff on acpi0 Timecounter "HPET" frequency 14318180 Hz quality 900 acpi_button0: on acpi0 pcib0: port 0xcf8-0xcff on acpi0 pci0: on pcib0 pcib1: irq 16 at device 1.0 on pci0 pci1: on pcib1 vgapci0: port 0xec90-0xec97 mem 0xfea00000-0xfea7ffff,0xd0000000-0xdfffffff,0xfeb00000-0xfebfffff irq 16 at device 2.0 on pci0 agp0: on vgapci0 agp0: detected 7164k stolen memory agp0: aperture size is 256M vgapci1: mem 0xfea80000-0xfeafffff at device 2.1 on pci0 pci0: at device 3.0 (no driver attached) atapci0: port 0xfe80-0xfe87,0xfe90-0xfe93,0xfea0-0xfea7,0xfeb0-0xfeb3,0xfef0-0xfeff irq 18 at device 3.2 on pci0 atapci0: [ITHREAD] ata2: on atapci0 ata2: [ITHREAD] ata3: on atapci0 ata3: [ITHREAD] pci0: at device 3.3 (no driver attached) em0: port 0xecc0-0xecdf mem 0xfe9e0000-0xfe9fffff,0xfe9db000-0xfe9dbfff irq 21 at device 25.0 on pci0 em0: Using MSI interrupt em0: [FILTER] em0: Ethernet address: 00:1a:a0:d9:35:9b uhci0: port 0xff20-0xff3f irq 16 at device 26.0 on pci0 uhci0: [GIANT-LOCKED] uhci0: [ITHREAD] usb0: on uhci0 usb0: USB revision 1.0 uhub0: on usb0 uhub0: 2 ports with 2 removable, self powered uhci1: port 0xff00-0xff1f irq 17 at device 26.1 on pci0 uhci1: [GIANT-LOCKED] uhci1: [ITHREAD] usb1: on uhci1 usb1: USB revision 1.0 uhub1: on usb1 uhub1: 2 ports with 2 removable, self powered ehci0: mem 0xfe9d9c00-0xfe9d9fff irq 22 at device 26.7 on pci0 ehci0: [GIANT-LOCKED] ehci0: [ITHREAD] usb2: waiting for BIOS to give up control usb2: EHCI version 1.0 usb2: wrong number of companions (3 != 2) usb2: companion controllers, 2 ports each: usb0 usb1 usb2: on ehci0 usb2: USB revision 2.0 uhub2: on usb2 uhub2: 6 ports with 6 removable, self powered pci0: at device 27.0 (no driver attached) pcib2: irq 16 at device 28.0 on pci0 pci2: on pcib2 uhci2: port 0xff80-0xff9f irq 23 at device 29.0 on pci0 uhci2: [GIANT-LOCKED] uhci2: [ITHREAD] usb3: on uhci2 usb3: USB revision 1.0 uhub3: on usb3 uhub3: 2 ports with 2 removable, self powered uhci3: port 0xff60-0xff7f irq 17 at device 29.1 on pci0 uhci3: [GIANT-LOCKED] uhci3: [ITHREAD] usb4: on uhci3 usb4: USB revision 1.0 uhub4: on usb4 uhub4: 2 ports with 2 removable, self powered uhci4: port 0xff40-0xff5f irq 18 at device 29.2 on pci0 uhci4: [GIANT-LOCKED] uhci4: [ITHREAD] usb5: on uhci4 usb5: USB revision 1.0 uhub5: on usb5 uhub5: 2 ports with 2 removable, self powered ehci1: mem 0xff980800-0xff980bff irq 23 at device 29.7 on pci0 ehci1: [GIANT-LOCKED] ehci1: [ITHREAD] usb6: waiting for BIOS to give up control usb6: timed out waiting for BIOS usb6: EHCI version 1.0 usb6: companion controllers, 2 ports each: usb3 usb4 usb5 usb6: on ehci1 usb6: USB revision 2.0 uhub6: on usb6 uhub6: 6 ports with 6 removable, self powered pcib3: at device 30.0 on pci0 pci3: on pcib3 isab0: at device 31.0 on pci0 isa0: on isab0 atapci1: port 0xfe00-0xfe07,0xfe10-0xfe13,0xfe20-0xfe27,0xfe30-0xfe33,0xfec0-0xfecf,0xeca0-0xecaf irq 18 at device 31.2 on pci0 atapci1: [ITHREAD] ata4: on atapci1 ata4: [ITHREAD] ata5: on atapci1 ata5: [ITHREAD] pci0: at device 31.3 (no driver attached) atapci2: port 0xfe40-0xfe47,0xfe50-0xfe53,0xfe60-0xfe67,0xfe70-0xfe73,0xfed0-0xfedf,0xecb0-0xecbf irq 18 at device 31.5 on pci0 atapci2: [ITHREAD] ata6: on atapci2 ata6: [ITHREAD] ata7: on atapci2 ata7: [ITHREAD] sio0: configured irq 4 not in bitmap of probed irqs 0 sio0: port may not be enabled sio0: configured irq 4 not in bitmap of probed irqs 0 sio0: port may not be enabled sio0: <16550A-compatible COM port> port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0 sio0: type 16550A sio0: [FILTER] cpu0: on acpi0 est0: on cpu0 est: CPU supports Enhanced Speedstep, but is not recognized. est: cpu_vendor GenuineIntel, msr 724072406000724 device_attach: est0 attach returned 6 p4tcc0: on cpu0 cpu1: on acpi0 est1: on cpu1 est: CPU supports Enhanced Speedstep, but is not recognized. est: cpu_vendor GenuineIntel, msr 724072406000724 device_attach: est1 attach returned 6 p4tcc1: on cpu1 orm0: at iomem 0xc0000-0xcb7ff,0xcb800-0xcd7ff,0xcd800-0xcffff on isa0 atkbdc0: at port 0x60,0x64 on isa0 atkbd0: irq 1 on atkbdc0 kbd0 at atkbd0 atkbd0: [GIANT-LOCKED] atkbd0: [ITHREAD] sc0: at flags 0x100 on isa0 sc0: VGA <16 virtual consoles, flags=0x300> sio1: configured irq 3 not in bitmap of probed irqs 0 sio1: port may not be enabled vga0: at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0 Timecounters tick every 1.000 msec ad8: 152587MB at ata4-master SATA300 ad9: 1430799MB at ata4-slave SATA300