From owner-freebsd-stable@FreeBSD.ORG Tue Sep 16 18:45:43 2008 Return-Path: Delivered-To: stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id B73411065687 for ; Tue, 16 Sep 2008 18:45:43 +0000 (UTC) (envelope-from wb@freebie.xs4all.nl) Received: from smtp-vbr4.xs4all.nl (smtp-vbr4.xs4all.nl [194.109.24.24]) by mx1.freebsd.org (Postfix) with ESMTP id 279098FC79 for ; Tue, 16 Sep 2008 18:45:42 +0000 (UTC) (envelope-from wb@freebie.xs4all.nl) Received: from freebie.xs4all.nl (freebie.xs4all.nl [82.95.250.254]) by smtp-vbr4.xs4all.nl (8.13.8/8.13.8) with ESMTP id m8GIX66p019152; Tue, 16 Sep 2008 20:33:06 +0200 (CEST) (envelope-from wb@freebie.xs4all.nl) Received: from freebie.xs4all.nl (localhost [127.0.0.1]) by freebie.xs4all.nl (8.14.2/8.14.2) with ESMTP id m8GIWbq5047409; Tue, 16 Sep 2008 20:32:37 +0200 (CEST) (envelope-from wb@freebie.xs4all.nl) Received: (from wb@localhost) by freebie.xs4all.nl (8.14.2/8.14.2/Submit) id m8GIWW2e047408; Tue, 16 Sep 2008 20:32:32 +0200 (CEST) (envelope-from wb) Date: Tue, 16 Sep 2008 20:32:31 +0200 From: Wilko Bulte To: Clint Olsen Message-ID: <20080916183231.GG41919@freebie.xs4all.nl> References: <20080916170452.GB4861@0lsen.net> <20080916175858.GA70396@icarus.home.lan> <20080916181903.GC7540@0lsen.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20080916181903.GC7540@0lsen.net> User-Agent: Mutt/1.5.18 (2008-05-17) X-Virus-Scanned: by XS4ALL Virus Scanner Cc: stable@freebsd.org Subject: Re: Help debugging DMA_READ errors X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 16 Sep 2008 18:45:43 -0000 Quoting Clint Olsen, who wrote on Tue, Sep 16, 2008 at 11:19:03AM -0700 .. > Hi Jeremy: > > Thanks for your detailed response. Here are the answers I have thus far: > > On Sep 16, Jeremy Chadwick wrote: > > acd0 is a CD/DVD drive. ad4 is a hard disk. What exactly were you > > doing with the system at the time these errors appeared? Were you using > > the CD/DVD drive? Was there a disc in the drive that was mounted? > > If none of these things, I'm baffled as to what would read acd0 and > > cause what you see here. > > I was not at the system at the time. I never have had a disk in the drive > nor is /cdrom mounted currently. I have dump backups that run in the > middle of the night on the various filesystems. Taking a long shot: you do not have cooling issues of the drives maybe? Wilko > > > Can you please provide full details of what these disks are connected > > to? I'd like to see dmesg output for ata0, ata2, and ata3, as well as > > the atapci devices those ataX devices are attached to, ditto with > > vmstat -i output. Are there any other errors in your logs around > > that time (e.g. watchdog timeouts of any kind on network devices, etc.?) > > # dmesg | grep -i ata > atapci0: port 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xfc00-0xfc0f at device 31.1 on pci0 > ata0: on atapci0 > ata1: on atapci0 > atapci1: port 0xeff0-0xeff7,0xefe4-0xefe7,0xefa8-0xefaf,0xefe0-0xefe3,0xef60-0xef6f irq 18 at device 31.2 on pci0 > ata2: on atapci1 > ata3: on atapci1 > > I skipped the disks, of course. > > > Additionally, it would be very useful if you could install > > ports/sysutils/smartmontools and provide the following output: > > > > # smartctl -a /dev/ad0 > > # smartctl -a /dev/ad4 > > See attached. > > > http://wiki.freebsd.org/JeremyChadwick/ATA_issues_and_troubleshooting > > > > The bottom line is that, if the problems you're seeing are the "same > > thing" others are seeing, then you are not alone. As I said initially, > > finding the source of these problems is difficult, and they are often > > "unique" to each individual's machine. For some, replacing cables, the > > entire motherboard, disk controller, or just the PSU helped; for others, > > the problem disappeared on its own; in other cases, the problem was > > so severe that they ended up switching to Linux. > > I'll take a look at this page. > > Thanks, > > -Clint > > -- > This message has been scanned for viruses and > dangerous content by MailScanner, and is > believed to be clean. > > smartctl version 5.38 [i386-portbld-freebsd6.3] Copyright (C) 2002-8 Bruce Allen > Home page is http://smartmontools.sourceforge.net/ > > === START OF INFORMATION SECTION === > Model Family: Western Digital Caviar SE family > Device Model: WDC WD1200JB-32EVA0 > Serial Number: WD-WMAEL1302890 > Firmware Version: 15.05R15 > User Capacity: 120,034,123,776 bytes > Device is: In smartctl database [for details use: -P show] > ATA Version is: 6 > ATA Standard is: Exact ATA specification draft version not indicated > Local Time is: Tue Sep 16 11:11:04 2008 PDT > SMART support is: Available - device has SMART capability. > SMART support is: Enabled > > === START OF READ SMART DATA SECTION === > SMART overall-health self-assessment test result: PASSED > > General SMART Values: > Offline data collection status: (0x82) Offline data collection activity > was completed without error. > Auto Offline Data Collection: Enabled. > Self-test execution status: ( 0) The previous self-test routine completed > without error or no self-test has ever > been run. > Total time to complete Offline > data collection: (3801) seconds. > Offline data collection > capabilities: (0x79) SMART execute Offline immediate. > No Auto Offline data collection support. > Suspend Offline collection upon new > command. > Offline surface scan supported. > Self-test supported. > Conveyance Self-test supported. > Selective Self-test supported. > SMART capabilities: (0x0003) Saves SMART data before entering > power-saving mode. > Supports SMART auto save timer. > Error logging capability: (0x01) Error logging supported. > No General Purpose Logging support. > Short self-test routine > recommended polling time: ( 2) minutes. > Extended self-test routine > recommended polling time: ( 53) minutes. > Conveyance self-test routine > recommended polling time: ( 5) minutes. > > SMART Attributes Data Structure revision number: 16 > Vendor Specific SMART Attributes with Thresholds: > ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE > 1 Raw_Read_Error_Rate 0x000b 200 200 051 Pre-fail Always - 0 > 3 Spin_Up_Time 0x0007 162 148 021 Pre-fail Always - 2433 > 4 Start_Stop_Count 0x0032 100 100 040 Old_age Always - 79 > 5 Reallocated_Sector_Ct 0x0033 199 199 140 Pre-fail Always - 7 > 7 Seek_Error_Rate 0x000b 100 253 051 Pre-fail Always - 0 > 9 Power_On_Hours 0x0032 042 042 000 Old_age Always - 42740 > 10 Spin_Retry_Count 0x0013 100 253 051 Pre-fail Always - 0 > 11 Calibration_Retry_Count 0x0013 100 253 051 Pre-fail Always - 0 > 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 79 > 194 Temperature_Celsius 0x0022 111 253 000 Old_age Always - 39 > 196 Reallocated_Event_Count 0x0032 193 193 000 Old_age Always - 7 > 197 Current_Pending_Sector 0x0012 200 200 000 Old_age Always - 0 > 198 Offline_Uncorrectable 0x0012 200 200 000 Old_age Always - 0 > 199 UDMA_CRC_Error_Count 0x000a 200 253 000 Old_age Always - 0 > 200 Multi_Zone_Error_Rate 0x0009 200 155 051 Pre-fail Offline - 0 > > SMART Error Log Version: 1 > No Errors Logged > > SMART Self-test log structure revision number 1 > No self-tests have been logged. [To run self-tests, use: smartctl -t] > > > SMART Selective self-test log data structure revision number 1 > SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS > 1 0 0 Not_testing > 2 0 0 Not_testing > 3 0 0 Not_testing > 4 0 0 Not_testing > 5 0 0 Not_testing > Selective self-test flags (0x0): > After scanning selected spans, do NOT read-scan remainder of disk. > If Selective self-test is pending on power-up, resume after 0 minute delay. > > smartctl version 5.38 [i386-portbld-freebsd6.3] Copyright (C) 2002-8 Bruce Allen > Home page is http://smartmontools.sourceforge.net/ > > === START OF INFORMATION SECTION === > Model Family: Western Digital Caviar SE Serial ATA family > Device Model: WDC WD1200JD-00GBB0 > Serial Number: WD-WMAET1326141 > Firmware Version: 02.05D02 > User Capacity: 120,034,123,776 bytes > Device is: In smartctl database [for details use: -P show] > ATA Version is: 6 > ATA Standard is: Exact ATA specification draft version not indicated > Local Time is: Tue Sep 16 11:11:17 2008 PDT > SMART support is: Available - device has SMART capability. > SMART support is: Enabled > > === START OF READ SMART DATA SECTION === > SMART overall-health self-assessment test result: PASSED > > General SMART Values: > Offline data collection status: (0x82) Offline data collection activity > was completed without error. > Auto Offline Data Collection: Enabled. > Self-test execution status: ( 0) The previous self-test routine completed > without error or no self-test has ever > been run. > Total time to complete Offline > data collection: (3801) seconds. > Offline data collection > capabilities: (0x79) SMART execute Offline immediate. > No Auto Offline data collection support. > Suspend Offline collection upon new > command. > Offline surface scan supported. > Self-test supported. > Conveyance Self-test supported. > Selective Self-test supported. > SMART capabilities: (0x0003) Saves SMART data before entering > power-saving mode. > Supports SMART auto save timer. > Error logging capability: (0x01) Error logging supported. > No General Purpose Logging support. > Short self-test routine > recommended polling time: ( 2) minutes. > Extended self-test routine > recommended polling time: ( 53) minutes. > Conveyance self-test routine > recommended polling time: ( 5) minutes. > > SMART Attributes Data Structure revision number: 16 > Vendor Specific SMART Attributes with Thresholds: > ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE > 1 Raw_Read_Error_Rate 0x000b 200 200 051 Pre-fail Always - 0 > 3 Spin_Up_Time 0x0007 160 139 021 Pre-fail Always - 2508 > 4 Start_Stop_Count 0x0032 100 100 040 Old_age Always - 55 > 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0 > 7 Seek_Error_Rate 0x000b 200 200 051 Pre-fail Always - 0 > 9 Power_On_Hours 0x0032 051 051 000 Old_age Always - 36471 > 10 Spin_Retry_Count 0x0013 100 253 051 Pre-fail Always - 0 > 11 Calibration_Retry_Count 0x0013 100 253 051 Pre-fail Always - 0 > 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 55 > 194 Temperature_Celsius 0x0022 104 253 000 Old_age Always - 46 > 196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0 > 197 Current_Pending_Sector 0x0012 200 200 000 Old_age Always - 0 > 198 Offline_Uncorrectable 0x0012 200 200 000 Old_age Always - 0 > 199 UDMA_CRC_Error_Count 0x000a 200 253 000 Old_age Always - 2 > 200 Multi_Zone_Error_Rate 0x0009 200 155 051 Pre-fail Offline - 0 > > SMART Error Log Version: 1 > No Errors Logged > > SMART Self-test log structure revision number 1 > No self-tests have been logged. [To run self-tests, use: smartctl -t] > > > SMART Selective self-test log data structure revision number 1 > SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS > 1 0 0 Not_testing > 2 0 0 Not_testing > 3 0 0 Not_testing > 4 0 0 Not_testing > 5 0 0 Not_testing > Selective self-test flags (0x0): > After scanning selected spans, do NOT read-scan remainder of disk. > If Selective self-test is pending on power-up, resume after 0 minute delay. > > _______________________________________________ > freebsd-stable@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org" --- End of quoted text ---