From owner-freebsd-questions@FreeBSD.ORG Sun Mar 16 08:52:27 2014 Return-Path: Delivered-To: freebsd-questions@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id E9684DA8 for ; Sun, 16 Mar 2014 08:52:27 +0000 (UTC) Received: from mail-pa0-x235.google.com (mail-pa0-x235.google.com [IPv6:2607:f8b0:400e:c03::235]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id BDA1075 for ; Sun, 16 Mar 2014 08:52:27 +0000 (UTC) Received: by mail-pa0-f53.google.com with SMTP id ld10so4425208pab.26 for ; Sun, 16 Mar 2014 01:52:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=Esri1uu7eQ3UDt5huIMkkixwPs0PrMuDonXZ2XyhanE=; b=jN1pQJNWHiJ/WMVaJxmoNsJTCzYUwRAE4HeM/x5+LzN54n9bghCXYcsS9xp81y7tfV Sgns9q1IxioXjw97jr6FZGUBeB9Uz6H7sV7vfOLJ8peqd9fPVby0F8OclI9/vlp1W4XU 275tchihxkhkNYqawQ+TDfqF9yMollYjjYbMQKpYVZo6R/siY7eNztpkG1Wk7egfDXJ+ ySXiTJDPsWAp3OtcOl8v8VFJftGe0N9OFeMnktonSYYOuEpj/ci2bUKauQZt96wFVcSU LCoScTKtCuURktTgTTbneS3AaJQdtnQQwXBOs6nk41JqY8JntymyK3N6fvuJ72ICUAv0 a1PA== MIME-Version: 1.0 X-Received: by 10.68.197.36 with SMTP id ir4mr18882230pbc.46.1394959947474; Sun, 16 Mar 2014 01:52:27 -0700 (PDT) Received: by 10.68.157.73 with HTTP; Sun, 16 Mar 2014 01:52:27 -0700 (PDT) In-Reply-To: References: <20140316130936.3f2d18e0@X220.alogt.com> <20140316134309.2edc258a@X220.alogt.com> <20140316142213.459009dc@X220.alogt.com> <20140316151807.140c7ead@X220.alogt.com> Date: Sun, 16 Mar 2014 03:52:27 -0500 Message-ID: Subject: Re: Another case of the vanishing disk From: cruxpot To: Erich Dollansky Content-Type: text/plain; charset=ISO-8859-1 Cc: freebsd-questions@freebsd.org X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 16 Mar 2014 08:52:28 -0000 I moved the power cable to plug into a surge protector and not the UPS. Still have the same problem. Every second I see new seek error rate messages, some drivers report more at a time than others but all 4 are doing it. # smartctl -a /dev/ada2 | egrep 'Error|ECC' Error logging capability: (0x01) Error logging supported. 1 Raw_Read_Error_Rate 0x000f 110 099 006 Pre-fail Always - 15160 7 Seek_Error_Rate 0x000f 078 060 030 Pre-fail Always - 67260695 184 End-to-End_Error 0x0032 100 100 099 Old_age Always - 0 191 G-Sense_Error_Rate 0x0032 100 100 000 Old_age Always - 0 195 Hardware_ECC_Recovered 0x001a 037 004 000 Old_age Always - 15160 199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0 SMART Error Log Version: 1 No Errors Logged # smartctl -a /dev/ada2 | egrep 'Error|ECC' Error logging capability: (0x01) Error logging supported. 1 Raw_Read_Error_Rate 0x000f 110 099 006 Pre-fail Always - 15160 7 Seek_Error_Rate 0x000f 078 060 030 Pre-fail Always - 67260696 184 End-to-End_Error 0x0032 100 100 099 Old_age Always - 0 191 G-Sense_Error_Rate 0x0032 100 100 000 Old_age Always - 0 195 Hardware_ECC_Recovered 0x001a 037 004 000 Old_age Always - 15160 199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0 SMART Error Log Version: 1 No Errors Logged # smartctl -a /dev/ada2 | egrep 'Error|ECC' Error logging capability: (0x01) Error logging supported. 1 Raw_Read_Error_Rate 0x000f 110 099 006 Pre-fail Always - 15160 7 Seek_Error_Rate 0x000f 078 060 030 Pre-fail Always - 67260697 184 End-to-End_Error 0x0032 100 100 099 Old_age Always - 0 191 G-Sense_Error_Rate 0x0032 100 100 000 Old_age Always - 0 195 Hardware_ECC_Recovered 0x001a 037 004 000 Old_age Always - 15160 199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0 I will be stunned if it's yet another bad power supply, but I will have to find another one somewhere and test this again. The drives are all still under warranty. On Sun, Mar 16, 2014 at 2:42 AM, cruxpot wrote: > It's an active PFC PSU plugged into an UPS which is not. Maybe that is > the problem. I will try isolating some things tomorrow after the scrub > has completed to see if I can get the errors to stop incrementing. > > On Sun, Mar 16, 2014 at 2:18 AM, Erich Dollansky > wrote: >> Hi, >> >> On Sun, 16 Mar 2014 02:00:51 -0500 >> cruxpot wrote: >> >>> Seek_Error_Rate, Hardware_ECC_Recovered, Raw_Read_Error_Rate are all >>> increasing steadily for all four disks. Does this have something to do >>> with the recent resilver of the disk or the ongoing scrub (16.5% >>> completed)? >>> >> the seek error rate could be linked to a failing power supply. The rest >> should be just internal to the drive. Of course, also here a failing >> power supply can be the cause. >> >> Can you put the drives into another machine? >> >> You must try to isolate the problem. It is a hardware problem on some >> level. You must find out what it could be. >> >> Or just run a single disk on plain UFS. And connect it to some other >> plug. And disconnect all other drives. >> >> Erich