From owner-freebsd-questions@FreeBSD.ORG  Sun Mar 16 08:52:27 2014
Return-Path: <owner-freebsd-questions@FreeBSD.ORG>
Delivered-To: freebsd-questions@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id E9684DA8
 for <freebsd-questions@freebsd.org>; Sun, 16 Mar 2014 08:52:27 +0000 (UTC)
Received: from mail-pa0-x235.google.com (mail-pa0-x235.google.com
 [IPv6:2607:f8b0:400e:c03::235])
 (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits))
 (No client certificate requested)
 by mx1.freebsd.org (Postfix) with ESMTPS id BDA1075
 for <freebsd-questions@freebsd.org>; Sun, 16 Mar 2014 08:52:27 +0000 (UTC)
Received: by mail-pa0-f53.google.com with SMTP id ld10so4425208pab.26
 for <freebsd-questions@freebsd.org>; Sun, 16 Mar 2014 01:52:27 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:in-reply-to:references:date:message-id:subject:from:to
 :cc:content-type;
 bh=Esri1uu7eQ3UDt5huIMkkixwPs0PrMuDonXZ2XyhanE=;
 b=jN1pQJNWHiJ/WMVaJxmoNsJTCzYUwRAE4HeM/x5+LzN54n9bghCXYcsS9xp81y7tfV
 Sgns9q1IxioXjw97jr6FZGUBeB9Uz6H7sV7vfOLJ8peqd9fPVby0F8OclI9/vlp1W4XU
 275tchihxkhkNYqawQ+TDfqF9yMollYjjYbMQKpYVZo6R/siY7eNztpkG1Wk7egfDXJ+
 ySXiTJDPsWAp3OtcOl8v8VFJftGe0N9OFeMnktonSYYOuEpj/ci2bUKauQZt96wFVcSU
 LCoScTKtCuURktTgTTbneS3AaJQdtnQQwXBOs6nk41JqY8JntymyK3N6fvuJ72ICUAv0
 a1PA==
MIME-Version: 1.0
X-Received: by 10.68.197.36 with SMTP id ir4mr18882230pbc.46.1394959947474;
 Sun, 16 Mar 2014 01:52:27 -0700 (PDT)
Received: by 10.68.157.73 with HTTP; Sun, 16 Mar 2014 01:52:27 -0700 (PDT)
In-Reply-To: <CAPYfQ9wtVDV=o21YRhWxpC6M5dchG1LY9zzmX_BDQujKHg6tfA@mail.gmail.com>
References: <CAPYfQ9z-YUzKDAh3=V3_m1wmDtds4NzcewTq0wLUD9LWt3VaGA@mail.gmail.com>
 <20140316130936.3f2d18e0@X220.alogt.com>
 <CAPYfQ9ycxEr+-qPBC6qY6tvLrTMqT3guU+8q+bK2_RAj=WH1tw@mail.gmail.com>
 <20140316134309.2edc258a@X220.alogt.com>
 <CAPYfQ9ztmzYWSRoNLJk2Z-mTAdDti48ZOJrKT0LEEpuWf5SqHg@mail.gmail.com>
 <20140316142213.459009dc@X220.alogt.com>
 <CAPYfQ9yUOXG7uHh120vuERZLggo3QQSguck9RcJn62h8-yugyw@mail.gmail.com>
 <20140316151807.140c7ead@X220.alogt.com>
 <CAPYfQ9wtVDV=o21YRhWxpC6M5dchG1LY9zzmX_BDQujKHg6tfA@mail.gmail.com>
Date: Sun, 16 Mar 2014 03:52:27 -0500
Message-ID: <CAPYfQ9yJNZoOiOAm8oP9jsm2umhHnLLRWqE+eBJMjAyTueExeQ@mail.gmail.com>
Subject: Re: Another case of the vanishing disk
From: cruxpot <cruxpot@gmail.com>
To: Erich Dollansky <erichsfreebsdlist@alogt.com>
Content-Type: text/plain; charset=ISO-8859-1
Cc: freebsd-questions@freebsd.org
X-BeenThere: freebsd-questions@freebsd.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: User questions <freebsd-questions.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-questions>, 
 <mailto:freebsd-questions-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-questions/>
List-Post: <mailto:freebsd-questions@freebsd.org>
List-Help: <mailto:freebsd-questions-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-questions>, 
 <mailto:freebsd-questions-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 16 Mar 2014 08:52:28 -0000

I moved the power cable to plug into a surge protector and not the
UPS. Still have the same problem. Every second I see new seek error
rate messages, some drivers report more at a time than others but all
4 are doing it.
# smartctl -a /dev/ada2 | egrep 'Error|ECC'
Error logging capability:        (0x01) Error logging supported.
  1 Raw_Read_Error_Rate     0x000f   110   099   006    Pre-fail
Always       -       15160
  7 Seek_Error_Rate         0x000f   078   060   030    Pre-fail
Always       -       67260695
184 End-to-End_Error        0x0032   100   100   099    Old_age
Always       -       0
191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age
Always       -       0
195 Hardware_ECC_Recovered  0x001a   037   004   000    Old_age
Always       -       15160
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age
Always       -       0
SMART Error Log Version: 1
No Errors Logged
# smartctl -a /dev/ada2 | egrep 'Error|ECC'
Error logging capability:        (0x01) Error logging supported.
  1 Raw_Read_Error_Rate     0x000f   110   099   006    Pre-fail
Always       -       15160
  7 Seek_Error_Rate         0x000f   078   060   030    Pre-fail
Always       -       67260696
184 End-to-End_Error        0x0032   100   100   099    Old_age
Always       -       0
191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age
Always       -       0
195 Hardware_ECC_Recovered  0x001a   037   004   000    Old_age
Always       -       15160
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age
Always       -       0
SMART Error Log Version: 1
No Errors Logged
# smartctl -a /dev/ada2 | egrep 'Error|ECC'
Error logging capability:        (0x01) Error logging supported.
  1 Raw_Read_Error_Rate     0x000f   110   099   006    Pre-fail
Always       -       15160
  7 Seek_Error_Rate         0x000f   078   060   030    Pre-fail
Always       -       67260697
184 End-to-End_Error        0x0032   100   100   099    Old_age
Always       -       0
191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age
Always       -       0
195 Hardware_ECC_Recovered  0x001a   037   004   000    Old_age
Always       -       15160
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age
Always       -       0


I will be stunned if it's yet another bad power supply, but I will
have to find another one somewhere and test this again. The drives are
all still under warranty.

On Sun, Mar 16, 2014 at 2:42 AM, cruxpot <cruxpot@gmail.com> wrote:
> It's an active PFC PSU plugged into an UPS which is not. Maybe that is
> the problem. I will try isolating some things tomorrow after the scrub
> has completed to see if I can get the errors to stop incrementing.
>
> On Sun, Mar 16, 2014 at 2:18 AM, Erich Dollansky
> <erichsfreebsdlist@alogt.com> wrote:
>> Hi,
>>
>> On Sun, 16 Mar 2014 02:00:51 -0500
>> cruxpot <cruxpot@gmail.com> wrote:
>>
>>> Seek_Error_Rate, Hardware_ECC_Recovered, Raw_Read_Error_Rate are all
>>> increasing steadily for all four disks. Does this have something to do
>>> with the recent resilver of the disk or the ongoing scrub (16.5%
>>> completed)?
>>>
>> the seek error rate could be linked to a failing power supply. The rest
>> should be just internal to the drive. Of course, also here a failing
>> power supply can be the cause.
>>
>> Can you put the drives into another machine?
>>
>> You must try to isolate the problem. It is a hardware problem on some
>> level. You must find out what it could be.
>>
>> Or just run a single disk on plain UFS. And connect it to some other
>> plug. And disconnect all other drives.
>>
>> Erich