Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 16 Mar 2014 03:52:27 -0500
From:      cruxpot <cruxpot@gmail.com>
To:        Erich Dollansky <erichsfreebsdlist@alogt.com>
Cc:        freebsd-questions@freebsd.org
Subject:   Re: Another case of the vanishing disk
Message-ID:  <CAPYfQ9yJNZoOiOAm8oP9jsm2umhHnLLRWqE%2BeBJMjAyTueExeQ@mail.gmail.com>
In-Reply-To: <CAPYfQ9wtVDV=o21YRhWxpC6M5dchG1LY9zzmX_BDQujKHg6tfA@mail.gmail.com>
References:  <CAPYfQ9z-YUzKDAh3=V3_m1wmDtds4NzcewTq0wLUD9LWt3VaGA@mail.gmail.com> <20140316130936.3f2d18e0@X220.alogt.com> <CAPYfQ9ycxEr%2B-qPBC6qY6tvLrTMqT3guU%2B8q%2BbK2_RAj=WH1tw@mail.gmail.com> <20140316134309.2edc258a@X220.alogt.com> <CAPYfQ9ztmzYWSRoNLJk2Z-mTAdDti48ZOJrKT0LEEpuWf5SqHg@mail.gmail.com> <20140316142213.459009dc@X220.alogt.com> <CAPYfQ9yUOXG7uHh120vuERZLggo3QQSguck9RcJn62h8-yugyw@mail.gmail.com> <20140316151807.140c7ead@X220.alogt.com> <CAPYfQ9wtVDV=o21YRhWxpC6M5dchG1LY9zzmX_BDQujKHg6tfA@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
I moved the power cable to plug into a surge protector and not the
UPS. Still have the same problem. Every second I see new seek error
rate messages, some drivers report more at a time than others but all
4 are doing it.
# smartctl -a /dev/ada2 | egrep 'Error|ECC'
Error logging capability:        (0x01) Error logging supported.
  1 Raw_Read_Error_Rate     0x000f   110   099   006    Pre-fail
Always       -       15160
  7 Seek_Error_Rate         0x000f   078   060   030    Pre-fail
Always       -       67260695
184 End-to-End_Error        0x0032   100   100   099    Old_age
Always       -       0
191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age
Always       -       0
195 Hardware_ECC_Recovered  0x001a   037   004   000    Old_age
Always       -       15160
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age
Always       -       0
SMART Error Log Version: 1
No Errors Logged
# smartctl -a /dev/ada2 | egrep 'Error|ECC'
Error logging capability:        (0x01) Error logging supported.
  1 Raw_Read_Error_Rate     0x000f   110   099   006    Pre-fail
Always       -       15160
  7 Seek_Error_Rate         0x000f   078   060   030    Pre-fail
Always       -       67260696
184 End-to-End_Error        0x0032   100   100   099    Old_age
Always       -       0
191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age
Always       -       0
195 Hardware_ECC_Recovered  0x001a   037   004   000    Old_age
Always       -       15160
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age
Always       -       0
SMART Error Log Version: 1
No Errors Logged
# smartctl -a /dev/ada2 | egrep 'Error|ECC'
Error logging capability:        (0x01) Error logging supported.
  1 Raw_Read_Error_Rate     0x000f   110   099   006    Pre-fail
Always       -       15160
  7 Seek_Error_Rate         0x000f   078   060   030    Pre-fail
Always       -       67260697
184 End-to-End_Error        0x0032   100   100   099    Old_age
Always       -       0
191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age
Always       -       0
195 Hardware_ECC_Recovered  0x001a   037   004   000    Old_age
Always       -       15160
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age
Always       -       0



I will be stunned if it's yet another bad power supply, but I will
have to find another one somewhere and test this again. The drives are
all still under warranty.

On Sun, Mar 16, 2014 at 2:42 AM, cruxpot <cruxpot@gmail.com> wrote:
> It's an active PFC PSU plugged into an UPS which is not. Maybe that is
> the problem. I will try isolating some things tomorrow after the scrub
> has completed to see if I can get the errors to stop incrementing.
>
> On Sun, Mar 16, 2014 at 2:18 AM, Erich Dollansky
> <erichsfreebsdlist@alogt.com> wrote:
>> Hi,
>>
>> On Sun, 16 Mar 2014 02:00:51 -0500
>> cruxpot <cruxpot@gmail.com> wrote:
>>
>>> Seek_Error_Rate, Hardware_ECC_Recovered, Raw_Read_Error_Rate are all
>>> increasing steadily for all four disks. Does this have something to do
>>> with the recent resilver of the disk or the ongoing scrub (16.5%
>>> completed)?
>>>
>> the seek error rate could be linked to a failing power supply. The rest
>> should be just internal to the drive. Of course, also here a failing
>> power supply can be the cause.
>>
>> Can you put the drives into another machine?
>>
>> You must try to isolate the problem. It is a hardware problem on some
>> level. You must find out what it could be.
>>
>> Or just run a single disk on plain UFS. And connect it to some other
>> plug. And disconnect all other drives.
>>
>> Erich



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAPYfQ9yJNZoOiOAm8oP9jsm2umhHnLLRWqE%2BeBJMjAyTueExeQ>