Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 24 Jan 2010 00:51:13 -0500
From:      jhell <jhell@DataIX.net>
To:        Rich <rincebrain@gmail.com>
Cc:        freebsd-fs@freebsd.org
Subject:   Re: Errors on a file on a zpool: How to remove?
Message-ID:  <alpine.BSF.2.00.1001240043350.19303@pragry.qngnvk.ybpny>
In-Reply-To: <5da0588e1001232128w5a551674od0805c2ff0b884ad@mail.gmail.com>
References:  <5da0588e1001222223m773648am907267235bdcf882@mail.gmail.com> <alpine.BSF.2.00.1001231733570.2160@ibyngvyr> <5da0588e1001231541l246769eao410c5ea6ccca0de4@mail.gmail.com> <A43CB93C-06D6-406D-A8C0-4E10E85661A2@gmail.com>  <5da0588e1001231615t37c22575uedaae938be40f530@mail.gmail.com> <4B5B94B8.7070509@modulus.org> <5da0588e1001231638i349f8f17t297e970b08825441@mail.gmail.com> <alpine.BSF.2.00.1001232307590.83451@pragry.qngnvk.ybpny> <5da0588e1001232017m6c67731fwaa1d71cd86800017@mail.gmail.com> <alpine.BSF.2.00.1001232341590.19303@pragry.qngnvk.ybpny> <5da0588e1001232128w5a551674od0805c2ff0b884ad@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help

On Sun, 24 Jan 2010 00:28, rincebrain@ wrote:
> On Sun, Jan 24, 2010 at 12:15 AM, jhell <jhell@dataix.net> wrote:
>> From what I see and what was already mentioned earlier in this thread is
>> meta data corruption but the checksum errors do not span across the whole
>> pool of vdevs. These are, correct me if I am wrong USB mass storage devices
>> ? SSD ?
>
> 1.5T Seagate 7200RPM drives.
>
>> In the arrangement of the devices on the system are da2,4,5 on the same hub
>> and da6,7 on another ? If this is the case you may have consolidated your
>> errors down to being a USB problem and narrowed down to where they are
>> connected to.
>
> ...no.
>
> All five are on the same SATA controller. These behaviors persist
> independent of which SATA controller they are plugged into, and I've
> tried all seven in the machine.
>
>> What happened to da1,3 ? Were these once connected to the system ? and if so
>> did you start noticing this problem occur roughly about the same period they
>> were removed ?
>
> da1,3 are being used in another disk pool, and were never a part of this pool.
>
> This is not an issue of a faulty SATA controller or SATA drives.
>
> This is an issue of "there was a single faulty stick of RAM in the machine".
>

Yeah I read this earlier, My apologies it slipped while I was writing 
"mind went into multi-write single read mode".

> I have sixteen disks in this machine. These three are having issues
> only on these particular files, and only on these files, not on random
> portions of the disk. The disks never report read errors - the ZFS
> layer is what reports them. SMART is not reporting any difficulties in
> reading any sectors of these disks.
>
>
> I could be mistaken, but I do not believe there to be a faulty
> controller in play at this time. I've rotated the drives among the
> spares of the 24 ports on the SATA controller in question, as well as
> the on-motherboard controller, and this behavior has persisted.
>
> - Rich
>

As I was thinking earlier... you mentioned you scrubbed multiple times with 
no difference. When I was mentioning the attempt to remove/replace I was 
thinking this will cause a "re-silvering" of the drives possibly fixing 
meta-data for the effected disks if good meta-data still exists somewhere.

Might be worth a shot but I would start with the replace of the devices 
that are showing the errors until you can clear the errors successfully 
without them showing up again and/or until you have replaced all disks.

Best of luck.

-- 

  jhell




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?alpine.BSF.2.00.1001240043350.19303>