Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 01 Feb 2011 10:17:47 -0500
From:      Mike Tancsa <mike@sentex.net>
To:        Adam Vande More <amvandemore@gmail.com>, freebsd-fs@freebsd.org
Subject:   Re: ZFS help! (solved)
Message-ID:  <4D48241B.2040807@sentex.net>
In-Reply-To: <AANLkTi=3Betpki=uDkH7vc0jNOEOuT7R5pphCzUROH-O@mail.gmail.com>
References:  <4D43475D.5050008@sentex.net>	<4D44D775.50507@jrv.org>	<4D470A65.4050000@sentex.net>	<AANLkTi=Z=Onduz9uMuoRgJNXEUJeNKU%2BWw=Rgi8TP2tP@mail.gmail.com>	<4D471729.3050804@sentex.net> <AANLkTi=3Betpki=uDkH7vc0jNOEOuT7R5pphCzUROH-O@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On 1/31/2011 3:32 PM, Adam Vande More wrote:
>>
> 
> maybe the meta data stuff is stored above it in /tank1/?  I don't know.  I'm
> pretty sure you can use a newer version of ZFS to rewind the transaction
> groups until you get back to a good state, but there's probably a lot in
> this scenario that would prevent that from being a viable solution.  If you
> do get it resolved please post the resolution.

OK, to summarize what happened for the archives.  This is RELENG_8 (from
end of Jan, on AMD64, 8G of RAM)

On my DR backup server that has backups of backups, I decided to expand
an existing pool.  I added a new eSata cage with integrated PM

2011-01-28.11:45:43 zpool add tank1 raidz /dev/ada0 /dev/ada1 /dev/ada2
/dev/ada3

0(offsite)# camcontrol devlist
<WDC WD1001FALS-00J7B1 05.00K05>   at scbus0 target 0 lun 0 (pass0,ada0)
<WDC WD1001FALS-00J7B1 05.00K05>   at scbus0 target 1 lun 0 (pass1,ada1)
<WDC WD1001FALS-00J7B1 05.00K05>   at scbus0 target 2 lun 0 (pass2,ada2)
<WDC WD1001FALS-00J7B1 05.00K05>   at scbus0 target 3 lun 0 (pass3,ada3)
<Port Multiplier 47261095 1f06>    at scbus0 target 15 lun 0 (pass4,pmp0)
<WDC WD2001FASS-00U0B0 01.00101>   at scbus1 target 0 lun 0 (pass5,ada4)
<WDC WD1501FASS-00W2B0 05.01D05>   at scbus1 target 1 lun 0 (pass6,ada5)
<WDC WD1501FASS-00W2B0 05.01D05>   at scbus1 target 2 lun 0 (pass7,ada6)
<WDC WD1501FASS-00W2B0 05.01D05>   at scbus1 target 3 lun 0 (pass8,ada7)
<WDC WD1501FASS-00W2B0 05.01D05>   at scbus1 target 4 lun 0 (pass9,ada8)
<Port Multiplier 47261095 1f06>    at scbus1 target 15 lun 0 (pass10,pmp1)
0(offsite)#

Controller is an Sil3134 (siis and ahci drivers)

Shortly after bringing the new sets of drives online, the drive cage
failed and started to present the drives in some odd way where the label
on the drives was no longer there.

# zdb -l /dev/ada0
--------------------------------------------
LABEL 0
--------------------------------------------
failed to unpack label 0
--------------------------------------------
LABEL 1
--------------------------------------------
failed to unpack label 1
--------------------------------------------
LABEL 2
--------------------------------------------
failed to unpack label 2
--------------------------------------------
LABEL 3
--------------------------------------------
failed to unpack label 3


# zpool status -v
  pool: tank1
 state: UNAVAIL
status: One or more devices could not be opened.  There are insufficient
        replicas for the pool to continue functioning.
action: Attach the missing device and online it using 'zpool online'.
   see: http://www.sun.com/msg/ZFS-8000-3C
 scrub: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        tank1       UNAVAIL      0     0     0  insufficient replicas
          raidz1    ONLINE       0     0     0
            ad0     ONLINE       0     0     0
            ad1     ONLINE       0     0     0
            ad4     ONLINE       0     0     0
            ad6     ONLINE       0     0     0
          raidz1    ONLINE       0     0     0
            ada4    ONLINE       0     0     0
            ada5    ONLINE       0     0     0
            ada6    ONLINE       0     0     0
            ada7    ONLINE       0     0     0
          raidz1    UNAVAIL      0     0     0  insufficient replicas
            ada0    UNAVAIL      0     0     0  cannot open
            ada1    UNAVAIL      0     0     0  cannot open
            ada2    UNAVAIL      0     0     0  cannot open
            ada3    UNAVAIL      0     0     0  cannot open


Pulling the drives out and putting them in a new drive cage allowed me
to see the file system as being online, albeit with errors.  Next steps
were to delete the 2 problem files

On bootup, it looked like

 zpool status -v
  pool: tank1
 state: ONLINE
status: One or more devices has experienced an error resulting in data
        corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
        entire pool from backup.
   see: http://www.sun.com/msg/ZFS-8000-8A
 scrub: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        tank1       ONLINE       0     0     0
          raidz1    ONLINE       0     0     0
            ad0     ONLINE       0     0     0
            ad1     ONLINE       0     0     0
            ad4     ONLINE       0     0     0
            ad6     ONLINE       0     0     0
          raidz1    ONLINE       0     0     0
            ada0    ONLINE       0     0     0
            ada1    ONLINE       0     0     0
            ada2    ONLINE       0     0     0
            ada3    ONLINE       0     0     0
          raidz1    ONLINE       0     0     0
            ada5    ONLINE       0     0     0
            ada8    ONLINE       0     0     0
            ada7    ONLINE       0     0     0
            ada6    ONLINE       0     0     0

errors: Permanent errors have been detected in the following files:

        /tank1/argus-data/previous/argus-sites-radium.2011.01.28.16.00
        tank1/argus-data:<0xc6>
        /tank1/argus-data/argus-sites-radium



Killed those files via rm, and then zpool status -v shows

errors: Permanent errors have been detected in the following files:

        tank1/argus-data:<0xc5>
        tank1/argus-data:<0xc6>
        tank1/argus-data:<0xc7>


So started a scrub and once it was done, no errors and all is clean!

0(offsite)# zpool status
  pool: tank1
 state: ONLINE
 scrub: scrub completed after 7h32m with 0 errors on Mon Jan 31 23:00:46
2011
config:

        NAME        STATE     READ WRITE CKSUM
        tank1       ONLINE       0     0     0
          raidz1    ONLINE       0     0     0
            ad0     ONLINE       0     0     0
            ad1     ONLINE       0     0     0
            ad4     ONLINE       0     0     0
            ad6     ONLINE       0     0     0
          raidz1    ONLINE       0     0     0
            ada0    ONLINE       0     0     0
            ada1    ONLINE       0     0     0
            ada2    ONLINE       0     0     0
            ada3    ONLINE       0     0     0
          raidz1    ONLINE       0     0     0
            ada5    ONLINE       0     0     0
            ada8    ONLINE       0     0     0
            ada7    ONLINE       0     0     0
            ada6    ONLINE       0     0     0

errors: No known data errors
0(offsite)#


	---Mike



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4D48241B.2040807>