Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 27 Oct 2010 18:05:04 -0700
From:      Rumen Telbizov <telbizov@gmail.com>
To:        Artem Belevich <fbsdlist@src.cx>
Cc:        freebsd-stable@freebsd.org
Subject:   Re: Degraded zpool cannot detach old/bad drive
Message-ID:  <AANLkTikPqgoxuYp7D88Dp0t5LvjXQeO3mCXdFw6onEZN@mail.gmail.com>
In-Reply-To: <AANLkTi=h6ZJtbRHeUOpKX17uOD5_XyYmu01ZTTCCKw=_@mail.gmail.com>
References:  <AANLkTi=EWfVyZjKEYe=c0x6QvsdUcHGo2-iqGr4OaVG7@mail.gmail.com> <AANLkTinjfpnHGMvzJ5Ly8_WFXGvQmQ4D0-_TgbVBi=cf@mail.gmail.com> <AANLkTi=h6ZJtbRHeUOpKX17uOD5_XyYmu01ZTTCCKw=_@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
Thanks Artem,

I am mainly concerned about fixing this immediate problem first and then if
I
can provide more information for the developers so that they look into this
problem
I'd be happy to.

I'll try OpenSolaris live CD and see how it goes. Either way I'll report
back here.

Cheers,
Rumen Telbizov

On Wed, Oct 27, 2010 at 5:22 PM, Artem Belevich <fbsdlist@src.cx> wrote:

> Are you interested in what's wrong or in how to fix it?
>
> If fixing is the priority, I'd boot from OpenSolaris live CD and would
> try importing the array there. Just make sure you don't upgrade ZFS to
> a version that is newer than the one FreeBSD supports.
>
> Opensolaris may be able to fix the array. Once it's done, export it,
> boot back to FreeBSD and re-import it.
>
> --Artem
>
>
>
> On Wed, Oct 27, 2010 at 4:22 PM, Rumen Telbizov <telbizov@gmail.com>
> wrote:
> > No ideas whatsoever?
> >
> > On Tue, Oct 26, 2010 at 1:04 PM, Rumen Telbizov <telbizov@gmail.com>
> wrote:
> >
> >> Hello everyone,
> >>
> >> After a few days of struggle with my degraded zpool on a backup server I
> >> decided to ask for
> >> help here or at least get some clues as to what might be wrong with it.
> >> Here's the current state of the zpool:
> >>
> >> # zpool status
> >>
> >>   pool: tank
> >>  state: DEGRADED
> >> status: One or more devices has experienced an error resulting in data
> >>         corruption.  Applications may be affected.
> >> action: Restore the file in question if possible.  Otherwise restore the
> >>         entire pool from backup.
> >>    see: http://www.sun.com/msg/ZFS-8000-8A
> >>  scrub: none requested
> >> config:
> >>
> >>         NAME                          STATE     READ WRITE CKSUM
> >>         tank                          DEGRADED     0     0     0
> >>           raidz1                      DEGRADED     0     0     0
> >>             spare                     DEGRADED     0     0     0
> >>               replacing               DEGRADED     0     0     0
> >>                 17307041822177798519  UNAVAIL      0   299     0  was
> >> /dev/gpt/disk-e1:s2
> >>                 gpt/newdisk-e1:s2     ONLINE       0     0     0
> >>               gpt/disk-e2:s10         ONLINE       0     0     0
> >>             gpt/disk-e1:s3            ONLINE      30     0     0
> >>             gpt/disk-e1:s4            ONLINE       0     0     0
> >>             gpt/disk-e1:s5            ONLINE       0     0     0
> >>           raidz1                      ONLINE       0     0     0
> >>             gpt/disk-e1:s6            ONLINE       0     0     0
> >>             gpt/disk-e1:s7            ONLINE       0     0     0
> >>             gpt/disk-e1:s8            ONLINE       0     0     0
> >>             gpt/disk-e1:s9            ONLINE       0     0     0
> >>           raidz1                      ONLINE       0     0     0
> >>             gpt/disk-e1:s10           ONLINE       0     0     0
> >>             gpt/disk-e1:s11           ONLINE       0     0     0
> >>             gpt/disk-e1:s12           ONLINE       0     0     0
> >>             gpt/disk-e1:s13           ONLINE       0     0     0
> >>           raidz1                      DEGRADED     0     0     0
> >>             gpt/disk-e1:s14           ONLINE       0     0     0
> >>             gpt/disk-e1:s15           ONLINE       0     0     0
> >>             gpt/disk-e1:s16           ONLINE       0     0     0
> >>             spare                     DEGRADED     0     0     0
> >>               replacing               DEGRADED     0     0     0
> >>                 15258738282880603331  UNAVAIL      0    48     0  was
> >> /dev/gpt/disk-e1:s17
> >>                 gpt/newdisk-e1:s17    ONLINE       0     0     0
> >>               gpt/disk-e2:s11         ONLINE       0     0     0
> >>           raidz1                      ONLINE       0     0     0
> >>             gpt/disk-e1:s18           ONLINE       0     0     0
> >>             gpt/disk-e1:s19           ONLINE       0     0     0
> >>             gpt/disk-e1:s20           ONLINE       0     0     0
> >>             gpt/disk-e1:s21           ONLINE       0     0     0
> >>           raidz1                      ONLINE       0     0     0
> >>             gpt/disk-e1:s22           ONLINE       0     0     0
> >>             gpt/disk-e1:s23           ONLINE       0     0     0
> >>             gpt/disk-e2:s0            ONLINE       0     0     0
> >>             gpt/disk-e2:s1            ONLINE       0     0     0
> >>           raidz1                      ONLINE       0     0     0
> >>             gpt/disk-e2:s2            ONLINE       0     0     0
> >>             gpt/disk-e2:s3            ONLINE       0     0     0
> >>             gpt/disk-e2:s4            ONLINE       0     0     0
> >>             gpt/disk-e2:s5            ONLINE       0     0     0
> >>           raidz1                      ONLINE       0     0     0
> >>             gpt/disk-e2:s6            ONLINE       0     0     0
> >>             gpt/disk-e2:s7            ONLINE       0     0     0
> >>             gpt/disk-e2:s8            ONLINE       0     0     0
> >>             gpt/disk-e2:s9            ONLINE       0     0     0
> >>         spares
> >>           gpt/disk-e2:s10             INUSE     currently in use
> >>           gpt/disk-e2:s11             INUSE     currently in use
> >>           gpt/disk-e1:s2              UNAVAIL   cannot open
> >>           gpt/newdisk-e1:s17          INUSE     currently in use
> >>
> >> errors: 4 data errors, use '-v' for a list
> >>
> >>
> >> The problem is: after replacing the bad drives and resilvering the
> old/bad
> >> drives cannot be detached.
> >> The replace command didn't remove it automatically and manual detach
> fails.
> >> Here are some examples:
> >>
> >> # zpool detach tank 15258738282880603331
> >> cannot detach 15258738282880603331: no valid replicas
> >> # zpool detach tank gpt/disk-e2:s11
> >> cannot detach gpt/disk-e2:s11: no valid replicas
> >> # zpool detach tank gpt/newdisk-e1:s17
> >> cannot detach gpt/newdisk-e1:s17: no valid replicas
> >> # zpool detach tank gpt/disk-e1:s17
> >> cannot detach gpt/disk-e1:s17: no valid replicas
> >>
> >>
> >> Here's more information and history of events.
> >> This is a 36 disk SuperMicro 847 machine with 2T WD RE4 disks organized
> in
> >> raidz1 groups as
> >> depicted above. zpool deals only with partitions like those:
> >>
> >> =>        34  3904294845  mfid30  GPT  (1.8T)
> >>           34  3903897600       1  disk-e2:s9  (1.8T)
> >>   3903897634      397245          - free -  (194M)
> >>
> >> mfidXX devices are disks connected to a SuperMicro/LSI controller and
> >> presented as jbods. JBODs in this adapter
> >> are actually constructed as raid0 array of 1 disk but this should be
> >> irrelevant in this case.
> >>
> >> This machine was working fine since September 6th but two of the disks
> (in
> >> different raidz1 vdevs) were going
> >> pretty bad and accumulated quite a bit of errors until eventually they
> >> died. This is how they looked like:
> >>
> >>           raidz1             DEGRADED     0     0     0
> >>             gpt/disk-e1:s2   UNAVAIL     44 59.5K     0  experienced I/O
> >> failures
> >>             gpt/disk-e1:s3   ONLINE       0     0     0
> >>             gpt/disk-e1:s4   ONLINE       0     0     0
> >>             gpt/disk-e1:s5   ONLINE       0     0     0
> >>
> >>           raidz1             DEGRADED     0     0     0
> >>             gpt/disk-e1:s14  ONLINE       0     0     0
> >>             gpt/disk-e1:s15  ONLINE       0     0     0
> >>             gpt/disk-e1:s16  ONLINE       0     0     0
> >>             gpt/disk-e1:s17  UNAVAIL  1.56K 49.0K     0  experienced I/O
> >> failures
> >>
> >>
> >> I did have two spare disks ready to replace them. So after they died
> here's
> >> what I executed:
> >>
> >> # zpool replace tank gpt/disk-e1:s2 gpt/disk-e2:s10
> >> # zpool replace tank gpt/disk-e1:s17 gpt/disk-e2:s11
> >>
> >> Resilvering started. While in the middle of it though the kernel paniced
> >> and I had to reboot the machine.
> >> After reboot I waited until the resilvering is complete. Now that it was
> >> complete I expected to see the old/bad
> >> device removed from the vdev but it was still there. Trying detach was
> >> complaining with no valid replicas.
> >> I sent colo technician to replace both those defective drives with brand
> >> new ones. Once I had them inserted
> >> I recreated them exactly the same way as the ones that I had before -
> jbod
> >> and gpart labeled partition with the
> >> same name! Then I added them as spares:
> >>
> >> # zpool add tank spare gpt/disk-e1:s2
> >> # zpool add tank spare gpt/disk-e1:s17
> >>
> >> That actually made it worse I think since now I had the same device name
> >> both as a 'previous' failed device
> >> inside the raidz1 group and as a hot spare spare device. I couldn't do
> >> anything with it.
> >> What I did was to export the pool fail the disk on the controller,
> import
> >> the pool and check that zfs could open
> >> it anymore (as a part of the hot spares). Then I recreated that
> >> disk/partition with a new label 'newdisk-XXX'
> >> and tried to replace the device that originally failed (and was only
> >> presented with a number). So I did this:
> >>
> >> # zpool replace tank gpt/disk-e1:s17 gpt/newdisk-e1:s17
> >> # zpool replace tank gpt/disk-e1:s2 gpt/newdisk-e1:s2
> >>
> >> Resilvering completed after 17 hours or so and I expected for the
> >> 'replacing' operation to disappear and the
> >> replaced device to go away. But it didn't! Instead I have the state of
> the
> >> pool as shown in the beginning of
> >> the email.
> >> As for the 'errors: 4 data errors, use '-v' for a list' I suspect that
> >> it's due another failing
> >> device (gpt/disk-e1:s3) inside the first (currently degraded) raidz1
> vdev.
> >> Those 4 corrupted files actually
> >> could be read sometimes so that tells me that the disk has trouble
> reading
> >> *sometimes* those bad blocks.
> >>
> >> Here's the output of zdb -l tank
> >>
> >>     version=14
> >>     name='tank'
> >>     state=0
> >>     txg=200225
> >>     pool_guid=13504509992978610301
> >>     hostid=409325918
> >>     hostname='XXXX'
> >>     vdev_tree
> >>         type='root'
> >>         id=0
> >>         guid=13504509992978610301
> >>         children[0]
> >>                 type='raidz'
> >>                 id=0
> >>                 guid=3740854890192825394
> >>                 nparity=1
> >>                 metaslab_array=33
> >>                 metaslab_shift=36
> >>                 ashift=9
> >>                 asize=7995163410432
> >>                 is_log=0
> >>                 children[0]
> >>                         type='spare'
> >>                         id=0
> >>                         guid=16171901098004278313
> >>                         whole_disk=0
> >>                         children[0]
> >>                                 type='replacing'
> >>                                 id=0
> >>                                 guid=2754550310390861576
> >>                                 whole_disk=0
> >>                                 children[0]
> >>                                         type='disk'
> >>                                         id=0
> >>                                         guid=17307041822177798519
> >>                                         path='/dev/gpt/disk-e1:s2'
> >>                                         whole_disk=0
> >>                                         not_present=1
> >>                                         DTL=246
> >>                                 children[1]
> >>                                         type='disk'
> >>                                         id=1
> >>                                         guid=1641394056824955485
> >>                                         path='/dev/gpt/newdisk-e1:s2'
> >>                                         whole_disk=0
> >>                                         DTL=55
> >>                         children[1]
> >>                                 type='disk'
> >>                                 id=1
> >>                                 guid=13150356781300468512
> >>                                 path='/dev/gpt/disk-e2:s10'
> >>                                 whole_disk=0
> >>                                 is_spare=1
> >>                                 DTL=1289
> >>                 children[1]
> >>                         type='disk'
> >>                         id=1
> >>                         guid=6047192237176807561
> >>                         path='/dev/gpt/disk-e1:s3'
> >>                         whole_disk=0
> >>                         DTL=250
> >>                 children[2]
> >>                         type='disk'
> >>                         id=2
> >>                         guid=9178318500891071208
> >>                         path='/dev/gpt/disk-e1:s4'
> >>                         whole_disk=0
> >>                         DTL=249
> >>                 children[3]
> >>                         type='disk'
> >>                         id=3
> >>                         guid=2567999855746767831
> >>                         path='/dev/gpt/disk-e1:s5'
> >>                         whole_disk=0
> >>                         DTL=248
> >>         children[1]
> >>                 type='raidz'
> >>                 id=1
> >>                 guid=17097047310177793733
> >>                 nparity=1
> >>                 metaslab_array=31
> >>                 metaslab_shift=36
> >>                 ashift=9
> >>                 asize=7995163410432
> >>                 is_log=0
> >>                 children[0]
> >>                         type='disk'
> >>                         id=0
> >>                         guid=14513380297393196654
> >>                         path='/dev/gpt/disk-e1:s6'
> >>                         whole_disk=0
> >>                         DTL=266
> >>                 children[1]
> >>                         type='disk'
> >>                         id=1
> >>                         guid=7673391645329839273
> >>                         path='/dev/gpt/disk-e1:s7'
> >>                         whole_disk=0
> >>                         DTL=265
> >>                 children[2]
> >>                         type='disk'
> >>                         id=2
> >>                         guid=15189132305590412134
> >>                         path='/dev/gpt/disk-e1:s8'
> >>                         whole_disk=0
> >>                         DTL=264
> >>                 children[3]
> >>                         type='disk'
> >>                         id=3
> >>                         guid=17171875527714022076
> >>                         path='/dev/gpt/disk-e1:s9'
> >>                         whole_disk=0
> >>                         DTL=263
> >>         children[2]
> >>                 type='raidz'
> >>                 id=2
> >>                 guid=4551002265962803186
> >>                 nparity=1
> >>                 metaslab_array=30
> >>                 metaslab_shift=36
> >>                 ashift=9
> >>                 asize=7995163410432
> >>                 is_log=0
> >>                 children[0]
> >>                         type='disk'
> >>                         id=0
> >>                         guid=12104241519484712161
> >>                         path='/dev/gpt/disk-e1:s10'
> >>                         whole_disk=0
> >>                         DTL=262
> >>                 children[1]
> >>                         type='disk'
> >>                         id=1
> >>                         guid=3950210349623142325
> >>                         path='/dev/gpt/disk-e1:s11'
> >>                         whole_disk=0
> >>                         DTL=261
> >>                 children[2]
> >>                         type='disk'
> >>                         id=2
> >>                         guid=14559903955698640085
> >>                         path='/dev/gpt/disk-e1:s12'
> >>                         whole_disk=0
> >>                         DTL=260
> >>                 children[3]
> >>                         type='disk'
> >>                         id=3
> >>                         guid=12364155114844220066
> >>                         path='/dev/gpt/disk-e1:s13'
> >>                         whole_disk=0
> >>                         DTL=259
> >>         children[3]
> >>                 type='raidz'
> >>                 id=3
> >>                 guid=12517231224568010294
> >>                 nparity=1
> >>                 metaslab_array=29
> >>                 metaslab_shift=36
> >>                 ashift=9
> >>                 asize=7995163410432
> >>                 is_log=0
> >>                 children[0]
> >>                         type='disk'
> >>                         id=0
> >>                         guid=7655789038925330983
> >>                         path='/dev/gpt/disk-e1:s14'
> >>                         whole_disk=0
> >>                         DTL=258
> >>                 children[1]
> >>                         type='disk'
> >>                         id=1
> >>                         guid=17815755378968233141
> >>                         path='/dev/gpt/disk-e1:s15'
> >>                         whole_disk=0
> >>                         DTL=257
> >>                 children[2]
> >>                         type='disk'
> >>                         id=2
> >>                         guid=9590421681925673767
> >>                         path='/dev/gpt/disk-e1:s16'
> >>                         whole_disk=0
> >>                         DTL=256
> >>                 children[3]
> >>                         type='spare'
> >>                         id=3
> >>                         guid=4015417100051235398
> >>                         whole_disk=0
> >>                         children[0]
> >>                                 type='replacing'
> >>                                 id=0
> >>                                 guid=11653429697330193176
> >>                                 whole_disk=0
> >>                                 children[0]
> >>                                         type='disk'
> >>                                         id=0
> >>                                         guid=15258738282880603331
> >>                                         path='/dev/gpt/disk-e1:s17'
> >>                                         whole_disk=0
> >>                                         not_present=1
> >>                                         DTL=255
> >>                                 children[1]
> >>                                         type='disk'
> >>                                         id=1
> >>                                         guid=908651380690954833
> >>                                         path='/dev/gpt/newdisk-e1:s17'
> >>                                         whole_disk=0
> >>                                         is_spare=1
> >>                                         DTL=52
> >>                         children[1]
> >>                                 type='disk'
> >>                                 id=1
> >>                                 guid=7250934196571906160
> >>                                 path='/dev/gpt/disk-e2:s11'
> >>                                 whole_disk=0
> >>                                 is_spare=1
> >>                                 DTL=1292
> >>         children[4]
> >>                 type='raidz'
> >>                 id=4
> >>                 guid=7622366288306613136
> >>                 nparity=1
> >>                 metaslab_array=28
> >>                 metaslab_shift=36
> >>                 ashift=9
> >>                 asize=7995163410432
> >>                 is_log=0
> >>                 children[0]
> >>                         type='disk'
> >>                         id=0
> >>                         guid=11283483106921343963
> >>                         path='/dev/gpt/disk-e1:s18'
> >>                         whole_disk=0
> >>                         DTL=254
> >>                 children[1]
> >>                         type='disk'
> >>                         id=1
> >>                         guid=14900597968455968576
> >>                         path='/dev/gpt/disk-e1:s19'
> >>                         whole_disk=0
> >>                         DTL=253
> >>                 children[2]
> >>                         type='disk'
> >>                         id=2
> >>                         guid=4140592611852504513
> >>                         path='/dev/gpt/disk-e1:s20'
> >>                         whole_disk=0
> >>                         DTL=252
> >>                 children[3]
> >>                         type='disk'
> >>                         id=3
> >>                         guid=2794215380207576975
> >>                         path='/dev/gpt/disk-e1:s21'
> >>                         whole_disk=0
> >>                         DTL=251
> >>         children[5]
> >>                 type='raidz'
> >>                 id=5
> >>                 guid=17655293908271300889
> >>                 nparity=1
> >>                 metaslab_array=27
> >>                 metaslab_shift=36
> >>                 ashift=9
> >>                 asize=7995163410432
> >>                 is_log=0
> >>                 children[0]
> >>                         type='disk'
> >>                         id=0
> >>                         guid=5274146379037055039
> >>                         path='/dev/gpt/disk-e1:s22'
> >>                         whole_disk=0
> >>                         DTL=278
> >>                 children[1]
> >>                         type='disk'
> >>                         id=1
> >>                         guid=8651755019404873686
> >>                         path='/dev/gpt/disk-e1:s23'
> >>                         whole_disk=0
> >>                         DTL=277
> >>                 children[2]
> >>                         type='disk'
> >>                         id=2
> >>                         guid=16827379661759988976
> >>                         path='/dev/gpt/disk-e2:s0'
> >>                         whole_disk=0
> >>                         DTL=276
> >>                 children[3]
> >>                         type='disk'
> >>                         id=3
> >>                         guid=2524967151333933972
> >>                         path='/dev/gpt/disk-e2:s1'
> >>                         whole_disk=0
> >>                         DTL=275
> >>         children[6]
> >>                 type='raidz'
> >>                 id=6
> >>                 guid=2413519694016115220
> >>                 nparity=1
> >>                 metaslab_array=26
> >>                 metaslab_shift=36
> >>                 ashift=9
> >>                 asize=7995163410432
> >>                 is_log=0
> >>                 children[0]
> >>                         type='disk'
> >>                         id=0
> >>                         guid=16361968944335143412
> >>                         path='/dev/gpt/disk-e2:s2'
> >>                         whole_disk=0
> >>                         DTL=274
> >>                 children[1]
> >>                         type='disk'
> >>                         id=1
> >>                         guid=10054650477559530937
> >>                         path='/dev/gpt/disk-e2:s3'
> >>                         whole_disk=0
> >>                         DTL=273
> >>                 children[2]
> >>                         type='disk'
> >>                         id=2
> >>                         guid=17105959045159531558
> >>                         path='/dev/gpt/disk-e2:s4'
> >>                         whole_disk=0
> >>                         DTL=272
> >>                 children[3]
> >>                         type='disk'
> >>                         id=3
> >>                         guid=17370453969371497663
> >>                         path='/dev/gpt/disk-e2:s5'
> >>                         whole_disk=0
> >>                         DTL=271
> >>         children[7]
> >>                 type='raidz'
> >>                 id=7
> >>                 guid=4614010953103453823
> >>                 nparity=1
> >>                 metaslab_array=24
> >>                 metaslab_shift=36
> >>                 ashift=9
> >>                 asize=7995163410432
> >>                 is_log=0
> >>                 children[0]
> >>                         type='disk'
> >>                         id=0
> >>                         guid=10090128057592036175
> >>                         path='/dev/gpt/disk-e2:s6'
> >>                         whole_disk=0
> >>                         DTL=270
> >>                 children[1]
> >>                         type='disk'
> >>                         id=1
> >>                         guid=16676544025008223925
> >>                         path='/dev/gpt/disk-e2:s7'
> >>                         whole_disk=0
> >>                         DTL=269
> >>                 children[2]
> >>                         type='disk'
> >>                         id=2
> >>                         guid=11777789246954957292
> >>                         path='/dev/gpt/disk-e2:s8'
> >>                         whole_disk=0
> >>                         DTL=268
> >>                 children[3]
> >>                         type='disk'
> >>                         id=3
> >>                         guid=3406600121427522915
> >>                         path='/dev/gpt/disk-e2:s9'
> >>                         whole_disk=0
> >>                         DTL=267
> >>
> >> OS:
> >> 8.1-STABLE FreeBSD 8.1-STABLE #0: Sun Sep  5 00:22:45 PDT 2010 amd64
> >>
> >> Hardware:
> >> Chassis:        SuperMicro 847E1 (two backplanes 24 disks front and 12
> >> disks in the back)
> >> Motherboard:    X8SIL
> >> CPU:            1 x X3430  @ 2.40GHz
> >> RAM:            16G
> >> HDD Controller: SuperMicro / LSI 9260 (pciconf -lv  SAS1078 PCI-X
> >> Fusion-MPT SAS) : 2 ports
> >> Disks:          36 x 2T Western Digital RE4
> >>
> >>
> >>
> >> Any help would be appreciated. Let me know what additional information I
> >> should provide.
> >> Thank you in advance,
> >> --
> >> Rumen Telbizov
> >>
> >>
> >
> >
> > --
> > Rumen Telbizov
> > http://telbizov.com
> > _______________________________________________
> > freebsd-stable@freebsd.org mailing list
> > http://lists.freebsd.org/mailman/listinfo/freebsd-stable
> > To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org
> "
> >
>



-- 
Rumen Telbizov
http://telbizov.com



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?AANLkTikPqgoxuYp7D88Dp0t5LvjXQeO3mCXdFw6onEZN>