Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 08 Jul 2010 19:58:08 +0100
From:      Karl Pielorz <kpielorz_lst@tdx.co.uk>
To:        freebsd-geom@freebsd.org
Subject:   FreeBSD 7.3-Stable / GEOM issue with ZFS attach/replace & zvol's...
Message-ID:  <6F0C8FABB57A0A91965413C1@Octa64>

next in thread | raw e-mail | index | archive | help

Hi All,

I posted a few days ago in -fs and -hackers - but never got any reply. I've 
done some digging around now, and been able to reproduce the problem below 
on another machine (by sending my ZFS zvol's & snapshots to it).

I'm running 7.3-STABLE on an amd64, w/10Gb of RAM, and 2 * dual core 
Opteron 285's.

In a nutshell: A zfs attach/replace (or similar) on my system results in 
GEOM iterating through all the 'drives' on the system (which is apparently 
normal). When it encounters some of my ZFS volume snapshots (which are GELI 
encrypted) it appears to 'hang' and the zfs attach/replace never completes.

Remove the snapshot it hangs on - and it hangs on another. Remove all the 
snapshots/volumes - and the ZFS command completes without issue.

At the moment this is stopping me from replacing a failing drive which is 
part of a zpool mirror set :(

e.g. With GEOM debugging turned on, I get:

host# zfs attach vol ad34 ad40

"
[GEOM complains the guid for ad40 doesn't match what it wants - and then 
starts iterating through all the disk devices one after another... The guid 
mismatch appears 'normal' - i.e. it always happens - even on working 
systems]

Jul  5 19:42:50 host kernel: 
g_access(0xffffff0035015380(zvol/vol2/zfs_backups/scanned), 1, 0, 0)
Jul  5 19:42:50 host kernel: open delta:[r1w0e0] old:[r0w0e0] 
provider:[r0w0e0] 0xffffff000e1fd000(zvol/vol2/zfs_backups/scanned)
Jul  5 19:42:50 host kernel: 
g_access(0xffffff0035015380(zvol/vol2/zfs_backups/scanned), -1, 0, 0)
Jul  5 19:42:50 host kernel: open delta:[r-1w0e0] old:[r1w0e0] 
provider:[r1w0e0] 0xffffff000e1fd000(zvol/vol2/zfs_backups/scanned)
Jul  5 19:42:50 host kernel: g_detach(0xffffff0035015380)
Jul  5 19:42:50 host kernel: 
g_access(0xffffff0035015380(zvol/vol/scanned@1237495449), 1, 0, 0)
Jul  5 19:42:50 host kernel: open delta:[r1w0e0] old:[r0w0e0] 
provider:[r0w0e0] 0xffffff000e60b300(zvol/vol/scanned@1237495449)
**** ZFS [hangs here] - as does anything that subsequently touches ZFS ***
"

ps axl at that point shows:

"
 0  2250  2004   0  -8  0 14460  2044 g_wait D+    p0    0:00.01 zpool 
attach vol ad34 ad40
"

So it appears to be hung in 'g_wait'.


If I then reboot, and do:

"zfs destroy vol/scanned@1237495449"

Then try the attach again - it hangs on another snapshot of 'vol/scanned' 
(e.g. 'vol/scanned@1274617895') next time round.

If I destroy all of them:

"zfs destroy -r vol/scanned"

The attach completes without issue. All those snapshots can be dd'd from 
without issue (or mounted when attached via GELI etc.) - none of the 
snapshots or GELI volumes are mounted when I do the attach/replace.

zpool status, and an ls of '/dev/zvol/vol' are below.

It *looks* like GEOM is seeing something it doesn't like, and hanging?

The system has worked fine for coming up to a year with ZFS - I have 
replaced/attached drives in the past - but that was under 7.2-Stable.

Is there any additional GEOM debugging I can enable? (or any possible 
workarounds - i.e. something I can do to get GEOM to ignore the ZVol's?)

-Karl


zpool status:

  pool: vol
 state: ONLINE
 scrub: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        vol         ONLINE       0     0     0
          mirror    ONLINE       0     0     0
            ad28    ONLINE       0     0     0
            ad12    ONLINE       0     0     0
          mirror    ONLINE       0     0     0
            ad14    ONLINE       0     0     0
            ad30    ONLINE       0     0     0
          mirror    ONLINE       0     0     0
            ad16    ONLINE       0     0     0
            ad32    ONLINE       0     0     0
          mirror    ONLINE       0     0     0
            ad18    ONLINE       0     0     0
            ad34    ONLINE       0     0     0
          mirror    ONLINE       0     0     0
            ad20    ONLINE       0     0     0
            ad36    ONLINE       0     0     0
          mirror    ONLINE       0     0     0
            ad22    ONLINE       0     0     0
            ad38    ONLINE       0     0     0
        spares
          ad42      AVAIL
(ad40 is also spare - but not linked to any pools)


ls /dev/zvol/vol

crw-r-----  1 root  operator    0, 162 Jul  5 19:55 scanned
crw-r-----  1 root  operator    0, 172 Jul  5 19:55 scanned@1237495449
crw-r-----  1 root  operator    0, 164 Jul  5 19:55 scanned@1238970339
crw-r-----  1 root  operator    0, 167 Jul  5 19:55 scanned@1239143782
crw-r-----  1 root  operator    0, 165 Jul  5 19:55 scanned@1244575946
crw-r-----  1 root  operator    0, 163 Jul  5 19:55 scanned@1247670305
crw-r-----  1 root  operator    0, 168 Jul  5 19:55 scanned@1251063149
crw-r-----  1 root  operator    0, 166 Jul  5 19:55 scanned@1256072040
crw-r-----  1 root  operator    0, 169 Jul  5 19:55 scanned@1259364830
crw-r-----  1 root  operator    0, 170 Jul  5 19:55 scanned@1267226353
crw-r-----  1 root  operator    0, 171 Jul  5 19:55 scanned@1274617895
crw-r-----  1 root  operator    0, 195 Jul  5 19:55 scanned@1278362753





Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?6F0C8FABB57A0A91965413C1>