Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 3 Jun 2014 01:29:08 +0100
From:      "Steven Hartland" <killing@multiplay.co.uk>
To:        <mike@bayphoto.com>, <freebsd-fs@freebsd.org>
Subject:   Re: ZFS Kernel Panic on 10.0-RELEASE
Message-ID:  <F445995D86AA44FB8497296E0D41AC8F@multiplay.co.uk>
References:  <5388D64D.4030400@bayphoto.com> <EC2EA442-56FC-46B4-A1E2-97523029B7B3@mail.turbofuzz.com> <5388E5B4.3030002@bayphoto.com> <538BBEB7.4070008@bayphoto.com> <782C34792E95484DBA631A96FE3BEF20@multiplay.co.uk> <538C9CF3.6070208@bayphoto.com> <16ADD4D9DC73403C9669D8F34FDBD316@multiplay.co.uk> <538CB3EA.9010807@bayphoto.com> <6C6FB182781541CEBF627998B73B1DB4@multiplay.co.uk> <538CC16A.6060207@bayphoto.com> <F959477921CD4552A94BF932A55961F4@multiplay.co.uk> <538CDB7F.2060408@bayphoto.com> <88B3A7562A5F4F9B9EEF0E83BCAD2FB0@multiplay.co.uk> <538CE2B3.8090008@bayphoto.com> <85184EB23AA84607A360E601D03E1741@multiplay.co.uk> <538D0174.6000906@bayphoto.com>

next in thread | previous in thread | raw e-mail | index | archive | help
This is a multi-part message in MIME format.

------=_NextPart_000_0681_01CF7ECB.36F91C70
Content-Type: text/plain;
	format=flowed;
	charset="Windows-1252";
	reply-type=original
Content-Transfer-Encoding: 7bit


----- Original Message ----- 
From: "Mike Carlson" <mike@bayphoto.com>
To: "Steven Hartland" <killing@multiplay.co.uk>; <freebsd-fs@freebsd.org>
Sent: Monday, June 02, 2014 11:57 PM
Subject: Re: ZFS Kernel Panic on 10.0-RELEASE


> On 6/2/2014 2:15 PM, Steven Hartland wrote:
>> ----- Original Message ----- From: "Mike Carlson" <mike@bayphoto.com>
>>
>>>> Thats the line I gathered it was on but no I need to know what the 
>>>> value
>>>> of vd is, so what you need to do is:
>>>> print vd
>>>>
>>>> If thats valid then:
>>>> print *vd
>>>>
>>> It reports:
>>>
>>> (kgdb) print *vd
>>> No symbol "vd" in current context.
>>
>> Dam optimiser :(
>>
>>> Should I rebuild the kernel with additional options?
>>
>> Likely wont help as kernel with zero optimisations tends to fail
>> to build in my experience :(
>>
>> Can you try applying the attached patch to your src e.g.
>> cd /usr/src
>> patch < zfs-dsize-dva-check.patch
>>
>> The rebuild, install the kernel and then reproduce the issue again.
>>
>> Hopefully it will provide some more information on the cause, but
>> I suspect you might be seeing the effect os have some corruption.
>
> Well, after building the kernel with your patch, installing it and 
> booting off of it, the system does not panic.
>
> It reports this when I mount the filesystem:
>
>    Solaris: WARNING: dva_get_dsize_sync(): bad DVA 131241:2147483648
>    Solaris: WARNING: dva_get_dsize_sync(): bad DVA 131241:2147483648
>    Solaris: WARNING: dva_get_dsize_sync(): bad DVA 131241:2147483648
>
> Here is the results, I can now mount the file system!
>
>    root@working-1:~ # zfs set canmount=on zroot/data/working
>    root@working-1:~ # zfs mount zroot/data/working
>    root@working-1:~ # df
>    Filesystem                 1K-blocks       Used Avail Capacity 
>    Mounted on
>    zroot                     2677363378    1207060 2676156318     0%    /
>    devfs                              1          1 0   100%    /dev
>    /dev/mfid10p1              253911544    2827824 230770800     1%   
>    /dump
>    zroot/home                2676156506        188 2676156318     0%   
>    /home
>    zroot/data                2676156389         71 2676156318     0%   
>    /mnt/data
>    zroot/usr/ports/distfiles 2676246609      90291 2676156318     0%   
>    /mnt/usr/ports/distfiles
>    zroot/usr/ports/packages  2676158702       2384 2676156318     0%   
>    /mnt/usr/ports/packages
>    zroot/tmp                 2676156812        493 2676156318     0%   
>    /tmp
>    zroot/usr                 2679746045    3589727 2676156318     0%   
>    /usr
>    zroot/usr/ports           2676986896     830578 2676156318     0%   
>    /usr/ports
>    zroot/usr/src             2676643553     487234 2676156318     0%   
>    /usr/src
>    zroot/var                 2676650671     494353 2676156318     0%   
>    /var
>    zroot/var/crash           2676156388         69 2676156318     0%   
>    /var/crash
>    zroot/var/db              2677521200    1364882 2676156318     0%   
>    /var/db
>    zroot/var/db/pkg          2676198058      41740 2676156318     0%   
>    /var/db/pkg
>    zroot/var/empty           2676156387         68 2676156318     0%   
>    /var/empty
>    zroot/var/log             2676168522      12203 2676156318     0%   
>    /var/log
>    zroot/var/mail            2676157043        725 2676156318     0%   
>    /var/mail
>    zroot/var/run             2676156508        190 2676156318     0%   
>    /var/run
>    zroot/var/tmp             2676156389         71 2676156318     0%   
>    /var/tmp
>    zroot/data/working        7664687468 4988531149 2676156318    65%   
>    /mnt/data/working
>    root@working-1:~ # ls /mnt/data/working/
>    DONE_ORDERS             DP2_CMD NEW_MULTI_TESTING       PROCESS
>    RECYCLER                XML_NOTIFICATIONS       XML_REPORTS

That does indeed seem to indicated some on disk corruption.

There are a number of cases in the code which have a similar check but
I'm afraid I don't know the implications of the corruption your
seeing but others may.

The attached updated patch will enforce the safe panic in this case
unless the sysctl vfs.zfs.recover is set to 1 (which can also now be
done on  the fly).

I'd recommend backing up the data off the pool and restoring it else
where.

It would be interesting to see the output of the following command
on your pool:
zdb -uuumdC <pool>

    Regards
    Steve
------=_NextPart_000_0681_01CF7ECB.36F91C70
Content-Type: application/octet-stream;
	name="zfs-dsize-dva-check.patch"
Content-Transfer-Encoding: quoted-printable
Content-Disposition: attachment;
	filename="zfs-dsize-dva-check.patch"

Index: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa_misc.c=0A=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=0A=
--- sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa_misc.c	(revision =
266009)=0A=
+++ sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa_misc.c	(working =
copy)=0A=
@@ -252,7 +252,7 @@ int zfs_flags =3D 0;=0A=
 int zfs_recover =3D 0;=0A=
 SYSCTL_DECL(_vfs_zfs);=0A=
 TUNABLE_INT("vfs.zfs.recover", &zfs_recover);=0A=
-SYSCTL_INT(_vfs_zfs, OID_AUTO, recover, CTLFLAG_RDTUN, &zfs_recover, 0,=0A=
+SYSCTL_INT(_vfs_zfs, OID_AUTO, recover, CTLFLAG_RWTUN, &zfs_recover, 0,=0A=
     "Try to recover from otherwise-fatal errors.");=0A=
 =0A=
 extern int zfs_txg_synctime_ms;=0A=
@@ -1631,7 +1631,13 @@ dva_get_dsize_sync(spa_t *spa, const dva_t *dva)=0A=
 	ASSERT(spa_config_held(spa, SCL_ALL, RW_READER) !=3D 0);=0A=
 =0A=
 	if (asize !=3D 0 && spa->spa_deflate) {=0A=
-		vdev_t *vd =3D vdev_lookup_top(spa, DVA_GET_VDEV(dva));=0A=
+		uint64_t vdev =3D DVA_GET_VDEV(dva);=0A=
+		vdev_t *vd =3D vdev_lookup_top(spa, vdev);=0A=
+		if (vd =3D=3D NULL) {=0A=
+			zfs_panic_recover(=0A=
+			    "dva_get_dsize_sync(): bad DVA %llu:%llu",=0A=
+			    (u_longlong_t)vdev, (u_longlong_t)asize);=0A=
+		}=0A=
 		dsize =3D (asize >> SPA_MINBLOCKSHIFT) * vd->vdev_deflate_ratio;=0A=
 	}=0A=
 =0A=

------=_NextPart_000_0681_01CF7ECB.36F91C70--




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?F445995D86AA44FB8497296E0D41AC8F>