From owner-freebsd-fs@FreeBSD.ORG Tue Jun 3 00:29:09 2014 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id DDBF2C82 for ; Tue, 3 Jun 2014 00:29:09 +0000 (UTC) Received: from smtp1.multiplay.co.uk (smtp1.multiplay.co.uk [85.236.96.35]) by mx1.freebsd.org (Postfix) with ESMTP id 8D94F24F2 for ; Tue, 3 Jun 2014 00:29:09 +0000 (UTC) Received: by smtp1.multiplay.co.uk (Postfix, from userid 65534) id CE60820E7088B; Tue, 3 Jun 2014 00:29:06 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on smtp1.multiplay.co.uk X-Spam-Level: ** X-Spam-Status: No, score=2.0 required=8.0 tests=AWL,BAYES_00,DOS_OE_TO_MX, FSL_HELO_NON_FQDN_1,HELO_NO_DOMAIN,RDNS_DYNAMIC autolearn=no version=3.3.1 Received: from r2d2 (82-69-141-170.dsl.in-addr.zen.co.uk [82.69.141.170]) by smtp1.multiplay.co.uk (Postfix) with ESMTPS id CD84520E70886; Tue, 3 Jun 2014 00:29:02 +0000 (UTC) Message-ID: From: "Steven Hartland" To: , References: <5388D64D.4030400@bayphoto.com> <5388E5B4.3030002@bayphoto.com> <538BBEB7.4070008@bayphoto.com> <782C34792E95484DBA631A96FE3BEF20@multiplay.co.uk> <538C9CF3.6070208@bayphoto.com> <16ADD4D9DC73403C9669D8F34FDBD316@multiplay.co.uk> <538CB3EA.9010807@bayphoto.com> <6C6FB182781541CEBF627998B73B1DB4@multiplay.co.uk> <538CC16A.6060207@bayphoto.com> <538CDB7F.2060408@bayphoto.com> <88B3A7562A5F4F9B9EEF0E83BCAD2FB0@multiplay.co.uk> <538CE2B3.8090008@bayphoto.com> <85184EB23AA84607A360E601D03E1741@multiplay.co.uk> <538D0174.6000906@bayphoto.com> Subject: Re: ZFS Kernel Panic on 10.0-RELEASE Date: Tue, 3 Jun 2014 01:29:08 +0100 MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="----=_NextPart_000_0681_01CF7ECB.36F91C70" X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2900.5931 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6157 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 03 Jun 2014 00:29:09 -0000 This is a multi-part message in MIME format. ------=_NextPart_000_0681_01CF7ECB.36F91C70 Content-Type: text/plain; format=flowed; charset="Windows-1252"; reply-type=original Content-Transfer-Encoding: 7bit ----- Original Message ----- From: "Mike Carlson" To: "Steven Hartland" ; Sent: Monday, June 02, 2014 11:57 PM Subject: Re: ZFS Kernel Panic on 10.0-RELEASE > On 6/2/2014 2:15 PM, Steven Hartland wrote: >> ----- Original Message ----- From: "Mike Carlson" >> >>>> Thats the line I gathered it was on but no I need to know what the >>>> value >>>> of vd is, so what you need to do is: >>>> print vd >>>> >>>> If thats valid then: >>>> print *vd >>>> >>> It reports: >>> >>> (kgdb) print *vd >>> No symbol "vd" in current context. >> >> Dam optimiser :( >> >>> Should I rebuild the kernel with additional options? >> >> Likely wont help as kernel with zero optimisations tends to fail >> to build in my experience :( >> >> Can you try applying the attached patch to your src e.g. >> cd /usr/src >> patch < zfs-dsize-dva-check.patch >> >> The rebuild, install the kernel and then reproduce the issue again. >> >> Hopefully it will provide some more information on the cause, but >> I suspect you might be seeing the effect os have some corruption. > > Well, after building the kernel with your patch, installing it and > booting off of it, the system does not panic. > > It reports this when I mount the filesystem: > > Solaris: WARNING: dva_get_dsize_sync(): bad DVA 131241:2147483648 > Solaris: WARNING: dva_get_dsize_sync(): bad DVA 131241:2147483648 > Solaris: WARNING: dva_get_dsize_sync(): bad DVA 131241:2147483648 > > Here is the results, I can now mount the file system! > > root@working-1:~ # zfs set canmount=on zroot/data/working > root@working-1:~ # zfs mount zroot/data/working > root@working-1:~ # df > Filesystem 1K-blocks Used Avail Capacity > Mounted on > zroot 2677363378 1207060 2676156318 0% / > devfs 1 1 0 100% /dev > /dev/mfid10p1 253911544 2827824 230770800 1% > /dump > zroot/home 2676156506 188 2676156318 0% > /home > zroot/data 2676156389 71 2676156318 0% > /mnt/data > zroot/usr/ports/distfiles 2676246609 90291 2676156318 0% > /mnt/usr/ports/distfiles > zroot/usr/ports/packages 2676158702 2384 2676156318 0% > /mnt/usr/ports/packages > zroot/tmp 2676156812 493 2676156318 0% > /tmp > zroot/usr 2679746045 3589727 2676156318 0% > /usr > zroot/usr/ports 2676986896 830578 2676156318 0% > /usr/ports > zroot/usr/src 2676643553 487234 2676156318 0% > /usr/src > zroot/var 2676650671 494353 2676156318 0% > /var > zroot/var/crash 2676156388 69 2676156318 0% > /var/crash > zroot/var/db 2677521200 1364882 2676156318 0% > /var/db > zroot/var/db/pkg 2676198058 41740 2676156318 0% > /var/db/pkg > zroot/var/empty 2676156387 68 2676156318 0% > /var/empty > zroot/var/log 2676168522 12203 2676156318 0% > /var/log > zroot/var/mail 2676157043 725 2676156318 0% > /var/mail > zroot/var/run 2676156508 190 2676156318 0% > /var/run > zroot/var/tmp 2676156389 71 2676156318 0% > /var/tmp > zroot/data/working 7664687468 4988531149 2676156318 65% > /mnt/data/working > root@working-1:~ # ls /mnt/data/working/ > DONE_ORDERS DP2_CMD NEW_MULTI_TESTING PROCESS > RECYCLER XML_NOTIFICATIONS XML_REPORTS That does indeed seem to indicated some on disk corruption. There are a number of cases in the code which have a similar check but I'm afraid I don't know the implications of the corruption your seeing but others may. The attached updated patch will enforce the safe panic in this case unless the sysctl vfs.zfs.recover is set to 1 (which can also now be done on the fly). I'd recommend backing up the data off the pool and restoring it else where. It would be interesting to see the output of the following command on your pool: zdb -uuumdC Regards Steve ------=_NextPart_000_0681_01CF7ECB.36F91C70 Content-Type: application/octet-stream; name="zfs-dsize-dva-check.patch" Content-Transfer-Encoding: quoted-printable Content-Disposition: attachment; filename="zfs-dsize-dva-check.patch" Index: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa_misc.c=0A= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=0A= --- sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa_misc.c (revision = 266009)=0A= +++ sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa_misc.c (working = copy)=0A= @@ -252,7 +252,7 @@ int zfs_flags =3D 0;=0A= int zfs_recover =3D 0;=0A= SYSCTL_DECL(_vfs_zfs);=0A= TUNABLE_INT("vfs.zfs.recover", &zfs_recover);=0A= -SYSCTL_INT(_vfs_zfs, OID_AUTO, recover, CTLFLAG_RDTUN, &zfs_recover, 0,=0A= +SYSCTL_INT(_vfs_zfs, OID_AUTO, recover, CTLFLAG_RWTUN, &zfs_recover, 0,=0A= "Try to recover from otherwise-fatal errors.");=0A= =0A= extern int zfs_txg_synctime_ms;=0A= @@ -1631,7 +1631,13 @@ dva_get_dsize_sync(spa_t *spa, const dva_t *dva)=0A= ASSERT(spa_config_held(spa, SCL_ALL, RW_READER) !=3D 0);=0A= =0A= if (asize !=3D 0 && spa->spa_deflate) {=0A= - vdev_t *vd =3D vdev_lookup_top(spa, DVA_GET_VDEV(dva));=0A= + uint64_t vdev =3D DVA_GET_VDEV(dva);=0A= + vdev_t *vd =3D vdev_lookup_top(spa, vdev);=0A= + if (vd =3D=3D NULL) {=0A= + zfs_panic_recover(=0A= + "dva_get_dsize_sync(): bad DVA %llu:%llu",=0A= + (u_longlong_t)vdev, (u_longlong_t)asize);=0A= + }=0A= dsize =3D (asize >> SPA_MINBLOCKSHIFT) * vd->vdev_deflate_ratio;=0A= }=0A= =0A= ------=_NextPart_000_0681_01CF7ECB.36F91C70--