Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 21 Aug 2015 11:38:03 -0700
From:      Xin Li <delphij@delphij.net>
To:        Don Lewis <truckman@FreeBSD.org>, freebsd-fs@FreeBSD.org
Cc:        "Justin T. Gibbs" <gibbs@freebsd.org>, George Wilson <george.wilson@delphix.com>
Subject:   Re: solaris assert: avl_is_empty(&dn -> dn_dbufs) panic
Message-ID:  <55D7700B.3080207@delphij.net>
In-Reply-To: <201508211748.t7LHmo96096088@gw.catspoiler.org>
References:  <201508211748.t7LHmo96096088@gw.catspoiler.org>

next in thread | previous in thread | raw e-mail | index | archive | help
This is an OpenPGP/MIME signed message (RFC 4880 and 3156)
--0gaAdQp5Winb1VPmqwelCaUlbW8WLnfDR
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: quoted-printable

Hi,

A quick glance at the changes suggests that Justin's changeset may be
related.  The reasoning is here:

https://reviews.csiden.org/r/131/

Related Illumos ticket:

https://www.illumos.org/issues/5056

In dnode_evict_dbufs(), remove multiple passes over dn->dn_dbufs.
This is possible now that objset eviction is asynchronously
completed in a different context once dbuf eviction completes.

In the case of objset eviction, any dbufs held by children will
be evicted via dbuf_rele_and_unlock() once their refcounts go
to zero.  Even when objset eviction is not active, the ordering
of the avl tree guarantees that children will be released before
parents, allowing the parent's refcounts to naturally drop to
zero before they are inspected in this single loop.

=3D=3D=3D=3D

So, upon return from dnode_evict_dbufs(), there could be some
DB_EVICTING buffers on the AVL pending release and thus breaks the
invariant.

Should we restore the loop where we yield briefly with the lock
released, then reacquire and recheck?

Cheers,

On 08/21/15 10:48, Don Lewis wrote:
> On 21 Aug, Don Lewis wrote:
>> On 21 Aug, Don Lewis wrote:
>>> I just started getting this panic:
>>>
>>> solaris assert: avl_is_empty(&dn -> dn_dbufs), file:
>>> /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dnode_sync.c,=

>>> line 495
>>>
>>> System info:
>>> FreeBSD zipper.catspoiler.org 11.0-CURRENT FreeBSD 11.0-CURRENT #25 r=
286923: Wed Aug 19 09:28:53 PDT 2015     dl@zipper.catspoiler.org:/usr/ob=
j/usr/src/sys/GENERIC  amd64
>>>
>>> My zfs pool has one mirrored vdev.  Scrub doesn't find any problems.
>>>
>>> %zpool status
>>>   pool: zroot
>>>  state: ONLINE
>>>   scan: scrub repaired 0 in 2h58m with 0 errors on Fri Aug 21 00:44:5=
2 2015
>>> config:
>>>
>>> 	NAME        STATE     READ WRITE CKSUM
>>> 	zroot       ONLINE       0     0     0
>>> 	  mirror-0  ONLINE       0     0     0
>>> 	    ada0p3  ONLINE       0     0     0
>>> 	    ada1p3  ONLINE       0     0     0
>>>
>>> This panic is reproduceable and happens every time I use poudriere to=

>>> build ports using my 9.3-RELEASE amd64 jail and occurs at the end of =
the
>>> poudriere run when it is unmounting filesystems.
>>>
>>> [00:10:43] =3D=3D=3D=3D>> Stopping 4 builders
>>> 93amd64-default-job-01: removed
>>> 93amd64-default-job-01-n: removed
>>> 93amd64-default-job-02: removed
>>> 93amd64-default-job-02-n: removed
>>> 93amd64-default-job-03: removed
>>> 93amd64-default-job-03-n: removed
>>> 93amd64-default-job-04: removed
>>> 93amd64-default-job-04-n: removed
>>> [00:10:46] =3D=3D=3D=3D>> Creating pkgng repository
>>> Creating repository in /tmp/packages: 100%
>>> Packing files for repository: 100%
>>> [00:10:55] =3D=3D=3D=3D>> Committing packages to repository
>>> [00:10:55] =3D=3D=3D=3D>> Removing old packages
>>> [00:10:55] =3D=3D=3D=3D>> Built ports: devel/py-pymtbl net/sie-nmsg n=
et/p5-Net-Nmsg net/axa
>>> [93amd64-default] [2015-08-21_00h47m41s] [committing:] Queued: 4  Bui=
lt: 4  Failed: 0  Skipped: 0  Ignored: 0  Tobuild: 0   Time: 00:10:53
>>> [00:10:55] =3D=3D=3D=3D>> Logs: /var/poudriere/data/logs/bulk/93amd64=
-default/2015-08-21_00h47m41s
>>> [00:10:55] =3D=3D=3D=3D>> Cleaning up
>>> 93amd64-default: removed
>>> 93amd64-default-n: removed
>>> [00:10:55] =3D=3D=3D=3D>> Umounting file systems
>>> Write failed: Broken pipe
>>>
>>> Prior to that, I ran poudriere a number of times with a 10.2-STABLE
>>> amd64 jail without incident.
>>>
>>> I've kicked off a bunch of poudriere runs for other jails and
>>> will check on it in the morning.
>>
>> Died the same way after building ports on the first jail,
>> 10.1-RELEASE amd64.
>>
>> Since there have been some zfs commits since r286923, I upgraded to
>> r286998 this morning and tried again with no better luck.  I got the
>> same panic again.
>>
>> This machine has mirrored swap, and even though I've done what
>> gmirror(8) says to do in order to capture crash dumps, I've had no luc=
k
>> with that.  The dump is getting written, but savecore is unable to fin=
d
>> it.
>=20
> Not sure what is happening with savecore during boot, but I was able to=

> run it manually and collect the crash dump.
>=20
>=20
> Unread portion of the kernel message buffer:
> panic: solaris assert: avl_is_empty(&dn->dn_dbufs), file: /usr/src/sys/=
cddl/contrib/opensolaris/uts/common/fs/zfs/dnode_sync.c, line: 495
> cpuid =3D 1
> KDB: stack backtrace:
> db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe085=
9e4e4e0
> vpanic() at vpanic+0x189/frame 0xfffffe0859e4e560
> panic() at panic+0x43/frame 0xfffffe0859e4e5c0
> assfail() at assfail+0x1a/frame 0xfffffe0859e4e5d0
> dnode_sync() at dnode_sync+0x6c8/frame 0xfffffe0859e4e6b0
> dmu_objset_sync_dnodes() at dmu_objset_sync_dnodes+0x2b/frame 0xfffffe0=
859e4e6e0
> dmu_objset_sync() at dmu_objset_sync+0x29e/frame 0xfffffe0859e4e7b0
> dsl_pool_sync() at dsl_pool_sync+0x348/frame 0xfffffe0859e4e820
> spa_sync() at spa_sync+0x442/frame 0xfffffe0859e4e910
> txg_sync_thread() at txg_sync_thread+0x23d/frame 0xfffffe0859e4e9f0
> fork_exit() at fork_exit+0x84/frame 0xfffffe0859e4ea30
> fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe0859e4ea30
> --- trap 0, rip =3D 0, rsp =3D 0, rbp =3D 0 ---
> KDB: enter: panic
>=20
>=20
>=20
> (kgdb) bt
> #0  doadump (textdump=3D0) at pcpu.h:221
> #1  0xffffffff8037bb86 in db_fncall (dummy1=3D<value optimized out>,=20
>     dummy2=3D<value optimized out>, dummy3=3D<value optimized out>,=20
>     dummy4=3D<value optimized out>) at /usr/src/sys/ddb/db_command.c:56=
8
> #2  0xffffffff8037b941 in db_command (cmd_table=3D0x0)
>     at /usr/src/sys/ddb/db_command.c:440
> #3  0xffffffff8037b5d4 in db_command_loop ()
>     at /usr/src/sys/ddb/db_command.c:493
> #4  0xffffffff8037e18b in db_trap (type=3D<value optimized out>, code=3D=
0)
>     at /usr/src/sys/ddb/db_main.c:251
> #5  0xffffffff80a5b294 in kdb_trap (type=3D3, code=3D0, tf=3D<value opt=
imized out>)
>     at /usr/src/sys/kern/subr_kdb.c:654
> #6  0xffffffff80e6a4b1 in trap (frame=3D0xfffffe0859e4e410)
>     at /usr/src/sys/amd64/amd64/trap.c:540
> #7  0xffffffff80e49f22 in calltrap ()
>     at /usr/src/sys/amd64/amd64/exception.S:235
> #8  0xffffffff80a5a96e in kdb_enter (why=3D0xffffffff81379010 "panic", =

>     msg=3D0xffffffff80a60b60 "UH\211\ufffdAWAVATSH\203\ufffdPI\211\ufff=
dA\211\ufffdH\213\004%Py\ufffd\201H\211E\ufffd\201<%\ufffd\210\ufffd\201"=
) at cpufunc.h:63
> #9  0xffffffff80a1e2c9 in vpanic (fmt=3D<value optimized out>,=20
>     ap=3D<value optimized out>) at /usr/src/sys/kern/kern_shutdown.c:61=
9
> #10 0xffffffff80a1e333 in panic (fmt=3D0xffffffff81aafa90 "\004")
>     at /usr/src/sys/kern/kern_shutdown.c:557
> ---Type <return> to continue, or q <return> to quit---
> #11 0xffffffff8240922a in assfail (a=3D<value optimized out>,=20
>     f=3D<value optimized out>, l=3D<value optimized out>)
>     at /usr/src/sys/cddl/compat/opensolaris/kern/opensolaris_cmn_err.c:=
81
> #12 0xffffffff820d4f78 in dnode_sync (dn=3D0xfffff8040b72d3d0,=20
>     tx=3D0xfffff8001598ec00)
>     at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dnode_sy=
nc.c:495
> #13 0xffffffff820c922b in dmu_objset_sync_dnodes (list=3D0xfffff8000771=
2b90,=20
>     newlist=3D<value optimized out>, tx=3D<value optimized out>)
>     at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_objs=
et.c:1045
> #14 0xffffffff820c8ede in dmu_objset_sync (os=3D0xfffff80007712800,=20
>     pio=3D<value optimized out>, tx=3D0xfffff8001598ec00)
>     at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_objs=
et.c:1163
> #15 0xffffffff820e8e78 in dsl_pool_sync (dp=3D0xfffff80015676000, txg=3D=
2660975)
>     at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_pool=
=2Ec:536
> #16 0xffffffff8210dca2 in spa_sync (spa=3D0xfffffe00089c6000, txg=3D266=
0975)
>     at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa.c:66=
41
> #17 0xffffffff8211843d in txg_sync_thread (arg=3D0xfffff80015676000)
>     at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/txg.c:51=
7
> #18 0xffffffff809e47c4 in fork_exit (
>     callout=3D0xffffffff82118200 <txg_sync_thread>, arg=3D0xfffff800156=
76000,=20
>     frame=3D0xfffffe0859e4ea40) at /usr/src/sys/kern/kern_fork.c:1006
> ---Type <return> to continue, or q <return> to quit---
> #19 0xffffffff80e4a45e in fork_trampoline ()
>     at /usr/src/sys/amd64/amd64/exception.S:610
> #20 0x0000000000000000 in ?? ()
> Current language:  auto; currently minimal
>=20
> _______________________________________________
> freebsd-fs@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-fs
> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"
>=20


--=20
Xin LI <delphij@delphij.net>    https://www.delphij.net/
FreeBSD - The Power to Serve!           Live free or die


--0gaAdQp5Winb1VPmqwelCaUlbW8WLnfDR
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: OpenPGP digital signature
Content-Disposition: attachment; filename="signature.asc"

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.1.7 (FreeBSD)

iQIcBAEBCgAGBQJV13AOAAoJEJW2GBstM+nsz+YP/jkHa9U0NfBa3bD4mH7xSl+o
umRX2nHmzu0qH+D/csa0Qj427+pwPnCZdpuGgI9OI6N52eSFm8G4oBkF4lDgTLJE
tNXpxaD6kvKsN9Xo/kv1CwreNk/QhatL+i7JR5Ozc0gZ7R3sJwenbjtLVB5L8JuB
Q1lXA/bbc1gfwqdZ73NWkYNHUA2gYkUypL3O9YQC1YbFcPrNDLbXZVRlBw0J1pqL
D2Rb7d6aztut2WHK0VE8rbnheBHT5gphuqwOGhNYut0DmPxFSgO6wJ78LHlCQ3te
5Ga7TVEA15ByWzVqqoTGfDP/Mze76OvB5I6wzHbKq9lZiFh5pFoPiBXgqBJKMwE5
UEkgmxoSR+HJ+/FL07rXBf/NwHHPzx10D7cI8S7Q1ZdJI/oOrCna9yAMfyrqO1BH
taKoNJqgZqPVhlB0rQB47Cnhp4pBKf6AkrK4TG7Oqrvko2jJt1aIEBfxaauhc+Ar
tH3CGy2hzP3/mMm0MSZ63g3m9jV8fVQdKV0yg9E2V5KoCbG7d9i5MftC7oz3kM7u
pSUMBzWlDv7zkkJ8xErIA22AEhjLVXQ/5+Va5m4Dbdb6le+Jk0Wl/ncMJ+XigUPZ
/cNfVPv2JlHArPNVgjOEJ4EI6spXR0xTaOxnyGefxCd5ONIvpnLeTaurzPGLF7S5
jSyz3pBLOTWnUDWMFyBo
=AKkZ
-----END PGP SIGNATURE-----

--0gaAdQp5Winb1VPmqwelCaUlbW8WLnfDR--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?55D7700B.3080207>