Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 15 Dec 2015 15:34:51 +0000
From:      krad <kraduk@gmail.com>
To:        Bengt Ahlgren <bengta@sics.se>
Cc:        Steven Hartland <killing@multiplay.co.uk>, FreeBSD FS <freebsd-fs@freebsd.org>
Subject:   Re: ZFS hang in zfs_freebsd_rename
Message-ID:  <CALfReyfdq6-cZzkjgNDgD-hd=JB_EaaGE2ek9VEK%2BomxgN=nkw@mail.gmail.com>
In-Reply-To: <uh7io3zhi2z.fsf@P142s.sics.se>
References:  <uh7a8pbj2mo.fsf@P142s.sics.se> <567022FB.1010508@multiplay.co.uk> <uh7vb7zhihv.fsf@P142s.sics.se> <56702A9F.90702@multiplay.co.uk> <uh7io3zhi2z.fsf@P142s.sics.se>

next in thread | previous in thread | raw e-mail | index | archive | help
If your situation allows it goto stable as there have been lots of fixes
since 10.2. It may be worth reviewing them to see if they are relevant.

On 15 December 2015 at 15:01, Bengt Ahlgren <bengta@sics.se> wrote:

> OK, thanks for the advice!
>
> Bengt
>
> Steven Hartland <killing@multiplay.co.uk> writes:
>
> > There have been quite a few reported issues with this some at least
> > have been fix, but as with anything the only way to be sure is to test
> > it.
> >
> > On 15/12/2015 14:52, Bengt Ahlgren wrote:
> >> Yes, that is on the todo list...
> >>
> >> So this is likely fixed then in 10.x?
> >>
> >> Bengt
> >>
> >> Steven Hartland <killing@multiplay.co.uk> writes:
> >>
> >>> Not a surprise in 9.x unfortunately, try upgrading to 10.x
> >>>
> >>> On 15/12/2015 12:51, Bengt Ahlgren wrote:
> >>>> We have a server running 9.3-REL which currenly has two quite large
> zfs
> >>>> pools:
> >>>>
> >>>> NAME   SIZE  ALLOC   FREE    CAP  DEDUP  HEALTH  ALTROOT
> >>>> p1    18.1T  10.7T  7.38T    59%  1.00x  ONLINE  -
> >>>> p2    43.5T  29.1T  14.4T    66%  1.00x  ONLINE  -
> >>>>
> >>>> It has been running without any issues for some time now.  Once, just
> >>>> now, processes are getting stuck and impossible to kill on accessing a
> >>>> particular directory in the p2 pool.  That pool is a 2x6 disk raidz2.
> >>>>
> >>>> One process is stuck in zfs_freebsd_rename, and other processes
> >>>> accessing that particular directory also get stuck.  The system is now
> >>>> almost completely idle.
> >>>>
> >>>> Output from kgdb on the running system for that first process:
> >>>>
> >>>> Thread 651 (Thread 102157):
> >>>> #0  sched_switch (td=0xfffffe0b14059920, newtd=0xfffffe001633e920,
> flags=<value optimized out>)
> >>>>       at /usr/src/sys/kern/sched_ule.c:1904
> >>>> #1  0xffffffff808f4604 in mi_switch (flags=260, newtd=0x0) at
> /usr/src/sys/kern/kern_synch.c:485
> >>>> #2 0xffffffff809308e2 in sleepq_wait (wchan=0xfffffe0135b60488,
> >>>> pri=96) at /usr/src/sys/kern/subr_sleepqueue.c:618
> >>>> #3  0xffffffff808cf922 in __lockmgr_args (lk=0xfffffe0135b60488,
> flags=524544, ilk=0xfffffe0135b604b8,
> >>>>       wmesg=<value optimized out>, pri=<value optimized out>,
> timo=<value optimized out>,
> >>>>       file=0xffffffff80f0d782 "/usr/src/sys/kern/vfs_subr.c",
> line=2337) at /usr/src/sys/kern/kern_lock.c:221
> >>>> #4  0xffffffff80977369 in vop_stdlock (ap=<value optimized out>) at
> lockmgr.h:97
> >>>> #5  0xffffffff80dd4a04 in VOP_LOCK1_APV (vop=0xffffffff813e8160,
> a=0xffffffa07f935520) at vnode_if.c:2052
> >>>> #6  0xffffffff80998c17 in _vn_lock (vp=0xfffffe0135b603f0,
> flags=524288,
> >>>>       file=0xffffffff80f0d782 "/usr/src/sys/kern/vfs_subr.c",
> line=2337) at vnode_if.h:859
> >>>> #7  0xffffffff8098b621 in vputx (vp=0xfffffe0135b603f0, func=1) at
> /usr/src/sys/kern/vfs_subr.c:2337
> >>>> #8  0xffffffff81ac7955 in zfs_rename_unlock (zlpp=0xffffffa07f9356b8)
> >>>>       at
> /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c:3609
> >>>> #9  0xffffffff81ac8c72 in zfs_freebsd_rename (ap=<value optimized
> out>)
> >>>>       at
> /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c:4039
> >>>> #10 0xffffffff80dd4f04 in VOP_RENAME_APV (vop=0xffffffff81b47d40,
> a=0xffffffa07f9358e0) at vnode_if.c:1522
> >>>> #11 0xffffffff80996bbd in kern_renameat (td=<value optimized out>,
> oldfd=<value optimized out>,
> >>>>       old=<value optimized out>, newfd=-100, new=0x1826a9af00 <Error
> reading address 0x1826a9af00: Bad address>,
> >>>>       pathseg=<value optimized out>) at vnode_if.h:636
> >>>> #12 0xffffffff80cd228a in amd64_syscall (td=0xfffffe0b14059920,
> traced=0) at subr_syscall.c:135
> >>>> #13 0xffffffff80cbc907 in Xfast_syscall () at
> /usr/src/sys/amd64/amd64/exception.S:396
> >>>> ---Type <return> to continue, or q <return> to quit---
> >>>> #14 0x0000000800cc1acc in ?? ()
> >>>> Previous frame inner to this frame (corrupt stack?)
> >>>>
> >>>> Full procstat -kk -a and kgdb "thread apply all bt" can be found here:
> >>>>
> >>>> https://www.sics.se/~bengta/ZFS-hang/
> >>>>
> >>>> I don't know how to produce "alltrace in ddb" as the instructions in
> the
> >>>> wiki says.  It runs the GENERIC kernel, so perhaps it isn't possible?
> >>>>
> >>>> I checked "camcontrol tags" for all the disks in the pool - all have
> >>>> zeroes for dev_active, devq_queued and held.
> >>>>
> >>>> Is there anything else I can check while the machine is up?  I however
> >>>> need to restart it pretty soon.
> >>>>
> >>>> Bengt
> >>>> _______________________________________________
> >>>> freebsd-fs@freebsd.org mailing list
> >>>> https://lists.freebsd.org/mailman/listinfo/freebsd-fs
> >>>> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"
> _______________________________________________
> freebsd-fs@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-fs
> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"
>



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CALfReyfdq6-cZzkjgNDgD-hd=JB_EaaGE2ek9VEK%2BomxgN=nkw>