Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 3 Jul 2017 15:36:40 +0200
From:      Ben RUBSON <ben.rubson@gmail.com>
To:        Freebsd fs <freebsd-fs@freebsd.org>, freebsd-scsi <freebsd-scsi@freebsd.org>
Subject:   Re: I/O to pool appears to be hung, panic !
Message-ID:  <1F414ECE-1856-4EA3-A141-88B64703D4D6@gmail.com>
In-Reply-To: <CAFLM3-qpsGx=EYHxAaLsSzF22JAJx0zg8deJ3FX_ec5uDO=0Cw@mail.gmail.com>
References:  <E8CC223E-3F41-4036-84A9-FBA693AC2CAA@gmail.com> <20170629144334.1e283570@fabiankeil.de> <A1CC7D73-4196-4503-9716-52E84AA24FD3@gmail.com> <C584B1DF-AC6E-4E77-9497-3D0EED76EACF@gmail.com> <CAFLM3-qpsGx=EYHxAaLsSzF22JAJx0zg8deJ3FX_ec5uDO=0Cw@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
> On 03 Jul 2017, at 13:10, Edward Napierala <trasz@freebsd.org> wrote:
>=20
> 2017-07-03 10:07 GMT+01:00 Ben RUBSON <ben.rubson@gmail.com =
<mailto:ben.rubson@gmail.com>>:
>=20
> > On 29 Jun 2017, at 15:36, Ben RUBSON <ben.rubson@gmail.com =
<mailto:ben.rubson@gmail.com>> wrote:
> >
> >> On 29 Jun 2017, at 14:43, Fabian Keil <freebsd-listen@fabiankeil.de =
<mailto:freebsd-listen@fabiankeil.de>> wrote:
> >
> > Thank you for your feedback Fabian.
> >
> >> Ben RUBSON <ben.rubson@gmail.com <mailto:ben.rubson@gmail.com>> =
wrote:
> >>
> >>> One of my servers did a kernel panic last night, giving the =
following message :
> >>> panic: I/O to pool 'home' appears to be hung on vdev guid 122... =
at '/dev/label/G23iscsi'.
> >> [...]
> >>> Here are some numbers regarding this disk, taken from the server =
hosting the pool :
> >>> (unfortunately not from the iscsi target server)
> >>> https://s23.postimg.org/zd8jy9xaj/busydisk.png =
<https://s23.postimg.org/zd8jy9xaj/busydisk.png>;
> >>>
> >>> We clearly see that suddendly, disk became 100% busy, meanwhile =
CPU was almost idle.
>=20
> We also clearly see that 5 minutes later (02:09) disk seems to be back =
but became 100% busy again,
> and that 16 minutes later (default vfs.zfs.deadman_synctime_ms), panic =
occurred.
>=20
> >>> No error message at all on both servers.
> >> [...]
> >>> The only log I have is the following stacktrace taken from the =
server console :
> >>> panic: I/O to pool 'home' appears to be hung on vdev guid 122... =
at '/dev/label/G23iscsi'.
> >>> cpuid =3D 0
> >>> KDB: stack backtrace:
> >>> #0 0xffffffff80b240f7 at kdb_backtrace+0x67
> >>> #1 0xffffffff80ad9462 at vpanic+0x182
> >>> #2 0xffffffff80ad92d3 at panic+0x43
> >>> #3 0xffffffff82238fa7 at vdev_deadman+0x127
> >>> #4 0xffffffff82238ec0 at vdev_deadman+0x40
> >>> #5 0xffffffff82238ec0 at vdev_deadman+0x40
> >>> #6 0xffffffff8222d0a6 at spa_deadman+0x86
> >>> #7 0xffffffff80af32da at softclock_call_cc+0x18a
> >>> #8 0xffffffff80af3854 at softclock+0x94
> >>> #9 0xffffffff80a9348f at intr_event_execute_handlers+0x20f
> >>> #10 0xffffffff80a936f6 at ithread_loop+0xc6
> >>> #11 0xffffffff80a900d5 at fork_exit+0x85
> >>> #12 0xffffffff80f846fe at fork_trampoline+0xe
> >>> Uptime: 92d2h47m6s
> >>>
> >>> I would have been pleased to make a dump available.
> >>> However, despite my (correct ?) configuration, server did not dump =
:
> >>> (nevertheless, "sysctl debug.kdb.panic=3D1" make it to dump)
> >>> # grep ^dump /boot/loader.conf /etc/rc.conf
> >>> /boot/loader.conf:dumpdev=3D"/dev/mirror/swap"
> >>> /etc/rc.conf:dumpdev=3D"AUTO"
> >>
> >> You may want to look at the NOTES section in gmirror(8).
> >
> > Yes, I should already be OK (prefer algorithm set).
> >
> >>> I use default kernel, with a rebuilt zfs module :
> >>> # uname -v
> >>> FreeBSD 11.0-RELEASE-p8 #0: Wed Feb 22 06:12:04 UTC 2017     =
root@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC
> >>>
> >>> I use the following iSCSI configuration, which disconnects the =
disks "as soon as" they are unavailable :
> >>> kern.iscsi.ping_timeout=3D5
> >>> kern.iscsi.fail_on_disconnection=3D1
> >>> kern.iscsi.iscsid_timeout=3D5
> >>>
> >>> I then think disk was at least correctly reachable during these 20 =
busy minutes.
> >>>
> >>> So, any idea why I could have faced this issue ?
> >>
> >> Is it possible that the system was under memory pressure?
> >
> > No I don't think it was :
> > https://s1.postimg.org/uvsebpyyn/busydisk2.png =
<https://s1.postimg.org/uvsebpyyn/busydisk2.png>;
> > More than 2GB of available memory.
> > Swap not used (624kB).
> > ARC behaviour seems correct (anon increases because ZFS can't =
actually write I think).
> > Regarding the pool itself, it was receiving data at 6MB/s, sending =
around 30kB blocks to disks.
> > When disk went busy, throughput fell to some kB, with 128kB blocks.
> >
> >> geli's use of malloc() is known to cause deadlocks under memory =
pressure:
> >> https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D209759 =
<https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D209759>;
> >>
> >> Given that gmirror uses malloc() as well it probably has the same =
issue.
> >
> > I don't use geli so I should not face this issue.
> >
> >>> I would have thought ZFS would have taken the busy device offline, =
instead of raising a panic.
> >>> Perhaps it is already possible to make ZFS behave like this ?
> >>
> >> There's a tunable for this: vfs.zfs.deadman_enabled.
> >> If the panic is just a symptom of the deadlock it's unlikely
> >> to help though.
> >
> > I think this tunable should have prevented the server from having =
raised a panic :
> > # sysctl -d vfs.zfs.deadman_enabled
> > vfs.zfs.deadman_enabled: Kernel panic on stalled ZFS I/O
> > # sysctl vfs.zfs.deadman_enabled
> > vfs.zfs.deadman_enabled: 1
> >
> > But not sure how it would have behaved then...
> > (busy disk miraculously back to normal status, memory pressure due =
to anon increasing...)
>=20
> I then think it would be nice, once vfs.zfs.deadman_synctime_ms has =
expired,
> to be able to take the busy device offline instead of raising a panic.
> Currently, disabling deadman will avoid the panic but will let the =
device slowing down the pool.
>=20
> I still did not found the root cause of this issue, not sure I will,
> quite difficult actually with a stacktrace and some performance graphs =
only :/
>=20
> What exactly is the disk doing when that happens?  What does "gstat" =
say?  If the iSCSI
> target is also FreeBSD, what does ctlstat say?

As shown on this graph made with gstat numbers from initiator :
https://s23.postimg.org/zd8jy9xaj/busydisk.png
The disk is continuously writing 3 MBps before the issue happens.
When it occurs, response time increases to around 30 seconds (100% =
busy),
and consequently disk throughput drops down to some kBps.
CPU stays at an almost fully idle level.

As shown here, no memory pressure :
https://s1.postimg.org/uvsebpyyn/busydisk2.png =
<https://s1.postimg.org/uvsebpyyn/busydisk2.png>;

At the end of graphs' lines, panic is raised.

iSCSI target is also FreeBSD, unfortunately ctlstat was not running =
during the issue occurred.
So numbers will be average since system startup (102 days ago).
I also do not have gstat numbers from this disk on target side
(to help finding if it's a hardware issue, a iSCSI issue or something =
else).
I will think about collecting these numbers if ever issue occurs again.

> If everything else fails, you might want to do a packet trace using =
"tcpdump -w" and take
> a look at it using Wireshark; it decodes iSCSI quite nicely.




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1F414ECE-1856-4EA3-A141-88B64703D4D6>