Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 6 Oct 2017 12:08:55 +0200
From:      Ben RUBSON <ben.rubson@gmail.com>
To:        FreeBSD-scsi <freebsd-scsi@freebsd.org>, =?utf-8?Q?Edward_Tomasz_Napiera=C5=82a?= <trasz@FreeBSD.org>
Cc:        Freebsd fs <freebsd-fs@freebsd.org>
Subject:   Re: ZFS stalled after some mirror disks were lost
Message-ID:  <82632887-E9D4-42D0-AC05-3764ABAC6B86@gmail.com>
In-Reply-To: <DDCFAC80-2D72-4364-85B2-7F4D7D70BCEE@gmail.com>
References:  <4A0E9EB8-57EA-4E76-9D7E-3E344B2037D2@gmail.com> <DDCFAC80-2D72-4364-85B2-7F4D7D70BCEE@gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
> On 02 Oct 2017, at 20:12, Ben RUBSON <ben.rubson@gmail.com> wrote:
>=20
> Hi,
>=20
> On a FreeBSD 11 server, the following online/healthy zpool :
>=20
> home
> mirror-0
>   label/local1
>   label/local2
>   label/iscsi1
>   label/iscsi2
> mirror-1
>   label/local3
>   label/local4
>   label/iscsi3
>   label/iscsi4
> cache
> label/local5
> label/local6
>=20
> A sustained read throughput of 180 MB/s, 45 MB/s on each iscsi disk
> according to "zpool iostat", nothing on local disks.
> No write IOs.
>=20
> Let's disconnect all iSCSI disks :
> iscsictl -Ra
>=20
> Expected behavior :
> IO activity flawlessly continue on local disks.
>=20
> What happened :
> All IOs stalled, server only answers to IOs made to its zroot pool.
> All commands related to the iSCSI disks (iscsictl), or to ZFS =
(zfs/zpool),
> don't return.
>=20
> Questions :
> Why this behavior ?
> How to know what happens ? (/var/log/messages says almost nothing)
>=20
> I already disconnected the iSCSI disks without any issue in the past,
> several times, but there were almost no IOs running.
>=20
> Thank you for your help !
>=20
> Ben

Hello,

So first, many thanks again to Andriy, we spent almost 3 hours debugging =
the
stalled server to find the root cause of the issue.

Sounds like I would need help from iSCSI dev team (Edward perhaps ?), as =
issue
seems to be on this side.

Here is Andriy conclusion after the debug session, I quote him :

> So, it seems that the root cause of all evil is this outstanding zio =
(it might
> be not the only one).
> In other words, it looks like iscsi stack bailed out without =
completing all
> outstanding i/o requests that it had.
> It should either return success or error for every request, it can not =
simply
> drop a request.
> And that appears to be what happened here.

> It looks like ZFS is fragile in the face of this type of errors.
> Essentially, each logical i/o request obtains a configuration lock of =
type 'zio'
> in shared mode to prevent certain configuration changes from happening =
while
> there are any outsanding zio-s.
> If a zio is lost, then this lock is leaked.
> Then, the code that deals with vdev failures tries to take this lock =
in
> exclusive mode while holding a few other configuration locks also in =
exclsuive
> mode so, any other thread needing those locks would block.
> And there are code paths where a configuration lock is taken while
> spa_namespace_lock is held.
> And when spa_namespace_lock is never dropped then the system is close =
to toast,
> because all pool lookups would get stuck.
> I don't see how this can be fixed in ZFS.

> It seems that when the initiator is being removed it doesn't properly =
terminate
> in-glight requests.
> It would be interesting to see what happens if you test other =
scenarios.

So I tested the following other scenarios :
1 - drop all iSCSI traffic using ipfw on the target
2 - ifdown the iSCSI NIC on the target
3 - ifdown the iSCSI NIC on the initiator
4 - stop ctld (on the target of course)

I tested all of them several times, 5 or 6 times each ?

I managed to kernel panic (!) 2 times.
First time in case 2.
Second time in case 4.
Not sure I would not have been able to panic in other test cases though.

Stack traces :
https://s1.postimg.org/2hfdpsvban/panic_case2.png
https://s1.postimg.org/2ac5ud9t0f/panic_case4.png

(kgdb) list *g_io_request+0x4a7
0xffffffff80a14dc7 is in g_io_request (/usr/src/sys/geom/geom_io.c:638).
633			g_bioq_unlock(&g_bio_run_down);
634			/* Pass it on down. */
635			if (first)
636				wakeup(&g_wait_down);
637		}
638	}
639=09
640	void
641	g_io_deliver(struct bio *bp, int error)
642	{

I had some kernel panics on the same servers a few months ago,
loosing iSCSI targets which were used in a gmirror with local disks.
gmirror should have continued to work flawlessly (as ZFS)
using local disks but the server crashed.

Stack traces :
https://s1.postimg.org/14v4sabhv3/panic_g_destroy1.png
https://s1.postimg.org/437evsk6rz/panic_g_destroy2.png
https://s1.postimg.org/8pt1whiy5b/panic_g_destroy3.png

(kgdb) list *g_destroy_consumer+0x53
0xffffffff80a18563 is in g_destroy_consumer (geom.h:369).
364			KASSERT(g_valid_obj(ptr) =3D=3D 0,
365			    ("g_free(%p) of live object, type %d", ptr,
366			    g_valid_obj(ptr)));
367		}
368	#endif
369		free(ptr, M_GEOM);
370	}
371=09
372	#define g_topology_lock() 					=
\
373		do {							=
\

> I think that all problems that you have seen are different sides of =
the same
> underlying issue.  It looks like iscsi does not properly depart from =
geom and
> leaves behind some dangling pointers...
>=20
> The panics you got today most likely occurred here:
> 	bp->bio_to->geom->start(bp);
>=20
> And the most likely reason is that bio_to points to a destroyed geom =
provider.
>=20
> I wonder if you'd be able to get into direct contact with a developer
> responsible for iscsi in FreeBSD.  I think that it is a relatively =
recent
> addition and it was under a FreeBSD Foundation project.  So, I'd =
expect that the
> developer should be responsive.

Feel free then to contact me if you need, so that we can go further on =
this !

Thank you very much for your help,

Ben




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?82632887-E9D4-42D0-AC05-3764ABAC6B86>