From owner-freebsd-fs@freebsd.org  Tue Oct  3 11:43:14 2017
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 105F0E39AF6;
 Tue,  3 Oct 2017 11:43:14 +0000 (UTC)
 (envelope-from ben.rubson@gmail.com)
Received: from mail-wm0-x22a.google.com (mail-wm0-x22a.google.com
 [IPv6:2a00:1450:400c:c09::22a])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id 96AE7708F3;
 Tue,  3 Oct 2017 11:43:13 +0000 (UTC)
 (envelope-from ben.rubson@gmail.com)
Received: by mail-wm0-x22a.google.com with SMTP id i82so13926471wmd.3;
 Tue, 03 Oct 2017 04:43:13 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025;
 h=mime-version:subject:from:in-reply-to:date
 :content-transfer-encoding:message-id:references:to;
 bh=glpyQV2+u2c6fJfkqWnk31JnmpiHOFmGo21o3TTPem4=;
 b=EjQirs0ZEDtRqHhE44IybPy9mbeC3ANouwMhfvyQ7sdaWhwYxhDAP/kzPkNIfZOhH/
 kQUpJ7j2MfQhH1ararAVdP6yVfraT26wio8luOfDU0+1kR9O/av79KX1NtK0pJCW33W/
 SMCs7Kx6A44xwfDq9Z6HNtBJ1Tv2KpeLnuQlHnZV+gi97ofWExVrRuyuvM6eKCWjoaQ1
 2/Q2iat3xIEVILrDmLWthQJf2DzUFfITkQ5j+IVUJOLk330XAwSH9DuylEj5PM0z91n6
 Z9eOiLTuqImS7A51V1qxMM5K0aanr/8EmQpTcFGVpLzl4ql2457HWKQ5CV/LLg8txdRg
 WZng==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:mime-version:subject:from:in-reply-to:date
 :content-transfer-encoding:message-id:references:to;
 bh=glpyQV2+u2c6fJfkqWnk31JnmpiHOFmGo21o3TTPem4=;
 b=pzaMbzYhABe18vxY1u4sTvxoeWwAC2QZEspNYkF1uX50JKz4vNnG6TSCroCq2cUQs3
 7xkGHpFcMpD9gIwPtsZ8C1+qknGOiQUDmlb9cK2hSuHZheDMkDHNAFH0gDXq+MtixldW
 d1e/i/pL0VZtOzHH4UmtboeIwKkx+3Bdo1hqFx+6Z3ukwHGaMObWlZqx4o5ui1ZiVuq6
 +bXm/BIM2C9UYKO9ZNR9OUkE8+uL37oIhBvPxymKUbN0s0rQDisfL6abQxbZajxX6rgY
 iiG67dTqWixIB+oNsz2rU7k1fs/VjDvcm8y52kRsu93QZO1Si4bZEFt/UP+3r6okisvh
 wNFQ==
X-Gm-Message-State: AMCzsaUggGDEBWxsqWb3xhP7P5iaWyXV8El7mdteV3ugFA6crocVIzsm
 pBsbgwzfmA5GiFaiu5IlAfPXwEdd
X-Google-Smtp-Source: AOwi7QACemBy8ozmmJZ4ikHMBjJdplLgUvShqzfjOlMdfYFKgVgHLWkTfw5czeA0p08z3XkqXsHZVg==
X-Received: by 10.28.136.83 with SMTP id k80mr12670178wmd.159.1507030991716;
 Tue, 03 Oct 2017 04:43:11 -0700 (PDT)
Received: from bens-mac.home (LFbn-MAR-1-445-220.w2-15.abo.wanadoo.fr.
 [2.15.38.220])
 by smtp.gmail.com with ESMTPSA id k37sm6553666wre.96.2017.10.03.04.43.10
 (version=TLS1 cipher=ECDHE-RSA-AES128-SHA bits=128/128);
 Tue, 03 Oct 2017 04:43:11 -0700 (PDT)
Content-Type: text/plain; charset=us-ascii
Mime-Version: 1.0 (Mac OS X Mail 9.3 \(3124\))
Subject: Re: ZFS stalled after some mirror disks were lost
From: Ben RUBSON <ben.rubson@gmail.com>
In-Reply-To: <DDCFAC80-2D72-4364-85B2-7F4D7D70BCEE@gmail.com>
Date: Tue, 3 Oct 2017 13:43:09 +0200
Content-Transfer-Encoding: quoted-printable
Message-Id: <63B239EB-47F0-4DDA-982A-794E5B5FC56F@gmail.com>
References: <4A0E9EB8-57EA-4E76-9D7E-3E344B2037D2@gmail.com>
 <DDCFAC80-2D72-4364-85B2-7F4D7D70BCEE@gmail.com>
To: Freebsd fs <freebsd-fs@freebsd.org>,
 FreeBSD-scsi <freebsd-scsi@freebsd.org>
X-Mailer: Apple Mail (2.3124)
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 03 Oct 2017 11:43:14 -0000

> On 03 Oct 2017, at 08:19, Ben RUBSON <ben.rubson@gmail.com> wrote:
>=20
> Hi,
>=20
> Putting scsi list as it could be related.
>=20
>> On 02 Oct 2017, at 20:12, Ben RUBSON <ben.rubson@gmail.com> wrote:
>>=20
>> Hi,
>>=20
>> On a FreeBSD 11 server, the following online/healthy zpool :
>>=20
>> home
>> mirror-0
>>   label/local1
>>   label/local2
>>   label/iscsi1
>>   label/iscsi2
>> mirror-1
>>   label/local3
>>   label/local4
>>   label/iscsi3
>>   label/iscsi4
>> cache
>> label/local5
>> label/local6
>>=20
>> A sustained read throughput of 180 MB/s, 45 MB/s on each iscsi disk
>> according to "zpool iostat", nothing on local disks (strange but I
>> noticed that IOs always prefer iscsi disks to local disks).
>> No write IOs.
>>=20
>> Let's disconnect all iSCSI disks :
>> iscsictl -Ra
>>=20
>> Expected behavior :
>> IO activity flawlessly continue on local disks.
>>=20
>> What happened :
>> All IOs stalled, server only answers to IOs made to its zroot pool.
>> All commands related to the iSCSI disks (iscsictl), or to ZFS =
(zfs/zpool),
>> don't return.
>>=20
>> Questions :
>> Why this behavior ?
>> How to know what happens ? (/var/log/messages says almost nothing)
>>=20
>> I already disconnected the iSCSI disks without any issue in the past,
>> several times, but there were almost no IOs running.
>>=20
>> Thank you for your help !
>>=20
>> Ben
>=20
>> On 02 Oct 2017, at 22:55, Andriy Gapon <avg@FreeBSD.org> wrote:
>>=20
>>> On 02/10/2017 22:13, Ben RUBSON wrote:
>>>=20
>>>> On 02 Oct 2017, at 20:45, Andriy Gapon <avg@FreeBSD.org> wrote:
>>>>=20
>>>>> On 02/10/2017 21:17, Ben RUBSON wrote:
>>>>>=20
>>>>> Unfortunately the zpool command stalls / does not return :/
>>>>=20
>>>> Try to take procstat -kk -a.
>>>=20
>>> Here is the procstat output :
>>> https://benrubson.github.io/zfs/procstat01.log
>>=20
>> First, it seems that there are some iscsi threads stuck on a lock =
like:
>>   0 100291 kernel           iscsimt          mi_switch+0xd2 =
sleepq_wait+0x3a
>> _sx_xlock_hard+0x592 iscsi_maintenance_thread+0x316 fork_exit+0x85
>> fork_trampoline+0xe
>>=20
>> or like
>>=20
>> 8580 102077 iscsictl         -                mi_switch+0xd2 =
sleepq_wait+0x3a
>> _sx_slock_hard+0x325 iscsi_ioctl+0x7ea devfs_ioctl_f+0x13f =
kern_ioctl+0x2d4
>> sys_ioctl+0x171 amd64_syscall+0x4ce Xfast_syscall+0xfb
>>=20
>> Also, there is a thread in cam_sim_free():
>>   0 100986 kernel           iscsimt          mi_switch+0xd2 =
sleepq_wait+0x3a
>> _sleep+0x2a1 cam_sim_free+0x48 iscsi_session_cleanup+0x1bd
>> iscsi_maintenance_thread+0x388 fork_exit+0x85 fork_trampoline+0xe
>>=20
>> So, it looks like there could be a problem is the iscsi teardown =
path.
>>=20
>> Maybe that caused a domino effect in ZFS code.  I see a lot of =
threads waiting
>> either for spa_namespace_lock or a spa config lock (a highly =
specialized ZFS
>> lock).  But it is hard to untangle their inter-dependencies.
>>=20
>> Some of ZFS I/O threads are also affected, for example:
>>   0 101538 kernel           zio_write_issue_ mi_switch+0xd2 =
sleepq_wait+0x3a
>> _cv_wait+0x194 spa_config_enter+0x9b zio_vdev_io_start+0x1c2 =
zio_execute+0x236
>> taskqueue_run_locked+0x14a taskqueue_thread_loop+0xe8 fork_exit+0x85
>> fork_trampoline+0xe
>> 8716 101319 sshd             -                mi_switch+0xd2 =
sleepq_wait+0x3a
>> _cv_wait+0x194 spa_config_enter+0x9b zio_vdev_io_start+0x1c2 =
zio_execute+0x236
>> zio_nowait+0x49 arc_read+0x8e4 dbuf_read+0x6c2 =
dmu_buf_hold_array_by_dnode+0x1d3
>> dmu_read_uio_dnode+0x41 dmu_read_uio_dbuf+0x3b zfs_freebsd_read+0x5fc
>> VOP_READ_APV+0x89 vn_read+0x157 vn_io_fault1+0x1c2 vn_io_fault+0x197
>> dofileread+0x98
>> 71181 101141 encfs            -                mi_switch+0xd2 =
sleepq_wait+0x3a
>> _cv_wait+0x194 spa_config_enter+0x9b zio_vdev_io_start+0x1c2 =
zio_execute+0x236
>> zio_nowait+0x49 arc_read+0x8e4 dbuf_read+0x6c2 dmu_buf_hold+0x3d
>> zap_lockdir+0x43 zap_cursor_retrieve+0x171 zfs_freebsd_readdir+0x3f3
>> VOP_READDIR_APV+0x8f kern_getdirentries+0x21b sys_getdirentries+0x28
>> amd64_syscall+0x4ce Xfast_syscall+0xfb
>> 71181 101190 encfs            -                mi_switch+0xd2 =
sleepq_wait+0x3a
>> _cv_wait+0x194 spa_config_enter+0x9b zio_vdev_io_start+0x1c2 =
zio_execute+0x236
>> zio_nowait+0x49 arc_read+0x8e4 dbuf_prefetch_indirect_done+0xcc =
arc_read+0x425
>> dbuf_prefetch+0x4f7 dmu_zfetch+0x418 =
dmu_buf_hold_array_by_dnode+0x34d
>> dmu_read_uio_dnode+0x41 dmu_read_uio_dbuf+0x3b zfs_freebsd_read+0x5fc
>> VOP_READ_APV+0x89 vn_read+0x157
>>=20
>> Note that the first of these threads executes a write zio.
>>=20
>> It would be nice to determine an owner of spa_namespace_lock.
>> If you have debug symbols then it can be easily done in kgdb on the =
live system:
>> (kgdb) p spa_namespace_lock
>=20
> So as said a few minutes ago I lost access to the server and had to =
recycle it.
> Thankfully I managed to reproduce the issue, re-playing exactly the =
same steps.
>=20
> Curious line in /var/log/messages :
> kernel: g_access(918): provider da18 has error
> (da18 is the remaining iSCSI target device which did not disconnect =
properly)
>=20
> procstat -kk -a :
> https://benrubson.github.io/zfs/procstat02.log
>=20
> (kgdb) p spa_namespace_lock
> $1 =3D -2110867066

This time with debug symbols.

procstat -kk -a :
https://benrubson.github.io/zfs/procstat03.log

(kgdb) p spa_namespace_lock
$1 =3D {
  lock_object =3D {
    lo_name =3D 0xffffffff822eb986 "spa_namespace_lock",=20
    lo_flags =3D 40960000,=20
    lo_data =3D 0,=20
    lo_witness =3D 0x0
  },=20
  sx_lock =3D 18446735285324580100
}

Easily reproductible.
No issue however is there is no IO load.
As soon as there is IO load, I can reproduce the issue.

Ben