From owner-freebsd-fs@freebsd.org Tue Oct 3 11:43:14 2017 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 105F0E39AF6; Tue, 3 Oct 2017 11:43:14 +0000 (UTC) (envelope-from ben.rubson@gmail.com) Received: from mail-wm0-x22a.google.com (mail-wm0-x22a.google.com [IPv6:2a00:1450:400c:c09::22a]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 96AE7708F3; Tue, 3 Oct 2017 11:43:13 +0000 (UTC) (envelope-from ben.rubson@gmail.com) Received: by mail-wm0-x22a.google.com with SMTP id i82so13926471wmd.3; Tue, 03 Oct 2017 04:43:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:subject:from:in-reply-to:date :content-transfer-encoding:message-id:references:to; bh=glpyQV2+u2c6fJfkqWnk31JnmpiHOFmGo21o3TTPem4=; b=EjQirs0ZEDtRqHhE44IybPy9mbeC3ANouwMhfvyQ7sdaWhwYxhDAP/kzPkNIfZOhH/ kQUpJ7j2MfQhH1ararAVdP6yVfraT26wio8luOfDU0+1kR9O/av79KX1NtK0pJCW33W/ SMCs7Kx6A44xwfDq9Z6HNtBJ1Tv2KpeLnuQlHnZV+gi97ofWExVrRuyuvM6eKCWjoaQ1 2/Q2iat3xIEVILrDmLWthQJf2DzUFfITkQ5j+IVUJOLk330XAwSH9DuylEj5PM0z91n6 Z9eOiLTuqImS7A51V1qxMM5K0aanr/8EmQpTcFGVpLzl4ql2457HWKQ5CV/LLg8txdRg WZng== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:subject:from:in-reply-to:date :content-transfer-encoding:message-id:references:to; bh=glpyQV2+u2c6fJfkqWnk31JnmpiHOFmGo21o3TTPem4=; b=pzaMbzYhABe18vxY1u4sTvxoeWwAC2QZEspNYkF1uX50JKz4vNnG6TSCroCq2cUQs3 7xkGHpFcMpD9gIwPtsZ8C1+qknGOiQUDmlb9cK2hSuHZheDMkDHNAFH0gDXq+MtixldW d1e/i/pL0VZtOzHH4UmtboeIwKkx+3Bdo1hqFx+6Z3ukwHGaMObWlZqx4o5ui1ZiVuq6 +bXm/BIM2C9UYKO9ZNR9OUkE8+uL37oIhBvPxymKUbN0s0rQDisfL6abQxbZajxX6rgY iiG67dTqWixIB+oNsz2rU7k1fs/VjDvcm8y52kRsu93QZO1Si4bZEFt/UP+3r6okisvh wNFQ== X-Gm-Message-State: AMCzsaUggGDEBWxsqWb3xhP7P5iaWyXV8El7mdteV3ugFA6crocVIzsm pBsbgwzfmA5GiFaiu5IlAfPXwEdd X-Google-Smtp-Source: AOwi7QACemBy8ozmmJZ4ikHMBjJdplLgUvShqzfjOlMdfYFKgVgHLWkTfw5czeA0p08z3XkqXsHZVg== X-Received: by 10.28.136.83 with SMTP id k80mr12670178wmd.159.1507030991716; Tue, 03 Oct 2017 04:43:11 -0700 (PDT) Received: from bens-mac.home (LFbn-MAR-1-445-220.w2-15.abo.wanadoo.fr. [2.15.38.220]) by smtp.gmail.com with ESMTPSA id k37sm6553666wre.96.2017.10.03.04.43.10 (version=TLS1 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Tue, 03 Oct 2017 04:43:11 -0700 (PDT) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 9.3 \(3124\)) Subject: Re: ZFS stalled after some mirror disks were lost From: Ben RUBSON In-Reply-To: Date: Tue, 3 Oct 2017 13:43:09 +0200 Content-Transfer-Encoding: quoted-printable Message-Id: <63B239EB-47F0-4DDA-982A-794E5B5FC56F@gmail.com> References: <4A0E9EB8-57EA-4E76-9D7E-3E344B2037D2@gmail.com> To: Freebsd fs , FreeBSD-scsi X-Mailer: Apple Mail (2.3124) X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 03 Oct 2017 11:43:14 -0000 > On 03 Oct 2017, at 08:19, Ben RUBSON wrote: >=20 > Hi, >=20 > Putting scsi list as it could be related. >=20 >> On 02 Oct 2017, at 20:12, Ben RUBSON wrote: >>=20 >> Hi, >>=20 >> On a FreeBSD 11 server, the following online/healthy zpool : >>=20 >> home >> mirror-0 >> label/local1 >> label/local2 >> label/iscsi1 >> label/iscsi2 >> mirror-1 >> label/local3 >> label/local4 >> label/iscsi3 >> label/iscsi4 >> cache >> label/local5 >> label/local6 >>=20 >> A sustained read throughput of 180 MB/s, 45 MB/s on each iscsi disk >> according to "zpool iostat", nothing on local disks (strange but I >> noticed that IOs always prefer iscsi disks to local disks). >> No write IOs. >>=20 >> Let's disconnect all iSCSI disks : >> iscsictl -Ra >>=20 >> Expected behavior : >> IO activity flawlessly continue on local disks. >>=20 >> What happened : >> All IOs stalled, server only answers to IOs made to its zroot pool. >> All commands related to the iSCSI disks (iscsictl), or to ZFS = (zfs/zpool), >> don't return. >>=20 >> Questions : >> Why this behavior ? >> How to know what happens ? (/var/log/messages says almost nothing) >>=20 >> I already disconnected the iSCSI disks without any issue in the past, >> several times, but there were almost no IOs running. >>=20 >> Thank you for your help ! >>=20 >> Ben >=20 >> On 02 Oct 2017, at 22:55, Andriy Gapon wrote: >>=20 >>> On 02/10/2017 22:13, Ben RUBSON wrote: >>>=20 >>>> On 02 Oct 2017, at 20:45, Andriy Gapon wrote: >>>>=20 >>>>> On 02/10/2017 21:17, Ben RUBSON wrote: >>>>>=20 >>>>> Unfortunately the zpool command stalls / does not return :/ >>>>=20 >>>> Try to take procstat -kk -a. >>>=20 >>> Here is the procstat output : >>> https://benrubson.github.io/zfs/procstat01.log >>=20 >> First, it seems that there are some iscsi threads stuck on a lock = like: >> 0 100291 kernel iscsimt mi_switch+0xd2 = sleepq_wait+0x3a >> _sx_xlock_hard+0x592 iscsi_maintenance_thread+0x316 fork_exit+0x85 >> fork_trampoline+0xe >>=20 >> or like >>=20 >> 8580 102077 iscsictl - mi_switch+0xd2 = sleepq_wait+0x3a >> _sx_slock_hard+0x325 iscsi_ioctl+0x7ea devfs_ioctl_f+0x13f = kern_ioctl+0x2d4 >> sys_ioctl+0x171 amd64_syscall+0x4ce Xfast_syscall+0xfb >>=20 >> Also, there is a thread in cam_sim_free(): >> 0 100986 kernel iscsimt mi_switch+0xd2 = sleepq_wait+0x3a >> _sleep+0x2a1 cam_sim_free+0x48 iscsi_session_cleanup+0x1bd >> iscsi_maintenance_thread+0x388 fork_exit+0x85 fork_trampoline+0xe >>=20 >> So, it looks like there could be a problem is the iscsi teardown = path. >>=20 >> Maybe that caused a domino effect in ZFS code. I see a lot of = threads waiting >> either for spa_namespace_lock or a spa config lock (a highly = specialized ZFS >> lock). But it is hard to untangle their inter-dependencies. >>=20 >> Some of ZFS I/O threads are also affected, for example: >> 0 101538 kernel zio_write_issue_ mi_switch+0xd2 = sleepq_wait+0x3a >> _cv_wait+0x194 spa_config_enter+0x9b zio_vdev_io_start+0x1c2 = zio_execute+0x236 >> taskqueue_run_locked+0x14a taskqueue_thread_loop+0xe8 fork_exit+0x85 >> fork_trampoline+0xe >> 8716 101319 sshd - mi_switch+0xd2 = sleepq_wait+0x3a >> _cv_wait+0x194 spa_config_enter+0x9b zio_vdev_io_start+0x1c2 = zio_execute+0x236 >> zio_nowait+0x49 arc_read+0x8e4 dbuf_read+0x6c2 = dmu_buf_hold_array_by_dnode+0x1d3 >> dmu_read_uio_dnode+0x41 dmu_read_uio_dbuf+0x3b zfs_freebsd_read+0x5fc >> VOP_READ_APV+0x89 vn_read+0x157 vn_io_fault1+0x1c2 vn_io_fault+0x197 >> dofileread+0x98 >> 71181 101141 encfs - mi_switch+0xd2 = sleepq_wait+0x3a >> _cv_wait+0x194 spa_config_enter+0x9b zio_vdev_io_start+0x1c2 = zio_execute+0x236 >> zio_nowait+0x49 arc_read+0x8e4 dbuf_read+0x6c2 dmu_buf_hold+0x3d >> zap_lockdir+0x43 zap_cursor_retrieve+0x171 zfs_freebsd_readdir+0x3f3 >> VOP_READDIR_APV+0x8f kern_getdirentries+0x21b sys_getdirentries+0x28 >> amd64_syscall+0x4ce Xfast_syscall+0xfb >> 71181 101190 encfs - mi_switch+0xd2 = sleepq_wait+0x3a >> _cv_wait+0x194 spa_config_enter+0x9b zio_vdev_io_start+0x1c2 = zio_execute+0x236 >> zio_nowait+0x49 arc_read+0x8e4 dbuf_prefetch_indirect_done+0xcc = arc_read+0x425 >> dbuf_prefetch+0x4f7 dmu_zfetch+0x418 = dmu_buf_hold_array_by_dnode+0x34d >> dmu_read_uio_dnode+0x41 dmu_read_uio_dbuf+0x3b zfs_freebsd_read+0x5fc >> VOP_READ_APV+0x89 vn_read+0x157 >>=20 >> Note that the first of these threads executes a write zio. >>=20 >> It would be nice to determine an owner of spa_namespace_lock. >> If you have debug symbols then it can be easily done in kgdb on the = live system: >> (kgdb) p spa_namespace_lock >=20 > So as said a few minutes ago I lost access to the server and had to = recycle it. > Thankfully I managed to reproduce the issue, re-playing exactly the = same steps. >=20 > Curious line in /var/log/messages : > kernel: g_access(918): provider da18 has error > (da18 is the remaining iSCSI target device which did not disconnect = properly) >=20 > procstat -kk -a : > https://benrubson.github.io/zfs/procstat02.log >=20 > (kgdb) p spa_namespace_lock > $1 =3D -2110867066 This time with debug symbols. procstat -kk -a : https://benrubson.github.io/zfs/procstat03.log (kgdb) p spa_namespace_lock $1 =3D { lock_object =3D { lo_name =3D 0xffffffff822eb986 "spa_namespace_lock",=20 lo_flags =3D 40960000,=20 lo_data =3D 0,=20 lo_witness =3D 0x0 },=20 sx_lock =3D 18446735285324580100 } Easily reproductible. No issue however is there is no IO load. As soon as there is IO load, I can reproduce the issue. Ben