From owner-freebsd-fs@freebsd.org  Mon Feb  8 14:15:32 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id A4553AA276F
 for <freebsd-fs@mailman.ysv.freebsd.org>; Mon,  8 Feb 2016 14:15:32 +0000 (UTC)
 (envelope-from thomasrcurry@gmail.com)
Received: from mail-io0-x235.google.com (mail-io0-x235.google.com
 [IPv6:2607:f8b0:4001:c06::235])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id 7B31F1FA
 for <freebsd-fs@freebsd.org>; Mon,  8 Feb 2016 14:15:32 +0000 (UTC)
 (envelope-from thomasrcurry@gmail.com)
Received: by mail-io0-x235.google.com with SMTP id 9so196264600iom.1
 for <freebsd-fs@freebsd.org>; Mon, 08 Feb 2016 06:15:32 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:in-reply-to:references:date:message-id:subject:from:to
 :cc:content-type;
 bh=WgdGxaC5Zr/kZZafYTDSmUDsYuRxu3mDnIqZZgL7KOI=;
 b=J7N9cUHTRpwA85orHrn7HQxh7VmlduwNccFSc3cJQ7G07ay3CDHbWForybou/o1bQC
 9FY/fwcjr6/9WoDzy9RRx2MieR8stKc9dT25Q95E5EOPIQBRmAwYicyTcPEr4pgnHvne
 B/lughexX3FIDqXuNNyXOGK+JdEWLcroDY/2AkKhdmY8t1afEBfr6fMWcl23M6+3LiLo
 XfmTs9BKgFSV8TJE+mbZ2w8QiQ56vqeCl6BVDNWI9wPU8zmaVpLbt2GRaYvMwE5PFRa6
 G9umNSOcdOJdMnh3l3tq6hEtqpj9h/OG4NLitgeR1nmQeOQgSbm/BxgH+J/7b7svN2zx
 OAaA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20130820;
 h=x-gm-message-state:mime-version:in-reply-to:references:date
 :message-id:subject:from:to:cc:content-type;
 bh=WgdGxaC5Zr/kZZafYTDSmUDsYuRxu3mDnIqZZgL7KOI=;
 b=l6LgENkaZq5ovPE5jqOgJNy/pSWT5ipX4Rnknv8CQbVHu9H6yNJIYVBYjRAvZOJVmg
 M9HdE/3JSzA7827myZg61x+lcK6WswIkagGFuzSONPZbMBstqUqGXDgENyCgNPW+p02c
 DN+Qk5Z8DFDcwKhFgLUHGRO8GZpQi6fQVlajutIOtx3W3ox//6qIB0adqpwxByqmnbO7
 oWClnSnJuXDqOtfXr1sr0cuv67nzJtIguDKTh8ZLksQM/QCDs7TqbKeahDDVBbSxSy97
 XL4B9DWkdSbhaHLxfpls2XyPNyadG+u1N3wtBsw9h6xGxoVemE+ne0cg7RJa0U+ZGtZ5
 amYQ==
X-Gm-Message-State: AG10YOSGGCcBTuuUdzY0q8q9kQE1FRr0HJn/blO8Sjm8Wt3MavWcRezYoWe9tdo4a+B5POPVbJiAClSqEXcAkg==
MIME-Version: 1.0
X-Received: by 10.107.136.200 with SMTP id s69mr27504303ioi.120.1454940931737; 
 Mon, 08 Feb 2016 06:15:31 -0800 (PST)
Received: by 10.107.4.71 with HTTP; Mon, 8 Feb 2016 06:15:31 -0800 (PST)
In-Reply-To: <alpine.DEB.2.11.1602080056390.17583@motsugo.ucc.gu.uwa.edu.au>
References: <alpine.DEB.2.11.1601292153420.26396@motsugo.ucc.gu.uwa.edu.au>
 <alpine.DEB.2.11.1602080056390.17583@motsugo.ucc.gu.uwa.edu.au>
Date: Mon, 8 Feb 2016 09:15:31 -0500
Message-ID: <CAGtEZUDCAENGcUjpZDjUBg93F_MWQO40Q4WScm1BogAOUjgEaA@mail.gmail.com>
Subject: Re: Poor ZFS+NFSv3 read/write performance and panic
From: Tom Curry <thomasrcurry@gmail.com>
To: David Adam <zanchey@ucc.gu.uwa.edu.au>
Cc: FreeBSD Filesystems <freebsd-fs@freebsd.org>
Content-Type: text/plain; charset=UTF-8
X-Content-Filtered-By: Mailman/MimeDel 2.1.20
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.20
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 08 Feb 2016 14:15:32 -0000

On Sun, Feb 7, 2016 at 11:58 AM, David Adam <zanchey@ucc.gu.uwa.edu.au>
wrote:

> Just wondering if anyone has any idea how to identify which devices are
> implicated in ZFS' vdev_deadman(). I have updated the firmware on the
> mps(4) card that has our disks attached but that hasn't helped.
>
> Thanks
>
> David
>
> On Fri, 29 Jan 2016, David Adam wrote:
> > We have a FreeBSD 10.2 server sharing some ZFS datasets over NFSv3. It's
> > worked well until recently, but has started to routinely perform
> > exceptionally poorly, eventually panicing in vdev_deadman() (which I
> > understand is a feature).
> >
> > Initally after booting, things are fine, but performance rapidly begins
> to
> > degrade. Both read and write performance is terrible, with many
> operations
> > either hanging indefinitely or timing out.
> >
> > When this happens, I can break into DDB and see lots of nfsd process
> stuck
> > waiting for a lock:
> > Process 784 (nfsd) thread 0xfffff80234795000 (100455)
> > shared lockmgr zfs (zfs) r = 0 (0xfffff8000b91f548) locked @
> >
> /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vfsops.c:2196
> >
> > and the backtrace looks like this:
> >   sched_switch() at sched_switch+0x495/frame 0xfffffe04677740b0
> >   mi_switch() at mi_switch+0x179/frame 0xfffffe04677740f0
> >   turnstile_wait() at turnstile_wait+0x3b2/frame 0xfffffe0467774140
> >   __mtx_lock_sleep() at __mtx_lock_sleep+0x2c0/frame 0xfffffe04677741c0
> >   __mtx_lock_flags() at __mtx_lock_flags+0x102/frame 0xfffffe0467774210
> >   vmem_size() at vmem_size+0x5a/frame 0xfffffe0467774240
> >   arc_reclaim_needed() at arc_reclaim_needed+0xd2/frame
> 0xfffffe0467774260
> >   arc_get_data_buf() at arc_get_data_buf+0x157/frame 0xfffffe04677742a0
> >   arc_read() at arc_read+0x68b/frame 0xfffffe0467774350
> >   dbuf_read() at dbuf_read+0x7ed/frame 0xfffffe04677743f0
> >   dmu_tx_check_ioerr() at dmu_tx_check_ioerr+0x8b/frame
> 0xfffffe0467774420
> >   dmu_tx_count_write() at dmu_tx_count_write+0x17e/frame
> 0xfffffe0467774540
> >   dmu_tx_hold_write() at dmu_tx_hold_write+0xba/frame 0xfffffe0467774580
> >   zfs_freebsd_write() at zfs_freebsd_write+0x55d/frame 0xfffffe04677747b0
> >   VOP_WRITE_APV() at VOP_WRITE_APV+0x193/frame 0xfffffe04677748c0
> >   nfsvno_write() at nfsvno_write+0x13e/frame 0xfffffe0467774970
> >   nfsrvd_write() at nfsrvd_write+0x496/frame 0xfffffe0467774c80
> >   nfsrvd_dorpc() at nfsrvd_dorpc+0x66b/frame 0xfffffe0467774e40
> >   nfssvc_program() at nfssvc_program+0x4e6/frame 0xfffffe0467774ff0
> >   svc_run_internal() at svc_run_internal+0xbb7/frame 0xfffffe0467775180
> >   svc_run() at svc_run+0x1db/frame 0xfffffe04677751f0
> >   nfsrvd_nfsd() at nfsrvd_nfsd+0x1f0/frame 0xfffffe0467775350
> >   nfssvc_nfsd() at nfssvc_nfsd+0x124/frame 0xfffffe0467775970
> >   sys_nfssvc() at sys_nfssvc+0xb7/frame 0xfffffe04677759a0
> >   amd64_syscall() at amd64_syscall+0x278/frame 0xfffffe0467775ab0
> >   Xfast_syscall() at Xfast_syscall+0xfb/frame 0xfffffe0467775ab0
> >
> > Is this likely to be due to bad hardware? I can't see any problems in
> > the SMART data, and `camcontrol tags da0 -v` etc. does not reveal any
> > particularly long queues. Are there other useful things to check?
> >
> > If not, do you have any other ideas? I can make the full DDB information
> > available if that would be helpful.
> >
> > The pool is configured thus:
> >         NAME                  STATE     READ WRITE CKSUM
> >         space                 ONLINE       0     0     0
> >           mirror-0            ONLINE       0     0     0
> >             da0               ONLINE       0     0     0
> >             da1               ONLINE       0     0     0
> >           mirror-1            ONLINE       0     0     0
> >             da2               ONLINE       0     0     0
> >             da3               ONLINE       0     0     0
> >           mirror-2            ONLINE       0     0     0
> >             da4               ONLINE       0     0     0
> >             da6               ONLINE       0     0     0
> >           mirror-3            ONLINE       0     0     0
> >             da7               ONLINE       0     0     0
> >             da8               ONLINE       0     0     0
> >         logs
> >           mirror-4            ONLINE       0     0     0
> >             gpt/molmol-slog   ONLINE       0     0     0
> >             gpt/molmol-slog0  ONLINE       0     0     0
> > where the da? devices are WD Reds and the SLOG partitions are on Samsung
> > 840s.
> >
> > Many thanks,
> >
> > David Adam
> > zanchey@ucc.gu.uwa.edu.au
> >
> > _______________________________________________
> > freebsd-fs@freebsd.org mailing list
> > https://lists.freebsd.org/mailman/listinfo/freebsd-fs
> > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"
> >
>
> Cheers,
>
> David Adam
> zanchey@ucc.gu.uwa.edu.au
> Ask Me About Our SLA!
> _______________________________________________
> freebsd-fs@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-fs
> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"
>


I too ran into this problem and spent quite some time troubleshooting
hardware. For me it turns out it was not hardware at all, but software.
Specifically the ZFS ARC. Looking at your stack I see some arc reclaim up
top, it's possible you're running into the same issue. There is a monster
of a PR that details this here
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=187594

If you would like to test this theory out, the fastest way is to limit the
ARC by adding the following to /boot/loader.conf and rebooting
vfs.zfs.arc_max="24G"

Replacing 24G with what makes sense for your system, aim for 3/4 of total
memory for starters. If this solves the problem there are more scientific
methods to a permanent fix, one would be applying the patch in the PR
above, another would be a more finely tuned arc_max value.