From owner-freebsd-questions@freebsd.org  Tue Feb  2 06:11:00 2016
Return-Path: <owner-freebsd-questions@freebsd.org>
Delivered-To: freebsd-questions@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id B9599A985F1;
 Tue,  2 Feb 2016 06:11:00 +0000 (UTC)
 (envelope-from woodsb02@gmail.com)
Received: from mail-lf0-x229.google.com (mail-lf0-x229.google.com
 [IPv6:2a00:1450:4010:c07::229])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id 2C0911852;
 Tue,  2 Feb 2016 06:11:00 +0000 (UTC)
 (envelope-from woodsb02@gmail.com)
Received: by mail-lf0-x229.google.com with SMTP id j78so41538814lfb.1;
 Mon, 01 Feb 2016 22:11:00 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:in-reply-to:references:date:message-id:subject:from:to
 :cc:content-type;
 bh=lZ3tdqE/YuX/cl1+JZKD2H6/zOzTPVplFBdq3hS6kIY=;
 b=e25KVCtvBKtYnLLvJdBHNbMe4shuXkxSCxbFlm/RSBW+7D285n1Mq9LjO+BX68sfN7
 RXQjPHB7wgd84oCjGiXHAFYlWnSzOhHw/Yc7BQxL22RKOO1Blhcx4kUsgTf9jlhl1Bw9
 rimj6pI6CZEiCnbHjZJvRtNiFd3Ta55Vwy0bLm8tM9SE+QnhNhc1pcgN7GFpquPUhqFv
 Avyk0TREjkfbCpXh83qO+EJhwOzDLWmkhcyKMBVAb+ljPKYehG7MAtqyiDlhJhKn77Iw
 MvUSOIgJU/kL4VjZRr4qAOeE5b0obwA6mzIU5Try/w+iKl+Txj2ngbyoPCPs4sqobtLj
 szWA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20130820;
 h=x-gm-message-state:mime-version:in-reply-to:references:date
 :message-id:subject:from:to:cc:content-type;
 bh=lZ3tdqE/YuX/cl1+JZKD2H6/zOzTPVplFBdq3hS6kIY=;
 b=HMix4N0i7F4dlpTRVT5pfZiICs9BgWuSN66eU9Zd22CjGEEaUwGkpIQcrcTMuUoXHu
 /ksxs0WHkR90D+tAegir/oPmMiIgzMARFPvEl4cCaBug2vjEc6kZnXOEBSgJ4hAxgL9f
 cmTDwuM1EEpqfHv+vzljIQ3Go2leF6qng4KAlXb7u+NWkTFjS4l6/2+CCe8booRiJR0y
 UdZDgdVuJbRPJMDrJiGqX30iC1thtO+6psDOcczHXOafn6ycc9uapop1msBT0VPQsNZH
 fVz6T5M4qvr8qxlSBZl8msOuProuXMo4EKpI2ExgaUfKuZyvZ1/VwNfe4I1Cj2wUNKUl
 vakg==
X-Gm-Message-State: AG10YOQJVxmSOEcb99LdAufXSnV+XG81QwAtZRBb/ANr7rClQpXC+PDeIQTuQ+lW8gZkXofWJL0SgLXy3nkdFw==
MIME-Version: 1.0
X-Received: by 10.25.65.5 with SMTP id o5mr9101477lfa.38.1454393458239; Mon,
 01 Feb 2016 22:10:58 -0800 (PST)
Received: by 10.25.89.10 with HTTP; Mon, 1 Feb 2016 22:10:58 -0800 (PST)
In-Reply-To: <CALd+dcfzPU=nMGo41BBZzt3jQnsQJaANVyA222TDM_is2Ueo0A@mail.gmail.com>
References: <CALd+dcfzPU=nMGo41BBZzt3jQnsQJaANVyA222TDM_is2Ueo0A@mail.gmail.com>
Date: Tue, 2 Feb 2016 07:10:58 +0100
Message-ID: <CAOc73CCHS4r-proJ_jT4T+BfcQB9pND8Ld8QZqYJOCkuq2LqiA@mail.gmail.com>
Subject: Re: NFS unstable with high load on server
From: Ben Woods <woodsb02@gmail.com>
To: Vick Khera <vivek@khera.org>
Cc: "freebsd-questions@freebsd.org" <freebsd-questions@freebsd.org>,
 freebsd-fs@freebsd.org
Content-Type: text/plain; charset=UTF-8
X-Content-Filtered-By: Mailman/MimeDel 2.1.20
X-BeenThere: freebsd-questions@freebsd.org
X-Mailman-Version: 2.1.20
Precedence: list
List-Id: User questions <freebsd-questions.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-questions>, 
 <mailto:freebsd-questions-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-questions/>
List-Post: <mailto:freebsd-questions@freebsd.org>
List-Help: <mailto:freebsd-questions-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-questions>, 
 <mailto:freebsd-questions-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 02 Feb 2016 06:11:00 -0000

On Monday, 1 February 2016, Vick Khera <vivek@khera.org> wrote:

> I have a handful of servers at my data center all running FreeBSD 10.2. On
> one of them I have a copy of the FreeBSD sources shared via NFS. When this
> server is running a large poudriere run re-building all the ports I need,
> the clients' NFS mounts become unstable. That is, the clients keep getting
> read failures. The interactive performance of the NFS server is just fine,
> however. The local file system is a ZFS mirror.
>
> What could be causing NFS to be unstable in this situation?
>
> Specifics:
>
> Server "lorax" FreeBSD 10.2-RELEASE-p7 kernel locally compiled, with NFS
> server and ZFS as dynamic kernel modules. 16GB RAM, Xeon 3.1GHz quad
> processor.
>
> The directory /u/lorax1 a ZFS dataset on a mirrored pool, and is NFS
> exported via the ZFS exports file. I put the FreeBSD sources on this
> dataset and symlink to /usr/src.
>
>
> Client "bluefish" FreeBSD 10.2-RELEASE-p5 kernel locally compiled, NFS
> client built in to kernel. 32GB RAM, Xeon 3.1GHz quad processor (basically
> same hardware but more RAM).
>
> The directory /n/lorax1 is NFS mounted from lorax via autofs. The NFS
> options are "intr,nolockd". /usr/src is symlinked to the sources in that
> NFS mount.
>
>
> What I observe:
>
> [lorax]~% cd /usr/src
> [lorax]src% svn status
> [lorax]src% w
>  9:12AM  up 12 days, 19:19, 4 users, load averages: 4.43, 4.45, 3.61
> USER       TTY      FROM                      LOGIN@  IDLE WHAT
> vivek      pts/0    vick.int.kcilink.com      8:44AM     - tmux: client
> (/tmp/
> vivek      pts/1    tmux(19747).%0            8:44AM    19 sed
> y%*+%pp%;s%[^_a
> vivek      pts/2    tmux(19747).%1            8:56AM     - w
> vivek      pts/3    tmux(19747).%2            8:56AM     - slogin
> bluefish-prv
> [lorax]src% pwd
> /u/lorax1/usr10/src
>
> So right now the load average is more than 1 per processor on lorax. I can
> quite easily run "svn status" on the source directory, and the interactive
> performance is pretty snappy for editing local files and navigating around
> the file system.
>
>
> On the client:
>
> [bluefish]~% cd /usr/src
> [bluefish]src% pwd
> /n/lorax1/usr10/src
> [bluefish]src% svn status
> svn: E070008: Can't read directory '/n/lorax1/usr10/src/contrib/sqlite3':
> Partial results are valid but processing is incomplete
> [bluefish]src% svn status
> svn: E070008: Can't read directory '/n/lorax1/usr10/src/lib/libfetch':
> Partial results are valid but processing is incomplete
> [bluefish]src% svn status
> svn: E070008: Can't read directory
> '/n/lorax1/usr10/src/release/picobsd/tinyware/msg': Partial results are
> valid but processing is incomplete
> [bluefish]src% w
>  9:14AM  up 93 days, 23:55, 1 user, load averages: 0.10, 0.15, 0.15
> USER       TTY      FROM                      LOGIN@  IDLE WHAT
> vivek      pts/0    lorax-prv.kcilink.com     8:56AM     - w
> [bluefish]src% df .
> Filesystem          1K-blocks    Used     Avail Capacity  Mounted on
> lorax-prv:/u/lorax1 932845181 6090910 926754271     1%    /n/lorax1
>
>
> What I see is more or less random failures to read the NFS volume. When the
> server is not so busy running poudriere builds, the client never has any
> failures.
>
> I also observe this kind of failure doing  buildworld or installworld on
> the client when the server is busy -- I get strange random failures reading
> the files causing the build or install to fail.
>
> My workaround is to not do build/installs on client machines when the NFS
> server is busy doing large jobs like building all packages, but there is
> definitely something wrong here I'd like to fix. I observe this on all the
> local NFS clients. I rebooted the server before to try to clear this up but
> it did not fix it.
>
> Any help would be appreciated.
>

I just wanted to point out that I am experiencing this exact same issue in
my home setup.

Performing an installworld from an NFS mount works perfectly, until I start
running poudriere on the NFS server. Then I start getting NFS timeouts and
the installworld fails.

The NFS server is also using ZFS, but the NFS export in my case is being
done via the ZFS property "sharenfs" (I am not using the /etc/exports file).

I suspect this will boil down to a ZFS tuning issue, where poudriere and
installworld are both stress testing the server. Both of these would
obviously cause significant memory and CPU usage, and the "recently used"
portion of the ARC to be constantly flushed as they access a large number
of different files.

It might be interesting if you could report the output of the heading lines
(including memory and ARC details) from the "top" command before/after
running poudriere and attempting the installworld.

Regards,
Ben


-- 

--
From: Benjamin Woods
woodsb02@gmail.com