From owner-freebsd-fs@freebsd.org  Mon May 16 23:07:30 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 8A86CB3E23B
 for <freebsd-fs@mailman.ysv.freebsd.org>; Mon, 16 May 2016 23:07:30 +0000 (UTC)
 (envelope-from rainer@ultra-secure.de)
Received: from connect.ultra-secure.de (connect.ultra-secure.de
 [88.198.71.201]) by mx1.freebsd.org (Postfix) with ESMTP id E79E511DE
 for <freebsd-fs@freebsd.org>; Mon, 16 May 2016 23:07:29 +0000 (UTC)
 (envelope-from rainer@ultra-secure.de)
Received: (Haraka outbound); Tue, 17 May 2016 01:07:28 +0200
Authentication-Results: connect.ultra-secure.de; iprev=pass; auth=pass (plain);
 spf=none smtp.mailfrom=ultra-secure.de
Received-SPF: None (connect.ultra-secure.de: domain of ultra-secure.de does
 not designate 217.71.83.52 as permitted sender)
 receiver=connect.ultra-secure.de; identity=mailfrom; client-ip=217.71.83.52;
 helo=[192.168.1.200]; envelope-from=<rainer@ultra-secure.de>
Received: from [192.168.1.200] (217-071-083-052.ip-tech.ch [217.71.83.52])
 by connect.ultra-secure.de (Haraka/2.6.2-toaster) with ESMTPSA id
 D0846A73-60AD-4F3A-841F-6946D77246BB.1
 envelope-from <rainer@ultra-secure.de> (authenticated bits=0)
 (version=TLSv1/SSLv3 cipher=AES256-SHA verify=NO);
 Tue, 17 May 2016 01:07:26 +0200
From: Rainer Duffner <rainer@ultra-secure.de>
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
Subject: zfs receive stalls whole system
Message-Id: <0C2233A9-C64A-4773-ABA5-C0BCA0D037F0@ultra-secure.de>
Date: Tue, 17 May 2016 01:07:24 +0200
To: FreeBSD Filesystems <freebsd-fs@freebsd.org>
Mime-Version: 1.0 (Mac OS X Mail 9.3 \(3124\))
X-Mailer: Apple Mail (2.3124)
X-Haraka-GeoIP: EU, CH, 451km
X-Haraka-ASN: 24951
X-Haraka-GeoIP-Received: 
X-Haraka-ASN: 24951 217.71.80.0/20
X-Haraka-ASN-CYMRU: asn=24951 net=217.71.80.0/20 country=CH assignor=ripencc
 date=2003-08-07
X-Haraka-FCrDNS: 217-071-083-052.ip-tech.ch
X-Haraka-p0f: os="Mac OS X " link_type="DSL" distance=13 total_conn=2
 shared_ip=N
X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on spamassassin
X-Spam-Level: 
X-Spam-Status: No, score=-2.9 required=5.0 tests=ALL_TRUSTED,BAYES_00
 autolearn=ham autolearn_force=no version=3.4.1
X-Haraka-Karma: score: 6, good: 168, bad: 0, connections: 328, history: 168,
 asn_score: 102, asn_connections: 113, asn_good: 102, asn_bad: 0, pass:all_good,
 asn, asn_all_good, relaying
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 16 May 2016 23:07:30 -0000

Hi,

I have two servers, that were running FreeBSD 10.1-AMD64 for a long =
time, one zfs-sending to the other (via zxfer). Both are NFS-servers and =
MySQL-slaves, the sender is actively used as NFS-server, the recipient =
is just a warm-standby, in case something serious happens and we don=E2=80=
=99t want to wait for a day until the restore is back in place. The =
MySQL-Slaves are actively used as read-only servers (at the application =
level, Python=E2=80=99s SQL-Alchemy does that, apparently).

They are HP DL380G8 (one CPU, hexacore) with over 128 GB RAM (I think =
one has 144, the other has 192).
While they were running 10.1, they used HP P420 RAID-controllers with =
individual 12 RAID0 volumes that I pooled into 6-disk RAIDZ2 vdevs.
I use zfsnap to do hourly, daily and weekly snapshots.

Sending worked well, especially after updating to 10.1

Because the storage was over 90% full (and I really hate this =
RAID0-business we have with the HP RAID controllers), I rebuilt the =
servers with HPs OEMed H220/221 controllers (LSI 2308 in disguise) and =
an external disk shelf, hosting 12 additional disks was added- and I =
upgraded to FreeBSD 10.3.
Because we didn=E2=80=99t want to throw out the original disks, but =
increase available space a lot, the new disks are double the size of the =
original disks (600 vs. 1200 GB SAS).=20
I also created GPT-partitions on the disks and labeled them according to =
the disk=E2=80=99s position in the cages/shelf, created the pools with =
the got-partition-names instead of the daX-names.

Now, when I do a zxfer, sometimes the whole system stalls while the data =
is sent over, especially if the delta is large or if something else is =
reading from the disk at the same time (backup agent).

I had this before, on 10.0 (I believe, we didn=E2=80=99t have this in =
9.1 either, IIRC) and it went away in 10.1.

It=E2=80=99s very difficult (well, impossible) to debug, because the =
system totally hangs and doesn=E2=80=99t accept any keypresses.

Would a ZIL help in this case?
I always thought that NFS was the only thing that did SYNC writes=E2=80=A6=