From owner-freebsd-stable@FreeBSD.ORG  Thu Mar  7 19:30:57 2013
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@freebsd.org
Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115])
 by hub.freebsd.org (Postfix) with ESMTP id E58FDB0B
 for <freebsd-stable@freebsd.org>; Thu,  7 Mar 2013 19:30:57 +0000 (UTC)
 (envelope-from karl@denninger.net)
Received: from fs.denninger.net (wsip-70-169-168-7.pn.at.cox.net
 [70.169.168.7]) by mx1.freebsd.org (Postfix) with ESMTP id 82A4DE6E
 for <freebsd-stable@freebsd.org>; Thu,  7 Mar 2013 19:30:57 +0000 (UTC)
Received: from [127.0.0.1] (localhost [127.0.0.1])
 by fs.denninger.net (8.14.6/8.13.1) with ESMTP id r27JUuwO090443
 for <freebsd-stable@freebsd.org>; Thu, 7 Mar 2013 13:30:56 -0600 (CST)
 (envelope-from karl@denninger.net)
Received: from [127.0.0.1] [192.168.1.40] by Spamblock-sys (LOCAL);
 Thu Mar  7 13:30:56 2013
Message-ID: <5138EAEB.7010105@denninger.net>
Date: Thu, 07 Mar 2013 13:30:51 -0600
From: Karl Denninger <karl@denninger.net>
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64;
 rv:17.0) Gecko/20130215 Thunderbird/17.0.3
MIME-Version: 1.0
To: freebsd-stable@freebsd.org
Subject: Re: ZFS "stalls" -- and maybe we should be talking about defaults?
References: <513524B2.6020600@denninger.net>
 <20130307072145.GA2923@server.rulingia.com> <5138A4C1.5090503@denninger.net>
 <F99CDA75FB2C454680C1E8AA9008E9DA@multiplay.co.uk>
 <5138E55F.7080107@denninger.net>
 <322C3648171F4BF28201350E5656372A@multiplay.co.uk>
In-Reply-To: <322C3648171F4BF28201350E5656372A@multiplay.co.uk>
X-Enigmail-Version: 1.5.1
X-Antivirus: avast! (VPS 130307-0, 03/07/2013), Outbound message
X-Antivirus-Status: Clean
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 8bit
X-Content-Filtered-By: Mailman/MimeDel 2.1.14
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-stable>,
 <mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
 <mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 07 Mar 2013 19:30:58 -0000


On 3/7/2013 1:27 PM, Steven Hartland wrote:
>
> ----- Original Message ----- From: "Karl Denninger" <karl@denninger.net>
> To: <freebsd-stable@freebsd.org>
> Sent: Thursday, March 07, 2013 7:07 PM
> Subject: Re: ZFS "stalls" -- and maybe we should be talking about
> defaults?
>
>
>
> On 3/7/2013 12:57 PM, Steven Hartland wrote:
>>
>> ----- Original Message ----- From: "Karl Denninger" <karl@denninger.net>
>>> Where I am right now is this:
>>>
>>> 1. I *CANNOT* reproduce the spins on the test machine with Postgres
>>> stopped in any way.  Even with multiple ZFS send/recv copies going on
>>> and the load average north of 20 (due to all the geli threads), the
>>> system doesn't stall or produce any notable pauses in throughput.  Nor
>>> does the system RAM allocation get driven hard enough to force paging.
>>> This is with NO tuning hacks in /boot/loader.conf.  I/O performance is
>>> both stable and solid.
>>>
>>> 2. WITH Postgres running as a connected hot spare (identical to the
>>> production machine), allocating ~1.5G of shared, wired memory,  running
>>> the same synthetic workload in (1) above I am getting SMALL versions of
>>> the misbehavior.  However, while system RAM allocation gets driven
>>> pretty hard and reaches down toward 100MB in some instances it doesn't
>>> get driven hard enough to allocate swap.  The "burstiness" is very
>>> evident in the iostat figures with spates getting into the single digit
>>> MB/sec range from time to time but it's not enough to drive the system
>>> to a full-on stall.
>>>
>>>> There's pretty-clearly a bad interaction here between Postgres wiring
>>>> memory and the ARC, when the latter is left alone and allowed to do
>>>> what
>>>> it wants.   I'm continuing to work on replicating this on the test
>>>> machine... just not completely there yet.
>>>
>>> Another possibility to consider is how postgres uses the FS. For
>>> example
>>> does is request sync IO in ways not present in the system without it
>>> which is causing the FS and possibly underlying disk system to behave
>>> differently.
>>
>> That's possible but not terribly-likely in this particular instance. 
>> The reason is that I ran into this with the Postgres data store on a UFS
>> volume BEFORE I converted it.  Now it's on the ZFS pool (with
>> recordsize=8k as recommended for that filesystem) but when I first ran
>> into this it was on a separate UFS filesystem (which is where it had
>> resided for 2+ years without incident), so unless the Postgres
>> filesystem use on a UFS volume would give ZFS fits it's unlikely to be
>> involved.
>
> I hate to say it, but that sounds very familiar to something we
> experienced
> with a machine here which was running high numbers of rrd updates. Again
> we had the issue on UFS and saw the same thing when we moved the ZFS.
>
> I'll leave that there as to not derail the investigation with what could
> be totally irrelavent info, but it may prove an interesting data point
> later.
>
> There are obvious common low level points between UFS and ZFS which
> may be the cause. One area which springs to mind is device bio ordering
> and barriers which could well be impacted by sync IO requests independent
> of the FS in use.
>
>>> One other options to test, just to rule it out is what happens if you
>>> use BSD scheduler instead of ULE?
>>
>> I will test that but first I have to get the test machine to reliably
>> stall so I know I'm not chasing my tail.
>
> Very sensible.
>
> Assuming you can reproduce it, one thing that might be interesting to
> try is to eliminate all sync IO. I'm not sure if there are options in
> Postgres to do this via configuration or if it would require editing
> the code but this could reduce the problem space.
>
> If disabling sync IO eliminated the problem it would go a long way
> to proving it isn't the IO volume or pattern per say but instead
> related to the sync nature of said IO.
>
That can be turned off in the Postgres configuration.  For obvious
reasons it's a very bad idea but it is able to be disabled without
actually changing the code itself.

I don't know if it shuts off ALL sync requests, but the documentation
says it does.

It's interesting that you ran into this with RRD going; the machine in
question does pull RRD data for Cacti, but it's such a small piece of
the total load profile that I considered it immaterial.

It might not be.

-- 
-- Karl Denninger
/The Market Ticker Ž/ <http://market-ticker.org>
Cuda Systems LLC