From owner-freebsd-stable@FreeBSD.ORG  Thu Mar  7 19:27:11 2013
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@freebsd.org
Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115])
 by hub.freebsd.org (Postfix) with ESMTP id 879CF685
 for <freebsd-stable@freebsd.org>; Thu,  7 Mar 2013 19:27:11 +0000 (UTC)
 (envelope-from prvs=1778223946=killing@multiplay.co.uk)
Received: from mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23])
 by mx1.freebsd.org (Postfix) with ESMTP id 2A72AE3A
 for <freebsd-stable@freebsd.org>; Thu,  7 Mar 2013 19:27:10 +0000 (UTC)
Received: from r2d2 ([46.65.172.4])
 by mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23])
 (MDaemon PRO v10.0.4) with ESMTP id md50002603614.msg
 for <freebsd-stable@freebsd.org>; Thu, 07 Mar 2013 19:27:09 +0000
X-Spam-Processed: mail1.multiplay.co.uk, Thu, 07 Mar 2013 19:27:09 +0000
 (not processed: message from valid local sender)
X-MDDKIM-Result: neutral (mail1.multiplay.co.uk)
X-MDRemoteIP: 46.65.172.4
X-Return-Path: prvs=1778223946=killing@multiplay.co.uk
X-Envelope-From: killing@multiplay.co.uk
X-MDaemon-Deliver-To: freebsd-stable@freebsd.org
Message-ID: <322C3648171F4BF28201350E5656372A@multiplay.co.uk>
From: "Steven Hartland" <killing@multiplay.co.uk>
To: "Karl Denninger" <karl@denninger.net>,
	<freebsd-stable@freebsd.org>
References: <513524B2.6020600@denninger.net>
 <20130307072145.GA2923@server.rulingia.com> <5138A4C1.5090503@denninger.net>
 <F99CDA75FB2C454680C1E8AA9008E9DA@multiplay.co.uk>
 <5138E55F.7080107@denninger.net>
Subject: Re: ZFS "stalls" -- and maybe we should be talking about defaults?
Date: Thu, 7 Mar 2013 19:27:15 -0000
MIME-Version: 1.0
Content-Type: text/plain; format=flowed; charset="iso-8859-1";
 reply-type=original
Content-Transfer-Encoding: 7bit
X-Priority: 3
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook Express 6.00.2900.5931
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6157
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-stable>,
 <mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
 <mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 07 Mar 2013 19:27:11 -0000


----- Original Message ----- 
From: "Karl Denninger" <karl@denninger.net>
To: <freebsd-stable@freebsd.org>
Sent: Thursday, March 07, 2013 7:07 PM
Subject: Re: ZFS "stalls" -- and maybe we should be talking about defaults?


On 3/7/2013 12:57 PM, Steven Hartland wrote:
>
> ----- Original Message ----- From: "Karl Denninger" <karl@denninger.net>
>> Where I am right now is this:
>>
>> 1. I *CANNOT* reproduce the spins on the test machine with Postgres
>> stopped in any way.  Even with multiple ZFS send/recv copies going on
>> and the load average north of 20 (due to all the geli threads), the
>> system doesn't stall or produce any notable pauses in throughput.  Nor
>> does the system RAM allocation get driven hard enough to force paging.
>> This is with NO tuning hacks in /boot/loader.conf.  I/O performance is
>> both stable and solid.
>>
>> 2. WITH Postgres running as a connected hot spare (identical to the
>> production machine), allocating ~1.5G of shared, wired memory,  running
>> the same synthetic workload in (1) above I am getting SMALL versions of
>> the misbehavior.  However, while system RAM allocation gets driven
>> pretty hard and reaches down toward 100MB in some instances it doesn't
>> get driven hard enough to allocate swap.  The "burstiness" is very
>> evident in the iostat figures with spates getting into the single digit
>> MB/sec range from time to time but it's not enough to drive the system
>> to a full-on stall.
>>
>>> There's pretty-clearly a bad interaction here between Postgres wiring
>>> memory and the ARC, when the latter is left alone and allowed to do what
>>> it wants.   I'm continuing to work on replicating this on the test
>>> machine... just not completely there yet.
>>
>> Another possibility to consider is how postgres uses the FS. For example
>> does is request sync IO in ways not present in the system without it
>> which is causing the FS and possibly underlying disk system to behave
>> differently.
>
> That's possible but not terribly-likely in this particular instance.  
> The reason is that I ran into this with the Postgres data store on a UFS
> volume BEFORE I converted it.  Now it's on the ZFS pool (with
> recordsize=8k as recommended for that filesystem) but when I first ran
> into this it was on a separate UFS filesystem (which is where it had
> resided for 2+ years without incident), so unless the Postgres
> filesystem use on a UFS volume would give ZFS fits it's unlikely to be
> involved.

I hate to say it, but that sounds very familiar to something we experienced
with a machine here which was running high numbers of rrd updates. Again
we had the issue on UFS and saw the same thing when we moved the ZFS.

I'll leave that there as to not derail the investigation with what could
be totally irrelavent info, but it may prove an interesting data point
later.

There are obvious common low level points between UFS and ZFS which
may be the cause. One area which springs to mind is device bio ordering
and barriers which could well be impacted by sync IO requests independent
of the FS in use.

>> One other options to test, just to rule it out is what happens if you
>> use BSD scheduler instead of ULE?
>
> I will test that but first I have to get the test machine to reliably
> stall so I know I'm not chasing my tail.

Very sensible.

Assuming you can reproduce it, one thing that might be interesting to
try is to eliminate all sync IO. I'm not sure if there are options in
Postgres to do this via configuration or if it would require editing
the code but this could reduce the problem space.

If disabling sync IO eliminated the problem it would go a long way
to proving it isn't the IO volume or pattern per say but instead
related to the sync nature of said IO.

    Regards
    Steve

================================================
This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. 

In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337
or return the E.mail to postmaster@multiplay.co.uk.