From owner-freebsd-stable@FreeBSD.ORG Thu Mar 7 19:30:57 2013 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id E58FDB0B for ; Thu, 7 Mar 2013 19:30:57 +0000 (UTC) (envelope-from karl@denninger.net) Received: from fs.denninger.net (wsip-70-169-168-7.pn.at.cox.net [70.169.168.7]) by mx1.freebsd.org (Postfix) with ESMTP id 82A4DE6E for ; Thu, 7 Mar 2013 19:30:57 +0000 (UTC) Received: from [127.0.0.1] (localhost [127.0.0.1]) by fs.denninger.net (8.14.6/8.13.1) with ESMTP id r27JUuwO090443 for ; Thu, 7 Mar 2013 13:30:56 -0600 (CST) (envelope-from karl@denninger.net) Received: from [127.0.0.1] [192.168.1.40] by Spamblock-sys (LOCAL); Thu Mar 7 13:30:56 2013 Message-ID: <5138EAEB.7010105@denninger.net> Date: Thu, 07 Mar 2013 13:30:51 -0600 From: Karl Denninger User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:17.0) Gecko/20130215 Thunderbird/17.0.3 MIME-Version: 1.0 To: freebsd-stable@freebsd.org Subject: Re: ZFS "stalls" -- and maybe we should be talking about defaults? References: <513524B2.6020600@denninger.net> <20130307072145.GA2923@server.rulingia.com> <5138A4C1.5090503@denninger.net> <5138E55F.7080107@denninger.net> <322C3648171F4BF28201350E5656372A@multiplay.co.uk> In-Reply-To: <322C3648171F4BF28201350E5656372A@multiplay.co.uk> X-Enigmail-Version: 1.5.1 X-Antivirus: avast! (VPS 130307-0, 03/07/2013), Outbound message X-Antivirus-Status: Clean Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8bit X-Content-Filtered-By: Mailman/MimeDel 2.1.14 X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 07 Mar 2013 19:30:58 -0000 On 3/7/2013 1:27 PM, Steven Hartland wrote: > > ----- Original Message ----- From: "Karl Denninger" > To: > Sent: Thursday, March 07, 2013 7:07 PM > Subject: Re: ZFS "stalls" -- and maybe we should be talking about > defaults? > > > > On 3/7/2013 12:57 PM, Steven Hartland wrote: >> >> ----- Original Message ----- From: "Karl Denninger" >>> Where I am right now is this: >>> >>> 1. I *CANNOT* reproduce the spins on the test machine with Postgres >>> stopped in any way. Even with multiple ZFS send/recv copies going on >>> and the load average north of 20 (due to all the geli threads), the >>> system doesn't stall or produce any notable pauses in throughput. Nor >>> does the system RAM allocation get driven hard enough to force paging. >>> This is with NO tuning hacks in /boot/loader.conf. I/O performance is >>> both stable and solid. >>> >>> 2. WITH Postgres running as a connected hot spare (identical to the >>> production machine), allocating ~1.5G of shared, wired memory, running >>> the same synthetic workload in (1) above I am getting SMALL versions of >>> the misbehavior. However, while system RAM allocation gets driven >>> pretty hard and reaches down toward 100MB in some instances it doesn't >>> get driven hard enough to allocate swap. The "burstiness" is very >>> evident in the iostat figures with spates getting into the single digit >>> MB/sec range from time to time but it's not enough to drive the system >>> to a full-on stall. >>> >>>> There's pretty-clearly a bad interaction here between Postgres wiring >>>> memory and the ARC, when the latter is left alone and allowed to do >>>> what >>>> it wants. I'm continuing to work on replicating this on the test >>>> machine... just not completely there yet. >>> >>> Another possibility to consider is how postgres uses the FS. For >>> example >>> does is request sync IO in ways not present in the system without it >>> which is causing the FS and possibly underlying disk system to behave >>> differently. >> >> That's possible but not terribly-likely in this particular instance. >> The reason is that I ran into this with the Postgres data store on a UFS >> volume BEFORE I converted it. Now it's on the ZFS pool (with >> recordsize=8k as recommended for that filesystem) but when I first ran >> into this it was on a separate UFS filesystem (which is where it had >> resided for 2+ years without incident), so unless the Postgres >> filesystem use on a UFS volume would give ZFS fits it's unlikely to be >> involved. > > I hate to say it, but that sounds very familiar to something we > experienced > with a machine here which was running high numbers of rrd updates. Again > we had the issue on UFS and saw the same thing when we moved the ZFS. > > I'll leave that there as to not derail the investigation with what could > be totally irrelavent info, but it may prove an interesting data point > later. > > There are obvious common low level points between UFS and ZFS which > may be the cause. One area which springs to mind is device bio ordering > and barriers which could well be impacted by sync IO requests independent > of the FS in use. > >>> One other options to test, just to rule it out is what happens if you >>> use BSD scheduler instead of ULE? >> >> I will test that but first I have to get the test machine to reliably >> stall so I know I'm not chasing my tail. > > Very sensible. > > Assuming you can reproduce it, one thing that might be interesting to > try is to eliminate all sync IO. I'm not sure if there are options in > Postgres to do this via configuration or if it would require editing > the code but this could reduce the problem space. > > If disabling sync IO eliminated the problem it would go a long way > to proving it isn't the IO volume or pattern per say but instead > related to the sync nature of said IO. > That can be turned off in the Postgres configuration. For obvious reasons it's a very bad idea but it is able to be disabled without actually changing the code itself. I don't know if it shuts off ALL sync requests, but the documentation says it does. It's interesting that you ran into this with RRD going; the machine in question does pull RRD data for Cacti, but it's such a small piece of the total load profile that I considered it immaterial. It might not be. -- -- Karl Denninger /The Market Ticker ®/ Cuda Systems LLC