Date: Thu, 07 Mar 2013 13:30:51 -0600 From: Karl Denninger <karl@denninger.net> To: freebsd-stable@freebsd.org Subject: Re: ZFS "stalls" -- and maybe we should be talking about defaults? Message-ID: <5138EAEB.7010105@denninger.net> In-Reply-To: <322C3648171F4BF28201350E5656372A@multiplay.co.uk> References: <513524B2.6020600@denninger.net> <20130307072145.GA2923@server.rulingia.com> <5138A4C1.5090503@denninger.net> <F99CDA75FB2C454680C1E8AA9008E9DA@multiplay.co.uk> <5138E55F.7080107@denninger.net> <322C3648171F4BF28201350E5656372A@multiplay.co.uk>
next in thread | previous in thread | raw e-mail | index | archive | help
On 3/7/2013 1:27 PM, Steven Hartland wrote: > > ----- Original Message ----- From: "Karl Denninger" <karl@denninger.net> > To: <freebsd-stable@freebsd.org> > Sent: Thursday, March 07, 2013 7:07 PM > Subject: Re: ZFS "stalls" -- and maybe we should be talking about > defaults? > > > > On 3/7/2013 12:57 PM, Steven Hartland wrote: >> >> ----- Original Message ----- From: "Karl Denninger" <karl@denninger.net> >>> Where I am right now is this: >>> >>> 1. I *CANNOT* reproduce the spins on the test machine with Postgres >>> stopped in any way. Even with multiple ZFS send/recv copies going on >>> and the load average north of 20 (due to all the geli threads), the >>> system doesn't stall or produce any notable pauses in throughput. Nor >>> does the system RAM allocation get driven hard enough to force paging. >>> This is with NO tuning hacks in /boot/loader.conf. I/O performance is >>> both stable and solid. >>> >>> 2. WITH Postgres running as a connected hot spare (identical to the >>> production machine), allocating ~1.5G of shared, wired memory, running >>> the same synthetic workload in (1) above I am getting SMALL versions of >>> the misbehavior. However, while system RAM allocation gets driven >>> pretty hard and reaches down toward 100MB in some instances it doesn't >>> get driven hard enough to allocate swap. The "burstiness" is very >>> evident in the iostat figures with spates getting into the single digit >>> MB/sec range from time to time but it's not enough to drive the system >>> to a full-on stall. >>> >>>> There's pretty-clearly a bad interaction here between Postgres wiring >>>> memory and the ARC, when the latter is left alone and allowed to do >>>> what >>>> it wants. I'm continuing to work on replicating this on the test >>>> machine... just not completely there yet. >>> >>> Another possibility to consider is how postgres uses the FS. For >>> example >>> does is request sync IO in ways not present in the system without it >>> which is causing the FS and possibly underlying disk system to behave >>> differently. >> >> That's possible but not terribly-likely in this particular instance. >> The reason is that I ran into this with the Postgres data store on a UFS >> volume BEFORE I converted it. Now it's on the ZFS pool (with >> recordsize=8k as recommended for that filesystem) but when I first ran >> into this it was on a separate UFS filesystem (which is where it had >> resided for 2+ years without incident), so unless the Postgres >> filesystem use on a UFS volume would give ZFS fits it's unlikely to be >> involved. > > I hate to say it, but that sounds very familiar to something we > experienced > with a machine here which was running high numbers of rrd updates. Again > we had the issue on UFS and saw the same thing when we moved the ZFS. > > I'll leave that there as to not derail the investigation with what could > be totally irrelavent info, but it may prove an interesting data point > later. > > There are obvious common low level points between UFS and ZFS which > may be the cause. One area which springs to mind is device bio ordering > and barriers which could well be impacted by sync IO requests independent > of the FS in use. > >>> One other options to test, just to rule it out is what happens if you >>> use BSD scheduler instead of ULE? >> >> I will test that but first I have to get the test machine to reliably >> stall so I know I'm not chasing my tail. > > Very sensible. > > Assuming you can reproduce it, one thing that might be interesting to > try is to eliminate all sync IO. I'm not sure if there are options in > Postgres to do this via configuration or if it would require editing > the code but this could reduce the problem space. > > If disabling sync IO eliminated the problem it would go a long way > to proving it isn't the IO volume or pattern per say but instead > related to the sync nature of said IO. > That can be turned off in the Postgres configuration. For obvious reasons it's a very bad idea but it is able to be disabled without actually changing the code itself. I don't know if it shuts off ALL sync requests, but the documentation says it does. It's interesting that you ran into this with RRD going; the machine in question does pull RRD data for Cacti, but it's such a small piece of the total load profile that I considered it immaterial. It might not be. -- -- Karl Denninger /The Market Ticker ®/ <http://market-ticker.org> Cuda Systems LLC
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?5138EAEB.7010105>