Date: Thu, 07 Mar 2013 13:07:11 -0600 From: Karl Denninger <karl@denninger.net> To: freebsd-stable@freebsd.org Subject: Re: ZFS "stalls" -- and maybe we should be talking about defaults? Message-ID: <5138E55F.7080107@denninger.net> In-Reply-To: <F99CDA75FB2C454680C1E8AA9008E9DA@multiplay.co.uk> References: <513524B2.6020600@denninger.net> <20130307072145.GA2923@server.rulingia.com> <5138A4C1.5090503@denninger.net> <F99CDA75FB2C454680C1E8AA9008E9DA@multiplay.co.uk>
next in thread | previous in thread | raw e-mail | index | archive | help
On 3/7/2013 12:57 PM, Steven Hartland wrote: > > ----- Original Message ----- From: "Karl Denninger" <karl@denninger.net> >> Where I am right now is this: >> >> 1. I *CANNOT* reproduce the spins on the test machine with Postgres >> stopped in any way. Even with multiple ZFS send/recv copies going on >> and the load average north of 20 (due to all the geli threads), the >> system doesn't stall or produce any notable pauses in throughput. Nor >> does the system RAM allocation get driven hard enough to force paging. >> This is with NO tuning hacks in /boot/loader.conf. I/O performance is >> both stable and solid. >> >> 2. WITH Postgres running as a connected hot spare (identical to the >> production machine), allocating ~1.5G of shared, wired memory, running >> the same synthetic workload in (1) above I am getting SMALL versions of >> the misbehavior. However, while system RAM allocation gets driven >> pretty hard and reaches down toward 100MB in some instances it doesn't >> get driven hard enough to allocate swap. The "burstiness" is very >> evident in the iostat figures with spates getting into the single digit >> MB/sec range from time to time but it's not enough to drive the system >> to a full-on stall. >> >> There's pretty-clearly a bad interaction here between Postgres wiring >> memory and the ARC, when the latter is left alone and allowed to do what >> it wants. I'm continuing to work on replicating this on the test >> machine... just not completely there yet. > > Another possibility to consider is how postgres uses the FS. For example > does is request sync IO in ways not present in the system without it > which is causing the FS and possibly underlying disk system to behave > differently. > That's possible but not terribly-likely in this particular instance. The reason is that I ran into this with the Postgres data store on a UFS volume BEFORE I converted it. Now it's on the ZFS pool (with recordsize=8k as recommended for that filesystem) but when I first ran into this it was on a separate UFS filesystem (which is where it had resided for 2+ years without incident), so unless the Postgres filesystem use on a UFS volume would give ZFS fits it's unlikely to be involved. > One other options to test, just to rule it out is what happens if you > use BSD scheduler instead of ULE? > > Regards > Steve > I will test that but first I have to get the test machine to reliably stall so I know I'm not chasing my tail. -- -- Karl Denninger /The Market Ticker ®/ <http://market-ticker.org> Cuda Systems LLC
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?5138E55F.7080107>