From owner-freebsd-stable@FreeBSD.ORG Thu Mar 7 19:07:18 2013 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 60144C2 for ; Thu, 7 Mar 2013 19:07:18 +0000 (UTC) (envelope-from karl@denninger.net) Received: from fs.denninger.net (wsip-70-169-168-7.pn.at.cox.net [70.169.168.7]) by mx1.freebsd.org (Postfix) with ESMTP id 10880D72 for ; Thu, 7 Mar 2013 19:07:17 +0000 (UTC) Received: from [127.0.0.1] (localhost [127.0.0.1]) by fs.denninger.net (8.14.6/8.13.1) with ESMTP id r27J7GKF089168 for ; Thu, 7 Mar 2013 13:07:16 -0600 (CST) (envelope-from karl@denninger.net) Received: from [127.0.0.1] [192.168.1.40] by Spamblock-sys (LOCAL); Thu Mar 7 13:07:16 2013 Message-ID: <5138E55F.7080107@denninger.net> Date: Thu, 07 Mar 2013 13:07:11 -0600 From: Karl Denninger User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:17.0) Gecko/20130215 Thunderbird/17.0.3 MIME-Version: 1.0 To: freebsd-stable@freebsd.org Subject: Re: ZFS "stalls" -- and maybe we should be talking about defaults? References: <513524B2.6020600@denninger.net> <20130307072145.GA2923@server.rulingia.com> <5138A4C1.5090503@denninger.net> In-Reply-To: X-Enigmail-Version: 1.5.1 X-Antivirus: avast! (VPS 130307-0, 03/07/2013), Outbound message X-Antivirus-Status: Clean Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8bit X-Content-Filtered-By: Mailman/MimeDel 2.1.14 X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 07 Mar 2013 19:07:18 -0000 On 3/7/2013 12:57 PM, Steven Hartland wrote: > > ----- Original Message ----- From: "Karl Denninger" >> Where I am right now is this: >> >> 1. I *CANNOT* reproduce the spins on the test machine with Postgres >> stopped in any way. Even with multiple ZFS send/recv copies going on >> and the load average north of 20 (due to all the geli threads), the >> system doesn't stall or produce any notable pauses in throughput. Nor >> does the system RAM allocation get driven hard enough to force paging. >> This is with NO tuning hacks in /boot/loader.conf. I/O performance is >> both stable and solid. >> >> 2. WITH Postgres running as a connected hot spare (identical to the >> production machine), allocating ~1.5G of shared, wired memory, running >> the same synthetic workload in (1) above I am getting SMALL versions of >> the misbehavior. However, while system RAM allocation gets driven >> pretty hard and reaches down toward 100MB in some instances it doesn't >> get driven hard enough to allocate swap. The "burstiness" is very >> evident in the iostat figures with spates getting into the single digit >> MB/sec range from time to time but it's not enough to drive the system >> to a full-on stall. >> >> There's pretty-clearly a bad interaction here between Postgres wiring >> memory and the ARC, when the latter is left alone and allowed to do what >> it wants. I'm continuing to work on replicating this on the test >> machine... just not completely there yet. > > Another possibility to consider is how postgres uses the FS. For example > does is request sync IO in ways not present in the system without it > which is causing the FS and possibly underlying disk system to behave > differently. > That's possible but not terribly-likely in this particular instance. The reason is that I ran into this with the Postgres data store on a UFS volume BEFORE I converted it. Now it's on the ZFS pool (with recordsize=8k as recommended for that filesystem) but when I first ran into this it was on a separate UFS filesystem (which is where it had resided for 2+ years without incident), so unless the Postgres filesystem use on a UFS volume would give ZFS fits it's unlikely to be involved. > One other options to test, just to rule it out is what happens if you > use BSD scheduler instead of ULE? > > Regards > Steve > I will test that but first I have to get the test machine to reliably stall so I know I'm not chasing my tail. -- -- Karl Denninger /The Market Ticker ®/ Cuda Systems LLC