From owner-freebsd-stable@FreeBSD.ORG Wed Feb 6 23:20:33 2013 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id B577F797 for ; Wed, 6 Feb 2013 23:20:33 +0000 (UTC) (envelope-from dustinwenz@ebureau.com) Received: from internet02.ebureau.com (internet02.tru-signal.biz [65.127.24.21]) by mx1.freebsd.org (Postfix) with ESMTP id 85415260 for ; Wed, 6 Feb 2013 23:20:33 +0000 (UTC) Received: from internet06.ebureau.com (internet06.ebureau.com [65.127.24.25]) by internet02.ebureau.com (Postfix) with ESMTP id ADCCDE52295 for ; Wed, 6 Feb 2013 17:15:10 -0600 (CST) Received: from localhost (localhost [127.0.0.1]) by internet06.ebureau.com (Postfix) with ESMTP id 86F481A5A19C for ; Wed, 6 Feb 2013 17:15:10 -0600 (CST) X-Virus-Scanned: amavisd-new at ebureau.com Received: from internet06.ebureau.com ([127.0.0.1]) by localhost (internet06.ebureau.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id PpJhUPLt7KAo for ; Wed, 6 Feb 2013 17:15:09 -0600 (CST) Received: from square.office.ebureau.com (square.office.ebureau.com [10.10.20.22]) by internet06.ebureau.com (Postfix) with ESMTPSA id F10BD1A5A186 for ; Wed, 6 Feb 2013 17:15:09 -0600 (CST) From: Dustin Wenz Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable Subject: I/O hanging while hosting Postgres database Message-Id: Date: Wed, 6 Feb 2013 17:15:09 -0600 To: freebsd-stable@freebsd.org Mime-Version: 1.0 (Mac OS X Mail 6.2 \(1499\)) X-Mailer: Apple Mail (2.1499) X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 06 Feb 2013 23:20:33 -0000 I'm seeing a condition on FreeBSD 9.1 (built October 24th) where I/O = seems to hang on any local zpools after several hours of hosting a = large-ish Postgres database. The database occupies about 14TB of a 38TB = zpool with a single SSD ZIL. The OS is on a ZFS boot disk. The system = also has 24GB of physical memory. Smartmon tools reports no errors on = any disks attached to the system, and IPMI reports all temperatures, CPU = voltages, and fan speeds are normal. The database has been gradually increasing in size since it was first = deployed on FreeBSD 9.1 this fall. There were no problems until last = night, when the database became unresponsive. Attempts to interact with = the shell would block (specifically, any interaction with the disk), and = no error messages were logged to the console. I restarted the system at = that time, and brought the database back up. Everything seemed normal = until this morning, where the database had become unresponsive again. = Fortunately, I was able to grab some system statistics before the shell = and console went AWOL. The only finding that I thought was suspicious were the kmem_map = numbers: vm.kmem_map_free: 655360 vm.kmem_map_size: 17141383168 It's something like 0.004% free. I haven't been able to find much = documentation on what to expect here, but I don't see anything like that = for other databases that I've monitored. It is possible that kmem_map = can become exhausted without generating a kernel panic? Could it be = indicative of severe memory fragmentation? - .Dustin