From owner-freebsd-stable@FreeBSD.ORG  Wed Feb  6 23:20:33 2013
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@freebsd.org
Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115])
 by hub.freebsd.org (Postfix) with ESMTP id B577F797
 for <freebsd-stable@freebsd.org>; Wed,  6 Feb 2013 23:20:33 +0000 (UTC)
 (envelope-from dustinwenz@ebureau.com)
Received: from internet02.ebureau.com (internet02.tru-signal.biz
 [65.127.24.21]) by mx1.freebsd.org (Postfix) with ESMTP id 85415260
 for <freebsd-stable@freebsd.org>; Wed,  6 Feb 2013 23:20:33 +0000 (UTC)
Received: from internet06.ebureau.com (internet06.ebureau.com [65.127.24.25])
 by internet02.ebureau.com (Postfix) with ESMTP id ADCCDE52295
 for <freebsd-stable@freebsd.org>; Wed,  6 Feb 2013 17:15:10 -0600 (CST)
Received: from localhost (localhost [127.0.0.1])
 by internet06.ebureau.com (Postfix) with ESMTP id 86F481A5A19C
 for <freebsd-stable@freebsd.org>; Wed,  6 Feb 2013 17:15:10 -0600 (CST)
X-Virus-Scanned: amavisd-new at ebureau.com
Received: from internet06.ebureau.com ([127.0.0.1])
 by localhost (internet06.ebureau.com [127.0.0.1]) (amavisd-new, port 10024)
 with ESMTP id PpJhUPLt7KAo for <freebsd-stable@freebsd.org>;
 Wed,  6 Feb 2013 17:15:09 -0600 (CST)
Received: from square.office.ebureau.com (square.office.ebureau.com
 [10.10.20.22])
 by internet06.ebureau.com (Postfix) with ESMTPSA id F10BD1A5A186
 for <freebsd-stable@freebsd.org>; Wed,  6 Feb 2013 17:15:09 -0600 (CST)
From: Dustin Wenz <dustinwenz@ebureau.com>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: quoted-printable
Subject: I/O hanging while hosting Postgres database
Message-Id: <B0A2F9DE-EC90-441B-AF82-85545BD341EF@ebureau.com>
Date: Wed, 6 Feb 2013 17:15:09 -0600
To: freebsd-stable@freebsd.org
Mime-Version: 1.0 (Mac OS X Mail 6.2 \(1499\))
X-Mailer: Apple Mail (2.1499)
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-stable>,
 <mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
 <mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 06 Feb 2013 23:20:33 -0000

I'm seeing a condition on FreeBSD 9.1 (built October 24th) where I/O =
seems to hang on any local zpools after several hours of hosting a =
large-ish Postgres database. The database occupies about 14TB of a 38TB =
zpool with a single SSD ZIL. The OS is on a ZFS boot disk. The system =
also has 24GB of physical memory. Smartmon tools reports no errors on =
any disks attached to the system, and IPMI reports all temperatures, CPU =
voltages, and fan speeds are normal.

The database has been gradually increasing in size since it was first =
deployed on FreeBSD 9.1 this fall. There were no problems until last =
night, when the database became unresponsive. Attempts to interact with =
the shell would block (specifically, any interaction with the disk), and =
no error messages were logged to the console. I restarted the system at =
that time, and brought the database back up. Everything seemed normal =
until this morning, where the database had become unresponsive again. =
Fortunately, I was able to grab some system statistics before the shell =
and console went AWOL.

The only finding that I thought was suspicious were the kmem_map =
numbers:

	vm.kmem_map_free: 655360
	vm.kmem_map_size: 17141383168

It's something like 0.004% free. I haven't been able to find much =
documentation on what to expect here, but I don't see anything like that =
for other databases that I've monitored. It is possible that kmem_map =
can become exhausted without generating a kernel panic? Could it be =
indicative of severe memory fragmentation?

	- .Dustin