From owner-freebsd-stable@FreeBSD.ORG  Wed Jul 17 18:07:10 2013
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by hub.freebsd.org (Postfix) with ESMTP id C7B543BB
 for <freebsd-stable@freebsd.org>; Wed, 17 Jul 2013 18:07:10 +0000 (UTC)
 (envelope-from markjdb@gmail.com)
Received: from mail-ie0-x236.google.com (mail-ie0-x236.google.com
 [IPv6:2607:f8b0:4001:c03::236])
 by mx1.freebsd.org (Postfix) with ESMTP id 9CEA715A
 for <freebsd-stable@freebsd.org>; Wed, 17 Jul 2013 18:07:10 +0000 (UTC)
Received: by mail-ie0-f182.google.com with SMTP id s9so4835826iec.27
 for <freebsd-stable@freebsd.org>; Wed, 17 Jul 2013 11:07:10 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=sender:date:from:to:subject:message-id:mime-version:content-type
 :content-disposition:user-agent;
 bh=z125jsncMZtP0Zj2Z+lCvIdzUU5voF0nU+YHHXc/s/I=;
 b=iKXq1fJKhMsIGVuooniHApd4b3otF8KJkPrT3znllDFJcdkWA8YioHKL4hfaoNhlvW
 RtWh/QtDoInYe3JpNX33lK+S4lxtRyjIXvXQwO7QyV6ULlmgZXYbLhuRmQoAbUd9OXRB
 uO8yQuM9kkqqy4CfNgm7koR4UYJkILEqwAeKN8GvuNP1pV2qO1MBUpAmuygZHasklT6I
 jdKQzMNpJtYQJvb/Cam4rH53rxVos/omumQ/H/LD0BKlNf18bAAQuz4+BZmVIK0qJ89u
 LOp2Bx4UG5reWsHb1txZ3Ak/yZY/mxpp0M1+oeSpVGzyU3FAiJMKy/+BCtHezug9wq5D
 4ANA==
X-Received: by 10.50.225.66 with SMTP id ri2mr1681949igc.55.1374084430392;
 Wed, 17 Jul 2013 11:07:10 -0700 (PDT)
Received: from charmander (mail1.sandvine.com. [64.7.137.162])
 by mx.google.com with ESMTPSA id nm17sm30284167igb.5.2013.07.17.11.07.08
 for <freebsd-stable@freebsd.org>
 (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128);
 Wed, 17 Jul 2013 11:07:09 -0700 (PDT)
Sender: Mark Johnston <markjdb@gmail.com>
Date: Wed, 17 Jul 2013 14:07:55 -0400
From: Mark Johnston <markj@freebsd.org>
To: freebsd-stable@freebsd.org
Subject: syncer causing latency spikes
Message-ID: <20130717180720.GA8289@charmander>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.5.21 (2010-09-15)
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-stable>,
 <mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
 <mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 17 Jul 2013 18:07:10 -0000

Hello,

I'm trying to investigate and solve some postgres latency spikes that
I'm seeing as a result of some behaviour in the syncer. This is with
FreeBSD 8.2 (with some local modifications and backports, r231160 in
particular). The system has an LSI 9261-8i RAID controller (backed by
mfi(4)) and the database and WALs are on separate volumes, a RAID 6 and
a RAID 1 respectively. It has about 96GB of RAM installed.

What's happening is that the syncer tries to fsync a large database file
and goes to sleep in getpbuf() with the corresponding vnode lock held
and the following stack:

#3  0xffffffff805fceb5 in _sleep (ident=0xffffffff80ca8e20, lock=0xffffffff80d6bc20, priority=-2134554464, wmesg=0xffffffff80a4fe43 "wswbuf0", timo=0) at /d2/usr/src/sys/kern/kern_synch.c:234
#4  0xffffffff808780d5 in getpbuf (pfreecnt=0xffffffff80ca8e20) at /d2/usr/src/sys/vm/vm_pager.c:339
#5  0xffffffff80677a00 in cluster_wbuild (vp=0xffffff02ea3d7ce8, size=16384, start_lbn=20869, len=2) at /d2/usr/src/sys/kern/vfs_cluster.c:801
#6  0xffffffff808477ed in ffs_syncvnode (vp=0xffffff02ea3d7ce8, waitfor=Variable "waitfor" is not available.) at /d2/usr/src/sys/ufs/ffs/ffs_vnops.c:306
#7  0xffffffff808488cf in ffs_fsync (ap=0xffffff9b0cd27b00) at /d2/usr/src/sys/ufs/ffs/ffs_vnops.c:190
#8  0xffffffff8096798a in VOP_FSYNC_APV (vop=0xffffffff80ca5300, a=0xffffff9b0cd27b00) at vnode_if.c:1267
#9  0xffffffff8068bade in sync_vnode (slp=0xffffff002ab8e758, bo=0xffffff9b0cd27bc0, td=0xffffff002ac89460) at vnode_if.h:549
#10 0xffffffff8068bdcd in sched_sync () at /d2/usr/src/sys/kern/vfs_subr.c:1841

(kgdb) frame 6
#6  0xffffffff808477ed in ffs_syncvnode (vp=0xffffff02ea3d7ce8, waitfor=Variable "waitfor" is not available.) at /d2/usr/src/sys/ufs/ffs/ffs_vnops.c:306
306                             vfs_bio_awrite(bp);
(kgdb) vpath vp
0xffffff02ea3d7ce8: 18381
0xffffff02eab1dce8: 16384
0xffffff02eaaf0588: base
0xffffff01616d8b10: data
0xffffff01616d8ce8: pgsql
0xffffff002af9f588: <mount point>
0xffffff002af853b0: d3
0xffffff002abd6b10: /
(kgdb)

During such an fsync, DTrace shows me that syncer sleeps of 50-200ms are
happening up to 8 or 10 times a second. When this happens, a bunch of
postgres threads become blocked in vn_write() waiting for the vnode lock
to become free. It looks like the write-clustering code is limited to
using (nswbuf / 2) pbufs, and FreeBSD prevents one from setting nswbuf
to anything greater than 256.

Since the sleeps are happening in the write-clustering code, I tried
disabling write clustering on the mountpoint (with the noclusterw mount
option) and found that this replaces my problem with another one: the
syncer periodically generates large bursts of writes that create a
backlog in the mfi(4) bioq. Then postgres' reads take a few seconds to
return, causing more or less the same end result.

Does anyone have any suggestions on what I can do to help reduce the
impact of the syncer on my systems? I can't just move to a newer version
of FreeBSD, but I'm willing to backport changes if anyone can point me
to something that might help.

Thanks,
-Mark