Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 17 Jul 2013 22:18:52 +0300
From:      Konstantin Belousov <kostikbel@gmail.com>
To:        Mark Johnston <markj@freebsd.org>
Cc:        freebsd-stable@freebsd.org
Subject:   Re: syncer causing latency spikes
Message-ID:  <20130717191852.GS5991@kib.kiev.ua>
In-Reply-To: <20130717180720.GA8289@charmander>
References:  <20130717180720.GA8289@charmander>

next in thread | previous in thread | raw e-mail | index | archive | help

--lx6+J6gHrzwEpYNy
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Wed, Jul 17, 2013 at 02:07:55PM -0400, Mark Johnston wrote:
> Hello,
>=20
> I'm trying to investigate and solve some postgres latency spikes that
> I'm seeing as a result of some behaviour in the syncer. This is with
> FreeBSD 8.2 (with some local modifications and backports, r231160 in
> particular). The system has an LSI 9261-8i RAID controller (backed by
> mfi(4)) and the database and WALs are on separate volumes, a RAID 6 and
> a RAID 1 respectively. It has about 96GB of RAM installed.
>=20
> What's happening is that the syncer tries to fsync a large database file
> and goes to sleep in getpbuf() with the corresponding vnode lock held
> and the following stack:
>=20
> #3  0xffffffff805fceb5 in _sleep (ident=3D0xffffffff80ca8e20, lock=3D0xff=
ffffff80d6bc20, priority=3D-2134554464, wmesg=3D0xffffffff80a4fe43 "wswbuf0=
", timo=3D0) at /d2/usr/src/sys/kern/kern_synch.c:234
> #4  0xffffffff808780d5 in getpbuf (pfreecnt=3D0xffffffff80ca8e20) at /d2/=
usr/src/sys/vm/vm_pager.c:339
> #5  0xffffffff80677a00 in cluster_wbuild (vp=3D0xffffff02ea3d7ce8, size=
=3D16384, start_lbn=3D20869, len=3D2) at /d2/usr/src/sys/kern/vfs_cluster.c=
:801
> #6  0xffffffff808477ed in ffs_syncvnode (vp=3D0xffffff02ea3d7ce8, waitfor=
=3DVariable "waitfor" is not available.) at /d2/usr/src/sys/ufs/ffs/ffs_vno=
ps.c:306
> #7  0xffffffff808488cf in ffs_fsync (ap=3D0xffffff9b0cd27b00) at /d2/usr/=
src/sys/ufs/ffs/ffs_vnops.c:190
> #8  0xffffffff8096798a in VOP_FSYNC_APV (vop=3D0xffffffff80ca5300, a=3D0x=
ffffff9b0cd27b00) at vnode_if.c:1267
> #9  0xffffffff8068bade in sync_vnode (slp=3D0xffffff002ab8e758, bo=3D0xff=
ffff9b0cd27bc0, td=3D0xffffff002ac89460) at vnode_if.h:549
> #10 0xffffffff8068bdcd in sched_sync () at /d2/usr/src/sys/kern/vfs_subr.=
c:1841
>=20
> (kgdb) frame 6
> #6  0xffffffff808477ed in ffs_syncvnode (vp=3D0xffffff02ea3d7ce8, waitfor=
=3DVariable "waitfor" is not available.) at /d2/usr/src/sys/ufs/ffs/ffs_vno=
ps.c:306
> 306                             vfs_bio_awrite(bp);
> (kgdb) vpath vp
> 0xffffff02ea3d7ce8: 18381
> 0xffffff02eab1dce8: 16384
> 0xffffff02eaaf0588: base
> 0xffffff01616d8b10: data
> 0xffffff01616d8ce8: pgsql
> 0xffffff002af9f588: <mount point>
> 0xffffff002af853b0: d3
> 0xffffff002abd6b10: /
> (kgdb)
>=20
> During such an fsync, DTrace shows me that syncer sleeps of 50-200ms are
> happening up to 8 or 10 times a second. When this happens, a bunch of
> postgres threads become blocked in vn_write() waiting for the vnode lock
> to become free. It looks like the write-clustering code is limited to
> using (nswbuf / 2) pbufs, and FreeBSD prevents one from setting nswbuf
> to anything greater than 256.
Syncer is probably just a victim of profiling.  Would postgres called
fsync(2), you then blame the fsync code for the pauses.

Just add a tunable to allow the user to manually-tune the nswbuf,
regardless of the buffer cache sizing.  And yes, nswbuf default max
probably should be bumped to something like 1024, at least on 64bit
architectures which do not starve for kernel memory.

>=20
> Since the sleeps are happening in the write-clustering code, I tried
> disabling write clustering on the mountpoint (with the noclusterw mount
> option) and found that this replaces my problem with another one: the
> syncer periodically generates large bursts of writes that create a
> backlog in the mfi(4) bioq. Then postgres' reads take a few seconds to
> return, causing more or less the same end result.
Well, this is exactly what clustering code is for, merge a lot of
short i/o requests into the bigger i/o ?

>=20
> Does anyone have any suggestions on what I can do to help reduce the
> impact of the syncer on my systems? I can't just move to a newer version
> of FreeBSD, but I'm willing to backport changes if anyone can point me
> to something that might help.

As a side note, syncer code for UFS was redone in HEAD and 9 to only
iterate over the active vnodes.  This is mostly irrelevant to your
problems which happen when syncer does perform the i/o, but still it
saves a lot of CPU.

--lx6+J6gHrzwEpYNy
Content-Type: application/pgp-signature

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.20 (FreeBSD)

iQIcBAEBAgAGBQJR5u4bAAoJEJDCuSvBvK1BIicQAIX9KpI4uwHfTpcdzry0pBjO
g5jl7r23MH2Y/34snX1oNfW0TWIP9dSypMZnmioswLLEVa3G/CFL+dr+A1tCaznc
4WctmktebGlyhjN1qZus/xLhNA1xeiXvSjiPccFTBnd53ce79+FrKvpsQICLXJwh
hzPfHFG4x3VCkbRP/tKj/F000MlYaZm2QUXsp7uSJahd/L0E1UfdRuw/oxUG9dG8
KqtUuq8i2uBZMCR8b/AsRZ9LNOymjc6DKcDEeuQi4xTp51GvMMqy/mspkb0LJDBj
YjsbdhV9fGOArq1TehknwTps0ApdK10++mv0r2QKj5IDOsYhOKDJMg9giYa81bbb
jD3qdMMPVeLqjQxqeGOx1jbKwV0NXvHonOU2b41Bo3xJL8WkdtT3RxeyPNYeLhUZ
2UYTr1k+zCDUnSckK1RH1ezLLiFxfOczpODoWecQI1pq3LNHI3LoH3zblsqzugiJ
nO/eMes2P4szPXTYy8pFI6q8m+XFeOsdJi6j+C+GC56O95Gfzp3I8gaPIKSaidoS
hXuCAXzqZRW1A6ByaMMl5eZIJdXwDWCT+OEWZh0jfwyB2IXO/FmAgqfLTnCRvdJK
uiV1Uvofm325EXldTHT5HeAZW6SHrJHZj6m9It1Tjw13JuuZkgT2w+/n9O3kVYD3
Fv8y4JZIk4/cne0TQo6Z
=8U9c
-----END PGP SIGNATURE-----

--lx6+J6gHrzwEpYNy--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20130717191852.GS5991>