Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 26 Feb 2005 13:47:21 -0800 (PST)
From:      Don Lewis <truckman@FreeBSD.org>
To:        PeterJeremy@optushome.com.au
Cc:        freebsd-stable@FreeBSD.org
Subject:   Re: Excessive delays due to syncer kthread
Message-ID:  <200502262147.j1QLlLoo008885@gw.catspoiler.org>
In-Reply-To: <20050226071308.GN57256@cirb503493.alcatel.com.au>

next in thread | previous in thread | raw e-mail | index | archive | help
On 26 Feb, Peter Jeremy wrote:
> I am trying to do some video capture and have been losing occasional
> fields.  After adding some debugging code to the kernel, I've found
> that the problem is excessive latency between the hardware interrupt
> and the driver interrupt - the hardware can handle about 1.5msec of
> latency.  Most of the time, the latency is less than 20µsec but but
> I'm seeing up to 8 msec occasionally.  In virtually all cases where
> there is a problem, curproc at the time of the hardware interrupt is
> syncer.  (I had one case where there was another process, but it had
> died by the time I went looking for it).  The interrupt is marked
> INTR_TYPE_AV so it shouldn't be being delayed by other threads.  (I
> can't easily make it INTR_FAST because it needs to call psignal(9)).
> 
> The system is an Athlon XP-1800 with 512MB RAM and 2 ATA-100 disks
> running 5.3-RELEASE-p5.  It has a couple of NFS exports but doesn't
> import anything.  There's nothing much running apart from ffmpeg
> capturing the video and a process capturing my kernel debugging
> output.  Apart from 4 files being sequentially written as part of my
> capture and cron regularly waking up to go back to sleep, there
> shouldn't be any filesystem activity.  I tried copying a couple of
> large files and touching lots of files but that didn't cause any
> problems.
> 
> Can anyone suggest why syncer would be occasionally running for
> up to 8 msec at a time?  Overall, it's not clocking up a great
> deal of CPU time, it just seems to grab it in large chunks.

You're probably running into the inode timestamp update loop.  Each
mounted file system has a special "syncer vnode" that remains
permanently on the syncer worklist.  The syncer will call VOP_FSYNC() on
each of these vnodes as it encounters them in the work list, which it
traverses every 32 seconds.  This is done so that things like the
superblock and other file system metadata is periodically written to
disk.  In the case of ufs, the code that does this is in ffs_sync().

I suspect that the problem that you are running into is that ffs_sync()
(and ext2_sync()) also handle inode timestamp updates.  Each time they
are called, they walk the list of vnodes for the file system and call
VOP_FSYNC() for any that have unwritten timestamp updates.  As the
comment in the loop in ffs_sync() says:

                /*
                 * Depend on the mntvnode_slock to keep things stable enough
                 * for a quick test.  Since there might be hundreds of
                 * thousands of vnodes, we cannot afford even a subroutine
                 * call unless there's a good chance that we have work to do.
                 */

I noticed a related performance problem a while back.  If you are doing
something that writes to a lot of files, like untarring the ports tree,
there will be large bursts of disk activity every 30 seconds and the
system gets very sluggish. Soft updates and the new syncer were supposed
to eliminate this behaviour by spreading out the write activity over
time, but this loop in ffs_sync() will cause a burst of writes every
time it is called.  This can also be observed by watching the length of
the syncer worklist.  When untarring the ports tree, the length of the
worklist should increase to a certain, high level, and stabilize.
Instead it ramps up over about thirty seconds and then takes a dramatic
drop.

In the initial softupdates implementation, some of the work inside the
loop was skipped in the MNT_LAZY case, but it was found that timestamp
updates were being deferred for too long a time.

I talked to Kirk about entirely bypassing this loop in the MNT_LAZY case
and moving the timestamp updates to the syncer worklist.  Kirk sounded
positive on the idea, but I never found the time to work on the
implementation.  and phk's conversion of the syncer to use bufobjs
instead of vnodes complicated things (what do you do about fifos and
sockets?).



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200502262147.j1QLlLoo008885>