Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 20 Jul 2014 01:54:38 -0700
From:      Jeremy Chadwick <jdc@koitsu.org>
To:        Adrian Chadd <adrian@freebsd.org>
Cc:        FreeBSD Stable Mailing List <freebsd-stable@freebsd.org>
Subject:   Re: Consistently "high" CPU load on 10.0-STABLE
Message-ID:  <20140720085438.GA59359@icarus.home.lan>
In-Reply-To: <CAJ-VmokyYW3Rwxj40nyaOWgVxwBvCGU_4KVY-aZa6GPomYtX0g@mail.gmail.com>
References:  <20140720062413.GA56318@icarus.home.lan> <CAJ-VmokyYW3Rwxj40nyaOWgVxwBvCGU_4KVY-aZa6GPomYtX0g@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help

Okay, so I had to install FreeBSD on a VM (on a different box) to deal
with Xorg and all that jazz, then did the following on the bare-metal
system:

- Let it sit idle (barring cronjobs and existing daemons) for about 10
  minutes
- sysctl debug.ktr.mask=0
- Waited 10-15 seconds
- ktrdump -ctq > ktr.out
- sysctl debug.ktr.mask=0x20000000  (what the value was originally)
- scp'd ktr.out to the VM box
- On VM in X: python /usr/src/tools/sched/schedgraph.py ktr.out 2.8

What I end up with is an application that is a bit difficult for me to
grasp.  It seems to indicate spanning a time frame of 256.47 seconds,
and KTR had collected 247735 events from a total of 83 sources.  I can,
of course, scroll through all those sources (vertically) but I'm not
really sure what it is that's being graphed on a per-event basis.

What the graphs (individually) represent on a vertical scale I'm not
sure.  Horizontally they seem to be "sectionalised" in some manner, like
into blocks (possibly of time?) but I can't tell if that's "how long
something was running for" and if that actually correlates with CPU load
or not.

This is very hard to explain in text, quite honestly, and I can't find a
single example of how to use this tool anywhere online.  rwatson's site
has some vague information (that also seems outdated, particularly
shoving the ktrdump output through sort -n).

I've put a screenshot up of the relevant window, specifically the CPU n
load parts.  Part of me wonders if the repeated "spikes" (especially on
CPU 0?) are indicators of what I'm experiencing, but I really don't
know:

http://jdc.koitsu.org/freebsd/releng10_perf_issue/sched01.png

I'm happy to provide the ktr.out if it's of any use, by the way.  My KTR
kernel configuration settings on the bare metal box are:

options         KTR
options         KTR_ENTRIES=262144
options         KTR_COMPILE=(KTR_SCHED)
options         KTR_MASK=(KTR_SCHED)

Which based on reading the python script vs. what's on the web seem to
be what's generally desired.

I suppose I can try doing things like shutting off all the daemons I
normally run and then see if the problem goes away (and if so try to
track it down to a single daemon), but like I said I don't really see
anything userland-process-wise suddenly start taking up CPU time.
Whatever it is is "heavy" enough to cause the load to go from 0.03 to
0.24 or so within a few seconds and then settle down again.  Part of me
wonders if it's ZFS (periodic txg flushing or something).

Oh, one thing I did find manually: top -S -H -b 999999 on two different
boxes: the "swapper" thread seems interesting and I'll explain why:

RELENG_10 box:

 1:49AM  up  5:52, 1 user, load averages: 0.32, 0.16, 0.10
    0 root       -16    0     0K  4912K swapin  0   1:04   0.00% [kernel{swapper}]

RELENG_9 box:

 1:49AM  up 39 days,  8:19, 1 user, load averages: 0.04, 0.06, 0.02
    0 root       -16    0     0K   160K swapin  0   0:55   0.00% [kernel{swapper}]

I don't know what the "swapper" thread is for or doing (I assume not
related to pagedaemon?), but I find it interesting that the RELENG_10
system has already used 1:04 worth of system time (I think that's 1
minute 4 seconds worth?) for a system that's only been up 5 hours and
not using any swap, while a different RELENG_9 box that's been up for 39
days and doing a lot more I/O (receiving mail, logging things, running a
public web server with logs, etc.) has only used 0:55 (and is actually
using 104MBytes of swap).  TL;DR -- I wonder if it's "swapper" that's
doing it.

In the schedgraph, swapper shows up in little chunks of 10 second
durations, but that may be normal.  :/

On Sun, Jul 20, 2014 at 12:13:05AM -0700, Adrian Chadd wrote:
> Hi,
> 
> I don't know how to do this with dtrace, but take a look at
> tools/sched/schedgraph.py and enable KTR to get some trace records.
> 
> KTR logs the scheduler activity -and- the loadav with it.
> 
> 
> -a
> 
> 
> On 19 July 2014 23:24, Jeremy Chadwick <jdc@koitsu.org> wrote:
> > (Please keep me CC'd as I'm not subscribed to freebsd-stable@)
> >
> > Today I took the liberty of upgrading my main home server from
> > 9.3-STABLE (r268785) to 10.0-STABLE (r268894).  The upgrade consisted of
> > doing a fresh install of 10.0-STABLE on a brand new unused SSD.  Most
> > everything went as planned, barring a couple ports-related anomalies,
> > and I seemed fairly impressed by the fact that buildworld times had
> > dropped to 27 minutes and buildkernel to 4 minutes with clang (something
> > I'd been avoiding like the plague for a long while).  Kudos.
> >
> > But after an hour or so, I noticed a consistent (i.e. reproducible)
> > trend: the system load average tends to hang around 0.10 to 0.15 all the
> > time.  There are times where the load drops to 0.03 or 0.04 but then
> > something kicks it back up to 0.15 or 0.20 and then it slowly levels out
> > again (over the course of a few minutes) then repeats.
> >
> > Obviously this is normal behaviour for a system when something is going
> > on periodically.  So I figured it might have been a userland process
> > behaving differently under 10.x than 9.x.  I let top -a -S -s 1 run and
> > paid very very close attention to it for several minutes.  Nothing.  It
> > doesn't appear to be something userland -- it appears to be something
> > kernel-level, but nothing in top -S shows up as taking up any CPU time
> > other than "[idle]" so I have no idea what might be doing it.
> >
> > The box isn't doing anything like routing network traffic/NAT, it's pure
> > IPv4 (IPv6 disabled in world and kernel, and my home network does
> > basically no IPv6) and sits idle most of the time fetching mail.  It
> > does use ZFS, but not for /, swap, /var, /tmp, or /usr.
> >
> > vmstat -i doesn't particularly show anything awful.  All the cpuX:timer
> > entries tend to fluctuate in rate, usually 120-200 or so; I'd expect an
> > interrupt storm to be showing something in the 1000+ range.
> >
> > The only thing I can think of is the fact that the SSD being used has no
> > 4K quirk entry in the kernel (and its ATA IDENTIFY responds with 512
> > logical, 512 physical, even though we know it's 4K).  The partitions are
> > all 1MB-aligned regardless.
> >
> > This is all bare-metal, by the way -- no virtualisation involved.
> >
> > I do have DTrace enabled/built on this box but I have absolutely no clue
> > how to go about profiling things.  For example maybe output of this sort
> > would be helpful (but I've no idea how to get it):
> >
> > http://lists.freebsd.org/pipermail/freebsd-stable/2014-July/079276.html
> >
> > I'm certain I didn't see this behaviour in 9.x so I'd be happy to try
> > and track it down if I had a little bit of hand-holding.
> >
> > I've put all the things I can think of that might be relevant to "system
> > config/tuning bits" up here:
> >
> > http://jdc.koitsu.org/freebsd/releng10_perf_issue/
> >
> > I should note my kernel config is slightly inaccurate (I've removed some
> > stuff from the file in attempt to rebuild, but building world prior to
> > kernel failed due to r268896 breaking world, but anyone subscribed here
> > has already seen the Jenkins job of that ;-) ).
> >
> > Thanks.
> >
> > --
> > | Jeremy Chadwick                                   jdc@koitsu.org |
> > | UNIX Systems Administrator                http://jdc.koitsu.org/ |
> > | Making life hard for others since 1977.             PGP 4BD6C0CB |
> >
> > _______________________________________________
> > freebsd-stable@freebsd.org mailing list
> > http://lists.freebsd.org/mailman/listinfo/freebsd-stable
> > To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org"
> _______________________________________________
> freebsd-stable@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org"

-- 
| Jeremy Chadwick                                   jdc@koitsu.org |
| UNIX Systems Administrator                http://jdc.koitsu.org/ |
| Making life hard for others since 1977.             PGP 4BD6C0CB |




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20140720085438.GA59359>