Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 9 Aug 2018 09:28:09 -0600
From:      Warner Losh <imp@bsdimp.com>
To:        Mark Johnston <markj@freebsd.org>
Cc:        bob prohaska <fbsd@www.zefox.net>, "freebsd-arm@freebsd.org" <freebsd-arm@freebsd.org>
Subject:   Re: RPI3 swap experiments ["was killed: out of swap space" with: "v_free_count: 5439, v_inactive_count: 1"]
Message-ID:  <CANCZdfpKOTBrxiNhaeHHRp-2iw5a4eXt%2Bmd_1LTD-c0%2BAE6qxg@mail.gmail.com>
In-Reply-To: <20180809152152.GC68459@raichu>
References:  <20180801034511.GA96616@www.zefox.net> <201808010405.w7145RS6086730@donotpassgo.dyslexicfish.net> <6BFE7B77-A0E2-4FAF-9C68-81951D2F6627@yahoo.com> <20180802002841.GB99523@www.zefox.net> <20180802015135.GC99523@www.zefox.net> <EC74A5A6-0DF4-48EB-88DA-543FD70FEA07@yahoo.com> <20180806155837.GA6277@raichu> <20180808153800.GF26133@www.zefox.net> <20180808204841.GA19379@raichu> <20180809065648.GB30347@www.zefox.net> <20180809152152.GC68459@raichu>

next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, Aug 9, 2018 at 9:21 AM, Mark Johnston <markj@freebsd.org> wrote:

> On Wed, Aug 08, 2018 at 11:56:48PM -0700, bob prohaska wrote:
> > On Wed, Aug 08, 2018 at 04:48:41PM -0400, Mark Johnston wrote:
> > > On Wed, Aug 08, 2018 at 08:38:00AM -0700, bob prohaska wrote:
> > > > The patched kernel ran longer than default but OOMA still halted
> buildworld around
> > > > 13 MB. That's considerably farther than a default build world have
> run but less than
> > > > observed when setting vm.pageout_oom_seq=120 alone. Log files are at
> > > > http://www.zefox.net/~fbsd/rpi3/swaptests/r337226M/
> 1gbsdflash_1gbusbflash/batchqueue/
> > > >
> > > > Both changes are now in place and -j4 buildworld has been restarted.
> > >
> > > Looking through the gstat output, I'm seeing some pretty abysmal
> average
> > > write latencies for da0, the flash drive.  I also realized that my
> > > reference to r329882 lowering the pagedaemon sleep period was wrong -
> > > things have been this way for much longer than that.  Moreover, as you
> > > pointed out, bumping oom_seq to a much larger value wasn't quite
> > > sufficient.
> > >
> > > I'm curious as to what the worst case swap I/O latencies are in your
> > > test, since the average latencies reported in your logs are high enough
> > > to trigger OOM kills even with the increased oom_seq value.  When the
> > > current test finishes, could you try repeating it with this patch
> > > applied on top? https://people.freebsd.org/~
> markj/patches/slow_swap.diff
> > > That is, keep the non-default oom_seq setting and modification to
> > > VM_BATCHQUEUE_SIZE, and apply this patch on top.  It'll cause the
> kernel
> > > to print messages to the console under certain conditions, so a log of
> > > console output will be interesting.
> >
> > The run finished with a panic, I've collected the logs and terminal
> output at
> > http://www.zefox.net/~fbsd/rpi3/swaptests/r337226M/
> 1gbsdflash_1gbusbflash/batchqueue/pageout120/slow_swap/
> >
> > There seems to be a considerable discrepancy between the wait times
> reported
> > by the patch and the wait times reported by gstat in the first couple of
> > occurrences. The fun begins at timestamp Wed Aug  8 21:26:03 PDT 2018 in
> > swapscript.log.
>
> The reports of "waited for swap buffer" are especially bad: during those
> periods, the laundry thread is blocked waiting for in-flight swap writes
> to finish before sending any more.  Because the system is generally
> quite starved for clean pages that it can reuse, it's relying on swap
> I/O to clean more.  If that fails, the system eventually has no choice
> but to start killing processes (where the time period corresponding to
> "eventually" is determined by vm.pageout_oom_seq).
>


Based on these latencies, I think the system is behaving more or less as
> expected from the VM's perspective.  I do think the default oom_seq value
> is too low and will get that addressed in 12.0.


Yea. I think we need to take a more active role in managing latencies on
some cards. Properly managed, they won't climb that high. Since there's no
tagged queueing to these devices, there's an I/O depth of one. The default
policy is to do them in order (since it's flash) which means that processes
that machine-gun down requests swamp everybody else and do
back-to-back-to-back writes which, at least for the few drives I have
looked at in detail tends to induce pathological behavior.

Warner



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CANCZdfpKOTBrxiNhaeHHRp-2iw5a4eXt%2Bmd_1LTD-c0%2BAE6qxg>