Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 14 Aug 2018 17:50:11 -0600
From:      Warner Losh <imp@bsdimp.com>
To:        bob prohaska <fbsd@www.zefox.net>
Cc:        Mark Millard <marklmi@yahoo.com>, freebsd-arm <freebsd-arm@freebsd.org>,  Mark Johnston <markj@freebsd.org>
Subject:   Re: RPI3 swap experiments (grace under pressure)
Message-ID:  <CANCZdfqFKY3Woa%2B9pVS5hika_JUAUCxAvLznSS4gaLq2kKoWtQ@mail.gmail.com>
In-Reply-To: <20180814014226.GA50013@www.zefox.net>
References:  <20180809033735.GJ30738@phouka1.phouka.net> <20180809175802.GA32974@www.zefox.net> <20180812173248.GA81324@phouka1.phouka.net> <20180812224021.GA46372@www.zefox.net> <B81E53A9-459E-4489-883B-24175B87D049@yahoo.com> <20180813021226.GA46750@www.zefox.net> <0D8B9A29-DD95-4FA3-8F7D-4B85A3BB54D7@yahoo.com> <FC0798A1-C805-4096-9EB1-15E3F854F729@yahoo.com> <20180813185350.GA47132@www.zefox.net> <FA3B8541-73E0-4796-B2AB-D55CE40B9654@yahoo.com> <20180814014226.GA50013@www.zefox.net>

next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, Aug 13, 2018 at 7:42 PM, bob prohaska <fbsd@www.zefox.net> wrote:

> [Altered subject, philosophical question]
> On Mon, Aug 13, 2018 at 01:05:38PM -0700, Mark Millard wrote:
> >
> > Here there is architecture choice and goals/primary
> > contexts. FreeBSD is never likely to primarily target
> > anything with a workload like buildworld buildkernel
> > on hardware like rpi3's and rpi2 V1.1's and
> > Pine64+ 2GB's and so on.
> >
>
> I understand that the RPi isn't a primary platform for FreeBSD.
> But, decent performance under overload seems like a universal
> problem that's always worth solving, whether for a computer or
> an office. The exact goals might vary, but coping with too much
> to do and not enough to do it with is humanity's oldest puzzle.
>
> Maybe I should ask what the goals of the OOMA process serve.
> I always thought an OS's goals were along the lines of:
> 1. maintain control
> 2. get the work done
> 3. remain responsive
>

Simplistically, one can view the VM system as a producer of dirty pages,
and a cleaner of dirty pages. These happen at different rates, but usually
are closely matched. We're normally able to launder enough pages to satisfy
the need for new pages from the VM system (since clean pages can just be
thrown away w/o any loss of data). The problem happens when we put a large
load onto the creation side with a build. This generates a lot of dirty
pages, and we have to flush the writes of the dirty pages quickly to keep
up. When the backing store has time-varying write rates that vary
substantially, we run into problems. We're not able to clean enough pages
to keep up with demand. The system does what it can to slow down demand,
but at some point it just can't keep up and we trigger OOM. I'm still
firmly convinced that a combination of bugs that's making the storage
system less robust.

The solution? Fix those bugs. Once you do that, however, you are still
stuck with crappy hardware is crappy. Swapping to the ultra-low-end is
still going to suck. USB and SD cards generally is geared to long stretches
of sequential writes and random reads since they are expected to go into
cameras, or used as sneaker net.

We might be able to not overload the device so much via tweaks to either
the swap-out code (to reduce its rate more quickly when the GC on the card
goes wonkies). But that might also allow for some way to write bigger,
contiguous blocks when swapping out (which would help avoid the Read Modify
Write behavior on 'small' writes that grind performance of some USB/SD
flash devices into the ground). That would help this workload (and likely
others). This is tricky because you'd want to do that as part of a single
write which has some tricky implications for the VM system. These can be
dealt with, of course. And the code to page it out will need a scatter
gather list do the DMA works right, so we have to be careful not to exceed
those limits. There's some clustering in the page-out code, but the swapper
looks like it could use some work... I've not studied closely though to
start work. At Netflix we've seen some workloads that suggest some
improvements there would be helpful for us, but I don't know if that's the
same problem or a different, related one.

So, philosophically, I agree that the system shouldn't suck. Making it
robust against suckage for extreme events that don't match the historic
usage of BSD, though, is going to take some work.

Warner



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CANCZdfqFKY3Woa%2B9pVS5hika_JUAUCxAvLznSS4gaLq2kKoWtQ>