Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 5 Sep 2018 22:15:20 -0700
From:      bob prohaska <fbsd@www.zefox.net>
To:        Mark Millard <marklmi@yahoo.com>
Cc:        freebsd-arm@freebsd.org, bob prohaska <fbsd@www.zefox.net>
Subject:   Re: RPI3 swap experiments (r338342 with vm.pageout_oom_seq="1024" and 6 GB swap)
Message-ID:  <20180906051520.GB3482@www.zefox.net>
In-Reply-To: <FB333A71-47D8-4038-9983-116DA80FC952@yahoo.com>
References:  <20180813185350.GA47132@www.zefox.net> <FA3B8541-73E0-4796-B2AB-D55CE40B9654@yahoo.com> <20180814014226.GA50013@www.zefox.net> <CANCZdfqFKY3Woa%2B9pVS5hika_JUAUCxAvLznSS4gaLq2kKoWtQ@mail.gmail.com> <20180815013612.GB51051@www.zefox.net> <CANCZdfoB_AcidFpKT_ZmZWUFnmC4Bw55krK%2BMqEmmj=f9KMQ2Q@mail.gmail.com> <20180815225504.GB59074@www.zefox.net> <20180901230233.GA42895@www.zefox.net> <20180906003829.GC818@www.zefox.net> <FB333A71-47D8-4038-9983-116DA80FC952@yahoo.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, Sep 05, 2018 at 07:05:11PM -0700, Mark Millard wrote:
> [I've omitted Kirk McKusick as my notes are largely off subject for
> what he asked about for testing specific to his changes.]
> 
> On 2018-Sep-5, at 5:38 PM, bob prohaska <fbsd at www.zefox.net> wrote:
> 
> > On Sat, Sep 01, 2018 at 04:02:33PM -0700, bob prohaska wrote:
> > 
> > It looks as if using all six GB of swap doesn't cause any immediate problem,
> > at least so long as swap usage stays relatively low, say 1.5 GB. In a final
> > test, TRIM was turned on without catastrophe, though it had little to do
> > given that all the busy filesystems were on USB. The penalty was about one
> > hour extra (25 vs 24 hours) to run -j4 buildworld from a clean start.
> 
> What UFS file systems with TRIM enabled were on some /dev/mmcsd0* ?
Everything _except_ /var, /tmp and /usr. Effectively, not much.

> Did you 1st use "fsck_ffs -E" on any of the file systems where
> trim would work?
No, I did not.

> 
> If I gather right, the "clean start" was on USB where TRIM during the
> clean would not be available.
> 
By "clean start" I meant running make cleandir twice and removing /usr/obj/usr/src.
That was done to make all of the -j4 buildworld tests consistent.

> The extra swap space may have contributed to the extra time? Having
> more swap uses more kernel memory for keeping track of the swap
> if I understand right. That leaves less for other things. That could
> have consequences other than outright failure.
>
There were two buildworld tests run with 6 GB of swap, the first without
TRIM being turned on and the second with TRIM turned on. The second run
too an hour longer, with TRIM being on the only difference. 

> Quoting "man 8 loader" related to kern.maxswzone :
> 
>                   Note that swap metadata can be fragmented, which means that
>                   the system can run out of space before it reaches the
>                   theoretical limit.  Therefore, care should be taken to not
>                   configure more swap than approximately half of the
>                   theoretical maximum.
> 
>                   Running out of space for swap metadata can leave the system
>                   in an unrecoverable state.
> 
> This wording suggests not allocating 6 GiBytes of swap when 3.5 GiBytes
> is approximately half the theoretical maximum --even if the system does
> still operate with 6 GiBytes.
> 
It's understood that 6 GB of swap on a Pi3 isn't a good idea. It was tried to
see if something useful might be revealed. 


> (Note: The man page's reference to "eight times the amount of physical memory"
> and such does not seem to apply to all platforms. And rpi2 V1.1 and an rpi3
> with the same amount of RAM get rather difference recommended figures
> according to the messages generated.)
> 
> > One chance observation caught my attention, however. I'd always thought
> > the VM system would favor fast swap devices over slow, but the gstat log
> > recorded this, visible at
> > http://www.zefox.net/~fbsd/rpi3/swaptests/r338342/3gbsd_3gbusb/trim_on/swapscript.log
> > 
> > 
> > 
> > dT: 10.004s  w: 10.000s
> > L(q)  ops/s    r/s   kBps   ms/r    w/s   kBps   ms/w    d/s   kBps   ms/d   %busy Name
> >    3    175     91    673    4.0     84    701    4.0      0      0    0.0   24.4  mmcsd0
> >    4    173     88    693  106.6     86    723  176.5      0      0    0.0  103.4  da0
> >    1     58     30    224    4.5     28    220    4.1      0      0    0.0   14.5  mmcsd0s2b
> >    3    175     91    673    4.0     84    701    4.0      0      0    0.0   24.7  mmcsd0s2
> >    1     58     30    223    4.0     28    244    3.8      0      0    0.0   14.0  mmcsd0s2d
> >    1     59     31    227    3.7     28    237    4.3      0      0    0.0   14.9  mmcsd0s2e
> >    2     57     28    235  140.2     28    236  103.8      0      0    0.0  186.1  da0a
> >    0     56     28    224  178.4     28    222   35.9      0      0    0.0  131.5  da0b
> >    2     59     31    234    9.4     28    240   59.1      0      0    0.0   99.5  da0d
> >    0      0      0      0    0.0      0      3  15011      0      0    0.0  150.1  da0e
> >    0      1      0      0    0.0      1     22  13376      0      0    0.0  147.8  da0g
> 
> Are there any examples of "d/s kBps ms/d" being non-zero? If they are
> always zero then no TRIMing likely happened. That in turn would make
> TRIM an unlikely use of an extra hour.
> 
Near as I can tell there are no non-zero values for d/s, which if it's tied
to TRIM is reasonable for all but microSD, which did have TRIM enabled. Since
microSD wasn't particularly busy, apart from swap, that too is unsurprising.


> > Tue Sep  4 15:07:39 PDT 2018
> > Device          1K-blocks     Used    Avail Capacity
> > /dev/da0b         1048576   236872   811704    23%
> > /dev/mmcsd0s2b    1048576   221568   827008    21%
> > /dev/da0d         1048576   218636   829940    21%
> > /dev/da0a         1048576   222028   826548    21%
> > /dev/mmcsd0s2d    1048576   221660   826916    21%
> > /dev/mmcsd0s2e    1048576   221392   827184    21%
> > Total             6291456  1342156  4949300    21%
> 
> As I understand the normal use of multiple swap partitions
> is to split the load across channels that can operate
> independently in parallel. Having 3 such partitions on
> the same channel/device may only add overhead vs. one
> full-size partition per channel/device.
> 
The multiple partitions on one device were a simple way to
vary swap amounts. I did expect that having swap on both
microSD and USB would lead to some performance gain, but
it seems not so.

> I also do not know if mmcsd0 and da0 can have independent,
> parallel I/O activity in the rpi3 context.
>
That is a key point; I took it for granted that they _can_
have independent, parallel I/O activity. If not, seemingly
it makes better sense, both for performance and cost, to
use a single large microSD card and skip USB devices entirely.
That seems to be where the evidence is leading me.

  
> > Sep  4 14:57:52 www sshd[41673]: error: Received disconnect from 103.207.39.197 port 64499:3: com.jcraft.jsch.JSchException: Auth cancel [preauth]
> > Sep  4 15:04:19 www kernel: swap_pager: indefinite wait buffer: bufobj: 0, blkno: 2217840, size: 12288
> 
> Note: my context is very different from yours and I get no console
> messages about I/O or waits during buildworld buildkernel or other
> such build/install tests.
>
That's likely the benefit of having 2 GB of RAM, I would think.

 
> > The system has lots of fast swap available on microSD, but is seemingly choking 
> > trying to use the slow swap on da0 _and_ run traffic to /usr and /var. Buildworld
> > doesn't run any faster with less swap, so I don't think the oversupply is the problem.
> 
> If I understand right, your only 6 GiByte swap experiment was slower
> but you attributed all time variations to an (inactive? ever used?)
> TRIM enabled status. You might want to manipulate the two
> separately. For all I know something else may also have contributed.
>

The tests were 6 GB swap, TRIM off vs TRIM on. TRIM on took an extra hour,
everything else kept the same to the best of my ability. I did the TRIM
test sort of on a whim, just to see if things would go spectacularly wrong.
That they didn't is encouraging. It certainly isn't decisive.
 
> I've no clue if having so many swap partitions on the same channel/device
> has consequences that having only one per channel/device would avoid.
> 
> > Is this expected behavior?  
> 
> As I understand the approximately even split across the in-use swap
> partitions is the normal way things are split. It is the placement
> of the partitions themselves that contributes to how effective that
> split is at improving the swap/paging I/O if I understand right.
> 
The great difference in activity between da0 and mmcsd0 suggests they
do have a degree of independence. Whether that independence can be
exploited to improve swap throughput is the point I wanted to explore.

Thanks for reading!

bob prohaska




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20180906051520.GB3482>