Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 23 Apr 2024 06:47:03 -0600
From:      Alan Somers <asomers@freebsd.org>
To:        Alexander Leidinger <Alexander@leidinger.net>
Cc:        Karl Denninger <karl@denninger.net>, freebsd-hackers@freebsd.org
Subject:   Re: Stressing malloc(9)
Message-ID:  <CAOtMX2jnNvnZBe7UdktQggsEQO%2BfovA-g_=-fZtbhH=caSLksQ@mail.gmail.com>
In-Reply-To: <2b72c4f749e93dfec08a164d5a664ee3@Leidinger.net>
References:  <CAOtMX2jeDHS15bGgzD89AOAd1SzS_=FikorkCdv9-eAxCZ2P5w@mail.gmail.com> <ZiPaFw0q17RGE7cS@nuc> <CAOtMX2jk6%2BSvqMP7Cbmdk0KQCFZ34yWuir7n_8ewZYJF2MwPSg@mail.gmail.com> <ZiU6IZ29syVsg61p@nuc> <CAOtMX2j=yaYeE%2B-fycg2mRRC_Jb9p74cn_dcenhH2xRRxz1shg@mail.gmail.com> <CAOtMX2hDfX-T90x9Fb2Wh%2BvgLvw9fUGmaDxh-FWaYwBTPwFY6Q@mail.gmail.com> <b1e56d20-dc98-4fff-adec-3f8cfae26c05@denninger.net> <CAOtMX2irXo_hvrhQhw0eLjCBiH7hZMTR9notBn9aDEMTynQiuQ@mail.gmail.com> <2b72c4f749e93dfec08a164d5a664ee3@Leidinger.net>

next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, Apr 23, 2024 at 2:37=E2=80=AFAM Alexander Leidinger
<Alexander@leidinger.net> wrote:
>
> Am 2024-04-23 00:05, schrieb Alan Somers:
> > On Mon, Apr 22, 2024 at 2:07=E2=80=AFPM Karl Denninger <karl@denninger.=
net>
> > wrote:
> >>
> >> On 4/22/2024 12:46, Alan Somers wrote:
> >>
> >> When I said "33kiB" I meant "33 pages", or 132 kB.  And the solution
> >> turns out to be very easy.  Since I'm using ZFS on top of geli, with
> >> the default recsize of 128kB, I'll just set
> >> vfs.zfs.vdev.aggregation_limit to 128 kB.  That way geli will never
> >> need to allocate more than 128kB contiguously.  ZFS doesn't even need
> >> those big allocations to be contiguous; it's just aggregating smaller
> >> operations to reduce disk IOPs.  But aggregating up to 1MB (the
> >> default) is overkill; any rotating HDD should easily be able to max
> >> out its consecutive write IOPs with 128kB operation size.  I'll add a
> >> read-only sysctl for g_eli_alloc_sz too.  Thanks Mark.
> >>
> >> -Alan
> >>
> >> Setting this on one of my production machines that uses zfs behind
> >> geli drops the load average quite materially with zero impact on
> >> throughput that I can see (thus far.)  I will run this for a while but
> >> it certainly doesn't appear to have any negatives associated with it
> >> and does appear to improve efficiency quite a bit.
> >
> > Great news!  Also, FTR I should add that this advice only applies to
> > people who use HDDs.  For SSDs zfs uses a different aggregation limit,
> > and the default value is already low enough.
>
> You basically say, that it is not uncommon to have such large
> allocations with kernels we ship (even in releases).
> Wouldn't it make sense to optimize the kernel to handle larger uma
> allocations?
>
> Or do you expect it to be specific to ZFS and it may be more sane to
> discuss with the OpenZFS developers to reduce this default setting?

Yes, both of those things are true.  It might make sense to reduce the
setting's default value.  OTOH, the current value is probably fine for
people who don't use geli (and possibly other transforms that require
allocating data).  And it would also be good to optimize the kernel to
perform these allocations more efficiently.  My best idea is to teach
g_eli_alloc_data how to allocate scatter/gather lists of 64k buffers
instead of contiguous memory.  The memory doesn't need to be
contiguous, after all.  But that's a bigger change, and I don't know
that I have the time for it right now.
-Alan



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAOtMX2jnNvnZBe7UdktQggsEQO%2BfovA-g_=-fZtbhH=caSLksQ>