Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 14 Nov 2020 11:48:34 -0700
From:      Scott Long <scottl@samsco.org>
To:        Konstantin Belousov <kostikbel@gmail.com>
Cc:        Alexander Motin <mav@freebsd.org>, "freebsd-arch@freebsd.org" <freebsd-arch@freebsd.org>
Subject:   Re: MAXPHYS bump for FreeBSD 13
Message-ID:  <634E35E9-9E2B-40CE-9C70-BB130BD9D614@samsco.org>
In-Reply-To: <X7Aj0a6ZtIQfBscC@kib.kiev.ua>
References:  <aac44f21-3a6d-c697-bb52-1a669b11aa3b@FreeBSD.org> <X7Aj0a6ZtIQfBscC@kib.kiev.ua>

next in thread | previous in thread | raw e-mail | index | archive | help


> On Nov 14, 2020, at 11:37 AM, Konstantin Belousov =
<kostikbel@gmail.com> wrote:
>=20
> On Sat, Nov 14, 2020 at 10:01:05AM -0500, Alexander Motin wrote:
>> On Fri, 13 Nov 2020 21:09:37 +0200 Konstantin Belousov wrote:
>>> To put the specific numbers, for struct buf it means increase by =
1792
>>> bytes. For bio it does not, because it does not embed vm_page_t[] =
into
>>> the structure.
>>>=20
>>> Worse, typical struct buf addend for excess vm_page pointers is =
going
>>> to be unused, because normal size of the UFS block is 32K.  It is
>>> going to be only used by clusters and physbufs.
>>>=20
>>> So I object against bumping this value without reworking buffers
>>> handling of b_pages[].  Most straightforward approach is stop using
>>> MAXPHYS to size this array, and use external array for clusters.
>>> Pbufs can embed large array.
>>=20
>> I am not very familiar with struct buf usage, so I'd appreciate some
>> help there.
>>=20
>> Quickly looking on pbuf, it seems trivial to allocate external =
b_pages
>> array of any size in pbuf_init, that should easily satisfy all of =
pbuf
>> descendants.  Cluster and vnode/swap pagers code are pbuf descendants
>> also.  Vnode pager I guess may only need replacement for
>> nitems(bp->b_pages) in few places.
> I planned to look at making MAXPHYS a tunable.
>=20
> You are right, we would need:
> 1. move b_pages to the end of struct buf and declaring it as flexible.
> This would make KBI worse because struct buf depends on some debugging
> options, and than b_pages offset depends on config.
>=20
> Another option could be to change b_pages to pointer, if we are fine =
with
> one more indirection.  But in my plan, real array is always allocated =
past
> struct buf, so flexible array is more correct even.
>=20

I like this, and I was in the middle of writing up an email that =
described it.
There could be multiple malloc types or UMA zones of different sizes,
depending on the intended i/o size, or just a runtime change to the size =
of
a single allocation size.

> 2. Preallocating both normal bufs and pbufs together with the arrays.
>=20
> 3. I considered adding B_SMALLPAGES flag to b_flags and use it to =
indicate
> that buffer has 'small' b_pages.  All buffers rotated through =
getnewbuf()/
> buf_alloc() should have it set.
>=20

This would work nicely with a variable sized allocator, yes.

> 4. There could be some places which either malloc() or allocate struct =
buf
> on stack (I tend to believe that I converted all later places to =
formed).
> They would need to get special handling.
>=20

I couldn=E2=80=99t find any places that allocated a buf on the stack or =
embedded it
into another structure.

> md(4) uses pbufs.
>=20
> 4. My larger concern is, in fact, cam and drivers.
>=20

Can you describe your concern?

>>=20
>> Could you or somebody help with vfs/ffs code, where I suppose the
>> smaller page lists are used?
> Do you plan to work on this ?  I can help, sure.
>=20
> Still, I wanted to make MAXPHYS (and 'small' MAXPHYS, this is not same =
as
> DFLPHYS), a tunable, in the scope of this work.

Sounds great, thank you for looking at it.

Scott




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?634E35E9-9E2B-40CE-9C70-BB130BD9D614>