Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 14 Nov 2020 21:43:00 +0200
From:      Konstantin Belousov <kostikbel@gmail.com>
To:        Scott Long <scottl@samsco.org>
Cc:        Alexander Motin <mav@freebsd.org>, "freebsd-arch@freebsd.org" <freebsd-arch@freebsd.org>
Subject:   Re: MAXPHYS bump for FreeBSD 13
Message-ID:  <X7AzRMZ5FSRpzXqo@kib.kiev.ua>
In-Reply-To: <634E35E9-9E2B-40CE-9C70-BB130BD9D614@samsco.org>
References:  <aac44f21-3a6d-c697-bb52-1a669b11aa3b@FreeBSD.org> <X7Aj0a6ZtIQfBscC@kib.kiev.ua> <634E35E9-9E2B-40CE-9C70-BB130BD9D614@samsco.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Sat, Nov 14, 2020 at 11:48:34AM -0700, Scott Long wrote:
> 
> 
> > On Nov 14, 2020, at 11:37 AM, Konstantin Belousov <kostikbel@gmail.com> wrote:
> > 
> > On Sat, Nov 14, 2020 at 10:01:05AM -0500, Alexander Motin wrote:
> >> On Fri, 13 Nov 2020 21:09:37 +0200 Konstantin Belousov wrote:
> >>> To put the specific numbers, for struct buf it means increase by 1792
> >>> bytes. For bio it does not, because it does not embed vm_page_t[] into
> >>> the structure.
> >>> 
> >>> Worse, typical struct buf addend for excess vm_page pointers is going
> >>> to be unused, because normal size of the UFS block is 32K.  It is
> >>> going to be only used by clusters and physbufs.
> >>> 
> >>> So I object against bumping this value without reworking buffers
> >>> handling of b_pages[].  Most straightforward approach is stop using
> >>> MAXPHYS to size this array, and use external array for clusters.
> >>> Pbufs can embed large array.
> >> 
> >> I am not very familiar with struct buf usage, so I'd appreciate some
> >> help there.
> >> 
> >> Quickly looking on pbuf, it seems trivial to allocate external b_pages
> >> array of any size in pbuf_init, that should easily satisfy all of pbuf
> >> descendants.  Cluster and vnode/swap pagers code are pbuf descendants
> >> also.  Vnode pager I guess may only need replacement for
> >> nitems(bp->b_pages) in few places.
> > I planned to look at making MAXPHYS a tunable.
> > 
> > You are right, we would need:
> > 1. move b_pages to the end of struct buf and declaring it as flexible.
> > This would make KBI worse because struct buf depends on some debugging
> > options, and than b_pages offset depends on config.
> > 
> > Another option could be to change b_pages to pointer, if we are fine with
> > one more indirection.  But in my plan, real array is always allocated past
> > struct buf, so flexible array is more correct even.
> > 
> 
> I like this, and I was in the middle of writing up an email that described it.
> There could be multiple malloc types or UMA zones of different sizes,
> depending on the intended i/o size, or just a runtime change to the size of
> a single allocation size.
I do not think we need new/many zones.

Queued (getnewbuf()) bufs come from buf_zone, and pbufs are allocated
from pbuf_zone. That should be fixed alloc size, with small b_pages[]
for buf_zone, and large (MAXPHYS) for pbuf.

Everything else, if any, would need to pre-calculate malloc size.

> 
> > 2. Preallocating both normal bufs and pbufs together with the arrays.
> > 
> > 3. I considered adding B_SMALLPAGES flag to b_flags and use it to indicate
> > that buffer has 'small' b_pages.  All buffers rotated through getnewbuf()/
> > buf_alloc() should have it set.
> > 
> 
> This would work nicely with a variable sized allocator, yes.
> 
> > 4. There could be some places which either malloc() or allocate struct buf
> > on stack (I tend to believe that I converted all later places to formed).
> > They would need to get special handling.
> > 
> 
> I couldn’t find any places that allocated a buf on the stack or embedded it
> into another structure.
As I said, I did a pass to eliminate stack allocations for bufs.
As result, for instance flushbufqueues() mallocs struct buf, but it does
not use b_pages[] of the allocated sentinel.

> 
> > md(4) uses pbufs.
> > 
> > 4. My larger concern is, in fact, cam and drivers.
> > 
> 
> Can you describe your concern?
My opinion is that during this work all uses of MAXPHYS should be reviewed,
and there are a lot of drivers that reference the constant.  From the past
experience, I expect some evil ingenious (ab)use.

Same for bufs, but apart from application of pbufs in cam_periph.c, I do not
think drivers have much use of it.

> 
> >> 
> >> Could you or somebody help with vfs/ffs code, where I suppose the
> >> smaller page lists are used?
> > Do you plan to work on this ?  I can help, sure.
> > 
> > Still, I wanted to make MAXPHYS (and 'small' MAXPHYS, this is not same as
> > DFLPHYS), a tunable, in the scope of this work.
> 
> Sounds great, thank you for looking at it.
> 
> Scott
> 



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?X7AzRMZ5FSRpzXqo>