Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 20 Mar 2000 10:46:15 -0800 (PST)
From:      Matthew Dillon <dillon@apollo.backplane.com>
To:        Poul-Henning Kamp <phk@critter.freebsd.dk>
Cc:        current@FreeBSD.ORG
Subject:   Re: patches for test / review 
Message-ID:  <200003201846.KAA70820@apollo.backplane.com>
References:   <19790.953575942@critter.freebsd.dk>

next in thread | previous in thread | raw e-mail | index | archive | help
:Thanks for the sketch.  It sounds really good.
:
:Is it your intention that drivers which cannot work from the b_pages[]
:array will call to map them into VM, or will a flag on the driver/dev_t/
:whatever tell the generic code that it should be mapped before calling
:the driver ?
:
:What about unaligned raw transfers, say a raw CD read of 2352 bytes
:from userland ?  I pressume we will need an offset into the first 
:page for that ?

    Well, let me tell you what the fuzzy goal is first and then maybe we
    can work backwards.

    Eventually all physical I/O needs a physical address.  The quickest
    way to get to a physical address is to be given an array of vm_page_t's
    (which can be trivially translated to physical addresses).

    The buffer cache already has such an array, called b_pages[].

    Any I/O that runs through b_data or runs through a uio must eventually
    be cut up into blocks of contiguous physical addresses.

    What we want to do is to try to extend VMIO (aka the vm_page_t) all
    the way through the I/O system - both VFS and DEV I/O, in order to 
    remove all the nasty back and forth translations.

    In regards to raw devices I originally envisioned having two BUF_*()
    strategy calls - one that uses a page array, and one that uses b_data.
    But your idea below - using bio_ops[], is much better.

    In regards to odd block sizes and offsets the real question is whether
    an attempt should be made to translate UIO ops into buffer cache b_pages[]
    ops directly, maintaining offsets and odd sizes, or whether we should 
    back-off to a copy scheme where we allocate b_pages[] for oddly sized 
    uio's and then copy the data to the uio buffer.

    My personal preference is to not pollute the VMIO page-passing mechanism
    with all sorts of fields to handle weird offsets and sizes.  Instead we
    ought to take the copy hit for the non-optimal cases, and simply fix all
    the programs doing the accesses to pass optimally aligned buffers.  For
    example, for a raw-I/O on an audio CD track you would pass a page-aligned
    buffer with a request size of at least a page (e.g. 4K on IA32) in your
    read(), and the raw device would return '2352' as the result and the
    returned data would be page-aligned.

    This would allow the system call to use the b_pages[] strategy entry
    point even for devices with odd sizes and still get optimal (zero-copy)
    operation.  If the user passes a non-aligned (or mulitiple of a page-sized)
    buffer, the system takes the copy hit in order to keep the lower level
    I/O interface clean.

:One thing I would like to see is for the buffers to know how to
:write themselves.  There is nothing which mandates that a buffer
:be backed by a disk-like device, and there are uses for buffers
:which aren't.
:
:Being able to say bp->bop_write(bp) rather than bwrite(bp) would
:allow that flexibility.  Kirk already introduced a bio_ops[] but
:made it global for now, that should be per buffer and have all the
:bufferops in it, (except for the onces which instantiate the buffer).
:
:If we had this, pseudo filesystems like DEVFS could use UFS for
:much of their naming management.  This is currently impossible.
:
:--
:Poul-Henning Kamp             FreeBSD coreteam member
:phk@FreeBSD.ORG               "Real hackers run -current on their laptop."
:FreeBSD -- It will take a long time before progress goes too far!

    I like the idea of dynamicizing bio_ops[] and using that to issue 
    struct buf based I/O.  It fits very nicely into the general idea of
    separating the VFS and DEV I/O interfaces (they are currently hopelessly
    intertwined).

    Actually, the more I think about it the more I'm willing to just say
    to hell with it and start doing all the changes all at once, in parallel,
    including the two patches you wanted reviewed earlier (though I would
    request that you not combine disparate patch funcitonalities into a 
    single patch set).  I agree with Julian on the point about IPSEC.

    Dynamicizing bio_ops[] ought to be trivial.

					-Matt
					Matthew Dillon 
					<dillon@backplane.com>


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-current" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200003201846.KAA70820>