FreeBSD Mail Archives

Date:      Mon, 4 Jan 1999 20:45:23 -0800 (PST)
From:      Matthew Dillon <dillon@apollo.backplane.com>
To:        Terry Lambert <tlambert@primenet.com>
Cc:        dg@root.com, hackers@FreeBSD.ORG
Subject:   Re: vfs_bio / struct buf
Message-ID:  <199901050445.UAA91309@apollo.backplane.com>
References:   <199901050428.VAA06606@usr05.primenet.com>

:> :so couldn't really be replaced by something that has 512 byte granularity
:> :without losing some performance. Granted, applications that show this
:> :behavior are probably broken, but that's another issue.
:> 
:>     Ah.  Hmmm.  I see the problem... the buf's need some sort of native block
:>     size and NFS doesn't really have a native block size.
:
:Not to contrdict David, but I was under the impression that
:the reason for this code was not necessarily the read-before-write
:avoidance on small, unaligned regions, but was actually for the
:avoidance on aligned block sized or multiple of block size regions
:being written.  The theory being that if you wrote a fragment of a
:NFS buffer size and did this sseveral times that you could just write
:it and not read at all.  Mostly or database stuff, if I recall
:correctly.
:
:There's actually a byte field that's unused as far as I can tell to
:allow page granularity down to PAGE_SIZE/8 to be bitmapped for
:validity within a given page, for similar reasons.

    It isn't unused!  The valid and dirty bits are definitely used (and
    have a DEV_BSIZE granularity).  For lots of things.  For example, the 
    MSDOS filesystem.  The problem is that that is the best granularity that 
    a vm_page_t can have.  

    The validoff/validend/dirtyoff/dirtyend stuff was thrown into the bp
    because DEV_BSIZE'd granularity isn't good enough for NFS when you might
    be reading or writing just a few bytes.  Read-before-write isn't the
    real problem, though the optimization certainly fixes that.  The real
    problem is when you have multiple machines doing an lseek()/write() on
    the same file.  The write() granularity must be correct or the machines
    will screw each other up even though they aren't writing to the same
    byte ranges (but are writing to the same block).

:I went looking at this code when I had an MSDOS FS that used
:1K blocks, but was not aligned on an even 1K boundary from the
:start of the device (odd cylinder size on the physical disk),
:which mean that every 4th 1K block spanned a page boundary
:(with obvious performance degradation during random access).
:
:					Terry Lambert
:					terry@lambert.org

    heh.  That should be fixed now with Luoqi's commits.  The bp system
    now understands DEV_BSIZE'd alignment properly in (hopefully) all cases.
    As long as it is at least 512-byte aligned it should work.  I wouldn't
    worry about performance degredation there too much - it's all just mapping
    already-cached pages into bp's, but I haven't looked at it with a 
    microscope so I can't say that for absolute sure.

						-Matt

:---
:Any opinions in this posting are my own and not those of my present
:or previous employers.
:
:To Unsubscribe: send mail to majordomo@FreeBSD.org
:with "unsubscribe freebsd-hackers" in the body of the message
:

    Matthew Dillon  Engineering, HiWay Technologies, Inc. & BEST Internet 
                    Communications & God knows what else.
    <dillon@backplane.com> (Please include original email in any response)    

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199901050445.UAA91309>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation