Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 13 Aug 1998 21:30:00 -0700 (PDT)
From:      Luoqi Chen <luoqi@chen.ml.org>
To:        freebsd-bugs@FreeBSD.ORG
Subject:   Re: kern/7418 (file corruption on mmap-based-read during file write())
Message-ID:  <199808140430.VAA17117@freefall.freebsd.org>

next in thread | raw e-mail | index | archive | help
The following reply was made to PR kern/7418; it has been noted by GNATS.

From: Luoqi Chen <luoqi@chen.ml.org>
To: dillon@backplane.com, luoqi@chen.ml.org
Cc: freebsd-gnats-submit@freebsd.org, luoqi@watermarkgroup.com
Subject: Re: kern/7418 (file corruption on mmap-based-read during file write())
Date: Fri, 14 Aug 1998 00:27:46 -0400 (EDT)

 >     I'm trying to track down the second PR I sent in... kern/7418 in this
 >     case.  I am concentrating on what happens when a page fault from an mmap'd
 >     file occurs in one process while a second process is blocked in
 >     ufs/ufs_readwrite() on the same file.
 >
 >     The file corruption that I am seeing is approximately this:
 >
 >     [this represents one page of memory]
 >
 > 	[data][data][data..00 00 00 00][data]
 >
 >     Where 00's replace what should have been valid data in the file.
 >     data written into the locations where the corruption occurs is being 
 >     replaced by 00.  i.e. I might see 
 >
 > 	'abcdef [00 00 00] (PAGE BOUNDRY) jklmnop'.
 >
 >     The most interesting item is that the 00 corruption always *ends* at
 >     a page boundry in the file.
 >
 >     --
 >
 >     So here is my question.  Refer to ufs/ufs_readwrite.c around line 337,
 >     as shown below.   What happens if VOP_BALLOC() blocks or uiomove()
 >     blocks during the discrete write() and, while blocked, another process
 >     has a read fault on precisely the same logical page via mmap()?
 >
 >     Is it possible for ufs_readwrite to obtain a bp, copy the write() data
 >     to it, but for the bp to then somehow be thrown away?  
 >
 Even if the buffer header is thrown away, the data should be still in
 the vm cache. But I could imagine one scenario that could lead to data loss.
 We know when we're reading multiple pages of data from disk, if some of the
 pages are already in vm cache, they are replaced with `bogus_page' in the
 buffer header page list. Now if another process tries to write into those
 data blocks before the read completes, the data will be written into the
 bogus_page. When the read completes, the true data pages will be put back in,
 unaltered, all those been written are lost. I don't know if this could
 really happen, I can't prove or disprove it, I can't reproduce what you
 had seen.
 
 >     I also don't understand why B_RELBUF is being set for the bp.  Can't 
 >     this cause the bp to be completely thrown away (aka kern/vfs_bio.c line 
 >     710)??   Makes sense for a READ, but I don't understand why B_RELBUF
 >     is being set for the bp in the WRITE.
 >
 >         for (error = 0; uio->uio_resid > 0;) {
 >                 lbn = lblkno(fs, uio->uio_offset);
 >                 blkoffset = blkoff(fs, uio->uio_offset);
 >                 xfersize = fs->fs_bsize - blkoffset;
 >                 if (uio->uio_resid < xfersize)
 >                         xfersize = uio->uio_resid;
 >
 >                 if (uio->uio_offset + xfersize > ip->i_size)
 >                         vnode_pager_setsize(vp, uio->uio_offset + xfersize);
 >
 >                 if (fs->fs_bsize > xfersize)
 >                         flags |= B_CLRBUF;
 >                 else
 >                         flags &= ~B_CLRBUF;
 > /* XXX is uio->uio_offset the right thing here? */
 >                 error = VOP_BALLOC(vp, uio->uio_offset, xfersize,
 >                     ap->a_cred, flags, &bp);
 >                 if (error != 0)
 >                         break;
 >
 >                 if (uio->uio_offset + xfersize > ip->i_size) {
 >                         ip->i_size = uio->uio_offset + xfersize;
 >                         extended = 1;
 >                 }
 >
 >                 size = BLKSIZE(fs, ip, lbn) - bp->b_resid;
 >                 if (size < xfersize)
 >                         xfersize = size;
 >
 >                 error =
 >                     uiomove((char *)bp->b_data + blkoffset, (int)xfersize, uio);
 >                 if ((ioflag & IO_VMIO) &&
 >                    (LIST_FIRST(&bp->b_dep) == NULL))
 >                         bp->b_flags |= B_RELBUF;
 >
 > 		...
 > 	}
 >
 >     Finally, I tried looking in other filesystem device code for comparable
 >     source.  The ext2fs code seems to simply copy the ufs code:
 >
 >                 error =
 >                     uiomove((char *)bp->b_data + blkoffset, (int)xfersize, uio);
 >                 if ((ioflag & IO_VMIO) &&
 >                    (LIST_FIRST(&bp->b_dep) == NULL)) /* in ext2fs? */
 >                         bp->b_flags |= B_RELBUF; 
 >
 >     The msdosfs does not set B_RELBUF in either its read or write, nor does
 >     the nfs code.   It's all very confusing.
 >
 IO_VMIO flag is only set from getpages/putpages calls, as a result of direct
 mmapped access. We don't need to keep the buffer headers around, and as I
 mentioned before, this won't result in any data loss, the data are still intact
 in vm cache.
 
 > 						-Matt
 >
 >     Matthew Dillon  Engineering, HiWay Technologies, Inc. & BEST Internet 
 >                     Communications
 >     <dillon@backplane.com> (Please include original email in any response)    
 >
 >
 -lq

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-bugs" in the body of the message



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199808140430.VAA17117>