Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 03 Dec 2000 19:08:55 -0800
From:      Peter Wemm <peter@netplex.com.au>
To:        "Kenneth D. Merry" <ken@kdm.org>
Cc:        arch@FreeBSD.ORG, gallatin@FreeBSD.ORG, dillon@FreeBSD.ORG
Subject:   Re: zero copy code review 
Message-ID:  <200012040308.eB438tD52326@mobile.wemm.org>
In-Reply-To: <20001129231653.A1503@panzer.kdm.org> 

next in thread | previous in thread | raw e-mail | index | archive | help
"Kenneth D. Merry" wrote:
> [ -net and -current BCCed for wider coverage, this is probably best
> handled on -arch ]
> 
> I would like to request reviews of the zero copy sockets and NFS code I've
> been posting about for months:
> 
> http://people.FreeBSD.org/~ken/zero_copy

Hmm.. I see one danger item:

"
5.Configuration and performance tuning. 

       There are a number of options that need to be turned on for various things to work: 

       options         ZERO_COPY_SOCKETS        # Turn on zero copy send code
       options         ENABLE_VFS_IOOPT         # Turn on zero copy receive
       options         NMBCLUSTERS=(512+512*32) # lots of mbuf clusters
       options         TI_JUMBO_HDRSPLIT        # Turn on Tigon header splitting

[..]
              Turn on vfs.ioopt to enable zero copy receive: 
               sysctl -w vfs.ioopt=1
"

I know Matt Dillon was intending to remove the ENABLE_VFS_IOOPT code
and vfs.ioopt because it is presently fundamentally broken and causes
devastating userland semantics impact.

For example, at it exists in the tree *right now*, if one does this:
  buf = malloc(PAGE_SIZE);	/* malloc does page alignment here */
  read(fd, buf, PAGE_SIZE);
.. it would be eligible for ioopt treatment (page lending).

Normally, you would have a *private* copy of the page of data.  If somebody
modifies the backing file, your private copy does not change.

However, turning on ioopt causes it to be mmapped in with MAP_PRIVATE.. But
this does **NOT** give the same semantics.  Sure, if you modify the buffer
yourself, you get a Copy-on-write fault and your own private page to mess with.

But if somebody else modifies the file before you dirty the page then
your supposedly static private copy silently changes out from underneath you
because you have been loaned a mapping from the vm/buffer cache.  The
infrastructure to track "loaned out" pages in the vm page cache isn't present.
The pages must be read-only to the kernel and DMA engines and a fault must be
taken giving the kernel a chance to fully donate the orignal page to the
mapping processes and generate it's own writable version.

I have not read the patch extensively, but I am not sure that it is handled
completely.  There are a few patches to vm_fault(), but I am not sure if
these are to handle the problem I described above or something else.  In
particular, if it is intended to handle the problem, then it seems to depend
on being able to make pages unwritable by the kernel.  This isn't possible
on i386 cpus (only 486 and later).  I did not see any busmaster DMA checking
either, but I could have missed it..  What about drivers that DMA to pages
mapped into KVM without checking writability (and hence COW)?  

Cheers,
-Peter
--
Peter Wemm - peter@FreeBSD.org; peter@yahoo-inc.com; peter@netplex.com.au
"All of this is for nothing if we don't go to the stars" - JMS/B5



To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200012040308.eB438tD52326>