Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 16 Dec 2011 23:52:50 GMT
From:      Guy Harris <guy@alum.mit.edu>
To:        freebsd-gnats-submit@FreeBSD.org
Subject:   misc/163370: zero-copy BPF is disabled by default
Message-ID:  <201112162352.pBGNqoJA073542@red.freebsd.org>
Resent-Message-ID: <201112170000.pBH00QjP019291@freefall.freebsd.org>

next in thread | raw e-mail | index | archive | help

>Number:         163370
>Category:       misc
>Synopsis:       zero-copy BPF is disabled by default
>Confidential:   no
>Severity:       non-critical
>Priority:       low
>Responsible:    freebsd-bugs
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Sat Dec 17 00:00:26 UTC 2011
>Closed-Date:
>Last-Modified:
>Originator:     Guy Harris
>Release:        8.0 and later
>Organization:
>Environment:
>Description:
In revision 1.207 of sys/net/bpf.c, the kernel bpf_zerocopy_enable variable was changed to initialize it to 0 rather than 1.  The comment was:

    Disable zerocopy by default for now.  It's causing some problems in pcap
    consumers which fork after the shared pages have been setup.  pflogd(8)
    is an example.  The problem is understood and there is a fix coming in
    shortly.

    Folks who want to continue using it can do so by setting

    net.bpf.zerocopy_enable

    to 1.

However, as of the current top-of-trunk, it's still disabled by default; no fix has come in.

Back in 2009, Christian Peron sent me (as I'm one of the libpcap core developers at tcpdump.org) a mail message saying:

  Ran into a bit of an issue with zerocopy bpf.  We have enabled zerocopy
  by default in -CURRENT in hopes of shaking out some bugs. We found
  issues for processes which fork.  An example would be the privsep
  requirements in pflogd.  The problem was easy enough to fix but it
  points to an issue with it being enabled for everyone un-conditionally.

  In some cases, it makes sense to have zerocopy while in others not so
  much.  So I am thinking about adding a pcap_set_bufmode() so if the
  application writers are aware, they can specify which buffer mode they
  prefer instead of having libpcap query the operating system directly.

  I guess I have two questions:

  (1) Is there anything in the existing API I can use to specify machine
     dependent options?

  (2) If not, are you ok with having a pcap_set_bufmode() or some other
     similar function?

  I would be interested in hearing your thoughts on this.

and, in response to a reply from me, said:

  If net.bpf.zerocopy_enable is set to 1 (which on current at least it is
  by default) libpcap will make use of zerocopy.  It checks for this via
  ioctl (i.e. the kernel inspect this variable to tell pcap whether
  or not zerocopy is enabled).  The problem is when an application
  initializes the pages and then forks.  This results in the pages being
  copied into the child which breaks zerocopy.  An example of where this
  is an issue is the privsep code in pflogd.

  Disabling net.bpf.zerocopy_enable fixes the problem however it means
  things like tcpdump cant take advantage of it.

  Calling minherit(INHERIT_SHARE) will fix this problem, however I am not
  sure I want to unconditionally do this.  If a child process does not want
  these pages mmaped, it has no way of knowing which pages to un-map,
  unlike closing un-wanted file descriptors. i.e. if a process forks to do
  a dns lookup, these pages would appear in the process as an example.

  So I was thinking about introducing three buffer modes the
  application can specify:

  PCAP_BUFMODE_BUFFER       - regular old buffer mode
  PCAP_BUFMODE_ZBUF         - zerocopy buffers without page inheritance
  PCAP_BUFMODE_ZBUF_INHERIT - zerocopy calling minherit so pages can be shared
                             across forks

  The only problem is pcap_open_live() calls pcap_activate() directly, so I am
  not sure how we could process a flag from the application after the pcap
  object is created, but before we call pcap_activate().

  It would be nice if the applications were explicit, so if they use
  PCAP_BUFMODE_ZBUF_INHERIT, they are aware of it and can operate with
  caution.  Instead of libpcap operating behind the scenes and changing
  the page inheritance policy for the application.

  So I was wondering if you had any ideas on approach.

In response to the comment about minherit(), I said:

  Why would that be an issue?  Why would there be a problem leaving those pages in the child's address space?

  libpcap changes a bunch of state when it opens a capture device - it gets a file descriptor, it mallocs some memory, and it might memory-map some stuff from the kernel.  There's currently no way of releasing that without closing the pcap_t, but that would cause libpcap to attempt to, for example, turn monitor mode off if monitor mode was turned off.

but I never got a response.

If zero-copy BPF is more efficient than non-zero-copy BPF, I would want the default mode in libpcap to be zero-copy; applications should, by default, get the best behavior out of the packet-capture mechanism, and should not have to know about *any* of the details of how libpcap uses that mechanism.

It sounds from "The problem is when an application initializes the pages and then forks.  This results in the pages being copied into the child which breaks zerocopy." and "Calling minherit(INHERIT_SHARE) will fix this problem" as if the "best" mode for most programs involves sharing the pages across forks.  Programs that don't fork won't care, I presume programs that fork and exec won't care as all the pages including the zerocopy-buffer pages would be unmapped, and programs that *do* fork won't cause copies that will break BPF (I'm assuming from "being copied into the child" that the default behavior is INHERIT_COPY and that this breaks BPF, presumably because, in the child, the mapped region *isn't* shared by the kernel and userland and thus doesn't deliver packets).

The only disadvantage appears to be that, if the child doesn't want to use the pcap_t (or only uses it for injecting packets), it still has pages mapped into its address space.  If that's a real problem, libpcap could add a pcap_close_child() function, or something such as that, which closes file descriptors etc. but does *not* do any of the manual mode cleanup, such as turning monitor mode off on *BSD and Linux without mac80211, deleting the monN device on Linux with mac80211, etc..  (Ideally, there would be ways of requesting monitor mode that work similarly to requesting promiscuous mode, so that monitor mode is on as long as at least one BPF/PF_PACKET socket/etc. descriptor that wants monitor mode is open and is turned off when the last such descriptor is closed.)
>How-To-Repeat:
Code inspection.
>Fix:


>Release-Note:
>Audit-Trail:
>Unformatted:



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201112162352.pBGNqoJA073542>