Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 10 Aug 2014 18:56:08 +0000
From:      bugzilla-noreply@freebsd.org
To:        freebsd-bugs@FreeBSD.org
Subject:   [Bug 192558] New: [patch] hard-to-hit crash in bpf catchpacket()
Message-ID:  <bug-192558-8@https.bugs.freebsd.org/bugzilla/>

next in thread | raw e-mail | index | archive | help
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=192558

            Bug ID: 192558
           Summary: [patch] hard-to-hit crash in bpf catchpacket()
           Product: Base System
           Version: 9.2-STABLE
          Hardware: Any
                OS: Any
            Status: Needs Triage
          Severity: Affects Only Me
          Priority: Normal
         Component: kern
          Assignee: freebsd-bugs@FreeBSD.org
          Reporter: torek@torek.net

The symptom is simply a crash that looks like a kernel
    NULL pointer bug, with the program counter in bcopy():

      Fatal trap 12: page fault while in kernel mode
      cpuid = 8; apic id = 08
      fault virtual address   = 0x0
      fault code              = supervisor write data, page not present
    [snip]

    The stack trace in ddb looks like this (I stripped out
    the frames - I don't have a full dump, we have that
    disabled for various mostly-good reasons, so the raw
    frames are not really useful here):

      bcopy() at bcopy+0x16
      bpf_mtap() at bpf_mtap+0x1d0
      ether_nh_input() at ether_nh_input+0x167
      netisr_dispatch_src() at netisr_dispatch_src+0x5e
      igb_rxeof() at igb_rxeof+0x56d
      igb_msix_que() at igb_msix_que+0x101
      ...

    although of course it will be some other rxeof() if you're
    on some other network device.

    The bpf_mtap() pc is actually in (inlined) catchpacket(),
    where we do:

        bpf_append_bytes(d, d->bd_sbuf, curlen, ...)

        where the bd_bufmode is BPF_BUFMODE_BUFFER (it must be, we
        have not used the sysctl that enables the zero copy stuff)
        so this becomes bpf_buffer_append_bytes() which becomes
        bcopy(), and it appears the whole mess of calls has been
        inlined.

    The crash clearly has a NULL d->bd_sbuf:

      bcopy+0x16:     repe movsq      (%rsi),%es:(%rdi)

      rsi  0xfffff82044dbd768
      rdi                   0

        (%es is not very interesting).  This means d->bd_sbuf was
        NULL, which was a bit of a mystery, as the code path
        starts with a lock assertion:

        BPFD_LOCK_ASSERT(d);

    and then has this bit:

        if (d->bd_fbuf == NULL) {
            /*
             * There's no room in the store buffer, and no
             * prospect of room, so drop the packet.  Notify
             * the
             * buffer model.
             */
            bpf_buffull(d);
            ++d->bd_dcount;
            return;
        }

    before doing this:

        ROTATE_BUFFERS(d);

    (the ROTATE causes bd_fbuf to move into bd_sbuf).

        But then I realized that it did this extra step in between
        these "test fbuf for NULL" and "ROTATE" steps:

        while (d->bd_hbuf_in_use)
            mtx_sleep(&d->bd_hbuf_in_use, &d->bd_lock,
                PRINET, "bd_hbuf", 0);

        So, if we assume that the hbuf (hold buffer) *is* in use,
        we'll sleep waiting for the user to finish with it and
        then wake us up.  While we're asleep, we'll give up
        d->bd_lock.

        It sure looks like someone else (some other bpf consumer
        using the same d->bd_* fields) snuck in while we were
        asleep and used up d->bd_fbuf, setting it to NULL.  Then
        we got the lock and did our own ROTATE_BUFFERS() and moved
        the NULL to d->bd_sbuf.

        If this analysis is correct, the fix is simply to wait for
        the hbuf to be available *before* checking d->bd_fbuf,
        i.e., move the while loop up.

Environment:
System: FreeBSD 9.2-STABLE #0: Sat Aug  9 20:01:00 PDT 2014:
root@builder:/usr/obj/usr/src/sys/XXX amd64

    Note, this applies to 9.x and 10.x -- the code is the same
    in both.

How-To-Repeat:
Difficult.  I don't have an actual reproducer.  The crash
    happened just once and I did the analysis based on stack
    trace and panic, etc.

        Clearly you need to have two or more threads/procs using
        bpf, so that one of them can have hbuf in use (during a
        copyout()).  You also need to have lots of traffic flying
        on the Ethernet (so that the buffers fill).

    I know the system has dhcpd running, not sure what other things
    are running at the time that have promiscuous mode enabled on
    the igb device (but promisc is enabled).

Fix:
I'm not 100% convinced this is the fix, or the entire fix.
        But it seems correct and is at least a step to fixing.
        Here it is in patch form...

        BTW, I note that there is a similar loop for the zero copy
        code, waiting for hbuf.  It may need to repeat its test
        after mtx_sleeping.  I did not look closely at the zero
        copy path after I verified we were not using it.

        It might also be wise to recheck curlen-vs-totlen, etc.
        Maybe we should just start the whole routine over if we
        sleep, for each hbuf sleep case.  But I think with this
        change, the not-zero-copy code will always work, albeit
    possibly in sub-optimal ways in this rare case.

            ---

bpf: check d->bd_fbuf after sleep, not before

The code in catchpacket() checks that there's a valid d->bd_fbuf
before doing a ROTATE_BUFFERS, which will move the sbuf (store
buffer) to the hbuf (hold buffer) and make the fbuf (free buffer)
become the sbuf.  OK so far, but *after* it verifies this fbuf,
it then mtx_sleep-waits for the hold buffer to be available.  If
the fbuf goes NULL during the wait when we drop the lock, there
will be no sbuf once we do the ROTATE_BUFFERS.

To fix this, simply check fbuf after possibly waiting for hbuf,
rather than before.


diff --git a/sys/net/bpf.c b/sys/net/bpf.c
--- a/sys/net/bpf.c
+++ b/sys/net/bpf.c
@@ -2352,6 +2352,9 @@ catchpacket(struct bpf_d *d, u_char *pkt
 #endif
         curlen = BPF_WORDALIGN(d->bd_slen);
     if (curlen + totlen > d->bd_bufsize || !bpf_canwritebuf(d)) {
+        while (d->bd_hbuf_in_use)
+            mtx_sleep(&d->bd_hbuf_in_use, &d->bd_lock,
+                PRINET, "bd_hbuf", 0);
         if (d->bd_fbuf == NULL) {
             /*
              * There's no room in the store buffer, and no
@@ -2362,9 +2365,6 @@ catchpacket(struct bpf_d *d, u_char *pkt
             ++d->bd_dcount;
             return;
         }
-        while (d->bd_hbuf_in_use)
-            mtx_sleep(&d->bd_hbuf_in_use, &d->bd_lock,
-                PRINET, "bd_hbuf", 0);
         ROTATE_BUFFERS(d);
         do_wakeup = 1;
         curlen = 0;

_______________________________________________
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@freebsd.org"

-- 
You are receiving this mail because:
You are the assignee for the bug.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-192558-8>