From owner-freebsd-hackers@FreeBSD.ORG  Sat Aug  9 01:59:38 2014
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 4E2FB8DB
 for <freebsd-hackers@freebsd.org>; Sat,  9 Aug 2014 01:59:38 +0000 (UTC)
Received: from elf.torek.net (50-73-42-1-utah.hfc.comcastbusiness.net
 [50.73.42.1])
 (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 2FCAE2972
 for <freebsd-hackers@freebsd.org>; Sat,  9 Aug 2014 01:59:37 +0000 (UTC)
Received: from elf.torek.net (localhost [127.0.0.1])
 by elf.torek.net (8.14.5/8.14.5) with ESMTP id s791VeDx049988
 for <freebsd-hackers@freebsd.org>; Fri, 8 Aug 2014 19:31:40 -0600 (MDT)
 (envelope-from torek@torek.net)
Message-Id: <201408090131.s791VeDx049988@elf.torek.net>
From: Chris Torek <torek@torek.net>
To: freebsd-hackers@freebsd.org
Subject: crash in bpf catchpacket() code
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-ID: <49986.1407547900.1@elf.torek.net>
Date: Fri, 08 Aug 2014 19:31:40 -0600
X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.4.3
 (elf.torek.net [127.0.0.1]); Fri, 08 Aug 2014 19:31:40 -0600 (MDT)
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 09 Aug 2014 01:59:38 -0000

We saw a crash with this stack trace (I snipped out the "frame"s
here as not very useful to others, plus there's no full dump, just
a text dump backtrace and some other bits):

  bcopy() at bcopy+0x16
  bpf_mtap() at bpf_mtap+0x1d0
  ether_nh_input() at ether_nh_input+0x167
  netisr_dispatch_src() at netisr_dispatch_src+0x5e
  igb_rxeof() at igb_rxeof+0x56d
  igb_msix_que() at igb_msix_que+0x101

The bpf_mtap() pc is actually in (inlined) catchpacket(), where
we do:

	bpf_append_bytes(d, d->bd_sbuf, curlen, ...)

where the bd_bufmode is BPF_BUFMODE_BUFFER so this becomes
bpf_buffer_append_bytes() which becomes bcopy(), and it appears
the whole mess of calls has been inlined.

The crash clearly has a NULL d->bd_sbuf:

  bcopy+0x16:     repe movsq      (%rsi),%es:(%rdi)

  rsi  0xfffff82044dbd768
  rdi                   0

(%es is not very interesting).  This means d->bd_sbuf was NULL,
which was a bit of a mystery, as the code path starts with a lock
assertion:

	BPFD_LOCK_ASSERT(d);

and then has this bit:

		if (d->bd_fbuf == NULL) {
			/*
			 * There's no room in the store buffer, and no
			 * prospect of room, so drop the packet.  Notify
			 * the
			 * buffer model.
			 */
			bpf_buffull(d);
			++d->bd_dcount;
			return;
		}

before doing this:

		ROTATE_BUFFERS(d);

(the ROTATE causes bd_fbuf to move into bd_sbuf).

But then I realized that it did this extra step in between these
"test fbuf for NULL" and "ROTATE" steps:

		while (d->bd_hbuf_in_use)
			mtx_sleep(&d->bd_hbuf_in_use, &d->bd_lock,
			    PRINET, "bd_hbuf", 0);

So, if we assume that the hbuf (hold buffer) *is* in use, we'll
sleep waiting for the user to finish with it and then wake us up.
While we're asleep, we'll give up d->bd_lock.

It sure looks like someone else (some other bpf consumer using the
same d->bd_* fields) snuck in while we were asleep and used up
d->bd_fbuf, setting it to NULL.  Then we got the lock and did our
own ROTATE_BUFFERS() and moved the NULL to d->bd_sbuf.

If this analysis is correct, the fix is simply to wait for the
hbuf to be available *before* checking d->bd_fbuf, i.e., move the
while loop up.

Because the bug is (apparently) really hard to hit (it's not like
we can reproduce this at will), I'm not 100% convinced this is the
fix (or the entire fix).  But here it is in patch form...

Chris

			---

bpf: check d->bd_fbuf after sleep, not before

The code in catchpacket() checks that there's a valid d->bd_fbuf
before doing a ROTATE_BUFFERS, which will move the sbuf (store
buffer) to the hbuf (hold buffer) and make the fbuf (free buffer)
become the sbuf.  OK so far, but *after* it verifies this fbuf,
it then mtx_sleep-waits for the hold buffer to be available.  If
the fbuf goes NULL during the wait when we drop the lock, there
will be no sbuf once we do the ROTATE_BUFFERS.

To fix this, simply check fbuf after possibly waiting for hbuf,
rather than before.


diff --git a/sys/net/bpf.c b/sys/net/bpf.c
--- a/sys/net/bpf.c
+++ b/sys/net/bpf.c
@@ -2352,6 +2352,9 @@ catchpacket(struct bpf_d *d, u_char *pkt
 #endif
 		curlen = BPF_WORDALIGN(d->bd_slen);
 	if (curlen + totlen > d->bd_bufsize || !bpf_canwritebuf(d)) {
+		while (d->bd_hbuf_in_use)
+			mtx_sleep(&d->bd_hbuf_in_use, &d->bd_lock,
+			    PRINET, "bd_hbuf", 0);
 		if (d->bd_fbuf == NULL) {
 			/*
 			 * There's no room in the store buffer, and no
@@ -2362,9 +2365,6 @@ catchpacket(struct bpf_d *d, u_char *pkt
 			++d->bd_dcount;
 			return;
 		}
-		while (d->bd_hbuf_in_use)
-			mtx_sleep(&d->bd_hbuf_in_use, &d->bd_lock,
-			    PRINET, "bd_hbuf", 0);
 		ROTATE_BUFFERS(d);
 		do_wakeup = 1;
 		curlen = 0;