From owner-freebsd-hackers@FreeBSD.ORG Sat Aug 9 01:59:38 2014 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 4E2FB8DB for ; Sat, 9 Aug 2014 01:59:38 +0000 (UTC) Received: from elf.torek.net (50-73-42-1-utah.hfc.comcastbusiness.net [50.73.42.1]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 2FCAE2972 for ; Sat, 9 Aug 2014 01:59:37 +0000 (UTC) Received: from elf.torek.net (localhost [127.0.0.1]) by elf.torek.net (8.14.5/8.14.5) with ESMTP id s791VeDx049988 for ; Fri, 8 Aug 2014 19:31:40 -0600 (MDT) (envelope-from torek@torek.net) Message-Id: <201408090131.s791VeDx049988@elf.torek.net> From: Chris Torek To: freebsd-hackers@freebsd.org Subject: crash in bpf catchpacket() code MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-ID: <49986.1407547900.1@elf.torek.net> Date: Fri, 08 Aug 2014 19:31:40 -0600 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.4.3 (elf.torek.net [127.0.0.1]); Fri, 08 Aug 2014 19:31:40 -0600 (MDT) X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 09 Aug 2014 01:59:38 -0000 We saw a crash with this stack trace (I snipped out the "frame"s here as not very useful to others, plus there's no full dump, just a text dump backtrace and some other bits): bcopy() at bcopy+0x16 bpf_mtap() at bpf_mtap+0x1d0 ether_nh_input() at ether_nh_input+0x167 netisr_dispatch_src() at netisr_dispatch_src+0x5e igb_rxeof() at igb_rxeof+0x56d igb_msix_que() at igb_msix_que+0x101 The bpf_mtap() pc is actually in (inlined) catchpacket(), where we do: bpf_append_bytes(d, d->bd_sbuf, curlen, ...) where the bd_bufmode is BPF_BUFMODE_BUFFER so this becomes bpf_buffer_append_bytes() which becomes bcopy(), and it appears the whole mess of calls has been inlined. The crash clearly has a NULL d->bd_sbuf: bcopy+0x16: repe movsq (%rsi),%es:(%rdi) rsi 0xfffff82044dbd768 rdi 0 (%es is not very interesting). This means d->bd_sbuf was NULL, which was a bit of a mystery, as the code path starts with a lock assertion: BPFD_LOCK_ASSERT(d); and then has this bit: if (d->bd_fbuf == NULL) { /* * There's no room in the store buffer, and no * prospect of room, so drop the packet. Notify * the * buffer model. */ bpf_buffull(d); ++d->bd_dcount; return; } before doing this: ROTATE_BUFFERS(d); (the ROTATE causes bd_fbuf to move into bd_sbuf). But then I realized that it did this extra step in between these "test fbuf for NULL" and "ROTATE" steps: while (d->bd_hbuf_in_use) mtx_sleep(&d->bd_hbuf_in_use, &d->bd_lock, PRINET, "bd_hbuf", 0); So, if we assume that the hbuf (hold buffer) *is* in use, we'll sleep waiting for the user to finish with it and then wake us up. While we're asleep, we'll give up d->bd_lock. It sure looks like someone else (some other bpf consumer using the same d->bd_* fields) snuck in while we were asleep and used up d->bd_fbuf, setting it to NULL. Then we got the lock and did our own ROTATE_BUFFERS() and moved the NULL to d->bd_sbuf. If this analysis is correct, the fix is simply to wait for the hbuf to be available *before* checking d->bd_fbuf, i.e., move the while loop up. Because the bug is (apparently) really hard to hit (it's not like we can reproduce this at will), I'm not 100% convinced this is the fix (or the entire fix). But here it is in patch form... Chris --- bpf: check d->bd_fbuf after sleep, not before The code in catchpacket() checks that there's a valid d->bd_fbuf before doing a ROTATE_BUFFERS, which will move the sbuf (store buffer) to the hbuf (hold buffer) and make the fbuf (free buffer) become the sbuf. OK so far, but *after* it verifies this fbuf, it then mtx_sleep-waits for the hold buffer to be available. If the fbuf goes NULL during the wait when we drop the lock, there will be no sbuf once we do the ROTATE_BUFFERS. To fix this, simply check fbuf after possibly waiting for hbuf, rather than before. diff --git a/sys/net/bpf.c b/sys/net/bpf.c --- a/sys/net/bpf.c +++ b/sys/net/bpf.c @@ -2352,6 +2352,9 @@ catchpacket(struct bpf_d *d, u_char *pkt #endif curlen = BPF_WORDALIGN(d->bd_slen); if (curlen + totlen > d->bd_bufsize || !bpf_canwritebuf(d)) { + while (d->bd_hbuf_in_use) + mtx_sleep(&d->bd_hbuf_in_use, &d->bd_lock, + PRINET, "bd_hbuf", 0); if (d->bd_fbuf == NULL) { /* * There's no room in the store buffer, and no @@ -2362,9 +2365,6 @@ catchpacket(struct bpf_d *d, u_char *pkt ++d->bd_dcount; return; } - while (d->bd_hbuf_in_use) - mtx_sleep(&d->bd_hbuf_in_use, &d->bd_lock, - PRINET, "bd_hbuf", 0); ROTATE_BUFFERS(d); do_wakeup = 1; curlen = 0;