From owner-freebsd-bugs@FreeBSD.ORG Fri Oct 7 05:10:18 2005 Return-Path: X-Original-To: freebsd-bugs@hub.freebsd.org Delivered-To: freebsd-bugs@hub.freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 6255F16A41F for ; Fri, 7 Oct 2005 05:10:18 +0000 (GMT) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [216.136.204.21]) by mx1.FreeBSD.org (Postfix) with ESMTP id 1215E43D45 for ; Fri, 7 Oct 2005 05:10:18 +0000 (GMT) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (gnats@localhost [127.0.0.1]) by freefall.freebsd.org (8.13.3/8.13.3) with ESMTP id j975AHcf093094 for ; Fri, 7 Oct 2005 05:10:17 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.13.3/8.13.1/Submit) id j975AH01093093; Fri, 7 Oct 2005 05:10:17 GMT (envelope-from gnats) Date: Fri, 7 Oct 2005 05:10:17 GMT Message-Id: <200510070510.j975AH01093093@freefall.freebsd.org> To: freebsd-bugs@FreeBSD.org From: Mark Gooderum Cc: Subject: Re: kern/87014: BPF_MTAP/bpf_mtap are not threadsafe and cause panics on SMP systems X-BeenThere: freebsd-bugs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: Mark Gooderum List-Id: Bug reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 07 Oct 2005 05:10:18 -0000 The following reply was made to PR kern/87014; it has been noted by GNATS. From: Mark Gooderum To: bug-followup@FreeBSD.org, mark@verniernetworks.com Cc: Subject: Re: kern/87014: BPF_MTAP/bpf_mtap are not threadsafe and cause panics on SMP systems Date: Fri, 07 Oct 2005 00:03:12 -0500 This is a multi-part message in MIME format. --------------050606070600030406070008 Content-Type: multipart/alternative; boundary="------------010506050407060700010906" --------------010506050407060700010906 Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit FYI - this appears to be a duplicate of PR 73719. I did search before but somehow missed it. Using the attached test program (which spins opening and closing BPF devices) I can make my system crash in a few seconds from this bug. The test setup is basically: * FreeBSD system as router o 2 GigE interfaces * 4 Traffic Generating Systems o Two on one interface with a netstraind running o Two on second interface with netstrain running + Run netstrain bi-dir (ie: netstrain both) o I can generate about 450Mbit/sec each way (900 Mbit/sec aggregate) with this setup * Start the netstraind servers * Start the netstrain clients * Things are fine * Run the attached test program full spin mode on one of the active interfaces o bpfspin -f 100000 bge0 * System crashes in 1-2 seconds once bpfspin is started w/o fix The SUT was a Tyan S2882 based Dual Opteron 248 system. The motherboard has an Intel 8255x based 10/100 port and two Broadcom 5704 based GigE ports onboard. It also had a pair of PCI-X Intel Dual GigE PRO/1000M cards (Intel 8254x based). The crash was reproduced with both the bge driver ports and the em driver interfaces. This test must be done on a true SMP system as the race requires two active threads - there are no other preemption points in the race window. Not sure about timing on HTT systems - this testing was on a true Dual Opteron system. The attached patch fixes the problem and has a couple of debug sysctls - one that counts the number of hits, the second that disables the fix. With the bpfspin running you can see the fix trip every second or so and then disable the fix and it panics almost immediately. -=- Mark --------------010506050407060700010906 Content-Type: text/html; charset=us-ascii Content-Transfer-Encoding: 7bit FYI - this appears to be a duplicate of  PR 73719.  I did search before but somehow missed it.

Using the attached test program (which spins opening and closing BPF devices) I can make my system crash in a few seconds from this bug.  The test setup is basically:
  • FreeBSD system as router
    • 2 GigE interfaces
  • 4 Traffic Generating Systems
    • Two on one interface with a netstraind running
    • Two on second interface with netstrain running
      • Run netstrain bi-dir (ie: netstrain <desthost> <port> both)
    • I can generate about 450Mbit/sec each way (900 Mbit/sec aggregate) with this setup
  • Start the netstraind servers
  • Start the netstrain clients
  • Things are fine
  • Run the attached test program full spin mode on one of the active interfaces
    • bpfspin -f 100000 bge0
  • System crashes in 1-2 seconds once bpfspin is started w/o fix
The SUT was a Tyan S2882 based Dual Opteron 248 system.  The motherboard has an Intel 8255x based 10/100 port and two Broadcom 5704 based GigE ports onboard.  It also had a pair of PCI-X Intel Dual GigE PRO/1000M cards (Intel 8254x based).  The crash was reproduced with both the bge driver ports and the em driver interfaces.

This test must be done on a true SMP system as the race requires two active threads - there are no other preemption points in the race window.  Not sure about timing on HTT systems - this testing was on a true Dual Opteron system.

The attached patch fixes the problem and has a couple of debug sysctls - one that counts the number of hits, the second that disables the fix.  With the bpfspin running you can see the fix trip every second or so and then disable the fix and it panics almost immediately.
-=-
Mark

--------------010506050407060700010906-- --------------050606070600030406070008 Content-Type: text/plain; name="Makefile" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="Makefile" bpfspin: bpfspin.o gcc -g -o bpfspin bpfspin.o -lpcap bpfspin.o: bpfspin.c gcc -g -c -o bpfspin.o bpfspin.c --------------050606070600030406070008 Content-Type: text/x-csrc; name="bpfspin.c" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="bpfspin.c" /* * Test program to open and close a BPF a _lot_. */ #include #include #include #include #include #include #include #include #include #include "pcap.h" #define CAP_LEN 100 const char *argv0; const char *iname; /* Default to something that won't match anything */ char *filter = "ip proto 199"; pcap_t * open_bpf(const char *ifname); void close_bpf(pcap_t *pct); int debug_level; int freq = 10; int on_sleep; int off_sleep; int per_cycle; int num_cycles = -1; int quit_flag; void usage(int badopt); void catchsig(int signo); int main(int argc, char *argv[]) { u_int64_t npass = 0; const char *estr; int eno; pcap_t *pct; int ch; argv0 = strrchr(argv[0], '/'); if (argv0 == NULL) { argv0 = argv[0]; } else { argv0++; } signal(SIGTERM, catchsig); signal(SIGHUP, catchsig); signal(SIGQUIT, catchsig); signal(SIGINT, catchsig); /* * Args... */ while ((ch = getopt(argc, argv, "df:hn:o:")) != -1) { switch (ch) { case 'd': debug_level++; break; case 'f': freq = atoi(optarg); break; case 'h': usage(0); exit(0); case 'n': num_cycles = atoi(optarg); break; case '0': on_sleep = atoi(optarg); break; default: usage(optopt); exit(1); } } argc -= optind; argv += (optind - 1); if (argc < 1) { fprintf(stderr, "Error: argument required.\n"); usage(-1); } iname = argv[1]; if (freq) { per_cycle = 1000000 / freq; off_sleep = per_cycle; } if (on_sleep) { off_sleep = per_cycle - on_sleep; } while (num_cycles) { pct = open_bpf(iname); if (pct == NULL) { eno = errno; estr = strerror(eno); if (estr == NULL) { estr = ""; } fprintf(stderr, "Error: open_bpf(%s) failed %d/%s\n", iname, eno, estr); exit(3); } if (on_sleep) { usleep(on_sleep); } close_bpf(pct); if (on_sleep) { usleep(off_sleep); } if (num_cycles > 0) { num_cycles--; } npass++; if (quit_flag) { break; } } printf("Open/Closed bpf on %s %llu times.\n", iname, npass); exit(0); } pcap_t * open_bpf(const char *ifname) { pcap_t *pct; int pfd; u_int one = 1; char ebuf[PCAP_ERRBUF_SIZE]; struct bpf_program dfilter; u_int32_t network = 0, netmask = 0; pct = pcap_open_live(ifname, CAP_LEN, 0, 1000, ebuf); if (pct == NULL) { perror("pcap_open_live failed"); return(NULL); } pfd = pcap_get_selectable_fd(pct); if (ioctl(pfd, BIOCIMMEDIATE, &one) < 0) { perror("BIOCIMMEDIATE failed"); pcap_close(pct); return(NULL); } #if 0 /* Must be needed? */ if(pcap_lookupnet(ifname, &network, &netmask, 0) < 0) { perror("pcap_lookupnet failed"); pcap_close(pct); return(NULL); } #endif /* Compile the Dummy filter pcap program */ bzero(&dfilter, sizeof(struct bpf_program)); if (pcap_compile(pct, &dfilter, filter, 0, netmask) < 0) { perror("pcap_compile failed"); pcap_close(pct); return(NULL); } if (pcap_setfilter(pct, &dfilter) < 0) { perror("pcap_setfilter failed"); pcap_close(pct); return(NULL); } return(pct); } void close_bpf(pcap_t *pct) { pcap_close(pct); } void usage(int badopt) { if (badopt > 0) { fprintf(stderr, "%s: Bad option [-%c]\n", argv0, (char) badopt); } fprintf(stderr, "Usage: %s [-dh] [-f ] \n", argv0); fprintf(stderr, "\t-d\tIncrease debug level by 1\n"); fprintf(stderr, "\t-f\tSet Flap Freq to \n"); fprintf(stderr, "\t-h\tPrint this help\n"); exit(badopt != 0); } void catchsig(int signo) { switch (signo) { case SIGHUP: case SIGTERM: case SIGQUIT: case SIGINT: quit_flag = 1; break; default: abort(); } } --------------050606070600030406070008 Content-Type: text/plain; name="BPFMTAP.difftxt" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="BPFMTAP.difftxt" --- /tmp/tmp.44835.0 Fri Oct 7 00:00:08 2005 +++ freebsd5/sys/net/bpf.c Thu Oct 6 16:34:36 2005 @@ -81,20 +81,27 @@ /* * The default read buffer size is patchable. */ static int bpf_bufsize = 4096; SYSCTL_INT(_debug, OID_AUTO, bpf_bufsize, CTLFLAG_RW, &bpf_bufsize, 0, ""); static int bpf_maxbufsize = BPF_MAXBUFSIZE; SYSCTL_INT(_debug, OID_AUTO, bpf_maxbufsize, CTLFLAG_RW, &bpf_maxbufsize, 0, ""); +static int bpf_nullhits; +static int bpf_donullfix = 1; +SYSCTL_INT(_debug, OID_AUTO, bpf_nullfix, CTLFLAG_RW, + &bpf_donullfix, 0, "Apply the BPF null BP workaround"); +SYSCTL_INT(_debug, OID_AUTO, bpf_nullhits, CTLFLAG_RW, + &bpf_nullhits, 0, "# of bpf_mtap/2() workarounds fired"); + /* * bpf_iflist is the list of interfaces; each corresponds to an ifnet */ static LIST_HEAD(, bpf_if) bpf_iflist; static struct mtx bpf_mtx; /* bpf global lock */ static int bpf_allocbufs(struct bpf_d *); static void bpf_attachd(struct bpf_d *d, struct bpf_if *bp); static void bpf_detachd(struct bpf_d *d); static void bpf_freed(struct bpf_d *); @@ -1201,20 +1208,31 @@ */ void bpf_mtap(bp, m) struct bpf_if *bp; struct mbuf *m; { struct bpf_d *d; u_int pktlen, slen; /* + * We can sometimes be invoked w/NULL bp due to a small race in + * BPF_MTAP(), see PR#xxxxx. + */ + if (bpf_donullfix) { + if (!bp) { + bpf_nullhits++; + return; + } + } + + /* * Lockless read to avoid cost of locking the interface if there are * no descriptors attached. */ if (LIST_EMPTY(&bp->bif_dlist)) return; pktlen = m_length(m, NULL); if (pktlen == m->m_len) { bpf_tap(bp, mtod(m, u_char *), pktlen); return; @@ -1245,20 +1263,31 @@ void bpf_mtap2(bp, data, dlen, m) struct bpf_if *bp; void *data; u_int dlen; struct mbuf *m; { struct mbuf mb; struct bpf_d *d; u_int pktlen, slen; + + /* + * We can sometimes be invoked w/NULL bp due to a small race in + * BPF_MTAP2(), see PR#xxxxx. + */ + if (bpf_donullfix) { + if (!bp) { + bpf_nullhits++; + return; + } + } /* * Lockless read to avoid cost of locking the interface if there are * no descriptors attached. */ if (LIST_EMPTY(&bp->bif_dlist)) return; pktlen = m_length(m, NULL); /* --------------050606070600030406070008--