Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 17 Oct 2012 16:55:40 -0500
From:      Guy Helmer <guy.helmer@gmail.com>
To:        "Alexander V. Chernikov" <melifaro@freebsd.org>
Cc:        freebsd-net@freebsd.org, FreeBSD Stable <freebsd-stable@freebsd.org>
Subject:   Re: 8.3: kernel panic in bpf.c catchpacket()
Message-ID:  <FA1F07D4-C6F3-4F55-B084-749366C0DAE6@gmail.com>
In-Reply-To: <381E3EEC-7EDB-428B-A724-434443E51A53@gmail.com>
References:  <4B5399BF-4EE0-4182-8297-3BB97C4AA884@gmail.com> <59F9A36E-3DB2-4F6F-BB2A-A4C9DA76A70C@gmail.com> <5075C05E.9070800@FreeBSD.org> <1EDA1615-2CDE-405A-A725-AF7CC7D3E273@gmail.com> <381E3EEC-7EDB-428B-A724-434443E51A53@gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Oct 17, 2012, at 8:58 AM, Guy Helmer <guy.helmer@gmail.com> wrote:

> On Oct 12, 2012, at 8:54 AM, Guy Helmer <guy.helmer@gmail.com> wrote:
>=20
>>=20
>> On Oct 10, 2012, at 1:37 PM, Alexander V. Chernikov =
<melifaro@freebsd.org> wrote:
>>=20
>>> On 10.10.2012 00:36, Guy Helmer wrote:
>>>>=20
>>>> On Oct 8, 2012, at 8:09 AM, Guy Helmer <guy.helmer@gmail.com> =
wrote:
>>>>=20
>>>>> I'm seeing a consistent new kernel panic in FreeBSD 8.3:
>>>>> I'm not seeing how bd_sbuf would be NULL here. Any ideas?
>>>>=20
>>>> Since I've not had any replies, I hope nobody minds if I reply with =
more information.
>>>>=20
>>>> This panic seems to be occasionally triggered now that my user land =
code is changing the packet filter a while after the bpd device has been =
opened and an initial packet filter was set (previously, my code did not =
change the filter after it was initially set).
>>>>=20
>>>> I'm focusing on bpf_setf() since that seems to be the place that =
could be tickling a problem, and I see that bpf_setf() calls reset_d(d) =
to clear the hold buffer. I have manually verified that the BPFD lock is =
held during the call to reset_d(), and the lock is held every other =
place that the buffers are manipulated, so I haven't been able to find =
any place that seems vulnerable to losing one of the bpf buffers. Still =
searching, but any help would be appreciated.
>>>=20
>>> Can you please check this code on -current?
>>> Locking has changed quite significantly some time ago, so there is =
good chance that you can get rid of this panic (or discover different =
one which is really "new") :).
>>=20
>> I'm not ready to run this app on current, so I have merged revs =
229898, 233937, 233938, 233946, 235744, 235745, 235746, 235747, 236231, =
236251, 236261, 236262, 236559, and 236806 to my 8.3 checkout to get =
code that should be virtually identical to current without the timestamp =
changes.
>>=20
>> Unfortunately, I have only been able to trigger the panic in my test =
lab once -- so I'm not sure whether a lack of problems with the updated =
code will be indicative of likely success in the field where this has =
been trigged regularly at some sites=85
>>=20
>> Thanks,
>> Guy
>>=20
>=20
>=20
> FWIW, I was able to trigger the panic with the original 8.3 code again =
in my test lab. With these changes resulting from merging the revs =
mentioned above, I have not seen any panics in my test lab setup in two =
days of load testing, and AFAIK, packet capturing seems to be working =
fine.

Of course, the test system panic'ed with the same problem in =
catchpacket() an hour after I wrote this.

(kgdb) where
#0  doadump () at pcpu.h:224
#1  0xffffffff804c8280 in boot (howto=3D260) at =
../../../kern/kern_shutdown.c:441
#2  0xffffffff804c8703 in panic (fmt=3D0x0) at =
../../../kern/kern_shutdown.c:614
#3  0xffffffff8069ffad in trap_fatal (frame=3D0xffffffff809edbc0, =
eva=3DVariable "eva" is not available.
)
    at ../../../amd64/amd64/trap.c:825
#4  0xffffffff806a02e1 in trap_pfault (frame=3D0xffffff800014a8a0, =
usermode=3D0)
    at ../../../amd64/amd64/trap.c:741
#5  0xffffffff806a06bf in trap (frame=3D0xffffff800014a8a0)
    at ../../../amd64/amd64/trap.c:478
#6  0xffffffff80687cd4 in calltrap () at =
../../../amd64/amd64/exception.S:228
#7  0xffffffff8069dc06 in bcopy () at ../../../amd64/amd64/support.S:124
#8  0xffffffff8056f69e in catchpacket (d=3D0xffffff005aaaf000,=20
    pkt=3D0xffffff0001f46200 "", pktlen=3D522, snaplen=3DVariable =
"snaplen" is not available.
) at ../../../net/bpf.c:2240
#9  0xffffffff8056fc66 in bpf_mtap (bp=3D0xffffff0001be8c80,=20
    m=3D0xffffff0001f46200) at ../../../net/bpf.c:2064
#10 0xffffffff80579c15 in ether_input (ifp=3D0xffffff0001b73800,=20
    m=3D0xffffff0001f46200) at ../../../net/if_ethersubr.c:635
#11 0xffffffff802b694a in em_rxeof (rxr=3D0xffffff0001bca200, count=3D99, =
done=3D0x0)
    at ../../../dev/e1000/if_em.c:4404
#12 0xffffffff802b6db8 in em_handle_que (context=3DVariable "context" is =
not available.
)
    at ../../../dev/e1000/if_em.c:1494
#13 0xffffffff80506d85 in taskqueue_run_locked =
(queue=3D0xffffff0001be1580)
    at ../../../kern/subr_taskqueue.c:250
---Type <return> to continue, or q <return> to quit---q=20
Quit
(kgdb) frame 8
#8  0xffffffff8056f69e in catchpacket (d=3D0xffffff005aaaf000,=20
    pkt=3D0xffffff0001f46200 "", pktlen=3D522, snaplen=3DVariable =
"snaplen" is not available.
) at ../../../net/bpf.c:2240
warning: Source file is more recent than executable.

2240		bpf_append_bytes(d, d->bd_sbuf, curlen, &hdr, =
sizeof(hdr));
(kgdb) print *d
$1 =3D {bd_next =3D {le_next =3D 0xffffff0023fff400, le_prev =3D =
0xffffff0001be8c90},=20
  bd_sbuf =3D 0x0, bd_hbuf =3D 0xffffff8000ffa000 "??~P", bd_fbuf =3D =
0x0,=20
  bd_slen =3D 0, bd_hlen =3D 2068, bd_bufsize =3D 8388608,=20
  bd_bif =3D 0xffffff0001be8c80, bd_rtout =3D 1, bd_rfilter =3D =
0xffffff0001e6f580,=20
  bd_wfilter =3D 0x0, bd_bfilter =3D 0x0, bd_rcount =3D 7, bd_dcount =3D =
0,=20
  bd_promisc =3D 1 '\001', bd_state =3D 0 '\0', bd_immediate =3D 1 =
'\001',=20
  bd_writer =3D 0 '\0', bd_hdrcmplt =3D 1, bd_direction =3D 1, =
bd_feedback =3D 0,=20
  bd_async =3D 0, bd_sig =3D 23, bd_sigio =3D 0x0, bd_sel =3D {si_tdlist =
=3D {
      tqh_first =3D 0x0, tqh_last =3D 0x0}, si_note =3D {kl_list =3D {
        slh_first =3D 0x0}, kl_lock =3D 0xffffffff80497920 =
<knlist_mtx_lock>,=20
      kl_unlock =3D 0xffffffff804978f0 <knlist_mtx_unlock>,=20
      kl_assert_locked =3D 0xffffffff804945d0 =
<knlist_mtx_assert_locked>,=20
      kl_assert_unlocked =3D 0xffffffff804945e0 =
<knlist_mtx_assert_unlocked>,=20
      kl_lockarg =3D 0xffffff005aaaf0d8}, si_mtx =3D 0x0}, bd_lock =3D {
    lock_object =3D {lo_name =3D 0xffffff0001a5fce0 "bpf", lo_flags =3D =
16973824,=20
      lo_data =3D 0, lo_witness =3D 0x0}, mtx_lock =3D =
18446742974226712768},=20
  bd_callout =3D {c_links =3D {sle =3D {sle_next =3D 0x0}, tqe =3D =
{tqe_next =3D 0x0,=20
        tqe_prev =3D 0x0}}, c_time =3D 0, c_arg =3D 0x0, c_func =3D 0,=20=

    c_lock =3D 0xffffff005aaaf0d8, c_flags =3D 0, c_cpu =3D 0}, bd_label =
=3D 0x0,=20
  bd_fcount =3D 7, bd_pid =3D 89517, bd_locked =3D 0, bd_bufmode =3D 1, =
bd_wcount =3D 0,=20
  bd_wfcount =3D 0, bd_wdcount =3D 0, bd_zcopy =3D 0, bd_compat32 =3D 0 =
'\0'}

Now, I am thinking the malloc() of the sbuf is failing but not sure =
how/why -- I thought malloc(size, M_BPF, M_WAITOK) should not fail?

Guy=



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?FA1F07D4-C6F3-4F55-B084-749366C0DAE6>