Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 17 Jun 2017 08:44:42 +0000
From:      bugzilla-noreply@freebsd.org
To:        freebsd-bugs@FreeBSD.org
Subject:   [Bug 220076] [patch] [panic] [netgraph] repeatable kernel panic due to a race in ng_iface(4)
Message-ID:  <bug-220076-8@https.bugs.freebsd.org/bugzilla/>

next in thread | raw e-mail | index | archive | help
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D220076

            Bug ID: 220076
           Summary: [patch] [panic] [netgraph] repeatable kernel panic due
                    to a race in ng_iface(4)
           Product: Base System
           Version: 11.0-STABLE
          Hardware: Any
                OS: Any
            Status: New
          Keywords: patch
          Severity: Affects Some People
          Priority: ---
         Component: kern
          Assignee: freebsd-bugs@FreeBSD.org
          Reporter: eugen@freebsd.org
          Keywords: patch

Created attachment 183566
  --> https://bugs.freebsd.org/bugzilla/attachment.cgi?id=3D183566&action=
=3Dedit
protect ng_iface private data

I observe repeatable panics at netgraph level while doing stress test
for net/mpd5 daemon under stable/11 r317184. It connects, uses and disconne=
cts
lots of ngXX interfaces and corresponding netgraph nodes and hooks.

Crashdump points to ng_iface node which private data - set of hooks - may be
modified in respond to userland request while another kernel thread sends d=
ata
over hook being disconnected. Here is a scenario:

1. mpd runs its BundNcpsLeave() procedure for an interface calling
NgSendMsg(csock, path, NGM_GENERIC_COOKIE, NGM_RMHOOK, &rm, sizeof(rm)) that
leads to libnetgraph's NgDeliverMsg() and sendto() system call for AF_NETGR=
APH.

The kernel reponds with ng_findhook->ng_destroy_hook->NG_HOOK_UNREF
(_NG_HOOK_UNREF/ng_unref_hook)->NG_FREE_HOOK: free((hook), M_NETGRAPH_HOOK).

2. In parallel, userland process like ftpd sends some data over IPv4 socket=
 to
corresponding interface being up and running. It may utilize hook being fre=
ed
same time by another kernel thread that leads to:

Fatal trap 9: general protection fault while in kernel mode
cpuid =3D 0; apic id =3D 00
instruction pointer     =3D 0x20:0xffffffff8097f249
stack pointer           =3D 0x28:0xfffffe0239542ec0
frame pointer           =3D 0x28:0xfffffe0239542f00
code segment            =3D base 0x0, limit 0xfffff, type 0x1b
                        =3D DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        =3D interrupt enabled, resume, IOPL =3D 0
current process         =3D 28999 (ftpd)
trap number             =3D 9
panic: general protection fault
cpuid =3D 0
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2c/frame 0xfffffe0239542=
840
kdb_backtrace() at kdb_backtrace+0x53/frame 0xfffffe0239542910
vpanic() at vpanic+0x249/frame 0xfffffe02395429e0
kproc_shutdown() at kproc_shutdown/frame 0xfffffe0239542a40
trap_fatal() at trap_fatal+0x60a/frame 0xfffffe0239542b70
trap() at trap+0x97c/frame 0xfffffe0239542dd0
trap_check() at trap_check+0x15/frame 0xfffffe0239542df0
calltrap() at calltrap+0x8/frame 0xfffffe0239542df0
--- trap 0x9, rip =3D 0xffffffff8097f249, rsp =3D 0xfffffe0239542ec0, rbp =
=3D
0xfffffe0239542f00 ---
ng_address_hook() at ng_address_hook+0x59/frame 0xfffffe0239542f00
ng_iface_send() at ng_iface_send+0x108/frame 0xfffffe0239542f90
ng_iface_output() at ng_iface_output+0x447/frame 0xfffffe0239543060
ip_output() at ip_output+0x1864/frame 0xfffffe0239543300
tcp_output() at tcp_output+0x2602/frame 0xfffffe02395436a0
tcp_disconnect() at tcp_disconnect+0x18e/frame 0xfffffe02395436e0
tcp_usr_disconnect() at tcp_usr_disconnect+0xe6/frame 0xfffffe0239543710
sodisconnect() at sodisconnect+0x62/frame 0xfffffe0239543740
soclose() at soclose+0x95/frame 0xfffffe02395437b0
soo_close() at soo_close+0x4d/frame 0xfffffe02395437e0
fo_close() at fo_close+0x31/frame 0xfffffe0239543810
_fdrop() at _fdrop+0x46/frame 0xfffffe0239543840
closef() at closef+0x2d7/frame 0xfffffe02395438f0
closefp() at closefp+0xde/frame 0xfffffe0239543940
kern_close() at kern_close+0xe7/frame 0xfffffe0239543990
sys_close() at sys_close+0x1f/frame 0xfffffe02395439b0
syscallenter() at syscallenter+0x4ff/frame 0xfffffe0239543a80
amd64_syscall() at amd64_syscall+0x2a/frame 0xfffffe0239543bb0
Xfast_syscall() at Xfast_syscall+0xfb/frame 0xfffffe0239543bb0
--- syscall (6, FreeBSD ELF64, sys_close), rip =3D 0x801a4033a, rsp =3D
0x7fffffffd0a8, rbp =3D 0x7fffffffd0d0 ---
Uptime: 2h11m5s
Dumping 544 out of 8156 MB:..3%..12%..21%..33%..42%..53%..62%..71%..83%..92%

Reading symbols from /boot/modules/geom_mirror.ko...done.
Loaded symbols for /boot/modules/geom_mirror.ko
Reading symbols from /boot/modules/accf_http.ko...done.
Loaded symbols for /boot/modules/accf_http.ko
Reading symbols from /boot/modules/nvidia.ko...done.
Loaded symbols for /boot/modules/nvidia.ko
Reading symbols from /boot/modules/vboxdrv.ko...done.
Loaded symbols for /boot/modules/vboxdrv.ko
Reading symbols from /boot/modules/mmc.ko...done.
Loaded symbols for /boot/modules/mmc.ko
Reading symbols from /boot/modules/mmcsd.ko...done.
Loaded symbols for /boot/modules/mmcsd.ko
Reading symbols from /boot/modules/sdhci.ko...done.
Loaded symbols for /boot/modules/sdhci.ko
Reading symbols from /boot/modules/h_ertt.ko...done.
Loaded symbols for /boot/modules/h_ertt.ko
Reading symbols from /boot/modules/cc_chd.ko...done.
Loaded symbols for /boot/modules/cc_chd.ko
Reading symbols from /boot/modules/geom_sched.ko...done.
Loaded symbols for /boot/modules/geom_sched.ko
Reading symbols from /boot/modules/gsched_rr.ko...done.
Loaded symbols for /boot/modules/gsched_rr.ko
Reading symbols from /boot/modules/vboxnetflt.ko...done.
Loaded symbols for /boot/modules/vboxnetflt.ko
Reading symbols from /boot/modules/vboxnetadp.ko...done.
Loaded symbols for /boot/modules/vboxnetadp.ko
Reading symbols from /boot/modules/nullfs.ko...done.
Loaded symbols for /boot/modules/nullfs.ko
Reading symbols from /usr/local/modules/rtc.ko...done.
Loaded symbols for /usr/local/modules/rtc.ko
#0  doadump (textdump=3D1) at /data2/src/sys/kern/kern_shutdown.c:298
298             dumptid =3D curthread->td_tid;
(kgdb) bt
#0  doadump (textdump=3D1) at /data2/src/sys/kern/kern_shutdown.c:298
#1  0xffffffff807a0828 in kern_reboot (howto=3D260) at
/data2/src/sys/kern/kern_shutdown.c:366
#2  0xffffffff807a125f in vpanic (fmt=3D0xffffffff80cf5311 "%s",
ap=3D0xfffffe0239542a20)
    at /data2/src/sys/kern/kern_shutdown.c:759
#3  0xffffffff807a12d0 in panic (fmt=3D0xffffffff80cf5311 "%s") at
/data2/src/sys/kern/kern_shutdown.c:690
#4  0xffffffff80c06a0a in trap_fatal (frame=3D0xfffffe0239542e00, eva=3D0) =
at
/data2/src/sys/amd64/amd64/trap.c:801
#5  0xffffffff80c0604c in trap (frame=3D0xfffffe0239542e00) at
/data2/src/sys/amd64/amd64/trap.c:549
#6  0xffffffff80c07085 in trap_check (frame=3D0xfffffe0239542e00) at
/data2/src/sys/amd64/amd64/trap.c:602
#7  0xffffffff80bdeba3 in calltrap () at
/data2/src/sys/amd64/amd64/exception.S:236
#8  0xffffffff8097f249 in ng_address_hook (here=3D0x0, item=3D0xfffff8011dd=
d1f00,
hook=3D0xfffff801f8232300, retaddr=3D0)
    at /data2/src/sys/netgraph/ng_base.c:3586
#9  0xffffffff80986548 in ng_iface_send (ifp=3D0xfffff801f8ef2800,
m=3D0xfffff801f8526100, sa=3D2 '\002')
    at /data2/src/sys/netgraph/ng_iface.c:451
#10 0xffffffff80985c97 in ng_iface_output (ifp=3D0xfffff801f8ef2800,
m=3D0xfffff801f8526100, dst=3D0xfffff8011db98720,=20
    ro=3D0xfffff8011db98700) at /data2/src/sys/netgraph/ng_iface.c:386
#11 0xffffffff809cae14 in ip_output (m=3D0xfffff801f8526100, opt=3D0x0,
ro=3D0xfffff8011db98700, flags=3D0, imo=3D0x0,=20
    inp=3D0xfffff8011db98570) at /data2/src/sys/netinet/ip_output.c:655
#12 0xffffffff809e08d2 in tcp_output (tp=3D0xfffff801f8258410) at
/data2/src/sys/netinet/tcp_output.c:1446
#13 0xffffffff809f5f4e in tcp_disconnect (tp=3D0xfffff801f8258410) at
/data2/src/sys/netinet/tcp_usrreq.c:1946
#14 0xffffffff809f29a6 in tcp_usr_disconnect (so=3D0xfffff8011d8de360) at
/data2/src/sys/netinet/tcp_usrreq.c:674
#15 0xffffffff80884072 in sodisconnect (so=3D0xfffff8011d8de360) at
/data2/src/sys/kern/uipc_socket.c:1051
#16 0xffffffff808839b5 in soclose (so=3D0xfffff8011d8de360) at
/data2/src/sys/kern/uipc_socket.c:869
#17 0xffffffff8084d67d in soo_close (fp=3D0xfffff8011df23b40,
td=3D0xfffff801f8c5d000) at /data2/src/sys/kern/sys_socket.c:334
#18 0xffffffff8072bee1 in fo_close (fp=3D0xfffff8011df23b40,
td=3D0xfffff801f8c5d000) at file.h:346
#19 0xffffffff80726b86 in _fdrop (fp=3D0xfffff8011df23b40, td=3D0xfffff801f=
8c5d000)
at /data2/src/sys/kern/kern_descrip.c:2849
#20 0xffffffff8072b1f7 in closef (fp=3D0xfffff8011df23b40, td=3D0xfffff801f=
8c5d000)
at /data2/src/sys/kern/kern_descrip.c:2430
#21 0xffffffff8072768e in closefp (fdp=3D0xfffff80007104000, fd=3D6,
fp=3D0xfffff8011df23b40, td=3D0xfffff801f8c5d000, holdleaders=3D0)
    at /data2/src/sys/kern/kern_descrip.c:1191
#22 0xffffffff80728417 in kern_close (td=3D0xfffff801f8c5d000, fd=3D6) at
/data2/src/sys/kern/kern_descrip.c:1239
#23 0xffffffff8072831f in sys_close (td=3D0xfffff801f8c5d000,
uap=3D0xfffffe0239543b58)
    at /data2/src/sys/kern/kern_descrip.c:1218
#24 0xffffffff80c07b7f in syscallenter (td=3D0xfffff801f8c5d000,
sa=3D0xfffffe0239543b48) at subr_syscall.c:135
#25 0xffffffff80c0741a in amd64_syscall (td=3D0xfffff801f8c5d000, traced=3D=
0) at
/data2/src/sys/amd64/amd64/trap.c:902
#26 0xffffffff80bdee8b in Xfast_syscall () at
/data2/src/sys/amd64/amd64/exception.S:396
#27 0x0000000801a4033a in ?? ()
Previous frame inner to this frame (corrupt stack?)
Current language:  auto; currently minimal
(kgdb) frame 8
#8  0xffffffff8097f249 in ng_address_hook (here=3D0x0, item=3D0xfffff8011dd=
d1f00,
hook=3D0xfffff801f8232300, retaddr=3D0)
    at /data2/src/sys/netgraph/ng_base.c:3586
3586                NG_HOOK_NOT_VALID(peer =3D NG_HOOK_PEER(hook)) ||
(kgdb) l
3581             * that the peer is still connected (even if invalid,) we k=
now
3582             * that the peer node is present, though maybe invalid.
3583             */
3584            TOPOLOGY_RLOCK();
3585            if ((hook =3D=3D NULL) || NG_HOOK_NOT_VALID(hook) ||
3586                NG_HOOK_NOT_VALID(peer =3D NG_HOOK_PEER(hook)) ||
3587                NG_NODE_NOT_VALID(peernode =3D NG_PEER_NODE(hook))) {
3588                    NG_FREE_ITEM(item);
3589                    TRAP_ERROR();
3590                    TOPOLOGY_RUNLOCK();
(kgdb) p *hook
$1 =3D {
  hk_name =3D 0xfffff801f8232300
"=C3=83=C5=BE=C3=83=E2=82=AC=C3=82=C2=AD=C3=83=C5=BE=C3=83=C5=BE=C3=83=E2=
=82=AC=C3=82=C2=AD=C3=83=C5=BE=C3=83=C5=BE=C3=83=E2=82=AC=C3=82=C2=AD=C3=83=
=C5=BE=C3=83=C5=BE=C3=83=E2=82=AC=C3=82=C2=AD=C3=83=C5=BE=C3=83=C5=BE=C3=83=
=E2=82=AC=C3=82=C2=AD=C3=83=C5=BE=C3=83=C5=BE=C3=83=E2=82=AC=C3=82=C2=AD=C3=
=83=C5=BE=C3=83=C5=BE=C3=83=E2=82=AC=C3=82=C2=AD=C3=83=C5=BE=C3=83=C5=BE=C3=
=83=E2=82=AC=C3=82=C2=AD=C3=83=C5=BE=C3=83=C5=BE=C3=83=E2=82=AC=C3=82=C2=AD=
=C3=83=C5=BE=C3=83=C5=BE=C3=83=E2=82=AC=C3=82=C2=AD=C3=83=C5=BE=C3=83=C5=BE=
=C3=83=E2=82=AC=C3=82=C2=AD=C3=83=C5=BE=C3=83=C5=BE=C3=83=E2=82=AC=C3=82=C2=
=AD=C3=83=C5=BE=C3=83=C5=BE=C3=83=E2=82=AC=C3=82=C2=AD=C3=83=C5=BE=C3=83=C5=
=BE=C3=83=E2=82=AC=C3=82=C2=AD=C3=83=C5=BE=C3=83=C5=BE=C3=83=E2=82=AC=C3=82=
=C2=AD=C3=83=C5=BE=C3=83=C5=BE=C3=83=E2=82=AC=C3=82=C2=AD=C3=83=C5=BE=C3=83=
=C5=BE=C3=83=E2=82=AC=C3=82=C2=AD=C3=83=C5=BE=C3=83=C5=BE=C3=83=E2=82=AC=C3=
=82=C2=AD=C3=83=C5=BE=C3=83=C5=BE=C3=83=E2=82=AC=C3=82=C2=AD=C3=83=C5=BE=C3=
=83=C5=BE=C3=83=E2=82=AC=C3=82=C2=AD=C3=83=C5=BE=C3=83=C5=BE=C3=83=E2=82=AC=
=C3=82=C2=AD=C3=83=C5=BE=C3=83=C5=BE=C3=83=E2=82=AC=C3=82=C2=AD=C3=83=C5=BE=
=C3=83=C5=BE=C3=83=E2=82=AC=C3=82=C2=AD=C3=83=C5=BE=C3=83=C5=BE=C3=83=E2=82=
=AC=C3=82=C2=AD=C3=83=C5=BE=C3=83=C5=BE=C3=83=E2=82=AC=C3=82=C2=AD=C3=83=C5=
=BE=C3=83=C5=BE=C3=83=E2=82=AC=C3=82=C2=AD=C3=83=C5=BE=C3=83=C5=BE=C3=83=E2=
=82=AC=C3=82=C2=AD=C3=83=C5=BE=C3=83=C5=BE=C3=83=E2=82=AC=C3=82=C2=AD=C3=83=
=C5=BE=C3=83=C5=BE=C3=83=E2=82=AC=C3=82=C2=AD=C3=83=C5=BE=C3=83=C5=BE=C3=83=
=E2=82=AC=C3=82=C2=AD=C3=83=C5=BEP=C3=82=C2=AF\003\201=C3=83=C2=BF=C3=83=C2=
=BF=C3=83=C2=BF=C3=83=C2=BF=C3=83=C5=BE=C3=83=E2=82=AC=C3=82=C2=AD=C3=83=C5=
=BE=C3=83=C5=BE=C3=83=E2=82=AC=C3=82=C2=AD=C3=83=C5=BE=C3=83=C5=BE=C3=83=E2=
=82=AC=C3=82=C2=AD=C3=83=C5=BE=C3=83=C5=BE=C3=83=E2=82=AC=C3=82=C2=AD=C3=83=
=C5=BE=C3=83=C5=BE=C3=83=E2=82=AC=C3=82=C2=AD=C3=83=C5=BE=C3=83=C5=BE=C3=83=
=E2=82=AC=C3=82=C2=AD=C3=83=C5=BE=C3=83=C5=BE=C3=83=E2=82=AC=C3=82=C2=AD=C3=
=83=C5=BE=C3=83=C5=BE=C3=83=E2=82=AC=C3=82=C2=AD=C3=83=C5=BE=C3=83=C5=BE=C3=
=83=E2=82=AC=C3=82=C2=AD=C3=83=C5=BE=C3=83=C5=BE=C3=83=E2=82=AC=C3=82=C2=AD=
=C3=83=C5=BE=C3=83=C5=BE=C3=83=E2=82=AC=C3=82=C2=AD=C3=83=C5=BE=C3=83=C5=BE=
=C3=83=E2=82=AC=C3=82=C2=AD=C3=83=C5=BE=C3=83=C5=BE=C3=83=E2=82=AC=C3=82=C2=
=AD=C3=83=C5=BE=C3=83=C5=BE=C3=83=E2=82=AC=C3=82=C2=AD=C3=83=C5=BE=C3=83=C5=
=BE=C3=83=E2=82=AC=C3=82=C2=AD=C3=83=C5=BE=C3=83=C5=BE=C3=83=E2=82=AC=C3=82=
=C2=AD=C3=83=C5=BE=C3=83=C5=BE=C3=83=E2=82=AC=C3=82=C2=AD=C3=83=C5=BE=C3=83=
=C5=BE=C3=83=E2=82=AC=C3=82=C2=AD=C3=83=C5=BE"...,=20
  hk_private =3D 0xdeadc0dedeadc0de, hk_flags =3D -559038242, hk_type =3D -=
559038242,
hk_peer =3D 0xdeadc0dedeadc0de,=20
  hk_node =3D 0xdeadc0dedeadc0de, hk_hooks =3D {le_next =3D 0xdeadc0dedeadc=
0de,
le_prev =3D 0xdeadc0dedeadc0de},=20
  hk_rcvmsg =3D 0xdeadc0dedeadc0de, hk_rcvdata =3D 0xdeadc0dedeadc0de, hk_r=
efs =3D
-559038242}
(kgdb)=20

Attached patch introduces per-node rwlock for ng_iface to protect usage of =
its
private data while it is being modified. Without the patch, my stress test =
for
mpd procudes this panic in short time. With patch applied, it was running o=
ver
11 hours non-stop and no panics.

--=20
You are receiving this mail because:
You are the assignee for the bug.=



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-220076-8>