Date: Thu, 22 Oct 2020 23:19:04 +0000 From: "Bjoern A. Zeeb" <bzeeb-lists@lists.zabbadoz.net> To: "Alexander V. Chernikov" <melifaro@ipfw.ru> Cc: "Ryan Stone" <rysto32@gmail.com>, freebsd-net <freebsd-net@freebsd.org> Subject: Re: Panic in in6_joingroup_locked Message-ID: <A385FA4D-4668-430F-A308-7A22236A38C4@lists.zabbadoz.net> In-Reply-To: <244891603404616@mail.yandex.ru> References: <CAFMmRNwZLh8G5Yc2XPQ=zaAnZCa5UuuT9_qkGUC837vYPFd%2B9g@mail.gmail.com> <244891603404616@mail.yandex.ru>
next in thread | previous in thread | raw e-mail | index | archive | help
On 22 Oct 2020, at 22:10, Alexander V. Chernikov wrote: > 21.10.2020, 23:05, "Ryan Stone" <rysto32@gmail.com>: >> Today at $WORK we saw a panic due to a race between >> in6_joingroup_locked and if_detach_internal. This happened on a >> branch that's about 2 years behind head, but the relevant code in = >> head >> does not appear to have changed. >> >> The backtrace of the panic was this: >> >> panic: Fatal trap 9: general protection fault while in kernel mode >> Stack: -------------------------------------------------- >> kernel:trap_fatal+0x96 >> kernel:trap+0x76 >> kernel:in6_joingroup_locked+0x2c7 >> kernel:in6_joingroup+0x46 >> kernel:in6_update_ifa+0x18e5 >> kernel:in6_ifattach+0x4d0 >> kernel:in6_if_up+0x99 >> kernel:if_up+0x7d >> kernel:ifhwioctl+0xcea >> kernel:ifioctl+0x2c9 >> kernel:kern_ioctl+0x29b >> kernel:sys_ioctl+0x16d >> kernel:amd64_syscall+0x327 >> >> We panic'ed here, because the memory pointed to by ifma has been = >> freed >> and filled with 0xdeadc0de: >> >> https://svnweb.freebsd.org/base/head/sys/netinet6/in6_mcast.c?revision= =3D365071&view=3Dmarkup#l421 >> >> Another thread was in the process of trying to destroy the same >> interface. It had the following backtrace at the time of the panic: >> >> #0 sched_switch (td=3D0xfffffea654845aa0, newtd=3D0xfffffea266fa9aa0, >> flags=3D<optimized out>) at /b/mnt/src/sys/kern/sched_ule.c:2423 >> #1 0xffffffff80643071 in mi_switch (flags=3D<optimized out>, newtd=3D0= x0) >> at /b/mnt/src/sys/kern/kern_synch.c:605 >> #2 0xffffffff80693234 in sleepq_switch (wchan=3D0xffffffff8139cc90 >> <ifv_sx>, pri=3D0) at /b/mnt/src/sys/kern/subr_sleepqueue.c:612 >> #3 0xffffffff806930c3 in sleepq_wait (wchan=3D0xffffffff8139cc90 >> <ifv_sx>, pri=3D0) at /b/mnt/src/sys/kern/subr_sleepqueue.c:691 >> #4 0xffffffff8063fcb3 in _sx_xlock_hard (sx=3D<optimized out>, >> x=3D<optimized out>, opts=3D0, timo=3D0, file=3D<optimized out>, >> line=3D<optimized out>) at >> /b/mnt/src/sys/kern/kern_sx.c:936 >> #5 0xffffffff8063f313 in _sx_xlock (sx=3D0xffffffff8139cc90 <ifv_sx>, >> opts=3D0, timo=3D<optimized out>, file=3D0xffffffff80ba6d2a >> "/b/mnt/src/sys/net/i >> f_vlan.c", line=3D668) at /b/mnt/src/sys/kern/kern_sx.c:352 >> #6 0xffffffff807558b2 in vlan_ifdetach (arg=3D<optimized out>, >> ifp=3D0xfffff8049b2ce000) at /b/mnt/src/sys/net/if_vlan.c:668 >> #7 0xffffffff80747676 in if_detach_internal (vmove=3D0, ifp=3D<optimiz= ed >> out>, ifcp=3D<optimized out>) at /b/mnt/src/sys/net/if.c:1203 >> #8 if_detach (ifp=3D0xfffff8049b2ce000) at /b/mnt/src/sys/net/if.c:106= 0 >> #9 0xffffffff80756521 in vlan_clone_destroy (ifc=3D0xfffff802f29dbe80,= >> ifp=3D0xfffff8049b2ce000) at /b/mnt/src/sys/net/if_vlan.c:1102 >> #10 0xffffffff8074dc57 in if_clone_destroyif (ifc=3D0xfffff802f29dbe80= , >> ifp=3D0xfffff8049b2ce000) at /b/mnt/src/sys/net/if_clone.c:330 >> #11 0xffffffff8074dafe in if_clone_destroy (name=3D<optimized out>) at= >> /b/mnt/src/sys/net/if_clone.c:288 >> #12 0xffffffff8074b2fd in ifioctl (so=3D0xfffffea6363806d0, >> cmd=3D2149607801, data=3D<optimized out>, td=3D0xfffffea654845aa0) at >> /b/mnt/src/sys/net/if. >> c:3077 >> #13 0xffffffff806aab1c in fo_ioctl (fp=3D<optimized out>, = >> com=3D<optimized >> out>, active_cred=3D<unavailable>, td=3D<optimized out>, data=3D<optim= ized >> out> >> ) at /b/mnt/src/sys/sys/file.h:396 >> #14 kern_ioctl (td=3D0xfffffea654845aa0, fd=3D4, com=3D<optimized out>= , >> data=3D<unavailable>) at /b/mnt/src/sys/kern/sys_generic.c:938 >> #15 0xffffffff806aa7fe in sys_ioctl (td=3D0xfffffea654845aa0, >> uap=3D0xfffffea653441b30) at /b/mnt/src/sys/kern/sys_generic.c:846 >> #16 0xffffffff809ceab8 in syscallenter (td=3D<optimized out>) at >> /b/mnt/src/sys/amd64/amd64/../../kern/subr_syscall.c:187 >> #17 amd64_syscall (td=3D0xfffffea654845aa0, traced=3D0) at >> /b/mnt/src/sys/amd64/amd64/trap.c:1196 >> #18 fast_syscall_common () at = >> /b/mnt/src/sys/amd64/amd64/exception.S:505 >> >> Frame 7 was at this point in if_detach_internal >> >> https://svnweb.freebsd.org/base/head/sys/net/if.c?revision=3D366230&vi= ew=3Dmarkup#l1206 >> >> As you can see, a couple of lines up if_purgemaddrs() was called and >> freed all multicast addresses assigned to the interface, which >> destroyed the multicast address being added out from under >> in6_joingroup_locked. > [sorry, re-posting in plain text] > I don't have a solution w.r.t. multicast locking spaghetti, but from = > looking into a code, it looks like that > extending network epoch to the whole in6_getmulti() would fix this = > panic? would that introduce even more recursions? >> I see two potential paths forward: either the wacky locking in >> in6_getmulti() gets fixed so that we don't have to do the "drop the >> lock to call a function that acquires that lock again" dance that >> opens up this race condition, or we fix if_addmulti so that it adds = >> an >> additional reference to the address if retifma is non-NULL. >> >> The second option would be a KPI change that would have a nasty side >> effect of leaking the address if an existing caller wasn't fixed, but >> on the other hand the current interface seems pretty useless if it >> can't actually guarantee that the address you asked for will exist >> when you get around to trying to manipulate it. >> >> Does anybody have any thoughts on this? >> _______________________________________________ >> freebsd-net@freebsd.org mailing list >> https://lists.freebsd.org/mailman/listinfo/freebsd-net >> To unsubscribe, send any mail to = >> "freebsd-net-unsubscribe@freebsd.org" > _______________________________________________ > freebsd-net@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org"
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?A385FA4D-4668-430F-A308-7A22236A38C4>