Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 24 Sep 2015 08:55:27 +0200
From:      Palle Girgensohn <girgen@FreeBSD.org>
To:        Julien Charbon <jch@freebsd.org>
Cc:        George Neville-Neil <gnn@FreeBSD.org>, freebsd-net@freebsd.org
Subject:   Re: Kernel panics in tcp_twclose
Message-ID:  <30AD333B-EC8B-4EEF-8FE2-8EA8C216601E@FreeBSD.org>
In-Reply-To: <4E148E2E-F8D2-41C2-B232-9FD1548AA20B@pingpong.net>
References:  <26B0FF93-8AE3-4514-BDA1-B966230AAB65@FreeBSD.org> <55FC1809.3070903@freebsd.org> <20150918160605.GN67105@kib.kiev.ua> <55FFBE01.6060706@freebsd.org> <3721F099-F45D-4DCD-8AB3-84D1ABC44145@FreeBSD.org> <73856F2B-3E70-483C-9988-C84E798CEB44@FreeBSD.org> <44EBAC98-4761-4E47-8E47-5032430A1C8A@FreeBSD.org> <56019AF8.8000705@freebsd.org> <F9D29C16-502B-43A1-BE2C-D2AD30F0B9EF@FreeBSD.org> <5601CF2D.9030307@freebsd.org> <E09DF89D-AAC5-48FD-8B75-EEAB937A5C32@FreeBSD.org> <5602E90A.9050504@freebsd.org> <0931591A-23EC-40CB-A109-72E9308B1A2D@pingpong.net> <5602F044.5010606@freebsd.org> <54767991-9D3B-4ECB-A07E-CFA21A54BBDD@pingpong.net> <4E148E2E-F8D2-41C2-B232-9FD1548AA20B@pingpong.net>

next in thread | previous in thread | raw e-mail | index | archive | help

> 24 sep 2015 kl. 07:51 skrev Palle Girgensohn <girgen@pingpong.net>:
>=20
>=20
>> 24 sep 2015 kl. 00:05 skrev Palle Girgensohn <girgen@pingpong.net>:
>>=20
>> Hi,
>>=20
>>> 23 sep 2015 kl. 20:32 skrev Julien Charbon <jch@freebsd.org>:
>>>=20
>>>=20
>>> Hi,
>>>=20
>>> On 23/09/15 20:26, Palle Girgensohn wrote:
>>>>> 23 sep. 2015 kl. 20:01 skrev Julien Charbon <jch@freebsd.org>:
>>>>> On 23/09/15 16:36, Palle Girgensohn wrote:
>>>>>>> 22 sep 2015 kl. 23:59 skrev Julien Charbon <jch@freebsd.org>: On
>>>>>>> 22/09/15 22:58, Palle Girgensohn wrote:
>>>>>>>>> 22 sep 2015 kl. 20:16 skrev Julien Charbon <jch@freebsd.org>:=20=

>>>>>>>>> On 22/09/15 18:49, Palle Girgensohn wrote:
>>>>>>>>>>> 22 sep 2015 kl. 18:46 skrev Palle Girgensohn=20
>>>>>>>>>>> <girgen@FreeBSD.org>:
>>>>>>>>>>>> 21 sep 2015 kl. 15:53 skrev Palle Girgensohn=20
>>>>>>>>>>>> <girgen@FreeBSD.org>:
>>>>>>>>>>>>> 21 sep 2015 kl. 10:21 skrev Julien Charbon=20
>>>>>>>>>>>>> <jch@FreeBSD.org>: On 18/09/15 18:06, Konstantin
>>>>>>>>>>>>> Belousov wrote:
>>>>>>>>>>>>>> On Fri, Sep 18, 2015 at 03:56:25PM +0200, Julien
>>>>>>>>>>>>>> Charbon wrote:
>>>>>>>>>>>>>>> [...]
>>>>>>>>>>>>> - Second, if issue is still in stable/10, compile 10.2=20
>>>>>>>>>>>>> kernel with these options:
>>>>>>>>>>>>>=20
>>>>>>>>>>>>> options        DDB options        DEADLKRES options=20
>>>>>>>>>>>>> INVARIANTS options        INVARIANT_SUPPORT options
>>>>>>>>>>>>> WITNESS options        WITNESS_SKIPSPIN
>>>>>>>>>>>>>=20
>>>>>>>>>>>>> To see where the original fault is coming from.
>>>>>>>>>>>> [...]
>>>>>>>>>>>>=20
>>>>>>>>>>>> I'll try stable/10 now. Would you suggest a "clean"=20
>>>>>>>>>>>> stable/10, or could 287621 and 287780 help?
>>>>>>>>>>>>=20
>>>>>>>>>>>> I'll add the debugging suggested options right away.
>>>>>>>>>>>>=20
>>>>>>>>>>>> Palle
>>>>>>>>>>>=20
>>>>>>>>>>> I have a new core dump from ^/stable/10 with:
>>>>>>>>>>>=20
>>>>>>>>>>> options        DDB options        DEADLKRES options
>>>>>>>>>>> INVARIANTS options        INVARIANT_SUPPORT options WITNESS
>>>>>>>>>>> options WITNESS_SKIPSPIN
>>>>>>>>>>=20
>>>>>>>>>> # kgdb kernel /var/crash/vmcore.2 GNU gdb 6.1.1 [FreeBSD]=20
>>>>>>>>>> Copyright 2004 Free Software Foundation, Inc. GDB is free=20
>>>>>>>>>> software, covered by the GNU General Public License, and you
>>>>>>>>>> are welcome to change it and/or distribute copies of it under
>>>>>>>>>> certain conditions. Type "show copying" to see the
>>>>>>>>>> conditions. There is absolutely no warranty for GDB.  Type
>>>>>>>>>> "show warranty" for details. This GDB was configured as
>>>>>>>>>> "amd64-marcel-freebsd"...
>>>>>>>>>>=20
>>>>>>>>>> Unread portion of the kernel message buffer: panic:
>>>>>>>>>> tcp_detach: INP_TIMEWAIT && INP_DROPPED && tp !=3D NULL cpuid =
=3D
>>>>>>>>>> 16 KDB: stack backtrace: db_trace_self_wrapper() at=20
>>>>>>>>>> db_trace_self_wrapper+0x2b/frame 0xfffffe183d9e97e0=20
>>>>>>>>>> kdb_backtrace() at kdb_backtrace+0x39/frame
>>>>>>>>>> 0xfffffe183d9e9890 vpanic() at vpanic+0x126/frame
>>>>>>>>>> 0xfffffe183d9e98d0 kassert_panic() at
>>>>>>>>>> kassert_panic+0x139/frame 0xfffffe183d9e9940
>>>>>>>>>> tcp_usr_detach() at tcp_usr_detach+0xf9/frame
>>>>>>>>>> 0xfffffe183d9e9970 sofree() at sofree+0x1f1/frame
>>>>>>>>>> 0xfffffe183d9e99a0 soclose() at soclose+0x3a0/frame
>>>>>>>>>> 0xfffffe183d9e99f0 _fdrop() at _fdrop+0x29/frame
>>>>>>>>>> 0xfffffe183d9e9a10 closef() at closef+0x1e2/frame
>>>>>>>>>> 0xfffffe183d9e9aa0 closefp() at closefp+0x9d/frame
>>>>>>>>>> 0xfffffe183d9e9ae0 amd64_syscall() at=20
>>>>>>>>>> amd64_syscall+0x25a/frame 0xfffffe183d9e9bf0 Xfast_syscall()
>>>>>>>>>> at Xfast_syscall+0xfb/frame 0xfffffe183d9e9bf0 --- syscall
>>>>>>>>>> (6, FreeBSD ELF64, sys_close), rip =3D 0x801c8d94a, rsp =3D=20=

>>>>>>>>>> 0x7ffff91c8668, rbp =3D 0x7ffff91c8680 --- KDB: enter: panic=20=

>>>>>>>>>> Uptime: 18h57m59s Dumping 23085 out of 98263=20
>>>>>>>>>> MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91%
>>>>>>>>>>=20
>>>>>>>>>> Reading symbols from /boot/kernel/nullfs.ko.symbols...done.=20=

>>>>>>>>>> Loaded symbols for /boot/kernel/nullfs.ko.symbols Reading
>>>>>>>>>> symbols from /boot/kernel/zfs.ko.symbols...done. Loaded
>>>>>>>>>> symbols for /boot/kernel/zfs.ko.symbols Reading symbols from
>>>>>>>>>> /boot/kernel/opensolaris.ko.symbols...done. Loaded symbols
>>>>>>>>>> for /boot/kernel/opensolaris.ko.symbols Reading symbols from
>>>>>>>>>> /boot/kernel/ng_bridge.ko.symbols...done. Loaded symbols for
>>>>>>>>>> /boot/kernel/ng_bridge.ko.symbols Reading symbols from=20
>>>>>>>>>> /boot/kernel/netgraph.ko.symbols...done. Loaded symbols for=20=

>>>>>>>>>> /boot/kernel/netgraph.ko.symbols Reading symbols from=20
>>>>>>>>>> /boot/kernel/ng_eiface.ko.symbols...done. Loaded symbols for
>>>>>>>>>> /boot/kernel/ng_eiface.ko.symbols Reading symbols from=20
>>>>>>>>>> /boot/kernel/ng_ether.ko.symbols...done. Loaded symbols for=20=

>>>>>>>>>> /boot/kernel/ng_ether.ko.symbols Reading symbols from=20
>>>>>>>>>> /boot/kernel/accf_data.ko.symbols...done. Loaded symbols for
>>>>>>>>>> /boot/kernel/accf_data.ko.symbols Reading symbols from=20
>>>>>>>>>> /boot/kernel/accf_http.ko.symbols...done. Loaded symbols for
>>>>>>>>>> /boot/kernel/accf_http.ko.symbols Reading symbols from=20
>>>>>>>>>> /boot/kernel/ums.ko.symbols...done. Loaded symbols for=20
>>>>>>>>>> /boot/kernel/ums.ko.symbols Reading symbols from=20
>>>>>>>>>> /boot/kernel/ng_socket.ko.symbols...done. Loaded symbols for
>>>>>>>>>> /boot/kernel/ng_socket.ko.symbols Reading symbols from=20
>>>>>>>>>> /boot/kernel/fdescfs.ko.symbols...done. Loaded symbols for=20
>>>>>>>>>> /boot/kernel/fdescfs.ko.symbols #0  doadump (textdump=3D1) at=20=

>>>>>>>>>> pcpu.h:219 219        __asm("movq %%gs:%1,%0" : "=3Dr" (td) =
(kgdb) bt
>>>>>>>>>> #0 doadump (textdump=3D1) at pcpu.h:219 #1  =
0xffffffff8094b337
>>>>>>>>>> in kern_reboot (howto=3D260) at
>>>>>>>>>> /usr/src/sys/kern/kern_shutdown.c:451 #2 0xffffffff8094b845
>>>>>>>>>> in vpanic (fmt=3D<value optimized out>, ap=3D<value optimized
>>>>>>>>>> out>) at /usr/src/sys/kern/kern_shutdown.c:758 #3
>>>>>>>>>> 0xffffffff8094b6d9 in kassert_panic (fmt=3D<value optimized
>>>>>>>>>> out>) at /usr/src/sys/kern/kern_shutdown.c:646 #4
>>>>>>>>>> 0xffffffff80b1ee59 in tcp_usr_detach (so=3D<value optimized
>>>>>>>>>> out>) at /usr/src/sys/netinet/tcp_usrreq.c:202 #5
>>>>>>>>>> 0xffffffff809cd291 in sofree (so=3D0xfffff801dd302000) at=20
>>>>>>>>>> /usr/src/sys/kern/uipc_socket.c:747 #6  0xffffffff809cdb00 in
>>>>>>>>>> soclose (so=3D<value optimized out>) at=20
>>>>>>>>>> /usr/src/sys/kern/uipc_socket.c:849 #7  0xffffffff808fe659 in
>>>>>>>>>> _fdrop (fp=3D0xfffff802a593db40, td=3D0x0) at file.h:343 #8=20=

>>>>>>>>>> 0xffffffff80901092 in closef (fp=3D0xfffff802a593db40,=20
>>>>>>>>>> td=3D0xfffff80eebc894a0) at
>>>>>>>>>> /usr/src/sys/kern/kern_descrip.c:2338 #9 0xffffffff808feb5d
>>>>>>>>>> in closefp (fdp=3D0xfffff80b20cce000, fd=3D<value optimized =
out>,
>>>>>>>>>> fp=3D0xfffff802a593db40, td=3D0xfffff80eebc894a0,
>>>>>>>>>> holdleaders=3D<value optimized out>) at=20
>>>>>>>>>> /usr/src/sys/kern/kern_descrip.c:1194 #10 0xffffffff80d7bc3a
>>>>>>>>>> in amd64_syscall (td=3D0xfffff80eebc894a0, traced=3D0) at=20
>>>>>>>>>> subr_syscall.c:134 #11 0xffffffff80d5f1db in Xfast_syscall ()
>>>>>>>>>> at /usr/src/sys/amd64/amd64/exception.S:396 #12
>>>>>>>>>> 0x0000000801c8d94a in ?? () Previous frame inner to this
>>>>>>>>>> frame (corrupt stack?) Current language:  auto; currently
>>>>>>>>>> minimal
>>>>>>>>>=20
>>>>>>>>> Thanks for the information.  As I suspected the initial error
>>>>>>>>> was elsewhere than tcp_twclose(), never got this assertion
>>>>>>>>> before:
>>>>>>>>>=20
>>>>>>>>> tcp_detach: INP_TIMEWAIT && INP_DROPPED && tp !=3D NULL
>>>>>>>>>=20
>>>>>>>>> from here:
>>>>>>>>>=20
>>>>>>>>> static void tcp_detach(struct socket *so, struct inpcb *inp) {
>>>>>>>>> struct tcpcb *tp;
>>>>>>>>>=20
>>>>>>>>> INP_INFO_WLOCK_ASSERT(&V_tcbinfo); INP_WLOCK_ASSERT(inp);
>>>>>>>>>=20
>>>>>>>>> KASSERT(so->so_pcb =3D=3D inp, ("tcp_detach: so_pcb !=3D =
inp"));=20
>>>>>>>>> KASSERT(inp->inp_socket =3D=3D so, ("tcp_detach: inp_socket !=3D=

>>>>>>>>> so"));
>>>>>>>>>=20
>>>>>>>>> tp =3D intotcpcb(inp);
>>>>>>>>>=20
>>>>>>>>> if (inp->inp_flags & INP_TIMEWAIT) { if (inp->inp_flags &=20
>>>>>>>>> INP_DROPPED) { KASSERT(tp =3D=3D NULL, ("tcp_detach: =
INP_TIMEWAIT
>>>>>>>>> && " "INP_DROPPED && tp !=3D NULL"));
>>>>>>>>>=20
>>>>>>>>> Let me check if I could find a path that could lead to this=20
>>>>>>>>> unexpected case.  Unexpected because:  INP_DROPPED and=20
>>>>>>>>> inp->inp_ppcb is set to NULL are set at same time here:
>>>>>>>>>=20
>>>>>>>>> void tcp_twclose(struct tcptw *tw, int reuse) { struct socket
>>>>>>>>> *so; struct inpcb *inp;
>>>>>>>>>=20
>>>>>>>>> inp =3D tw->tw_inpcb; KASSERT((inp->inp_flags & INP_TIMEWAIT),=20=

>>>>>>>>> ("tcp_twclose: !timewait")); KASSERT(intotw(inp) =3D=3D tw,=20
>>>>>>>>> ("tcp_twclose: inp_ppcb !=3D tw"));=20
>>>>>>>>> INP_INFO_WLOCK_ASSERT(&V_tcbinfo);      /* in_pcbfree() */=20
>>>>>>>>> INP_WLOCK_ASSERT(inp);
>>>>>>>>>=20
>>>>>>>>> tcp_tw_2msl_stop(tw, reuse); inp->inp_ppcb =3D NULL;=20
>>>>>>>>> in_pcbdrop(inp); ...
>>>>>>>>>=20
>>>>>>>>> Interesting [...]
>>>>>>=20
>>>>>> Just a quick update. Julien is pursuing this off list with a core
>>>>>> dump and we are now waiting for a new core dump with the first
>>>>>> KASSERT removed. this is on a stable/10 kernel.
>>>>>=20
>>>>> By the way Palle could you also run below Dtrace script to see =
where
>>>>> this tcp_close() in INP_TIMEWAIT comes from:
>>>>>=20
>>>>> $ cat tcp-close-tw.d
>>>>> fbt::tcp_close:entry
>>>>> /args[0]->t_inpcb->inp_flags & 0x01000000/
>>>>> {
>>>>> @s1[stack()] =3D count()
>>>>> }
>>>>>=20
>>>>> tick-1sec {
>>>>> printa(@s1);
>>>>> }
>>>>> $ sudo dtrace -s tcp-close-tw.d
>>>>>=20
>>>>> And share any backtraces reported in this dtrace script output.
>>>>>=20
>>>>> George, could you check if this dtrace script makes sense for you, =
and
>>>>> if you have any improvements to add, I am quite a rookie in Dtrace =
scripts.
>>>>=20
>>>> Shall I let the dtrace script run continuously until the machine =
crashes? Or just run it once?
>>>=20
>>> Continuously until the machine crashes.  You can report any =
backstrace
>>> outputs like:
>>>=20
>>>            kernel`tcp_usr_close+0x86
>>>            kernel`soclose+0xe4
>>>            kernel`_fdrop+0x29
>>>            kernel`closef+0x237
>>>            kernel`closefp+0x95
>>>            kernel`amd64_syscall+0x357
>>>            kernel`0xffffffff80c83c4b
>>>              1
>>>=20
>>> before the machine crashes.   But I expect the problematic case
>>> detection with Dtrace to be quickly followed by the crash.  Will =
see.
>>>=20
>>> --
>>> Julien
>>>=20
>>=20
>> Kernels and userland are updated to 10.2-p3 with the patch removing =
the suspicous KASSERT.=20
>>=20
>> dtrace running continously redirecting to a log file.
>>=20
>> now we're just waiting... :)
>>=20
>> Palle
>=20
> Is the dtrace correct?
>=20
> $ sort -u dtrace.out=20
>=20
>  0  59779                       :tick-1sec=20
> CPU     ID                    FUNCTION:NAME
> $ wc -l dtrace.out=20
>   56233 dtrace.out
>=20
>=20
> All it does is write
>=20
> 0  59779                       :tick-1sec=20
>=20
> once a second.
>=20
> Just checking... :)
>=20
> Palle


Just had a crash. Unfortunately, the kernel was stuck at the db> prompt, =
and the remote keyboard was unresponsive (HP ILO, not impressed). So I =
had to reset the power and never got a core dump... :(

panic: tcp_tw_2msl_stop: inp should not be released here                 =
      =20
cpuid =3D 0                                                              =
        =20
KDB: stack backtrace:                                                    =
      =20
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame =
0xfffffe175acd16a0 =20
kdb_backtrace() at kdb_backtrace+0x39/frame 0xfffffe175acd1750           =
      =20
vpanic() at vpanic+0x126/frame 0xfffffe175acd1790                        =
      =20
kassert_panic() at kassert_panic+0x139/frame 0xfffffe175acd1800          =
      =20
tcp_twclose() at tcp_twclose+0x2cb/frame 0xfffffe175acd1850              =
      =20
tcp_tw_2msl_scan() at tcp_tw_2msl_scan+0x13b/frame 0xfffffe175acd1890    =
      =20
tcp_slowtimo() at tcp_slowtimo+0x68/frame 0xfffffe175acd18c0             =
      =20
pfslowtimo() at pfslowtimo+0x54/frame 0xfffffe175acd18f0                 =
      =20
softclock_call_cc() at softclock_call_cc+0x193/frame 0xfffffe175acd19d0  =
      =20
softclock() at softclock+0x47/frame 0xfffffe175acd19f0                   =
      =20
intr_event_execute_handlers() at intr_event_execute_handlers+0x93/frame =
0xfffffe
175acd1a30                                                               =
      =20
ithread_loop() at ithread_loop+0xa6/frame 0xfffffe175acd1a70             =
      =20
fork_exit() at fork_exit+0x84/frame 0xfffffe175acd1ab0                   =
      =20
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe175acd1ab0        =
      =20
--- trap 0, rip =3D 0, rsp =3D 0xfffffe175acd1b70, rbp =3D 0 ---         =
            =20
KDB: enter: panic                                                        =
      =20
[ thread pid 12 tid 100043 ]                                             =
      =20
Stopped at      kdb_enter+0x3e: movq    $0,kdb_why                       =
      =20
db>                                                                      =
      =20



Is there a way to configure the kernel to get all the gory debug stuff =
without it dropping to the debug prompt on panic? I'd rather see it just =
dump core and restart automatically.

Palle







Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?30AD333B-EC8B-4EEF-8FE2-8EA8C216601E>