Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 24 Sep 2015 09:03:23 +0200
From:      Julien Charbon <jch@freebsd.org>
To:        Palle Girgensohn <girgen@FreeBSD.org>
Cc:        freebsd-net@freebsd.org
Subject:   Re: Kernel panics in tcp_twclose
Message-ID:  <5603A03B.4060002@freebsd.org>
In-Reply-To: <30AD333B-EC8B-4EEF-8FE2-8EA8C216601E@FreeBSD.org>
References:  <26B0FF93-8AE3-4514-BDA1-B966230AAB65@FreeBSD.org> <55FC1809.3070903@freebsd.org> <20150918160605.GN67105@kib.kiev.ua> <55FFBE01.6060706@freebsd.org> <3721F099-F45D-4DCD-8AB3-84D1ABC44145@FreeBSD.org> <73856F2B-3E70-483C-9988-C84E798CEB44@FreeBSD.org> <44EBAC98-4761-4E47-8E47-5032430A1C8A@FreeBSD.org> <56019AF8.8000705@freebsd.org> <F9D29C16-502B-43A1-BE2C-D2AD30F0B9EF@FreeBSD.org> <5601CF2D.9030307@freebsd.org> <E09DF89D-AAC5-48FD-8B75-EEAB937A5C32@FreeBSD.org> <5602E90A.9050504@freebsd.org> <0931591A-23EC-40CB-A109-72E9308B1A2D@pingpong.net> <5602F044.5010606@freebsd.org> <54767991-9D3B-4ECB-A07E-CFA21A54BBDD@pingpong.net> <4E148E2E-F8D2-41C2-B232-9FD1548AA20B@pingpong.net> <30AD333B-EC8B-4EEF-8FE2-8EA8C216601E@FreeBSD.org>

next in thread | previous in thread | raw e-mail | index | archive | help
This is an OpenPGP/MIME signed message (RFC 4880 and 3156)
--B3Hv69f1vfCN4351XBs8QWLSu6hqAaUQ1
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: quoted-printable


 Hi Palle,

On 24/09/15 08:55, Palle Girgensohn wrote:
>> 24 sep 2015 kl. 07:51 skrev Palle Girgensohn
>> <girgen@pingpong.net>:
>>> 24 sep 2015 kl. 00:05 skrev Palle Girgensohn
>>> <girgen@pingpong.net>:
>>>> 23 sep 2015 kl. 20:32 skrev Julien Charbon <jch@freebsd.org>:=20
>>>> On 23/09/15 20:26, Palle Girgensohn wrote:
>>>>>> 23 sep. 2015 kl. 20:01 skrev Julien Charbon
>>>>>> <jch@freebsd.org>: On 23/09/15 16:36, Palle Girgensohn
>>>>>> wrote:
>>>>>>>> 22 sep 2015 kl. 23:59 skrev Julien Charbon
>>>>>>>> <jch@freebsd.org>: On 22/09/15 22:58, Palle Girgensohn
>>>>>>>> wrote:
>>>>>>>>>> 22 sep 2015 kl. 20:16 skrev Julien Charbon
>>>>>>>>>> <jch@freebsd.org>: On 22/09/15 18:49, Palle
>>>>>>>>>> Girgensohn wrote:
>>>>>>>>>>>> 22 sep 2015 kl. 18:46 skrev Palle Girgensohn=20
>>>>>>>>>>>> <girgen@FreeBSD.org>:
>>>>>>>>>>>>> 21 sep 2015 kl. 15:53 skrev Palle Girgensohn
>>>>>>>>>>>>>  <girgen@FreeBSD.org>:
>>>>>>>>>>>>>> 21 sep 2015 kl. 10:21 skrev Julien Charbon
>>>>>>>>>>>>>>  <jch@FreeBSD.org>: On 18/09/15 18:06,
>>>>>>>>>>>>>> Konstantin Belousov wrote:
>>>>>>>>>>>>>>> On Fri, Sep 18, 2015 at 03:56:25PM +0200,
>>>>>>>>>>>>>>> Julien Charbon wrote:
>>>>>>>>>>>>>>>> [...]
>>>>>>>>>>>>>> - Second, if issue is still in stable/10,
>>>>>>>>>>>>>> compile 10.2 kernel with these options:
>>>>>>>>>>>>>>=20
>>>>>>>>>>>>>> options        DDB options        DEADLKRES
>>>>>>>>>>>>>> options INVARIANTS options
>>>>>>>>>>>>>> INVARIANT_SUPPORT options WITNESS options
>>>>>>>>>>>>>> WITNESS_SKIPSPIN
>>>>>>>>>>>>>>=20
>>>>>>>>>>>>>> To see where the original fault is coming
>>>>>>>>>>>>>> from.
>>>>>>>>>>>>> [...]
>>>>>>>>>>>>>=20
>>>>>>>>>>>>> I'll try stable/10 now. Would you suggest a
>>>>>>>>>>>>> "clean" stable/10, or could 287621 and 287780
>>>>>>>>>>>>> help?
>>>>>>>>>>>>>=20
>>>>>>>>>>>>> I'll add the debugging suggested options
>>>>>>>>>>>>> right away.
>>>>>>>>>>>>>=20
>>>>>>>>>>>>> Palle
>>>>>>>>>>>>=20
>>>>>>>>>>>> I have a new core dump from ^/stable/10 with:
>>>>>>>>>>>>=20
>>>>>>>>>>>> options        DDB options        DEADLKRES
>>>>>>>>>>>> options INVARIANTS options
>>>>>>>>>>>> INVARIANT_SUPPORT options WITNESS options
>>>>>>>>>>>> WITNESS_SKIPSPIN
>>>>>>>>>>>=20
>>>>>>>>>>> # kgdb kernel /var/crash/vmcore.2 GNU gdb 6.1.1
>>>>>>>>>>> [FreeBSD] Copyright 2004 Free Software
>>>>>>>>>>> Foundation, Inc. GDB is free software, covered by
>>>>>>>>>>> the GNU General Public License, and you are
>>>>>>>>>>> welcome to change it and/or distribute copies of
>>>>>>>>>>> it under certain conditions. Type "show copying"
>>>>>>>>>>> to see the conditions. There is absolutely no
>>>>>>>>>>> warranty for GDB.  Type "show warranty" for
>>>>>>>>>>> details. This GDB was configured as=20
>>>>>>>>>>> "amd64-marcel-freebsd"...
>>>>>>>>>>>=20
>>>>>>>>>>> Unread portion of the kernel message buffer:
>>>>>>>>>>> panic: tcp_detach: INP_TIMEWAIT && INP_DROPPED &&
>>>>>>>>>>> tp !=3D NULL cpuid =3D 16 KDB: stack backtrace:
>>>>>>>>>>> db_trace_self_wrapper() at=20
>>>>>>>>>>> db_trace_self_wrapper+0x2b/frame
>>>>>>>>>>> 0xfffffe183d9e97e0 kdb_backtrace() at
>>>>>>>>>>> kdb_backtrace+0x39/frame 0xfffffe183d9e9890
>>>>>>>>>>> vpanic() at vpanic+0x126/frame 0xfffffe183d9e98d0
>>>>>>>>>>> kassert_panic() at kassert_panic+0x139/frame
>>>>>>>>>>> 0xfffffe183d9e9940 tcp_usr_detach() at
>>>>>>>>>>> tcp_usr_detach+0xf9/frame 0xfffffe183d9e9970
>>>>>>>>>>> sofree() at sofree+0x1f1/frame 0xfffffe183d9e99a0
>>>>>>>>>>> soclose() at soclose+0x3a0/frame=20
>>>>>>>>>>> 0xfffffe183d9e99f0 _fdrop() at _fdrop+0x29/frame=20
>>>>>>>>>>> 0xfffffe183d9e9a10 closef() at
>>>>>>>>>>> closef+0x1e2/frame 0xfffffe183d9e9aa0 closefp()
>>>>>>>>>>> at closefp+0x9d/frame 0xfffffe183d9e9ae0
>>>>>>>>>>> amd64_syscall() at amd64_syscall+0x25a/frame
>>>>>>>>>>> 0xfffffe183d9e9bf0 Xfast_syscall() at
>>>>>>>>>>> Xfast_syscall+0xfb/frame 0xfffffe183d9e9bf0 ---
>>>>>>>>>>> syscall (6, FreeBSD ELF64, sys_close), rip =3D
>>>>>>>>>>> 0x801c8d94a, rsp =3D 0x7ffff91c8668, rbp =3D
>>>>>>>>>>> 0x7ffff91c8680 --- KDB: enter: panic Uptime:
>>>>>>>>>>> 18h57m59s Dumping 23085 out of 98263=20
>>>>>>>>>>> MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91%
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>=20
Reading symbols from /boot/kernel/nullfs.ko.symbols...done.
>>>>>>>>>>> Loaded symbols for /boot/kernel/nullfs.ko.symbols
>>>>>>>>>>> Reading symbols from
>>>>>>>>>>> /boot/kernel/zfs.ko.symbols...done. Loaded=20
>>>>>>>>>>> symbols for /boot/kernel/zfs.ko.symbols Reading
>>>>>>>>>>> symbols from=20
>>>>>>>>>>> /boot/kernel/opensolaris.ko.symbols...done.
>>>>>>>>>>> Loaded symbols for
>>>>>>>>>>> /boot/kernel/opensolaris.ko.symbols Reading
>>>>>>>>>>> symbols from=20
>>>>>>>>>>> /boot/kernel/ng_bridge.ko.symbols...done. Loaded
>>>>>>>>>>> symbols for /boot/kernel/ng_bridge.ko.symbols
>>>>>>>>>>> Reading symbols from=20
>>>>>>>>>>> /boot/kernel/netgraph.ko.symbols...done. Loaded
>>>>>>>>>>> symbols for /boot/kernel/netgraph.ko.symbols
>>>>>>>>>>> Reading symbols from=20
>>>>>>>>>>> /boot/kernel/ng_eiface.ko.symbols...done. Loaded
>>>>>>>>>>> symbols for /boot/kernel/ng_eiface.ko.symbols
>>>>>>>>>>> Reading symbols from=20
>>>>>>>>>>> /boot/kernel/ng_ether.ko.symbols...done. Loaded
>>>>>>>>>>> symbols for /boot/kernel/ng_ether.ko.symbols
>>>>>>>>>>> Reading symbols from=20
>>>>>>>>>>> /boot/kernel/accf_data.ko.symbols...done. Loaded
>>>>>>>>>>> symbols for /boot/kernel/accf_data.ko.symbols
>>>>>>>>>>> Reading symbols from=20
>>>>>>>>>>> /boot/kernel/accf_http.ko.symbols...done. Loaded
>>>>>>>>>>> symbols for /boot/kernel/accf_http.ko.symbols
>>>>>>>>>>> Reading symbols from=20
>>>>>>>>>>> /boot/kernel/ums.ko.symbols...done. Loaded
>>>>>>>>>>> symbols for /boot/kernel/ums.ko.symbols Reading
>>>>>>>>>>> symbols from=20
>>>>>>>>>>> /boot/kernel/ng_socket.ko.symbols...done. Loaded
>>>>>>>>>>> symbols for /boot/kernel/ng_socket.ko.symbols
>>>>>>>>>>> Reading symbols from=20
>>>>>>>>>>> /boot/kernel/fdescfs.ko.symbols...done. Loaded
>>>>>>>>>>> symbols for /boot/kernel/fdescfs.ko.symbols #0
>>>>>>>>>>> doadump (textdump=3D1) at pcpu.h:219 219
>>>>>>>>>>> __asm("movq %%gs:%1,%0" : "=3Dr" (td) (kgdb) bt #0
>>>>>>>>>>> doadump (textdump=3D1) at pcpu.h:219 #1
>>>>>>>>>>> 0xffffffff8094b337 in kern_reboot (howto=3D260) at=20
>>>>>>>>>>> /usr/src/sys/kern/kern_shutdown.c:451 #2
>>>>>>>>>>> 0xffffffff8094b845 in vpanic (fmt=3D<value
>>>>>>>>>>> optimized out>, ap=3D<value optimized out>) at
>>>>>>>>>>> /usr/src/sys/kern/kern_shutdown.c:758 #3=20
>>>>>>>>>>> 0xffffffff8094b6d9 in kassert_panic (fmt=3D<value
>>>>>>>>>>> optimized out>) at
>>>>>>>>>>> /usr/src/sys/kern/kern_shutdown.c:646 #4=20
>>>>>>>>>>> 0xffffffff80b1ee59 in tcp_usr_detach (so=3D<value
>>>>>>>>>>> optimized out>) at
>>>>>>>>>>> /usr/src/sys/netinet/tcp_usrreq.c:202 #5=20
>>>>>>>>>>> 0xffffffff809cd291 in sofree
>>>>>>>>>>> (so=3D0xfffff801dd302000) at=20
>>>>>>>>>>> /usr/src/sys/kern/uipc_socket.c:747 #6
>>>>>>>>>>> 0xffffffff809cdb00 in soclose (so=3D<value
>>>>>>>>>>> optimized out>) at=20
>>>>>>>>>>> /usr/src/sys/kern/uipc_socket.c:849 #7
>>>>>>>>>>> 0xffffffff808fe659 in _fdrop
>>>>>>>>>>> (fp=3D0xfffff802a593db40, td=3D0x0) at file.h:343 #8
>>>>>>>>>>>  0xffffffff80901092 in closef
>>>>>>>>>>> (fp=3D0xfffff802a593db40, td=3D0xfffff80eebc894a0)
>>>>>>>>>>> at /usr/src/sys/kern/kern_descrip.c:2338 #9
>>>>>>>>>>> 0xffffffff808feb5d in closefp
>>>>>>>>>>> (fdp=3D0xfffff80b20cce000, fd=3D<value optimized
>>>>>>>>>>> out>, fp=3D0xfffff802a593db40,
>>>>>>>>>>> td=3D0xfffff80eebc894a0, holdleaders=3D<value
>>>>>>>>>>> optimized out>) at=20
>>>>>>>>>>> /usr/src/sys/kern/kern_descrip.c:1194 #10
>>>>>>>>>>> 0xffffffff80d7bc3a in amd64_syscall
>>>>>>>>>>> (td=3D0xfffff80eebc894a0, traced=3D0) at=20
>>>>>>>>>>> subr_syscall.c:134 #11 0xffffffff80d5f1db in
>>>>>>>>>>> Xfast_syscall () at
>>>>>>>>>>> /usr/src/sys/amd64/amd64/exception.S:396 #12=20
>>>>>>>>>>> 0x0000000801c8d94a in ?? () Previous frame inner
>>>>>>>>>>> to this frame (corrupt stack?) Current language:
>>>>>>>>>>> auto; currently minimal
>>>>>>>>>>=20
>>>>>>>>>> Thanks for the information.  As I suspected the
>>>>>>>>>> initial error was elsewhere than tcp_twclose(),
>>>>>>>>>> never got this assertion before:
>>>>>>>>>>=20
>>>>>>>>>> tcp_detach: INP_TIMEWAIT && INP_DROPPED && tp !=3D
>>>>>>>>>> NULL
>>>>>>>>>>=20
>>>>>>>>>> from here:
>>>>>>>>>>=20
>>>>>>>>>> static void tcp_detach(struct socket *so, struct
>>>>>>>>>> inpcb *inp) { struct tcpcb *tp;
>>>>>>>>>>=20
>>>>>>>>>> INP_INFO_WLOCK_ASSERT(&V_tcbinfo);
>>>>>>>>>> INP_WLOCK_ASSERT(inp);
>>>>>>>>>>=20
>>>>>>>>>> KASSERT(so->so_pcb =3D=3D inp, ("tcp_detach: so_pcb !=3D
>>>>>>>>>> inp")); KASSERT(inp->inp_socket =3D=3D so,
>>>>>>>>>> ("tcp_detach: inp_socket !=3D so"));
>>>>>>>>>>=20
>>>>>>>>>> tp =3D intotcpcb(inp);
>>>>>>>>>>=20
>>>>>>>>>> if (inp->inp_flags & INP_TIMEWAIT) { if
>>>>>>>>>> (inp->inp_flags & INP_DROPPED) { KASSERT(tp =3D=3D
>>>>>>>>>> NULL, ("tcp_detach: INP_TIMEWAIT && " "INP_DROPPED
>>>>>>>>>> && tp !=3D NULL"));
>>>>>>>>>>=20
>>>>>>>>>> Let me check if I could find a path that could lead
>>>>>>>>>> to this unexpected case.  Unexpected because:
>>>>>>>>>> INP_DROPPED and inp->inp_ppcb is set to NULL are
>>>>>>>>>> set at same time here:
>>>>>>>>>>=20
>>>>>>>>>> void tcp_twclose(struct tcptw *tw, int reuse) {
>>>>>>>>>> struct socket *so; struct inpcb *inp;
>>>>>>>>>>=20
>>>>>>>>>> inp =3D tw->tw_inpcb; KASSERT((inp->inp_flags &
>>>>>>>>>> INP_TIMEWAIT), ("tcp_twclose: !timewait"));
>>>>>>>>>> KASSERT(intotw(inp) =3D=3D tw, ("tcp_twclose: inp_ppcb
>>>>>>>>>> !=3D tw")); INP_INFO_WLOCK_ASSERT(&V_tcbinfo);
>>>>>>>>>> /* in_pcbfree() */ INP_WLOCK_ASSERT(inp);
>>>>>>>>>>=20
>>>>>>>>>> tcp_tw_2msl_stop(tw, reuse); inp->inp_ppcb =3D NULL;
>>>>>>>>>>  in_pcbdrop(inp); ...
>>>>>>>>>>=20
>>>>>>>>>> Interesting [...]
>>>>>>>=20
>>>>>>> Just a quick update. Julien is pursuing this off list
>>>>>>> with a core dump and we are now waiting for a new core
>>>>>>> dump with the first KASSERT removed. this is on a
>>>>>>> stable/10 kernel.
>>>>>>=20
>>>>>> By the way Palle could you also run below Dtrace script to
>>>>>> see where this tcp_close() in INP_TIMEWAIT comes from:
>>>>>>=20
>>>>>> $ cat tcp-close-tw.d fbt::tcp_close:entry=20
>>>>>> /args[0]->t_inpcb->inp_flags & 0x01000000/ { @s1[stack()] =3D
>>>>>> count() }
>>>>>>=20
>>>>>> tick-1sec { printa(@s1); } $ sudo dtrace -s tcp-close-tw.d
>>>>>>=20
>>>>>> And share any backtraces reported in this dtrace script
>>>>>> output.
>>>>>>=20
>>>>>> George, could you check if this dtrace script makes sense
>>>>>> for you, and if you have any improvements to add, I am
>>>>>> quite a rookie in Dtrace scripts.
>>>>>=20
>>>>> Shall I let the dtrace script run continuously until the
>>>>> machine crashes? Or just run it once?
>>>>=20
>>>> Continuously until the machine crashes.  You can report any
>>>> backstrace outputs like:
>>>>=20
>>>> kernel`tcp_usr_close+0x86 kernel`soclose+0xe4=20
>>>> kernel`_fdrop+0x29 kernel`closef+0x237 kernel`closefp+0x95=20
>>>> kernel`amd64_syscall+0x357 kernel`0xffffffff80c83c4b 1
>>>>=20
>>>> before the machine crashes.   But I expect the problematic
>>>> case detection with Dtrace to be quickly followed by the crash.
>>>> Will see.
>>>>=20
>>>> -- Julien
>>>>=20
>>>=20
>>> Kernels and userland are updated to 10.2-p3 with the patch
>>> removing the suspicous KASSERT.
>>>=20
>>> dtrace running continously redirecting to a log file.
>>>=20
>>> now we're just waiting... :)
>>>=20
>>> Palle
>>=20
>> Is the dtrace correct?
>>=20
>> $ sort -u dtrace.out
>>=20
>> 0  59779                       :tick-1sec CPU     ID
>> FUNCTION:NAME $ wc -l dtrace.out 56233 dtrace.out
>>=20
>>=20
>> All it does is write
>>=20
>> 0  59779                       :tick-1sec
>>=20
>> once a second.

 It is right, it tries to display the guilty backstrace every second,
but this is a rare event.

0  59779                       :tick-1sec

 is printed when you have no backstrace yet.

> Just had a crash. Unfortunately, the kernel was stuck at the db>
> prompt, and the remote keyboard was unresponsive (HP ILO, not
> impressed). So I had to reset the power and never got a core dump...
>=20
> panic: tcp_tw_2msl_stop: inp should not be released here
>  cpuid =3D 0
>  KDB: stack backtrace:
>  db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame
> 0xfffffe175acd16a0 kdb_backtrace() at kdb_backtrace+0x39/frame
> 0xfffffe175acd1750 vpanic() at vpanic+0x126/frame 0xfffffe175acd1790
>  kassert_panic() at kassert_panic+0x139/frame 0xfffffe175acd1800
>  tcp_twclose() at tcp_twclose+0x2cb/frame 0xfffffe175acd1850
>  tcp_tw_2msl_scan() at tcp_tw_2msl_scan+0x13b/frame
> 0xfffffe175acd1890 tcp_slowtimo() at tcp_slowtimo+0x68/frame
> 0xfffffe175acd18c0 pfslowtimo() at pfslowtimo+0x54/frame
> 0xfffffe175acd18f0 softclock_call_cc() at
> softclock_call_cc+0x193/frame 0xfffffe175acd19d0 softclock() at
> softclock+0x47/frame 0xfffffe175acd19f0 intr_event_execute_handlers()
> at intr_event_execute_handlers+0x93/frame 0xfffffe 175acd1a30
>  ithread_loop() at ithread_loop+0xa6/frame 0xfffffe175acd1a70
>  fork_exit() at fork_exit+0x84/frame 0xfffffe175acd1ab0
>  fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe175acd1ab0
>  --- trap 0, rip =3D 0, rsp =3D 0xfffffe175acd1b70, rbp =3D 0 ---
>  KDB: enter: panic
>  [ thread pid 12 tid 100043 ]
>  Stopped at      kdb_enter+0x3e: movq    $0,kdb_why
>  db>

 Thanks a log for this backstrace.  This is what at expected, when
tcp_close() in call in INP_TIMEWAIT case, in_pcbfree() can be called one
extra time that leads to:

tcp_tw_2msl_stop: inp should not be released here

 Let me try to come with a tentative fix for this case.

> Is there a way to configure the kernel to get all the gory debug
> stuff without it dropping to the debug prompt on panic? I'd rather
> see it just dump core and restart automatically.

 You can set:

options KDB_UNATTENDED

 or

debug.debugger_on_panic=3D0

in /etc/sysctl.conf

 both are equivalent.

 More details here:

https://www.freebsd.org/doc/en/books/developers-handbook/kerneldebug-opti=
ons.html

 Thanks.

--
Julien


--B3Hv69f1vfCN4351XBs8QWLSu6hqAaUQ1
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: OpenPGP digital signature
Content-Disposition: attachment; filename="signature.asc"

-----BEGIN PGP SIGNATURE-----
Comment: GPGTools - https://gpgtools.org

iQEcBAEBCgAGBQJWA6BBAAoJEKVlQ5Je6dhx5i4IALr7OhuK7NvVESFUK+9t7ygo
lluCrPhEA+z/t1t2C4HutWnVEucLW9Kh0lBjg2qodcvVHVl0wZHN+T1IySaIfhbf
dFAxUcWAS3xGQgsPwSuT4UU1uFmqrYtmhOmt0GJVmywLzfKN4AL9toK7uQKknHzW
HZM0K+vbzT1pz8RrD+DBSrQrWTrDk1adF94oi5Z+01MssGZ0ljcyeStsxba3Wg3N
9pYKfSSXKdld6qpjdj2di6EmevS12LVyUGt755ke9Bd1r09DXhFbZ2hXkDf+NQAN
QQswXp+/DGVBXtc8L+AiAJhuu8oPUqWc38jdNe0HGRa7naEsdgLWC4OkSchzwcE=
=msZs
-----END PGP SIGNATURE-----

--B3Hv69f1vfCN4351XBs8QWLSu6hqAaUQ1--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?5603A03B.4060002>