Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 23 Sep 2015 20:01:46 +0200
From:      Julien Charbon <jch@freebsd.org>
To:        Palle Girgensohn <girgen@gmail.com>, George Neville-Neil <gnn@FreeBSD.org>
Cc:        freebsd-net@freebsd.org
Subject:   Re: Kernel panics in tcp_twclose
Message-ID:  <5602E90A.9050504@freebsd.org>
In-Reply-To: <E09DF89D-AAC5-48FD-8B75-EEAB937A5C32@FreeBSD.org>
References:  <26B0FF93-8AE3-4514-BDA1-B966230AAB65@FreeBSD.org> <55FC1809.3070903@freebsd.org> <20150918160605.GN67105@kib.kiev.ua> <55FFBE01.6060706@freebsd.org> <3721F099-F45D-4DCD-8AB3-84D1ABC44145@FreeBSD.org> <73856F2B-3E70-483C-9988-C84E798CEB44@FreeBSD.org> <44EBAC98-4761-4E47-8E47-5032430A1C8A@FreeBSD.org> <56019AF8.8000705@freebsd.org> <F9D29C16-502B-43A1-BE2C-D2AD30F0B9EF@FreeBSD.org> <5601CF2D.9030307@freebsd.org> <E09DF89D-AAC5-48FD-8B75-EEAB937A5C32@FreeBSD.org>

next in thread | previous in thread | raw e-mail | index | archive | help
This is an OpenPGP/MIME signed message (RFC 4880 and 3156)
--QuKFFsTUFTTrLT9EK0ha1xMF6OlutTF7c
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: quoted-printable


 Hi Palle, Hi George,

On 23/09/15 16:36, Palle Girgensohn wrote:
>> 22 sep 2015 kl. 23:59 skrev Julien Charbon <jch@freebsd.org>: On
>> 22/09/15 22:58, Palle Girgensohn wrote:
>>>> 22 sep 2015 kl. 20:16 skrev Julien Charbon <jch@freebsd.org>:=20
>>>> On 22/09/15 18:49, Palle Girgensohn wrote:
>>>>>> 22 sep 2015 kl. 18:46 skrev Palle Girgensohn=20
>>>>>> <girgen@FreeBSD.org>:
>>>>>>> 21 sep 2015 kl. 15:53 skrev Palle Girgensohn=20
>>>>>>> <girgen@FreeBSD.org>:
>>>>>>>> 21 sep 2015 kl. 10:21 skrev Julien Charbon=20
>>>>>>>> <jch@FreeBSD.org>: On 18/09/15 18:06, Konstantin
>>>>>>>> Belousov wrote:
>>>>>>>>> On Fri, Sep 18, 2015 at 03:56:25PM +0200, Julien
>>>>>>>>> Charbon wrote:
>>>>>>>>>> [...]
>>>>>>>> - Second, if issue is still in stable/10, compile 10.2=20
>>>>>>>> kernel with these options:
>>>>>>>>=20
>>>>>>>> options        DDB options        DEADLKRES options=20
>>>>>>>> INVARIANTS options        INVARIANT_SUPPORT options
>>>>>>>> WITNESS options        WITNESS_SKIPSPIN
>>>>>>>>=20
>>>>>>>> To see where the original fault is coming from.
>>>>>>> [...]
>>>>>>>=20
>>>>>>> I'll try stable/10 now. Would you suggest a "clean"=20
>>>>>>> stable/10, or could 287621 and 287780 help?
>>>>>>>=20
>>>>>>> I'll add the debugging suggested options right away.
>>>>>>>=20
>>>>>>> Palle
>>>>>>=20
>>>>>> I have a new core dump from ^/stable/10 with:
>>>>>>=20
>>>>>> options        DDB options        DEADLKRES options
>>>>>> INVARIANTS options        INVARIANT_SUPPORT options WITNESS
>>>>>> options WITNESS_SKIPSPIN
>>>>>=20
>>>>> # kgdb kernel /var/crash/vmcore.2 GNU gdb 6.1.1 [FreeBSD]=20
>>>>> Copyright 2004 Free Software Foundation, Inc. GDB is free=20
>>>>> software, covered by the GNU General Public License, and you
>>>>> are welcome to change it and/or distribute copies of it under
>>>>> certain conditions. Type "show copying" to see the
>>>>> conditions. There is absolutely no warranty for GDB.  Type
>>>>> "show warranty" for details. This GDB was configured as
>>>>> "amd64-marcel-freebsd"...
>>>>>=20
>>>>> Unread portion of the kernel message buffer: panic:
>>>>> tcp_detach: INP_TIMEWAIT && INP_DROPPED && tp !=3D NULL cpuid =3D
>>>>> 16 KDB: stack backtrace: db_trace_self_wrapper() at=20
>>>>> db_trace_self_wrapper+0x2b/frame 0xfffffe183d9e97e0=20
>>>>> kdb_backtrace() at kdb_backtrace+0x39/frame
>>>>> 0xfffffe183d9e9890 vpanic() at vpanic+0x126/frame
>>>>> 0xfffffe183d9e98d0 kassert_panic() at
>>>>> kassert_panic+0x139/frame 0xfffffe183d9e9940
>>>>> tcp_usr_detach() at tcp_usr_detach+0xf9/frame
>>>>> 0xfffffe183d9e9970 sofree() at sofree+0x1f1/frame
>>>>> 0xfffffe183d9e99a0 soclose() at soclose+0x3a0/frame
>>>>> 0xfffffe183d9e99f0 _fdrop() at _fdrop+0x29/frame
>>>>> 0xfffffe183d9e9a10 closef() at closef+0x1e2/frame
>>>>> 0xfffffe183d9e9aa0 closefp() at closefp+0x9d/frame
>>>>> 0xfffffe183d9e9ae0 amd64_syscall() at=20
>>>>> amd64_syscall+0x25a/frame 0xfffffe183d9e9bf0 Xfast_syscall()
>>>>> at Xfast_syscall+0xfb/frame 0xfffffe183d9e9bf0 --- syscall
>>>>> (6, FreeBSD ELF64, sys_close), rip =3D 0x801c8d94a, rsp =3D=20
>>>>> 0x7ffff91c8668, rbp =3D 0x7ffff91c8680 --- KDB: enter: panic=20
>>>>> Uptime: 18h57m59s Dumping 23085 out of 98263=20
>>>>> MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91%
>>>>>=20
>>>>> Reading symbols from /boot/kernel/nullfs.ko.symbols...done.=20
>>>>> Loaded symbols for /boot/kernel/nullfs.ko.symbols Reading
>>>>> symbols from /boot/kernel/zfs.ko.symbols...done. Loaded
>>>>> symbols for /boot/kernel/zfs.ko.symbols Reading symbols from
>>>>>  /boot/kernel/opensolaris.ko.symbols...done. Loaded symbols
>>>>> for /boot/kernel/opensolaris.ko.symbols Reading symbols from
>>>>>  /boot/kernel/ng_bridge.ko.symbols...done. Loaded symbols for
>>>>>  /boot/kernel/ng_bridge.ko.symbols Reading symbols from=20
>>>>> /boot/kernel/netgraph.ko.symbols...done. Loaded symbols for=20
>>>>> /boot/kernel/netgraph.ko.symbols Reading symbols from=20
>>>>> /boot/kernel/ng_eiface.ko.symbols...done. Loaded symbols for
>>>>>  /boot/kernel/ng_eiface.ko.symbols Reading symbols from=20
>>>>> /boot/kernel/ng_ether.ko.symbols...done. Loaded symbols for=20
>>>>> /boot/kernel/ng_ether.ko.symbols Reading symbols from=20
>>>>> /boot/kernel/accf_data.ko.symbols...done. Loaded symbols for
>>>>>  /boot/kernel/accf_data.ko.symbols Reading symbols from=20
>>>>> /boot/kernel/accf_http.ko.symbols...done. Loaded symbols for
>>>>>  /boot/kernel/accf_http.ko.symbols Reading symbols from=20
>>>>> /boot/kernel/ums.ko.symbols...done. Loaded symbols for=20
>>>>> /boot/kernel/ums.ko.symbols Reading symbols from=20
>>>>> /boot/kernel/ng_socket.ko.symbols...done. Loaded symbols for
>>>>>  /boot/kernel/ng_socket.ko.symbols Reading symbols from=20
>>>>> /boot/kernel/fdescfs.ko.symbols...done. Loaded symbols for=20
>>>>> /boot/kernel/fdescfs.ko.symbols #0  doadump (textdump=3D1) at=20
>>>>> pcpu.h:219 219		__asm("movq %%gs:%1,%0" : "=3Dr" (td) (kgdb) bt
>>>>> #0 doadump (textdump=3D1) at pcpu.h:219 #1  0xffffffff8094b337
>>>>> in kern_reboot (howto=3D260) at
>>>>> /usr/src/sys/kern/kern_shutdown.c:451 #2 0xffffffff8094b845
>>>>> in vpanic (fmt=3D<value optimized out>, ap=3D<value optimized
>>>>> out>) at /usr/src/sys/kern/kern_shutdown.c:758 #3
>>>>> 0xffffffff8094b6d9 in kassert_panic (fmt=3D<value optimized
>>>>> out>) at /usr/src/sys/kern/kern_shutdown.c:646 #4
>>>>> 0xffffffff80b1ee59 in tcp_usr_detach (so=3D<value optimized
>>>>> out>) at /usr/src/sys/netinet/tcp_usrreq.c:202 #5
>>>>> 0xffffffff809cd291 in sofree (so=3D0xfffff801dd302000) at=20
>>>>> /usr/src/sys/kern/uipc_socket.c:747 #6  0xffffffff809cdb00 in
>>>>>  soclose (so=3D<value optimized out>) at=20
>>>>> /usr/src/sys/kern/uipc_socket.c:849 #7  0xffffffff808fe659 in
>>>>>  _fdrop (fp=3D0xfffff802a593db40, td=3D0x0) at file.h:343 #8=20
>>>>> 0xffffffff80901092 in closef (fp=3D0xfffff802a593db40,=20
>>>>> td=3D0xfffff80eebc894a0) at
>>>>> /usr/src/sys/kern/kern_descrip.c:2338 #9 0xffffffff808feb5d
>>>>> in closefp (fdp=3D0xfffff80b20cce000, fd=3D<value optimized out>,
>>>>> fp=3D0xfffff802a593db40, td=3D0xfffff80eebc894a0,
>>>>> holdleaders=3D<value optimized out>) at=20
>>>>> /usr/src/sys/kern/kern_descrip.c:1194 #10 0xffffffff80d7bc3a
>>>>> in amd64_syscall (td=3D0xfffff80eebc894a0, traced=3D0) at=20
>>>>> subr_syscall.c:134 #11 0xffffffff80d5f1db in Xfast_syscall ()
>>>>> at /usr/src/sys/amd64/amd64/exception.S:396 #12
>>>>> 0x0000000801c8d94a in ?? () Previous frame inner to this
>>>>> frame (corrupt stack?) Current language:  auto; currently
>>>>> minimal
>>>>=20
>>>> Thanks for the information.  As I suspected the initial error
>>>> was elsewhere than tcp_twclose(), never got this assertion
>>>> before:
>>>>=20
>>>> tcp_detach: INP_TIMEWAIT && INP_DROPPED && tp !=3D NULL
>>>>=20
>>>> from here:
>>>>=20
>>>> static void tcp_detach(struct socket *so, struct inpcb *inp) {
>>>>  struct tcpcb *tp;
>>>>=20
>>>> INP_INFO_WLOCK_ASSERT(&V_tcbinfo); INP_WLOCK_ASSERT(inp);
>>>>=20
>>>> KASSERT(so->so_pcb =3D=3D inp, ("tcp_detach: so_pcb !=3D inp"));=20
>>>> KASSERT(inp->inp_socket =3D=3D so, ("tcp_detach: inp_socket !=3D
>>>> so"));
>>>>=20
>>>> tp =3D intotcpcb(inp);
>>>>=20
>>>> if (inp->inp_flags & INP_TIMEWAIT) { if (inp->inp_flags &=20
>>>> INP_DROPPED) { KASSERT(tp =3D=3D NULL, ("tcp_detach: INP_TIMEWAIT
>>>> && " "INP_DROPPED && tp !=3D NULL"));
>>>>=20
>>>> Let me check if I could find a path that could lead to this=20
>>>> unexpected case.  Unexpected because:  INP_DROPPED and=20
>>>> inp->inp_ppcb is set to NULL are set at same time here:
>>>>=20
>>>> void tcp_twclose(struct tcptw *tw, int reuse) { struct socket
>>>> *so; struct inpcb *inp;
>>>>=20
>>>> inp =3D tw->tw_inpcb; KASSERT((inp->inp_flags & INP_TIMEWAIT),=20
>>>> ("tcp_twclose: !timewait")); KASSERT(intotw(inp) =3D=3D tw,=20
>>>> ("tcp_twclose: inp_ppcb !=3D tw"));=20
>>>> INP_INFO_WLOCK_ASSERT(&V_tcbinfo);      /* in_pcbfree() */=20
>>>> INP_WLOCK_ASSERT(inp);
>>>>=20
>>>> tcp_tw_2msl_stop(tw, reuse); inp->inp_ppcb =3D NULL;=20
>>>> in_pcbdrop(inp); ...
>>>>=20
>>>> Interesting [...]
>=20
> Just a quick update. Julien is pursuing this off list with a core
> dump and we are now waiting for a new core dump with the first
> KASSERT removed. this is on a stable/10 kernel.

 By the way Palle could you also run below Dtrace script to see where
this tcp_close() in INP_TIMEWAIT comes from:

$ cat tcp-close-tw.d
fbt::tcp_close:entry
/args[0]->t_inpcb->inp_flags & 0x01000000/
{
  @s1[stack()] =3D count()
}

tick-1sec {
 printa(@s1);
}
$ sudo dtrace -s tcp-close-tw.d

 And share any backtraces reported in this dtrace script output.

 George, could you check if this dtrace script makes sense for you, and
if you have any improvements to add, I am quite a rookie in Dtrace script=
s.

> This is a really big problem for us now, 100k+ users use these
> systems... Would anyone suggest any paralell routes here to speed up
> the process of finding the culprit?

 I agree, this issue is a complex one, more eyeballs/propositions would
be great.

 Thanks.

--
Julien


--QuKFFsTUFTTrLT9EK0ha1xMF6OlutTF7c
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: OpenPGP digital signature
Content-Disposition: attachment; filename="signature.asc"

-----BEGIN PGP SIGNATURE-----
Comment: GPGTools - https://gpgtools.org

iQEcBAEBCgAGBQJWAukRAAoJEKVlQ5Je6dhxcscIAMd9BXemZn8hXsSoZfzxfEgE
yKSqLz5BVjFMsVhZqrGGbHgJa7AMtLtE8sB5MFsO4RRKYY/kotepO8epvRR8sxDK
+EyLsZnl5k/mqmlCkQ533ZTW8YUvPs57XauGLVp543jWxe1IxdjnTJnH6AQeN5Tg
r7iljW8RoIvGqc/8EMsuCrHsZYPEoMJhsqx8m86oBaTJv7n2PkDyIE3RaBN8hliP
iI6Z5KAud2UUF/bH99RJdsIHtHHphrsNcZ5c0RBHxSTubf1BP97PnVHNL6EUlpRD
J12PI9P8cKFdBnrHoKb34EN/fOBEZSsy991mQbr4tkuAVpIUf/Yylzy4Dqy30is=
=vBpj
-----END PGP SIGNATURE-----

--QuKFFsTUFTTrLT9EK0ha1xMF6OlutTF7c--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?5602E90A.9050504>