From owner-freebsd-net@freebsd.org Wed Sep 23 18:32:54 2015 Return-Path: Delivered-To: freebsd-net@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 235A8A06256 for ; Wed, 23 Sep 2015 18:32:54 +0000 (UTC) (envelope-from julien.charbon@gmail.com) Received: from mail-wi0-f176.google.com (mail-wi0-f176.google.com [209.85.212.176]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id A71C418B2; Wed, 23 Sep 2015 18:32:53 +0000 (UTC) (envelope-from julien.charbon@gmail.com) Received: by wicfx3 with SMTP id fx3so82113649wic.0; Wed, 23 Sep 2015 11:32:46 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=subject:to:references:cc:from:message-id:date:user-agent :mime-version:in-reply-to:content-type; bh=VV3IkMx1rlwBHuhdbztLl723fj0QJ9XOKpq0LTP+LsE=; b=TbogrIDQmgexW7YVJeCfq22aauoho2rNgV/wvkw7839bwSXRDy8nt9TbxbOezYwKcb nmiROw9d0BYuZBRYnOjaQHemxl8ekZlIE5n1kYFnT5QJdJDW49tpM0JlDVz/A4846zJx BZFG0zyupLYJSHqz7c5zttXy0IvGTcldntu1T2YBf4NAI9OEcP0L2uXGHQDqP6j/vjRM svP0WnWHv4v6yeARCye6OtaDRmcmM+lsTDaTUUKKV/69f7nNfvwrPUP+mEtiA6kFGv/R w766JeHjviKGW1MJv2Ud4HVHFyOsRm1UD9lAg6cQNcZwXl9IOxLroAHBh9MC1bBbSvhR AL6w== X-Received: by 10.180.107.130 with SMTP id hc2mr5127224wib.92.1443033166339; Wed, 23 Sep 2015 11:32:46 -0700 (PDT) Received: from FRI2JCHARBON-M1.local ([217.30.88.7]) by smtp.googlemail.com with ESMTPSA id hs5sm1655455wib.6.2015.09.23.11.32.44 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 23 Sep 2015 11:32:45 -0700 (PDT) Subject: Re: Kernel panics in tcp_twclose To: Palle Girgensohn References: <26B0FF93-8AE3-4514-BDA1-B966230AAB65@FreeBSD.org> <55FC1809.3070903@freebsd.org> <20150918160605.GN67105@kib.kiev.ua> <55FFBE01.6060706@freebsd.org> <3721F099-F45D-4DCD-8AB3-84D1ABC44145@FreeBSD.org> <73856F2B-3E70-483C-9988-C84E798CEB44@FreeBSD.org> <44EBAC98-4761-4E47-8E47-5032430A1C8A@FreeBSD.org> <56019AF8.8000705@freebsd.org> <5601CF2D.9030307@freebsd.org> <5602E90A.9050504@freebsd.org> <0931591A-23EC-40CB-A109-72E9308B1A2D@pingpong.net> Cc: Palle Girgensohn , George Neville-Neil , freebsd-net@freebsd.org From: Julien Charbon Message-ID: <5602F044.5010606@freebsd.org> Date: Wed, 23 Sep 2015 20:32:36 +0200 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:38.0) Gecko/20100101 Thunderbird/38.2.0 MIME-Version: 1.0 In-Reply-To: <0931591A-23EC-40CB-A109-72E9308B1A2D@pingpong.net> Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="epRRqLROkIsrJcaDOpMFvODp97niGFVa6" X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 23 Sep 2015 18:32:54 -0000 This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --epRRqLROkIsrJcaDOpMFvODp97niGFVa6 Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable Hi, On 23/09/15 20:26, Palle Girgensohn wrote: >> 23 sep. 2015 kl. 20:01 skrev Julien Charbon : >> On 23/09/15 16:36, Palle Girgensohn wrote: >>>> 22 sep 2015 kl. 23:59 skrev Julien Charbon : On >>>> 22/09/15 22:58, Palle Girgensohn wrote: >>>>>> 22 sep 2015 kl. 20:16 skrev Julien Charbon :=20 >>>>>> On 22/09/15 18:49, Palle Girgensohn wrote: >>>>>>>> 22 sep 2015 kl. 18:46 skrev Palle Girgensohn=20 >>>>>>>> : >>>>>>>>> 21 sep 2015 kl. 15:53 skrev Palle Girgensohn=20 >>>>>>>>> : >>>>>>>>>> 21 sep 2015 kl. 10:21 skrev Julien Charbon=20 >>>>>>>>>> : On 18/09/15 18:06, Konstantin >>>>>>>>>> Belousov wrote: >>>>>>>>>>> On Fri, Sep 18, 2015 at 03:56:25PM +0200, Julien >>>>>>>>>>> Charbon wrote: >>>>>>>>>>>> [...] >>>>>>>>>> - Second, if issue is still in stable/10, compile 10.2=20 >>>>>>>>>> kernel with these options: >>>>>>>>>> >>>>>>>>>> options DDB options DEADLKRES options=20 >>>>>>>>>> INVARIANTS options INVARIANT_SUPPORT options >>>>>>>>>> WITNESS options WITNESS_SKIPSPIN >>>>>>>>>> >>>>>>>>>> To see where the original fault is coming from. >>>>>>>>> [...] >>>>>>>>> >>>>>>>>> I'll try stable/10 now. Would you suggest a "clean"=20 >>>>>>>>> stable/10, or could 287621 and 287780 help? >>>>>>>>> >>>>>>>>> I'll add the debugging suggested options right away. >>>>>>>>> >>>>>>>>> Palle >>>>>>>> >>>>>>>> I have a new core dump from ^/stable/10 with: >>>>>>>> >>>>>>>> options DDB options DEADLKRES options >>>>>>>> INVARIANTS options INVARIANT_SUPPORT options WITNESS >>>>>>>> options WITNESS_SKIPSPIN >>>>>>> >>>>>>> # kgdb kernel /var/crash/vmcore.2 GNU gdb 6.1.1 [FreeBSD]=20 >>>>>>> Copyright 2004 Free Software Foundation, Inc. GDB is free=20 >>>>>>> software, covered by the GNU General Public License, and you >>>>>>> are welcome to change it and/or distribute copies of it under >>>>>>> certain conditions. Type "show copying" to see the >>>>>>> conditions. There is absolutely no warranty for GDB. Type >>>>>>> "show warranty" for details. This GDB was configured as >>>>>>> "amd64-marcel-freebsd"... >>>>>>> >>>>>>> Unread portion of the kernel message buffer: panic: >>>>>>> tcp_detach: INP_TIMEWAIT && INP_DROPPED && tp !=3D NULL cpuid =3D= >>>>>>> 16 KDB: stack backtrace: db_trace_self_wrapper() at=20 >>>>>>> db_trace_self_wrapper+0x2b/frame 0xfffffe183d9e97e0=20 >>>>>>> kdb_backtrace() at kdb_backtrace+0x39/frame >>>>>>> 0xfffffe183d9e9890 vpanic() at vpanic+0x126/frame >>>>>>> 0xfffffe183d9e98d0 kassert_panic() at >>>>>>> kassert_panic+0x139/frame 0xfffffe183d9e9940 >>>>>>> tcp_usr_detach() at tcp_usr_detach+0xf9/frame >>>>>>> 0xfffffe183d9e9970 sofree() at sofree+0x1f1/frame >>>>>>> 0xfffffe183d9e99a0 soclose() at soclose+0x3a0/frame >>>>>>> 0xfffffe183d9e99f0 _fdrop() at _fdrop+0x29/frame >>>>>>> 0xfffffe183d9e9a10 closef() at closef+0x1e2/frame >>>>>>> 0xfffffe183d9e9aa0 closefp() at closefp+0x9d/frame >>>>>>> 0xfffffe183d9e9ae0 amd64_syscall() at=20 >>>>>>> amd64_syscall+0x25a/frame 0xfffffe183d9e9bf0 Xfast_syscall() >>>>>>> at Xfast_syscall+0xfb/frame 0xfffffe183d9e9bf0 --- syscall >>>>>>> (6, FreeBSD ELF64, sys_close), rip =3D 0x801c8d94a, rsp =3D=20 >>>>>>> 0x7ffff91c8668, rbp =3D 0x7ffff91c8680 --- KDB: enter: panic=20 >>>>>>> Uptime: 18h57m59s Dumping 23085 out of 98263=20 >>>>>>> MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91% >>>>>>> >>>>>>> Reading symbols from /boot/kernel/nullfs.ko.symbols...done.=20 >>>>>>> Loaded symbols for /boot/kernel/nullfs.ko.symbols Reading >>>>>>> symbols from /boot/kernel/zfs.ko.symbols...done. Loaded >>>>>>> symbols for /boot/kernel/zfs.ko.symbols Reading symbols from >>>>>>> /boot/kernel/opensolaris.ko.symbols...done. Loaded symbols >>>>>>> for /boot/kernel/opensolaris.ko.symbols Reading symbols from >>>>>>> /boot/kernel/ng_bridge.ko.symbols...done. Loaded symbols for >>>>>>> /boot/kernel/ng_bridge.ko.symbols Reading symbols from=20 >>>>>>> /boot/kernel/netgraph.ko.symbols...done. Loaded symbols for=20 >>>>>>> /boot/kernel/netgraph.ko.symbols Reading symbols from=20 >>>>>>> /boot/kernel/ng_eiface.ko.symbols...done. Loaded symbols for >>>>>>> /boot/kernel/ng_eiface.ko.symbols Reading symbols from=20 >>>>>>> /boot/kernel/ng_ether.ko.symbols...done. Loaded symbols for=20 >>>>>>> /boot/kernel/ng_ether.ko.symbols Reading symbols from=20 >>>>>>> /boot/kernel/accf_data.ko.symbols...done. Loaded symbols for >>>>>>> /boot/kernel/accf_data.ko.symbols Reading symbols from=20 >>>>>>> /boot/kernel/accf_http.ko.symbols...done. Loaded symbols for >>>>>>> /boot/kernel/accf_http.ko.symbols Reading symbols from=20 >>>>>>> /boot/kernel/ums.ko.symbols...done. Loaded symbols for=20 >>>>>>> /boot/kernel/ums.ko.symbols Reading symbols from=20 >>>>>>> /boot/kernel/ng_socket.ko.symbols...done. Loaded symbols for >>>>>>> /boot/kernel/ng_socket.ko.symbols Reading symbols from=20 >>>>>>> /boot/kernel/fdescfs.ko.symbols...done. Loaded symbols for=20 >>>>>>> /boot/kernel/fdescfs.ko.symbols #0 doadump (textdump=3D1) at=20 >>>>>>> pcpu.h:219 219 __asm("movq %%gs:%1,%0" : "=3Dr" (td) (kgdb= ) bt >>>>>>> #0 doadump (textdump=3D1) at pcpu.h:219 #1 0xffffffff8094b337 >>>>>>> in kern_reboot (howto=3D260) at >>>>>>> /usr/src/sys/kern/kern_shutdown.c:451 #2 0xffffffff8094b845 >>>>>>> in vpanic (fmt=3D, ap=3D>>>>>> out>) at /usr/src/sys/kern/kern_shutdown.c:758 #3 >>>>>>> 0xffffffff8094b6d9 in kassert_panic (fmt=3D>>>>>> out>) at /usr/src/sys/kern/kern_shutdown.c:646 #4 >>>>>>> 0xffffffff80b1ee59 in tcp_usr_detach (so=3D>>>>>> out>) at /usr/src/sys/netinet/tcp_usrreq.c:202 #5 >>>>>>> 0xffffffff809cd291 in sofree (so=3D0xfffff801dd302000) at=20 >>>>>>> /usr/src/sys/kern/uipc_socket.c:747 #6 0xffffffff809cdb00 in >>>>>>> soclose (so=3D) at=20 >>>>>>> /usr/src/sys/kern/uipc_socket.c:849 #7 0xffffffff808fe659 in >>>>>>> _fdrop (fp=3D0xfffff802a593db40, td=3D0x0) at file.h:343 #8=20 >>>>>>> 0xffffffff80901092 in closef (fp=3D0xfffff802a593db40,=20 >>>>>>> td=3D0xfffff80eebc894a0) at >>>>>>> /usr/src/sys/kern/kern_descrip.c:2338 #9 0xffffffff808feb5d >>>>>>> in closefp (fdp=3D0xfffff80b20cce000, fd=3D,= >>>>>>> fp=3D0xfffff802a593db40, td=3D0xfffff80eebc894a0, >>>>>>> holdleaders=3D) at=20 >>>>>>> /usr/src/sys/kern/kern_descrip.c:1194 #10 0xffffffff80d7bc3a >>>>>>> in amd64_syscall (td=3D0xfffff80eebc894a0, traced=3D0) at=20 >>>>>>> subr_syscall.c:134 #11 0xffffffff80d5f1db in Xfast_syscall () >>>>>>> at /usr/src/sys/amd64/amd64/exception.S:396 #12 >>>>>>> 0x0000000801c8d94a in ?? () Previous frame inner to this >>>>>>> frame (corrupt stack?) Current language: auto; currently >>>>>>> minimal >>>>>> >>>>>> Thanks for the information. As I suspected the initial error >>>>>> was elsewhere than tcp_twclose(), never got this assertion >>>>>> before: >>>>>> >>>>>> tcp_detach: INP_TIMEWAIT && INP_DROPPED && tp !=3D NULL >>>>>> >>>>>> from here: >>>>>> >>>>>> static void tcp_detach(struct socket *so, struct inpcb *inp) { >>>>>> struct tcpcb *tp; >>>>>> >>>>>> INP_INFO_WLOCK_ASSERT(&V_tcbinfo); INP_WLOCK_ASSERT(inp); >>>>>> >>>>>> KASSERT(so->so_pcb =3D=3D inp, ("tcp_detach: so_pcb !=3D inp"));=20 >>>>>> KASSERT(inp->inp_socket =3D=3D so, ("tcp_detach: inp_socket !=3D >>>>>> so")); >>>>>> >>>>>> tp =3D intotcpcb(inp); >>>>>> >>>>>> if (inp->inp_flags & INP_TIMEWAIT) { if (inp->inp_flags &=20 >>>>>> INP_DROPPED) { KASSERT(tp =3D=3D NULL, ("tcp_detach: INP_TIMEWAIT >>>>>> && " "INP_DROPPED && tp !=3D NULL")); >>>>>> >>>>>> Let me check if I could find a path that could lead to this=20 >>>>>> unexpected case. Unexpected because: INP_DROPPED and=20 >>>>>> inp->inp_ppcb is set to NULL are set at same time here: >>>>>> >>>>>> void tcp_twclose(struct tcptw *tw, int reuse) { struct socket >>>>>> *so; struct inpcb *inp; >>>>>> >>>>>> inp =3D tw->tw_inpcb; KASSERT((inp->inp_flags & INP_TIMEWAIT),=20 >>>>>> ("tcp_twclose: !timewait")); KASSERT(intotw(inp) =3D=3D tw,=20 >>>>>> ("tcp_twclose: inp_ppcb !=3D tw"));=20 >>>>>> INP_INFO_WLOCK_ASSERT(&V_tcbinfo); /* in_pcbfree() */=20 >>>>>> INP_WLOCK_ASSERT(inp); >>>>>> >>>>>> tcp_tw_2msl_stop(tw, reuse); inp->inp_ppcb =3D NULL;=20 >>>>>> in_pcbdrop(inp); ... >>>>>> >>>>>> Interesting [...] >>> >>> Just a quick update. Julien is pursuing this off list with a core >>> dump and we are now waiting for a new core dump with the first >>> KASSERT removed. this is on a stable/10 kernel. >> >> By the way Palle could you also run below Dtrace script to see where >> this tcp_close() in INP_TIMEWAIT comes from: >> >> $ cat tcp-close-tw.d >> fbt::tcp_close:entry >> /args[0]->t_inpcb->inp_flags & 0x01000000/ >> { >> @s1[stack()] =3D count() >> } >> >> tick-1sec { >> printa(@s1); >> } >> $ sudo dtrace -s tcp-close-tw.d >> >> And share any backtraces reported in this dtrace script output. >> >> George, could you check if this dtrace script makes sense for you, and= >> if you have any improvements to add, I am quite a rookie in Dtrace scr= ipts. >=20 > Shall I let the dtrace script run continuously until the machine crashe= s? Or just run it once? Continuously until the machine crashes. You can report any backstrace outputs like: kernel`tcp_usr_close+0x86 kernel`soclose+0xe4 kernel`_fdrop+0x29 kernel`closef+0x237 kernel`closefp+0x95 kernel`amd64_syscall+0x357 kernel`0xffffffff80c83c4b 1 before the machine crashes. But I expect the problematic case detection with Dtrace to be quickly followed by the crash. Will see. -- Julien --epRRqLROkIsrJcaDOpMFvODp97niGFVa6 Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Comment: GPGTools - https://gpgtools.org iQEcBAEBCgAGBQJWAvBLAAoJEKVlQ5Je6dhxCaMH/3bDejnfXmp8vdhssczFoK7w rcjM/BZOpwLAMRFujSnIvokDC3eNhFFg2xM6ieECA0yAPSqN15jG7RBAiK6J2itS Lfo3LNgnoBfQMgXnkU1kpNZ4n5bQLVFIl2MKvOYyDjphhtDcUj0LUBICGt6t9CLN D0SU7+R6PkUSNpHXnI0K8K/cfgtkiXZhSAS6sucgOqhPiRn6gT0lFDql3LBV7QqJ w1RMjdOzO1jgIGJZl4J+2yer7rzFeciIasjfPtEk7NkvVkKh0q99jDTQGcRkwpnb BxS+ux2wJ4hiMV6F9LVAtgV4T000FnhvVFljL5Z8jY0hqJDPuzl2TIkEGttO5aE= =1EB0 -----END PGP SIGNATURE----- --epRRqLROkIsrJcaDOpMFvODp97niGFVa6--