From owner-freebsd-net@freebsd.org Thu Jul 14 20:57:38 2016 Return-Path: Delivered-To: freebsd-net@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 93A92B9921B for ; Thu, 14 Jul 2016 20:57:38 +0000 (UTC) (envelope-from julien.charbon@gmail.com) Received: from mailman.ysv.freebsd.org (mailman.ysv.freebsd.org [IPv6:2001:1900:2254:206a::50:5]) by mx1.freebsd.org (Postfix) with ESMTP id 5FF891C19 for ; Thu, 14 Jul 2016 20:57:38 +0000 (UTC) (envelope-from julien.charbon@gmail.com) Received: by mailman.ysv.freebsd.org (Postfix) id 58DBAB99219; Thu, 14 Jul 2016 20:57:38 +0000 (UTC) Delivered-To: net@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 581F0B99214; Thu, 14 Jul 2016 20:57:38 +0000 (UTC) (envelope-from julien.charbon@gmail.com) Received: from mail-wm0-f53.google.com (mail-wm0-f53.google.com [74.125.82.53]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id E60E51C17; Thu, 14 Jul 2016 20:57:37 +0000 (UTC) (envelope-from julien.charbon@gmail.com) Received: by mail-wm0-f53.google.com with SMTP id i5so4089534wmg.0; Thu, 14 Jul 2016 13:57:37 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:subject:to:references:cc:from:message-id:date :user-agent:mime-version:in-reply-to; bh=sjTxoo/eCUL2oM5uNLExyCOqJi5HCTpFlzfpjZNQZl4=; b=dAccgpvybsUDA6DzK1us1Prxcpwju9dVXnFXHUFBje4Cz1Y8bpJ+LTlKsuLv+XvuBW jFiL37bvRdRUNgIoQ5JDHiscl+zMoXvhkhf0HJMkS8pGYuTYl9JIJ5zB3so4ZMTieLYN p3SWEK3/5sJHpFNAQKnvaRBzej9rRdvv+ok2+A/+YvGYqTn8kTghMwKdDhxGPxx6BsFy Uc2x/3XSmXGAops5l2+yvVC+Z9uKblczUcB5/d+acvHLRnqGYE6UKNDeU2ce5NMLLDYC xt5GoAFnQR6WZiDcoMgId1Mb7hdMsIgq9ZNOS+m/kYal0xuvztkXctoi0Ml/6d0rzRS5 Klzw== X-Gm-Message-State: ALyK8tIvFFEYSwuhdIrNv6PYmbcQCQcuZv8FNclyR7Eg9PqWQi1fcU9u3BsCcQeBsKJ4jA== X-Received: by 10.28.199.205 with SMTP id x196mr16683033wmf.96.1468515679633; Thu, 14 Jul 2016 10:01:19 -0700 (PDT) Received: from [172.20.10.4] (4.232.197.178.dynamic.wless.lssmb00p-cgnat.res.cust.swisscom.ch. [178.197.232.4]) by smtp.gmail.com with ESMTPSA id b186sm4968236wmg.23.2016.07.14.10.01.17 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 14 Jul 2016 10:01:18 -0700 (PDT) Subject: Re: panic with tcp timers To: Gleb Smirnoff , rrs@FreeBSD.org References: <20160617045319.GE1076@FreeBSD.org> <1f28844b-b4ea-b544-3892-811f2be327b9@freebsd.org> <20160620073917.GI1076@FreeBSD.org> <1d18d0e2-3e42-cb26-928c-2989d0751884@freebsd.org> Cc: hselasky@FreeBSD.org, net@FreeBSD.org, current@FreeBSD.org From: Julien Charbon Message-ID: Date: Thu, 14 Jul 2016 19:01:11 +0200 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:45.0) Gecko/20100101 Thunderbird/45.2.0 MIME-Version: 1.0 In-Reply-To: <1d18d0e2-3e42-cb26-928c-2989d0751884@freebsd.org> Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="3bJTeB9odClQb2ngDjv5fox53UGeQumpb" X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 14 Jul 2016 20:57:38 -0000 This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --3bJTeB9odClQb2ngDjv5fox53UGeQumpb Content-Type: multipart/mixed; boundary="wOpjdVBETEfMPxFEjbCdXKObs3doI97cF" From: Julien Charbon To: Gleb Smirnoff , rrs@FreeBSD.org Cc: hselasky@FreeBSD.org, net@FreeBSD.org, current@FreeBSD.org Message-ID: Subject: Re: panic with tcp timers References: <20160617045319.GE1076@FreeBSD.org> <1f28844b-b4ea-b544-3892-811f2be327b9@freebsd.org> <20160620073917.GI1076@FreeBSD.org> <1d18d0e2-3e42-cb26-928c-2989d0751884@freebsd.org> In-Reply-To: <1d18d0e2-3e42-cb26-928c-2989d0751884@freebsd.org> --wOpjdVBETEfMPxFEjbCdXKObs3doI97cF Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable Hi, On 6/20/16 11:55 AM, Julien Charbon wrote: > On 6/20/16 9:39 AM, Gleb Smirnoff wrote: >> On Fri, Jun 17, 2016 at 11:27:39AM +0200, Julien Charbon wrote: >> J> > Comparing stable/10 and head, I see two changes that could >> J> > affect that: >> J> >=20 >> J> > - callout_async_drain >> J> > - switch to READ lock for inp info in tcp timers >> J> >=20 >> J> > That's why you are in To, Julien and Hans :) >> J> >=20 >> J> > We continue investigating, and I will keep you updated. >> J> > However, any help is welcome. I can share cores. >> >> Now, spending some time with cores and adding a bunch of >> extra CTRs, I have a sequence of events that lead to the >> panic. In short, the bug is in the callout system. It seems >> to be not relevant to the callout_async_drain, at least for >> now. The transition to READ lock unmasked the problem, that's >> why NetflixBSD 10 doesn't panic. >> >> The panic requires heavy contention on the TCP info lock. >> >> [CPU 1] the callout fires, tcp_timer_keep entered >> [CPU 1] blocks on INP_INFO_RLOCK(&V_tcbinfo); >> [CPU 2] schedules the callout >> [CPU 2] tcp_discardcb called >> [CPU 2] callout successfully canceled >> [CPU 2] tcpcb freed >> [CPU 1] unblocks... panic >> >> When the lock was WLOCK, all contenders were resumed in a >> sequence they came to the lock. Now, that they are readers, >> once the lock is released, readers are resumed in a "random" >> order, and this allows tcp_discardcb to go before the old >> running callout, and this unmasks the panic. >=20 > Highly interesting. I should be able to reproduce that (will be usefu= l > for testing the corresponding fix). Finally, I was able to reproduce it (without glebius fix). The trick was to really lower TCP keep timer expiration: $ sysctl -a | grep tcp.keep net.inet.tcp.keepidle: 7200000 net.inet.tcp.keepintvl: 75000 net.inet.tcp.keepinit: 75000 net.inet.tcp.keepcnt: 8 $ sudo bash -c "sysctl net.inet.tcp.keepidle=3D10 && sysctl net.inet.tcp.keepintvl=3D50 && sysctl net.inet.tcp.keepinit=3D10" Password: net.inet.tcp.keepidle: 7200000 -> 10 net.inet.tcp.keepintvl: 75000 -> 50 net.inet.tcp.keepinit: 75000 -> 10 Note: It will certainly close all your ssh connections to the tested server. Now I will test in order: #1. glebius fix https://svnweb.freebsd.org/base?view=3Drevision&revision=3D302350 #2. rss extra fix https://reviews.freebsd.org/D7135 #3. rrs TCP Timer cleanup https://reviews.freebsd.org/D7136 My panic for reference: Fatal trap 9: general protection fault while in kernel mode cpuid =3D 10; apic id =3D 28 [root@atlas-dl360-4 ~]# instruction pointer =3D 0x20:0xffffffff80c346= f1 stack pointer =3D 0x28:0xfffffe1f29b848b0 frame pointer =3D 0x28:0xfffffe1f29b848e0 code segment =3D base 0x0, limit 0xfffff, type 0x1b =3D DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags =3D interrupt enabled, resume, IOPL =3D 0 current process =3D 12 (swi4: clock (4)) trap number =3D 9 panic: general protection fault cpuid =3D 10 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe1f29b844a0 vpanic() at vpanic+0x182/frame 0xfffffe1f29b84520 panic() at panic+0x43/frame 0xfffffe1f29b84580 trap_fatal() at trap_fatal+0x351/frame 0xfffffe1f29b845e0 trap() at trap+0x820/frame 0xfffffe1f29b847f0 calltrap() at calltrap+0x8/frame 0xfffffe1f29b847f0 --- trap 0x9, rip =3D 0xffffffff80c346f1, rsp =3D 0xfffffe1f29b848c0, rbp= =3D 0xfffffe1f29b848e0 --- tcp_timer_keep() at tcp_timer_keep+0x51/frame 0xfffffe1f29b848e0 softclock_call_cc() at softclock_call_cc+0x19c/frame 0xfffffe1f29b849c0 softclock() at softclock+0x47/frame 0xfffffe1f29b849e0 intr_event_execute_handlers() at intr_event_execute_handlers+0x96/frame 0xfffffe1f29b84a20 ithread_loop() at ithread_loop+0xa6/frame 0xfffffe1f29b84a70 fork_exit() at fork_exit+0x84/frame 0xfffffe1f29b84ab0 fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe1f29b84ab0 --- trap 0, rip =3D 0, rsp =3D 0, rbp =3D 0 --- -- Julien --wOpjdVBETEfMPxFEjbCdXKObs3doI97cF-- --3bJTeB9odClQb2ngDjv5fox53UGeQumpb Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Comment: GPGTools - https://gpgtools.org iQEcBAEBCgAGBQJXh8VdAAoJEKVlQ5Je6dhxhccH/R7BIEReY5MtXw8l37IDBIB2 pK2uuSS+mvscTnIUzJcaCMPfXLsH/b5gmFpaqFGhouVsl0Z/pBl45br2jMXggFph Z9ApSUFhEdfkTeM0tVp2VHOnMnIn8+L/gdSY4S2dKyPk/rEq/5DzIf0Ys2q34XJ1 WTltD3IsDjS1baOpy4O6iwSgoZnNTuZerOQqsJXmZ+ZayLM9OF/TGS8w+ztqewQL 9eKfZM7EoYKVdMsYjD/ECZOGy1pw9lFflHQkNaSdUMCePFPLy29DoTXSfALzl5+P 4JLnkRxKzoLoy8ep3LzVm91lwGZIigrkWGobGqAo+YYR9Np6Aq0680ZggPn50Ac= =rq23 -----END PGP SIGNATURE----- --3bJTeB9odClQb2ngDjv5fox53UGeQumpb--