From owner-freebsd-current@freebsd.org Thu Jul 21 07:54:37 2016 Return-Path: Delivered-To: freebsd-current@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 80710BA056D for ; Thu, 21 Jul 2016 07:54:37 +0000 (UTC) (envelope-from julien.charbon@gmail.com) Received: from mailman.ysv.freebsd.org (unknown [127.0.1.3]) by mx1.freebsd.org (Postfix) with ESMTP id 5733C1E64 for ; Thu, 21 Jul 2016 07:54:37 +0000 (UTC) (envelope-from julien.charbon@gmail.com) Received: by mailman.ysv.freebsd.org (Postfix) id 567EABA056A; Thu, 21 Jul 2016 07:54:37 +0000 (UTC) Delivered-To: current@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 55EB5BA0569; Thu, 21 Jul 2016 07:54:37 +0000 (UTC) (envelope-from julien.charbon@gmail.com) Received: from mail-wm0-f53.google.com (mail-wm0-f53.google.com [74.125.82.53]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 09E861E63; Thu, 21 Jul 2016 07:54:36 +0000 (UTC) (envelope-from julien.charbon@gmail.com) Received: by mail-wm0-f53.google.com with SMTP id q128so1983354wma.1; Thu, 21 Jul 2016 00:54:36 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:subject:to:references:cc:from:message-id:date :user-agent:mime-version:in-reply-to; bh=yWSNo71n1kyYDNe2DcqDQSOI8NYHIuJdJ64TxMgsglE=; b=NrH5WHgEeqjKfgmKJFYhYAM4CjhRS9W6gWJzmSRlxr5vxkngjfA55XXmRM1YqWiu24 zA9A0zuCtKd8nWWd1Kf1cR134nhOtSpdLVwJ3EouqIcVShUgl8wr4H8MtCfFsvHnNeOJ xujXfzQhcLVkZGYj4HM7+28s9eTXwgpfI6TvNDuEVRHIVW1Ue7zqHapDAcikw2w80pDx 1xve5X5eEh0eCwOqmakz+uFXXOPWFFMZOJ5WpKYuDaYsP8hgmqjC19FuL0My39iYN9+5 1OGD6l1MVEr9hJKkL0nA5hx73MHA6NMGnaE8koed4NyNMXex+vpZrdi9IbH/ZxO48lt3 GNhQ== X-Gm-Message-State: ALyK8tIJhtChClXw/FbucNTJcDM794oQSyL9x+SzAsyMEZH/3TYsKhVRfOw4OJiCvuOmlA== X-Received: by 10.194.89.68 with SMTP id bm4mr5515631wjb.164.1469087668756; Thu, 21 Jul 2016 00:54:28 -0700 (PDT) Received: from [172.20.10.4] (47.236.197.178.dynamic.wless.lssmb00p-cgnat.res.cust.swisscom.ch. [178.197.236.47]) by smtp.gmail.com with ESMTPSA id b130sm2001088wmg.3.2016.07.21.00.54.26 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 21 Jul 2016 00:54:27 -0700 (PDT) Subject: Re: panic with tcp timers To: Larry Rosenman References: <20160617045319.GE1076@FreeBSD.org> <1f28844b-b4ea-b544-3892-811f2be327b9@freebsd.org> <20160620073917.GI1076@FreeBSD.org> <1d18d0e2-3e42-cb26-928c-2989d0751884@freebsd.org> Cc: Gleb Smirnoff , rrs@freebsd.org, hselasky@freebsd.org, net@freebsd.org, current@freebsd.org, owner-freebsd-current@freebsd.org From: Julien Charbon Message-ID: <548bf673-580d-350a-9f91-88553f3c82f1@freebsd.org> Date: Thu, 21 Jul 2016 09:54:20 +0200 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:45.0) Gecko/20100101 Thunderbird/45.2.0 MIME-Version: 1.0 In-Reply-To: Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="RltEhHq6WEsTOuEtcjO8preCMk3cvFPf5" X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 21 Jul 2016 07:54:37 -0000 This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --RltEhHq6WEsTOuEtcjO8preCMk3cvFPf5 Content-Type: multipart/mixed; boundary="9FJ5nB7dX6eQcKe0ImWav0rWKa4Vtkule" From: Julien Charbon To: Larry Rosenman Cc: Gleb Smirnoff , rrs@freebsd.org, hselasky@freebsd.org, net@freebsd.org, current@freebsd.org, owner-freebsd-current@freebsd.org Message-ID: <548bf673-580d-350a-9f91-88553f3c82f1@freebsd.org> Subject: Re: panic with tcp timers References: <20160617045319.GE1076@FreeBSD.org> <1f28844b-b4ea-b544-3892-811f2be327b9@freebsd.org> <20160620073917.GI1076@FreeBSD.org> <1d18d0e2-3e42-cb26-928c-2989d0751884@freebsd.org> In-Reply-To: --9FJ5nB7dX6eQcKe0ImWav0rWKa4Vtkule Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable Hi, On 7/14/16 11:02 PM, Larry Rosenman wrote: > On 2016-07-14 12:01, Julien Charbon wrote: >> On 6/20/16 11:55 AM, Julien Charbon wrote: >>> On 6/20/16 9:39 AM, Gleb Smirnoff wrote: >>>> On Fri, Jun 17, 2016 at 11:27:39AM +0200, Julien Charbon wrote: >>>> J> > Comparing stable/10 and head, I see two changes that could >>>> J> > affect that: >>>> J> > >>>> J> > - callout_async_drain >>>> J> > - switch to READ lock for inp info in tcp timers >>>> J> > >>>> J> > That's why you are in To, Julien and Hans :) >>>> J> > >>>> J> > We continue investigating, and I will keep you updated. >>>> J> > However, any help is welcome. I can share cores. >>>> >>>> Now, spending some time with cores and adding a bunch of >>>> extra CTRs, I have a sequence of events that lead to the >>>> panic. In short, the bug is in the callout system. It seems >>>> to be not relevant to the callout_async_drain, at least for >>>> now. The transition to READ lock unmasked the problem, that's >>>> why NetflixBSD 10 doesn't panic. >>>> >>>> The panic requires heavy contention on the TCP info lock. >>>> >>>> [CPU 1] the callout fires, tcp_timer_keep entered >>>> [CPU 1] blocks on INP_INFO_RLOCK(&V_tcbinfo); >>>> [CPU 2] schedules the callout >>>> [CPU 2] tcp_discardcb called >>>> [CPU 2] callout successfully canceled >>>> [CPU 2] tcpcb freed >>>> [CPU 1] unblocks... panic >>>> >>>> When the lock was WLOCK, all contenders were resumed in a >>>> sequence they came to the lock. Now, that they are readers, >>>> once the lock is released, readers are resumed in a "random" >>>> order, and this allows tcp_discardcb to go before the old >>>> running callout, and this unmasks the panic. >>> >>> Highly interesting. I should be able to reproduce that (will be use= ful >>> for testing the corresponding fix). >> >> Finally, I was able to reproduce it (without glebius fix). The tric= k >> was to really lower TCP keep timer expiration: >> >> $ sysctl -a | grep tcp.keep >> net.inet.tcp.keepidle: 7200000 >> net.inet.tcp.keepintvl: 75000 >> net.inet.tcp.keepinit: 75000 >> net.inet.tcp.keepcnt: 8 >> $ sudo bash -c "sysctl net.inet.tcp.keepidle=3D10 && sysctl >> net.inet.tcp.keepintvl=3D50 && sysctl net.inet.tcp.keepinit=3D10" >> Password: >> net.inet.tcp.keepidle: 7200000 -> 10 >> net.inet.tcp.keepintvl: 75000 -> 50 >> net.inet.tcp.keepinit: 75000 -> 10 >> >> Note: It will certainly close all your ssh connections to the tested >> server. >> >> Now I will test in order: >> >> #1. glebius fix >> https://svnweb.freebsd.org/base?view=3Drevision&revision=3D302350 >> >> #2. rss extra fix >> https://reviews.freebsd.org/D7135 >> >> #3. rrs TCP Timer cleanup >> https://reviews.freebsd.org/D7136 >=20 > please see also https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D210= 884 My tests result so far: #1. r302350: First glebius TCP timer fix: No more TCP timer kernel panic during 48h under 200k TCP query per second load. Sadly I was unable to reproduce the issue described here: panic: bogus refcnt 0 on lle 0xfffff80004608c00 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D210884 #2. r303098: Got all kernel callout changes since r302350, (updates on callout code are indeed always full of surprises): https://svnweb.freebsd.org/base/head/sys/kern/kern_timeout.c?view=3Dlog&p= athrev=3D303098 No kernel panic either. Still to test: #3. rss extra fix (if still relevant now) https://reviews.freebsd.org/D7135 #4. rrs TCP Timer cleanup: https://reviews.freebsd.org/D7136 My 2 cents. -- Julien --9FJ5nB7dX6eQcKe0ImWav0rWKa4Vtkule-- --RltEhHq6WEsTOuEtcjO8preCMk3cvFPf5 Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Comment: GPGTools - https://gpgtools.org iQEcBAEBCgAGBQJXkH+yAAoJEKVlQ5Je6dhxGe0H+gJyAT5R0hpGgjBBTICN3h+q aGvIgBPC3HgVDJhU1ZKhU0xjNZirq2icxgh/0UV+iuZvOUZCTteT4IsVl8WoZDUQ 0VODwVSj748EJdftA5GqDR464nY+6McIj1FrWtmbVgqtYkKP2oAuOQzy0w2lRYeK c3m8gb9JP0bN8M9zFRee2IzaIikzQJtaapMX77XzBR5umxuzAnp4tbSuAmJdE3Ln +ddBH/4DcTLQEKSBboqQwM/VLYzoWl33e5IQhrYyUzJe1dfXLZHBS6sm2eHdug+0 NIOEuBcYRJZqp4TwYyjIGauIALAfqo6zDQCSUZvhkgqNmkriogBVtjz92pxmQPg= =5jrc -----END PGP SIGNATURE----- --RltEhHq6WEsTOuEtcjO8preCMk3cvFPf5--