From owner-svn-src-all@FreeBSD.ORG Thu Jan 22 11:27:02 2015 Return-Path: Delivered-To: svn-src-all@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 13EB13A3; Thu, 22 Jan 2015 11:27:02 +0000 (UTC) Received: from lakerest.net (lakerest.net [162.235.35.161]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client CN "lakerest.net", Issuer "Stewart" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id CD35733C; Thu, 22 Jan 2015 11:27:01 +0000 (UTC) Received: from [192.168.1.134] (173.64-138-239-net.sccoast.net [64.138.239.173]) (authenticated bits=0) by lakerest.net (8.14.4/8.14.3) with ESMTP id t0MBQ0VH093440 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NOT); Thu, 22 Jan 2015 06:26:01 -0500 (EST) (envelope-from randall@lakerest.net) Content-Type: text/plain; charset=windows-1252 Mime-Version: 1.0 (Mac OS X Mail 8.2 \(2070.1\)) Subject: Re: svn commit: r277213 - in head: share/man/man9 sys/kern sys/ofed/include/linux sys/sys From: Randy Stewart In-Reply-To: <54C0B75B.9070305@selasky.org> Date: Thu, 22 Jan 2015 06:26:53 -0500 Content-Transfer-Encoding: quoted-printable Message-Id: <04866FE0-43BF-4569-9B67-7ED5F6F4F736@lakerest.net> References: <201501151532.t0FFWV2Y037455@svn.freebsd.org> <54BDD9E1.6090505@selasky.org> <20150120075126.GA42409@kib.kiev.ua> <54BE0AAA.4050104@selasky.org> <20150120090057.GD42409@kib.kiev.ua> <54BE21F0.6010602@selasky.org> <7C692107-51CF-4DFA-BD6C-623D56893150@bsdimp.com> <54C0A352.8090701@selasky.org> <20150122081023.GT42409@kib.kiev.ua> <54C0B75B.9070305@selasky.org> To: Hans Petter Selasky X-Mailer: Apple Mail (2.2070.1) Cc: Adrian Chadd , "src-committers@freebsd.org" , "svn-src-all@freebsd.org" , "svn-src-head@freebsd.org" , Konstantin Belousov , "M. Warner Losh" X-BeenThere: svn-src-all@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: "SVN commit messages for the entire src tree \(except for " user" and " projects" \)" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 22 Jan 2015 11:27:02 -0000 Hans: We (netflix) run in production 35% of the internet with these very = things you identify no lock an all. We *do* have some issue we are looking at = but so far I have *never* connected the dots the way you were claiming that would cause a crash. I can see where TCP would do incorrect retransmissions = but I did *not* see a crash. Now granted my look was quick at this, but that was due to time constraints and the holidays. I am going to put myself = full-time on this to see if I can understand both how you got at =93there is a = panic in tcp=94 and it must fully be the callout-subsystem thus we need to re-write large = parts of it. You *may* be correct in a re-write is needed, you *may* be completely = incorrect. In either case I plan to dig into this and find out. R > On Jan 22, 2015, at 3:39 AM, Hans Petter Selasky = wrote: >=20 > On 01/22/15 09:10, Konstantin Belousov wrote: >> On Thu, Jan 22, 2015 at 08:14:26AM +0100, Hans Petter Selasky wrote: >>> On 01/22/15 06:26, Warner Losh wrote: >>> > >>>>> The code simply needs an update. It is not broken in any ways - = right? If it is not broken, fixing it is not that urgent. >>>>=20 >>>> Radically changing the performance characteristics is breaking the = code. Performance regression in the TCP stack is urgent to fix. >>=20 >>> Not being able to enumerate what all the consumers are that use this = and >>> provide an analysis about why they aren?t important to fix is a bug = in >>> your process, and in your interaction with the project. We simply do = not >>> operate that way. >> Right, I completely agree with this statement. >>=20 >>=20 >>> Hi, >>>=20 >>> My plan is to work out a patch for the TCP stack today, which only >>> change the callout_init() call or its function. This should not need = any >>> particular review. I'll let adrian test and review, because I think = he >>> is closer to me timezone wise and you're standing on my head saying = its >>> urgent. If he is still not happy, I can back my change out. Else it >>> remains in -current AS-IS. >> TCP regresssion was noted, so it is brought in front. There is = nothing >> else which makes TCP issue different from other (hidden) issues. >>=20 >> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D >>> MFC to 10-stable I can delay for sure until >>> all issues you report to me are fixed. >> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D >>=20 >> Sigh, you still do not understand. It is your duty to identify all = pieces >> which break after your change. After that, we can argue whether each = of >> them is critical or not to allow the migration. But this must have = been >> done before the KPI change hit the tree. >>=20 >=20 > Hi, >=20 > Are you saying that pieces of code that runs completely unlocked using = "volatile" as only synchronization mechanism is better than what I would = call a temporary and hopefully short TCP stack performance loss? >=20 > I don't understand? How frequently do you reboot your boxes? Maybe one = every day? And you don't care? >=20 > --HPS >=20 >=20 >=20 ----- Randall Stewart randall@lakerest.net