From owner-freebsd-stable@freebsd.org Tue Mar 8 20:29:59 2016 Return-Path: Delivered-To: freebsd-stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 5AAC9AC8397 for ; Tue, 8 Mar 2016 20:29:59 +0000 (UTC) (envelope-from scottl@samsco.org) Received: from mail.samsco.org (suzy.samsco.org [168.103.85.61]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 3589BEAE for ; Tue, 8 Mar 2016 20:29:58 +0000 (UTC) (envelope-from scottl@samsco.org) Received: from localhost (localhost [192.168.254.3]) by mail.samsco.org (Postfix) with ESMTP id EE63415C18683; Tue, 8 Mar 2016 20:31:52 +0000 (UTC) Received: from mail.samsco.org ([192.168.254.3]) by localhost (mail.samsco.org [192.168.254.3]) (maiad, port 10024) with ESMTP id 11403-08; Tue, 8 Mar 2016 20:31:52 +0000 (UTC) Received: from [100.127.144.27] (unknown [69.53.245.22]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) (Authenticated sender: scottl@samsco.org) by mail.samsco.org (Postfix) with ESMTPSA id 98B0315C18682; Tue, 8 Mar 2016 20:31:52 +0000 (UTC) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 9.2 \(3112\)) Subject: Re: kernel: mps0: Out of chain frames, consider increasing hw.mps.max_chains. From: Scott Long In-Reply-To: <20160308190205.GG70809@zxy.spb.ru> Date: Tue, 8 Mar 2016 12:29:56 -0800 Cc: freebsd-stable@freebsd.org Content-Transfer-Encoding: quoted-printable Message-Id: References: <0F0C78F4-6FE2-43BA-B503-AA04A79F2E70@samsco.org> <20160306212733.GJ11654@zxy.spb.ru> <20160307060407.GK11654@zxy.spb.ru> <5B8DD95A-9FA0-4E16-85A1-87B54035B3F7@samsco.org> <20160307111012.GL11654@zxy.spb.ru> <20160308180746.GE70809@zxy.spb.ru> <6189E959-3489-438E-8D91-9E5E46E2D482@samsco.org> <20160308184823.GF70809@zxy.spb.ru> <20160308190205.GG70809@zxy.spb.ru> To: Slawa Olhovchenkov X-Mailer: Apple Mail (2.3112) X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 08 Mar 2016 20:29:59 -0000 > On Mar 8, 2016, at 11:02 AM, Slawa Olhovchenkov = wrote: >=20 > On Tue, Mar 08, 2016 at 10:56:39AM -0800, Scott Long wrote: >=20 >>=20 >>> On Mar 8, 2016, at 10:48 AM, Slawa Olhovchenkov = wrote: >>>=20 >>> On Tue, Mar 08, 2016 at 10:34:23AM -0800, Scott Long wrote: >>>=20 >>>>=20 >>>>> On Mar 8, 2016, at 10:07 AM, Slawa Olhovchenkov = wrote: >>>>>=20 >>>>> On Mon, Mar 07, 2016 at 02:10:12PM +0300, Slawa Olhovchenkov = wrote: >>>>>=20 >>>>>>>>>> This allocated one for all controllers, or allocated for = every controller? >>>>>>>>>=20 >>>>>>>>> It=E2=80=99s per-controller. >>>>>>>>>=20 >>>>>>>>> I=E2=80=99ve thought about making the tuning be dynamic at = runtime. I >>>>>>>>> implemented similar dynamic tuning for other drivers, but it = seemed >>>>>>>>> overly complex for low benefit. Implementing it for this = driver >>>>>>>>> would be possible but require some significant code changes. >>>>>>>>=20 >>>>>>>> What cause of chain_free+io_cmds_active << max_chains? >>>>>>>> One cmd can use many chains? >>>>>>>=20 >>>>>>> Yes. A request uses and active command, and depending on the = size of the I/O, >>>>>>> it might use several chain frames. >>>>>=20 >>>>> I am play with max_chains and like significant cost of handling >>>>> max_chains: with 8192 system resonded badly vs 2048. Now try 3192, >>>>> response like with 2048. >>>>=20 >>>> Hi, I=E2=80=99m not sure I understand what you=E2=80=99re saying. = You said that you tried 8192, but the system still complained of being = out of chain frames? Now you are trying fewer, only 3192? >>>=20 >>> With 8192 system not complained of being out of chain frames, but = like >>> need more CPU power to handle this chain list -- traffic graf (this >>> host servered HTTP by nginx) have many "jerking", with 3192 traffic >>> graf is more smooth. >>=20 >> Hi, >>=20 >> The CPU overhead of doing more chain frames is nil. They are just >> objects in a list, and processing the list is O(1), not O(n). What >> you are likely seeing is other problems with VM and VFS-BIO system >> struggling to deal with the amount of I/O that you are doing. >> Depending on what kind I/O you are doing (buffered filesystem >> reads/writes, memory mapped I/O, unbuffered I/O) there are limits >> and high/low water marks on how much I/O can be outstanding, and >> when the limits are reached processes are put to sleep and then race >> back in when they are woken up. This causes poor, oscillating >> system behavior. There=E2=80=99s some tuning you can do to increase = the >> limits, but yes, it=E2=80=99s a problem that behaves poorly in an = untuned >> system. >=20 > Sorry, I am don't understund you point: how to large unused chain > frames can consume CPU power? A =E2=80=98chain frame=E2=80=99 is 128 bytes. By jumping from 2048 to = 8192 chain frames allocated, you=E2=80=99ve jumped from 256KB to 1MB of = allocated memory. This sounds like a lot, but if you=E2=80=99re doing = enough I/O to saturate the tunings then you likely have many GB of RAM. = The 1MB of memory consumed is going to be well less than 1% of you have, = and likely .1 to .01%. So it=E2=80=99s likely that the VM is not having = to work much harder to deal with the missing memory. In dealing with = the chain frames themselves, they are stored on a linked list, and that = list is never walked from head to tail. The driver adds to head and = subtracts from the head, so there is no cost for the length of the list. For comparison, we use 4 =E2=80=98mps=E2=80=99 controllers in our = servers at Netflix, and run 20Gbps (2.5GB/s) through them. We=E2=80=99ve = done extensive profiling and tuning of the kernel, and we=E2=80=99ve = never measured a change in cost for having different chain frame = lengths, other than the difficulties that come from having too few. The = problems exist in the VM and VFS-BIO interfaces being poorly tuned for = modern workloads. Scott