From owner-freebsd-stable@freebsd.org  Tue Mar  8 20:29:59 2016
Return-Path: <owner-freebsd-stable@freebsd.org>
Delivered-To: freebsd-stable@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 5AAC9AC8397
 for <freebsd-stable@mailman.ysv.freebsd.org>;
 Tue,  8 Mar 2016 20:29:59 +0000 (UTC)
 (envelope-from scottl@samsco.org)
Received: from mail.samsco.org (suzy.samsco.org [168.103.85.61])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 3589BEAE
 for <freebsd-stable@freebsd.org>; Tue,  8 Mar 2016 20:29:58 +0000 (UTC)
 (envelope-from scottl@samsco.org)
Received: from localhost (localhost [192.168.254.3])
 by mail.samsco.org (Postfix) with ESMTP id EE63415C18683;
 Tue,  8 Mar 2016 20:31:52 +0000 (UTC)
Received: from mail.samsco.org ([192.168.254.3])
 by localhost (mail.samsco.org [192.168.254.3]) (maiad, port 10024) with ESMTP
 id 11403-08; Tue,  8 Mar 2016 20:31:52 +0000 (UTC)
Received: from [100.127.144.27] (unknown [69.53.245.22])
 (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits))
 (No client certificate requested)
 (Authenticated sender: scottl@samsco.org)
 by mail.samsco.org (Postfix) with ESMTPSA id 98B0315C18682;
 Tue,  8 Mar 2016 20:31:52 +0000 (UTC)
Content-Type: text/plain; charset=utf-8
Mime-Version: 1.0 (Mac OS X Mail 9.2 \(3112\))
Subject: Re: kernel: mps0: Out of chain frames,
 consider increasing hw.mps.max_chains.
From: Scott Long <scottl@samsco.org>
In-Reply-To: <20160308190205.GG70809@zxy.spb.ru>
Date: Tue, 8 Mar 2016 12:29:56 -0800
Cc: freebsd-stable@freebsd.org
Content-Transfer-Encoding: quoted-printable
Message-Id: <BA751E70-A3A8-497B-9914-3AD122284311@samsco.org>
References: <0F0C78F4-6FE2-43BA-B503-AA04A79F2E70@samsco.org>
 <20160306212733.GJ11654@zxy.spb.ru>
 <DFC3C4CF-89D4-417C-AEBA-67F49F3EA1DE@samsco.org>
 <20160307060407.GK11654@zxy.spb.ru>
 <5B8DD95A-9FA0-4E16-85A1-87B54035B3F7@samsco.org>
 <20160307111012.GL11654@zxy.spb.ru> <20160308180746.GE70809@zxy.spb.ru>
 <6189E959-3489-438E-8D91-9E5E46E2D482@samsco.org>
 <20160308184823.GF70809@zxy.spb.ru>
 <CEBE95C5-9167-45AC-9671-DF2C919A1AF3@samsco.org>
 <20160308190205.GG70809@zxy.spb.ru>
To: Slawa Olhovchenkov <slw@zxy.spb.ru>
X-Mailer: Apple Mail (2.3112)
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.21
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-stable>, 
 <mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable/>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
 <mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 08 Mar 2016 20:29:59 -0000


> On Mar 8, 2016, at 11:02 AM, Slawa Olhovchenkov <slw@zxy.spb.ru> =
wrote:
>=20
> On Tue, Mar 08, 2016 at 10:56:39AM -0800, Scott Long wrote:
>=20
>>=20
>>> On Mar 8, 2016, at 10:48 AM, Slawa Olhovchenkov <slw@zxy.spb.ru> =
wrote:
>>>=20
>>> On Tue, Mar 08, 2016 at 10:34:23AM -0800, Scott Long wrote:
>>>=20
>>>>=20
>>>>> On Mar 8, 2016, at 10:07 AM, Slawa Olhovchenkov <slw@zxy.spb.ru> =
wrote:
>>>>>=20
>>>>> On Mon, Mar 07, 2016 at 02:10:12PM +0300, Slawa Olhovchenkov =
wrote:
>>>>>=20
>>>>>>>>>> This allocated one for all controllers, or allocated for =
every controller?
>>>>>>>>>=20
>>>>>>>>> It=E2=80=99s per-controller.
>>>>>>>>>=20
>>>>>>>>> I=E2=80=99ve thought about making the tuning be dynamic at =
runtime.  I
>>>>>>>>> implemented similar dynamic tuning for other drivers, but it =
seemed
>>>>>>>>> overly complex for low benefit.  Implementing it for this =
driver
>>>>>>>>> would be possible but require some significant code changes.
>>>>>>>>=20
>>>>>>>> What cause of chain_free+io_cmds_active << max_chains?
>>>>>>>> One cmd can use many chains?
>>>>>>>=20
>>>>>>> Yes.  A request uses and active command, and depending on the =
size of the I/O,
>>>>>>> it might use several chain frames.
>>>>>=20
>>>>> I am play with max_chains and like significant cost of handling
>>>>> max_chains: with 8192 system resonded badly vs 2048. Now try 3192,
>>>>> response like with 2048.
>>>>=20
>>>> Hi, I=E2=80=99m not sure I understand what you=E2=80=99re saying.  =
You said that you tried 8192, but the system still complained of being =
out of chain frames?  Now you are trying fewer, only 3192?
>>>=20
>>> With 8192 system not complained of being out of chain frames, but =
like
>>> need more CPU power to handle this chain list -- traffic graf (this
>>> host servered HTTP by nginx) have many "jerking", with 3192 traffic
>>> graf is more smooth.
>>=20
>> Hi,
>>=20
>> The CPU overhead of doing more chain frames is nil.  They are just
>> objects in a list, and processing the list is O(1), not O(n).  What
>> you are likely seeing is other problems with VM and VFS-BIO system
>> struggling to deal with the amount of I/O that you are doing.
>> Depending on what kind I/O you are doing (buffered filesystem
>> reads/writes, memory mapped I/O, unbuffered I/O) there are limits
>> and high/low water marks on how much I/O can be outstanding, and
>> when the limits are reached processes are put to sleep and then race
>> back in when they are woken up.  This causes poor, oscillating
>> system behavior.  There=E2=80=99s some tuning you can do to increase =
the
>> limits, but yes, it=E2=80=99s a problem that behaves poorly in an =
untuned
>> system.
>=20
> Sorry, I am don't understund you point: how to large unused chain
> frames can consume CPU power?

A =E2=80=98chain frame=E2=80=99 is 128 bytes.  By jumping from 2048 to =
8192 chain frames allocated, you=E2=80=99ve jumped from 256KB to 1MB of =
allocated memory.  This sounds like a lot, but if you=E2=80=99re doing =
enough I/O to saturate the tunings then you likely have many GB of RAM.  =
The 1MB of memory consumed is going to be well less than 1% of you have, =
and likely .1 to .01%.  So it=E2=80=99s likely that the VM is not having =
to work much harder to deal with the missing memory.  In dealing with =
the chain frames themselves, they are stored on a linked list, and that =
list is never walked from head to tail.  The driver adds to head and =
subtracts from the head, so there is no cost for the length of the list.

For comparison, we use 4 =E2=80=98mps=E2=80=99 controllers in our =
servers at Netflix, and run 20Gbps (2.5GB/s) through them.  We=E2=80=99ve =
done extensive profiling and tuning of the kernel, and we=E2=80=99ve =
never measured a change in cost for having different chain frame =
lengths, other than the difficulties that come from having too few.  The =
problems exist in the VM and VFS-BIO interfaces being poorly tuned for =
modern workloads.

Scott