Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 8 Mar 2016 12:29:56 -0800
From:      Scott Long <scottl@samsco.org>
To:        Slawa Olhovchenkov <slw@zxy.spb.ru>
Cc:        freebsd-stable@freebsd.org
Subject:   Re: kernel: mps0: Out of chain frames, consider increasing hw.mps.max_chains.
Message-ID:  <BA751E70-A3A8-497B-9914-3AD122284311@samsco.org>
In-Reply-To: <20160308190205.GG70809@zxy.spb.ru>
References:  <0F0C78F4-6FE2-43BA-B503-AA04A79F2E70@samsco.org> <20160306212733.GJ11654@zxy.spb.ru> <DFC3C4CF-89D4-417C-AEBA-67F49F3EA1DE@samsco.org> <20160307060407.GK11654@zxy.spb.ru> <5B8DD95A-9FA0-4E16-85A1-87B54035B3F7@samsco.org> <20160307111012.GL11654@zxy.spb.ru> <20160308180746.GE70809@zxy.spb.ru> <6189E959-3489-438E-8D91-9E5E46E2D482@samsco.org> <20160308184823.GF70809@zxy.spb.ru> <CEBE95C5-9167-45AC-9671-DF2C919A1AF3@samsco.org> <20160308190205.GG70809@zxy.spb.ru>

next in thread | previous in thread | raw e-mail | index | archive | help

> On Mar 8, 2016, at 11:02 AM, Slawa Olhovchenkov <slw@zxy.spb.ru> =
wrote:
>=20
> On Tue, Mar 08, 2016 at 10:56:39AM -0800, Scott Long wrote:
>=20
>>=20
>>> On Mar 8, 2016, at 10:48 AM, Slawa Olhovchenkov <slw@zxy.spb.ru> =
wrote:
>>>=20
>>> On Tue, Mar 08, 2016 at 10:34:23AM -0800, Scott Long wrote:
>>>=20
>>>>=20
>>>>> On Mar 8, 2016, at 10:07 AM, Slawa Olhovchenkov <slw@zxy.spb.ru> =
wrote:
>>>>>=20
>>>>> On Mon, Mar 07, 2016 at 02:10:12PM +0300, Slawa Olhovchenkov =
wrote:
>>>>>=20
>>>>>>>>>> This allocated one for all controllers, or allocated for =
every controller?
>>>>>>>>>=20
>>>>>>>>> It=E2=80=99s per-controller.
>>>>>>>>>=20
>>>>>>>>> I=E2=80=99ve thought about making the tuning be dynamic at =
runtime.  I
>>>>>>>>> implemented similar dynamic tuning for other drivers, but it =
seemed
>>>>>>>>> overly complex for low benefit.  Implementing it for this =
driver
>>>>>>>>> would be possible but require some significant code changes.
>>>>>>>>=20
>>>>>>>> What cause of chain_free+io_cmds_active << max_chains?
>>>>>>>> One cmd can use many chains?
>>>>>>>=20
>>>>>>> Yes.  A request uses and active command, and depending on the =
size of the I/O,
>>>>>>> it might use several chain frames.
>>>>>=20
>>>>> I am play with max_chains and like significant cost of handling
>>>>> max_chains: with 8192 system resonded badly vs 2048. Now try 3192,
>>>>> response like with 2048.
>>>>=20
>>>> Hi, I=E2=80=99m not sure I understand what you=E2=80=99re saying.  =
You said that you tried 8192, but the system still complained of being =
out of chain frames?  Now you are trying fewer, only 3192?
>>>=20
>>> With 8192 system not complained of being out of chain frames, but =
like
>>> need more CPU power to handle this chain list -- traffic graf (this
>>> host servered HTTP by nginx) have many "jerking", with 3192 traffic
>>> graf is more smooth.
>>=20
>> Hi,
>>=20
>> The CPU overhead of doing more chain frames is nil.  They are just
>> objects in a list, and processing the list is O(1), not O(n).  What
>> you are likely seeing is other problems with VM and VFS-BIO system
>> struggling to deal with the amount of I/O that you are doing.
>> Depending on what kind I/O you are doing (buffered filesystem
>> reads/writes, memory mapped I/O, unbuffered I/O) there are limits
>> and high/low water marks on how much I/O can be outstanding, and
>> when the limits are reached processes are put to sleep and then race
>> back in when they are woken up.  This causes poor, oscillating
>> system behavior.  There=E2=80=99s some tuning you can do to increase =
the
>> limits, but yes, it=E2=80=99s a problem that behaves poorly in an =
untuned
>> system.
>=20
> Sorry, I am don't understund you point: how to large unused chain
> frames can consume CPU power?

A =E2=80=98chain frame=E2=80=99 is 128 bytes.  By jumping from 2048 to =
8192 chain frames allocated, you=E2=80=99ve jumped from 256KB to 1MB of =
allocated memory.  This sounds like a lot, but if you=E2=80=99re doing =
enough I/O to saturate the tunings then you likely have many GB of RAM.  =
The 1MB of memory consumed is going to be well less than 1% of you have, =
and likely .1 to .01%.  So it=E2=80=99s likely that the VM is not having =
to work much harder to deal with the missing memory.  In dealing with =
the chain frames themselves, they are stored on a linked list, and that =
list is never walked from head to tail.  The driver adds to head and =
subtracts from the head, so there is no cost for the length of the list.

For comparison, we use 4 =E2=80=98mps=E2=80=99 controllers in our =
servers at Netflix, and run 20Gbps (2.5GB/s) through them.  We=E2=80=99ve =
done extensive profiling and tuning of the kernel, and we=E2=80=99ve =
never measured a change in cost for having different chain frame =
lengths, other than the difficulties that come from having too few.  The =
problems exist in the VM and VFS-BIO interfaces being poorly tuned for =
modern workloads.

Scott




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?BA751E70-A3A8-497B-9914-3AD122284311>