FreeBSD Mail Archives

Date:      Wed, 26 Aug 2015 08:56:27 +0300
From:      Daniel Braniss <danny@cs.huji.ac.il>
To:        Rick Macklem <rmacklem@uoguelph.ca>
Cc:        Hans Petter Selasky <hps@selasky.org>, pyunyh@gmail.com, FreeBSD Net <freebsd-net@freebsd.org>, FreeBSD stable <freebsd-stable@freebsd.org>, Gleb Smirnoff <glebius@FreeBSD.org>
Subject:   Re: ix(intel) vs mlxen(mellanox) 10Gb performance
Message-ID:  <1E679659-BA50-42C3-B569-03579E322685@cs.huji.ac.il>
In-Reply-To: <2112273205.29795512.1440419111720.JavaMail.zimbra@uoguelph.ca>
References:  <1D52028A-B39F-4F9B-BD38-CB1D73BF5D56@cs.huji.ac.il> <1153838447.28656490.1440193567940.JavaMail.zimbra@uoguelph.ca> <15D19823-08F7-4E55-BBD0-CE230F67D26E@cs.huji.ac.il> <818666007.28930310.1440244756872.JavaMail.zimbra@uoguelph.ca> <49173B1F-7B5E-4D59-8651-63D97B0CB5AC@cs.huji.ac.il> <1815942485.29539597.1440370972998.JavaMail.zimbra@uoguelph.ca> <55DAC623.60006@selasky.org> <62C7B1A3-CC6B-41A1-B254-6399F19F8FF7@cs.huji.ac.il> <2112273205.29795512.1440419111720.JavaMail.zimbra@uoguelph.ca>


> On Aug 24, 2015, at 3:25 PM, Rick Macklem <rmacklem@uoguelph.ca> =
wrote:
>=20
> Daniel Braniss wrote:
>>=20
>>> On 24 Aug 2015, at 10:22, Hans Petter Selasky <hps@selasky.org> =
wrote:
>>>=20
>>> On 08/24/15 01:02, Rick Macklem wrote:
>>>> The other thing is the degradation seems to cut the rate by about =
half
>>>> each time.
>>>> 300-->150-->70 I have no idea if this helps to explain it.
>>>=20
>>> Might be a NUMA binding issue for the processes involved.
>>>=20
>>> man cpuset
>>>=20
>>> --HPS
>>=20
>> I can=E2=80=99t see how this is relevant, given that the same host, =
using the
>> mellanox/mlxen
>> behave much better.
> Well, the "ix" driver has a bunch of tunables for things like "number =
of queues"
> and although I'll admit I don't understand how these queues are used, =
I think
> they are related to CPUs and their caches. There is also something =
called IXGBE_FDIR,
> which others have recommended be disabled. (The code is #ifdef =
IXGBE_FDIR, but I don't
> know if it defined for your kernel?) There are also tunables for =
interrupt rate and
> something called hw.ixgbe_tx_process_limit, which appears to limit the =
number of packets
> to send or something like that?
> (I suspect Hans would understand this stuff much better than I do, =
since I don't understand
> it at all.;-)
>=20
but how does this explain the fact that, at the same time,
the throughput to the NetApp is about 70MG/s while to
a FreeBSD it=E2=80=99s above 150MB/s? (window size negotiation?)
switching off TSO evens out this diff.

> At a glance, the mellanox  driver looks very different.
>=20
>> I=E2=80=99m getting different results with the intel/ix depending who =
is the nfs
>> server
>>=20
> Who knows until you figure out what is actually going on. It could =
just be the timing of
> handling the write RPCs or when the different servers send acks for =
the TCP segments or ...
> that causes this for one server and not another.
>=20
> One of the principals used when investigating airplane accidents is to =
"never assume anything"
> and just try to collect the facts until the pieces of the puzzle fall =
in place. I think the
> same principal works for this kind of stuff.
> I once had a case where a specific read of one NFS file would fail on =
certain machines.
> I won't bore you with the details, but after weeks we got to the point =
where we had a lab
> of identical machines (exactly the same hardware and exactly the same =
software loaded on them)
> and we could reproduce this problem on about half the machines and not =
the other half. We
> (myself and the guy I worked with) finally noticed the failing =
machines were on network ports
> for a given switch. We moved the net cables to another switch and the =
problem went away.
> --> This particular network switch was broken in such a way that it =
would garble one specific
>    packet consistently, but worked fine for everything else.
> My point here is that, if someone had suggested the "network switch =
might be broken" at the
> beginning of investigating this, I would have probably dismissed it, =
based on "the network is
> working just fine", but in the end, that was the problem.
> --> I am not suggesting you have a broken network switch, just "don't =
take anything off the
>    table until you know what is actually going on".
>=20
> And to be honest, you may never know, but it is fun to try and solve =
these puzzles.

one needs to find the clues =E2=80=A6
at the moment:
	when things go bad, they stay bad
		ix/nfs/tcp/tso and NetApp
	when things are ok, the numbers fluctuate, which is probably due =
to loads
	on the system, but they are far above the 70MB/s (100 to 200)

> Beyond what I already suggested, I'd look at the "ix" driver's stats =
and tunables and
> see if any of the tunables has an effect. (And, yes, it will take time =
to work through these.)
>=20



> Good luck with it, rick
>=20
>>=20
>> danny
>>=20
>> _______________________________________________
>> freebsd-stable@freebsd.org <mailto:freebsd-stable@freebsd.org> =
mailing list
>> https://lists.freebsd.org/mailman/listinfo/freebsd-stable =
<https://lists.freebsd.org/mailman/listinfo/freebsd-stable>;
>> To unsubscribe, send any mail to =
"freebsd-stable-unsubscribe@freebsd.org =
<mailto:freebsd-stable-unsubscribe@freebsd.org>"

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1E679659-BA50-42C3-B569-03579E322685>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation