Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 24 Aug 2015 08:25:11 -0400 (EDT)
From:      Rick Macklem <rmacklem@uoguelph.ca>
To:        Daniel Braniss <danny@cs.huji.ac.il>
Cc:        Hans Petter Selasky <hps@selasky.org>, pyunyh@gmail.com,  FreeBSD Net <freebsd-net@freebsd.org>,  FreeBSD stable <freebsd-stable@freebsd.org>,  Gleb Smirnoff <glebius@FreeBSD.org>
Subject:   Re: ix(intel) vs mlxen(mellanox) 10Gb performance
Message-ID:  <2112273205.29795512.1440419111720.JavaMail.zimbra@uoguelph.ca>
In-Reply-To: <62C7B1A3-CC6B-41A1-B254-6399F19F8FF7@cs.huji.ac.il>
References:  <1D52028A-B39F-4F9B-BD38-CB1D73BF5D56@cs.huji.ac.il> <1153838447.28656490.1440193567940.JavaMail.zimbra@uoguelph.ca> <15D19823-08F7-4E55-BBD0-CE230F67D26E@cs.huji.ac.il> <818666007.28930310.1440244756872.JavaMail.zimbra@uoguelph.ca> <49173B1F-7B5E-4D59-8651-63D97B0CB5AC@cs.huji.ac.il> <1815942485.29539597.1440370972998.JavaMail.zimbra@uoguelph.ca> <55DAC623.60006@selasky.org> <62C7B1A3-CC6B-41A1-B254-6399F19F8FF7@cs.huji.ac.il>

next in thread | previous in thread | raw e-mail | index | archive | help
Daniel Braniss wrote:
>=20
> > On 24 Aug 2015, at 10:22, Hans Petter Selasky <hps@selasky.org> wrote:
> >=20
> > On 08/24/15 01:02, Rick Macklem wrote:
> >> The other thing is the degradation seems to cut the rate by about half
> >> each time.
> >> 300-->150-->70 I have no idea if this helps to explain it.
> >=20
> > Might be a NUMA binding issue for the processes involved.
> >=20
> > man cpuset
> >=20
> > --HPS
>=20
> I can=E2=80=99t see how this is relevant, given that the same host, using=
 the
> mellanox/mlxen
> behave much better.
Well, the "ix" driver has a bunch of tunables for things like "number of qu=
eues"
and although I'll admit I don't understand how these queues are used, I thi=
nk
they are related to CPUs and their caches. There is also something called I=
XGBE_FDIR,
which others have recommended be disabled. (The code is #ifdef IXGBE_FDIR, =
but I don't
know if it defined for your kernel?) There are also tunables for interrupt =
rate and
something called hw.ixgbe_tx_process_limit, which appears to limit the numb=
er of packets
to send or something like that?
(I suspect Hans would understand this stuff much better than I do, since I =
don't understand
 it at all.;-)

At a glance, the mellanox  driver looks very different.

> I=E2=80=99m getting different results with the intel/ix depending who is =
the nfs
> server
>=20
Who knows until you figure out what is actually going on. It could just be =
the timing of
handling the write RPCs or when the different servers send acks for the TCP=
 segments or ...
that causes this for one server and not another.

One of the principals used when investigating airplane accidents is to "nev=
er assume anything"
and just try to collect the facts until the pieces of the puzzle fall in pl=
ace. I think the
same principal works for this kind of stuff.
I once had a case where a specific read of one NFS file would fail on certa=
in machines.
I won't bore you with the details, but after weeks we got to the point wher=
e we had a lab
of identical machines (exactly the same hardware and exactly the same softw=
are loaded on them)
and we could reproduce this problem on about half the machines and not the =
other half. We
(myself and the guy I worked with) finally noticed the failing machines wer=
e on network ports
for a given switch. We moved the net cables to another switch and the probl=
em went away.
--> This particular network switch was broken in such a way that it would g=
arble one specific
    packet consistently, but worked fine for everything else.
My point here is that, if someone had suggested the "network switch might b=
e broken" at the
beginning of investigating this, I would have probably dismissed it, based =
on "the network is
working just fine", but in the end, that was the problem.
--> I am not suggesting you have a broken network switch, just "don't take =
anything off the
    table until you know what is actually going on".

And to be honest, you may never know, but it is fun to try and solve these =
puzzles.
Beyond what I already suggested, I'd look at the "ix" driver's stats and tu=
nables and
see if any of the tunables has an effect. (And, yes, it will take time to w=
ork through these.)

Good luck with it, rick

>=20
> danny
>=20
> _______________________________________________
> freebsd-stable@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org"



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?2112273205.29795512.1440419111720.JavaMail.zimbra>