Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 2 Mar 2013 06:24:54 -0800 (PST)
From:      Barney Cordoba <barney_cordoba@yahoo.com>
To:        "Christopher D. Harrison" <harrison@biostat.wisc.edu>
Cc:        freebsd-net@freebsd.org
Subject:   Re: igb network lockups
Message-ID:  <1362234294.77730.YahooMailClassic@web121603.mail.ne1.yahoo.com>
In-Reply-To: <512BAF8D.7080308@biostat.wisc.edu>

next in thread | previous in thread | raw e-mail | index | archive | help


--- On Mon, 2/25/13, Christopher D. Harrison <harrison@biostat.wisc.edu> wr=
ote:

> From: Christopher D. Harrison <harrison@biostat.wisc.edu>
> Subject: Re: igb network lockups
> To: "Jack Vogel" <jfvogel@gmail.com>
> Cc: freebsd-net@freebsd.org
> Date: Monday, February 25, 2013, 1:38 PM
> Sure,
> The problem appears on both systems running with ALTQ and
> vanilla.
> =A0 =A0=A0=A0-C
> On 02/25/13 12:29, Jack Vogel wrote:
> > I've not heard of this problem, but I think most users
> do not use=20
> > ALTQ, and we (Intel) do not
> > test using it. Can it be eliminated from the equation?
> >
> > Jack
> >
> >
> > On Mon, Feb 25, 2013 at 10:16 AM, Christopher D.
> Harrison=20
> > <harrison@biostat.wisc.edu
> <mailto:harrison@biostat.wisc.edu>>
> wrote:
> >
> >=A0 =A0=A0=A0I recently have been
> experiencing network "freezes" and network
> >=A0 =A0=A0=A0"lockups" on our Freebsd 9.1
> systems which are running zfs and nfs
> >=A0 =A0=A0=A0file servers.
> >=A0 =A0=A0=A0I upgraded from 9.0 to 9.1
> about 2 months ago and we have been
> >=A0 =A0=A0=A0having issues with almost
> bi-monthly.=A0=A0=A0The issue manifests in the
> >=A0 =A0=A0=A0system becomes unresponsive to
> any/all nfs clients.=A0=A0=A0The system
> >=A0 =A0=A0=A0is not resource bound as our
> I/O is low to disk and our network is
> >=A0 =A0=A0=A0usually in the 20mbit/40mbit
> range.=A0=A0=A0We do notice a correlation
> >=A0 =A0=A0=A0between temporary i/o spikes
> and network freezes but not enough to
> >=A0 =A0=A0=A0send our system in to "lockup"
> mode for the next 5min.=A0=A0=A0Currently
> >=A0 =A0=A0=A0we have 4 igb nics in 2 aggr's
> with 8 queue's per nic and our
> >=A0 =A0=A0=A0dev.igb reports:
> >
> >=A0 =A0=A0=A0dev.igb.3.%desc: Intel(R)
> PRO/1000 Network Connection version - 2.3.4
> >
> >=A0 =A0=A0=A0I am almost certain the problem
> is with the ibg driver as a friend
> >=A0 =A0=A0=A0is also experiencing the same
> problem with the same intel igb nic.
> >=A0 =A0 =A0=A0=A0He has addressed the
> issue by restarting the network using netif
> >=A0 =A0=A0=A0on his
> systems.=A0=A0=A0According to my friend, once the
> network
> >=A0 =A0=A0=A0interfaces get cleared,
> everything comes back and starts working
> >=A0 =A0=A0=A0as expected.
> >
> >=A0 =A0=A0=A0I have noticed an issue with
> the igb driver and I was looking for
> >=A0 =A0=A0=A0thoughts on how to help address
> this problem.
> >=A0 =A0=A0=A0http://freebsd.1045724.n5.nabble.com/em-igb-if-transmit-drb=
r-and-ALTQ-td5760338.html
> >
> >=A0 =A0=A0=A0Thoughts/Ideas are greatly
> appreciated!!!
> >
> >=A0 =A0 =A0 =A0=A0=A0-C

Do you have 32 cpus in the system? You've created a lock contention
nightmare; frankly Im surprised that the system runs at all.

Try running with 1 queue per nic. The point of using queues is to spread
the load; the fact that you're even using queues with such a minuscule load
is a commentary on the blind use of "features" without any explanation or
understanding of what they do.

Does igb still bind to CPUs without regard to whether its a real cpu or
a hyper thread? This needs to be removed.

I wish that someone who understood this stuff would have a beer with Jack
and explain to him why this design is defective. The "default" for this
driver is almost always the wrong configuration.

You don't need to spread the load with 40Mb/s throughput, and using
multiple queues will use a lot more CPU than using just 1. do you really
want 4 cpus using 10% instead of 1 using 14%?

You also should consider increasing your tx buffers; a property of=20
applications like ALTQ is that they tend to send out big bursts of=20
packets and they can overflow the rings. I'm not specifically familiar with
ALTQ so Im not sure how it handles such things; nor am I sure of how it
handles multiple tx queues, if at all.

BC



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1362234294.77730.YahooMailClassic>