Date: Thu, 20 Jun 2013 08:33:42 -0700 (PDT) From: Barney Cordoba <barney_cordoba@yahoo.com> To: Eugene Grosbein <eugen@grosbein.net>, Andre Oppermann <andre@freebsd.org> Cc: "freebsd-net@freebsd.org" <freebsd-net@freebsd.org>, LarsEggert <lars@netapp.com>, Jack Vogel <jfvogel@gmail.com> Subject: Re: hw.igb.num_queues default Message-ID: <1371742422.50315.YahooMailBasic@web121606.mail.ne1.yahoo.com> In-Reply-To: <51C311D6.5090801@freebsd.org>
next in thread | previous in thread | raw e-mail | index | archive | help
--- On Thu, 6/20/13, Andre Oppermann wrote: > From: Andre Oppermann=20 > Subject: Re: hw.igb.num_queues default > To: "Eugene Grosbein"=20 > Cc: "freebsd-net@freebsd.org" , "Eggert, Lars" , &= quot;Jack Vogel"=20 > Date: Thursday, June 20, 2013, 10:29 AM > On 20.06.2013 15:37, Eugene Grosbein > wrote: > > On 20.06.2013 17:34, Eggert, Lars wrote: > > > >> real memory=A0 =3D 8589934592 (8192 MB) > >> avail memory =3D 8239513600 (7857 MB) > > > >> By default, the igb driver seems to set up one > queue per detected CPU. Googling around, people seemed to > suggest that limiting the number of queues makes things work > better. I can confirm that setting hw.igb.num_queues=3D2 seems > to have fixed the issue. (Two was the first value I tried, > maybe other values other than 0 would work, too.) > >> > >> In order to uphold POLA, should the igb driver > maybe default to a conservative value for hw.igb.num_queues > that may not deliver optimal performance, but at least works > out of the box? > > > > Or, better, make nmbclusters auto-tuning smarter, if > any. > > I mean, use more nmbclusters for machines with large > amounts of memory. >=20 > That has already been done in HEAD. >=20 > The other problem is the pre-filling of the large rings for > all queues > stranding large amounts of mbuf clusters.=A0 OpenBSD > starts with a small > number of filled mbufs in the RX ring and then dynamically > adjusts the > number upwards if there is enough traffic to maintain deep > buffers.=A0 I > don't know if it always quickly scales in practice though. You're probably not running with 512MB these days, so pre-filling isn&#= 39;t much of an issue. 4 queues is only 8MB of ram with 1024 descriptors per queue, and 4MB with 5= 12. Think about the # of queues issue. In order to have acceptable latency, you= need to do 6k-10k=20 interrupts per second per queue. So with 4 queues you have to process 40K i= nts/second and with 2 you only process 20k. For a gig link 2 queues is much more effi= cient. "Spreading" for the sake of spreading is more about Intel marketi= ng than it is about practical computing. BC BC
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1371742422.50315.YahooMailBasic>