Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 06 May 2014 20:06:29 +0400
From:      "Alexander V. Chernikov" <melifaro@FreeBSD.org>
To:        FreeBSD Net <net@freebsd.org>, hackers@freebsd.org
Cc:        jfv@FreeBSD.org, Adrian Chadd <adrian@freebsd.org>, wollman@freebsd.org, nparhar@gmail.com
Subject:   Use of contiguous physical memory in ixgbe driver
Message-ID:  <53690885.1010704@FreeBSD.org>

next in thread | raw e-mail | index | archive | help
This is a multi-part message in MIME format.
--------------060607030109030205060500
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit

Hello guys.
(bootstrapping people involved in previous version of this topic, sorry 
for that)

There were several problem descriptions/discussions on using 9k+ mbufs 
with current allocator in:
if_em: kern/183381
cxgbe: 
http://lists.freebsd.org/pipermail/freebsd-net/2014-February/037834.html

general one: 
http://lists.freebsd.org/pipermail/freebsd-net/2014-January/037673.html

I'd like to add ixgbe (and i40e with igb) to the list. We're facing the 
same problem for a long time.

As far as I can understand,
a) everyone (tm) is aware of current 9/16k allocation problems leading 
to sudden network failures.
b) such mbufs sizes are not absolute evil and can be useful on 40/100G 
and for TSO cases.
c) however, no one is able to / willing to fix our allocator to 
pre-allocate special arena for mbufs >= 4k page size.
d) so most people have written their own local hacks to disable 9k mbufs 
and use 4k ones.

e) our list is not full, people with 
mellanox/solarflare/broadcom/emulex/etc are still not there (and most if 
not all 10g NICs support scatter/gather).

Can we add more generic hack moving default mbuf size decision from NIC 
driver to OS and make it tunable for user?

Example path for Intel ones is attached.

--------------060607030109030205060500
Content-Type: text/x-patch;
 name="mbuf_sizes.diff"
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment;
 filename="mbuf_sizes.diff"

Index: sys/kern/kern_mbuf.c
===================================================================
--- sys/kern/kern_mbuf.c	(revision 265236)
+++ sys/kern/kern_mbuf.c	(working copy)
@@ -103,6 +103,11 @@ int nmbjumbop;			/* limits number of page size jum
 int nmbjumbo9;			/* limits number of 9k jumbo clusters */
 int nmbjumbo16;			/* limits number of 16k jumbo clusters */
 
+static int nojumbobuf;		/* Use MCLBYTES mbufs */
+static int nojumbo9buf;		/* Use either MCLBYTES or MJUMPAGESIZE */
+static int nojumbo16buf;	/* Use any mbuf size less than MJUM16BYTES */
+
+	
 static quad_t maxmbufmem;	/* overall real memory limit for all mbufs */
 
 SYSCTL_QUAD(_kern_ipc, OID_AUTO, maxmbufmem, CTLFLAG_RDTUN, &maxmbufmem, 0,
@@ -151,6 +156,17 @@ tunable_mbinit(void *dummy)
 	if (nmbufs < nmbclusters + nmbjumbop + nmbjumbo9 + nmbjumbo16)
 		nmbufs = lmax(maxmbufmem / MSIZE / 5,
 		    nmbclusters + nmbjumbop + nmbjumbo9 + nmbjumbo16);
+
+	/*
+	 * Defaults to disable 9/16-kbyte pages
+	 */
+	nojumbobuf = 0;
+	nojumbo9buf = 1;
+	nojumbo16buf = 1;
+	
+	TUNABLE_INT_FETCH("kern.ipc.nojumbobuf", &nojumbobuf);
+	TUNABLE_INT_FETCH("kern.ipc.nojumbo9buf", &nojumbo9buf);
+	TUNABLE_INT_FETCH("kern.ipc.nojumbo16buf", &nojumbo16buf);
 }
 SYSINIT(tunable_mbinit, SI_SUB_KMEM, SI_ORDER_MIDDLE, tunable_mbinit, NULL);
 
@@ -261,6 +277,27 @@ SYSCTL_PROC(_kern_ipc, OID_AUTO, nmbufs, CTLTYPE_I
     "Maximum number of mbufs allowed");
 
 /*
+ * Determine the correct mbuf pool
+ * for given mtu size
+ */
+int
+m_preferredsize(int mtu)
+{
+	int size;
+
+	if (mtu <= 2048 || nojumbobuf != 0)
+		size = MCLBYTES;
+	else if (mtu <= 4096 || nojumbo9buf != 0)
+		size = MJUMPAGESIZE;
+	else if (mtu <= 9216 || nojumbo16buf != 0)
+		size = MJUM9BYTES;
+	else
+		size = MJUM16BYTES;
+
+	return (size);
+}
+
+/*
  * Zones from which we allocate.
  */
 uma_zone_t	zone_mbuf;
Index: sys/dev/ixgbe/ixgbe.c
===================================================================
--- sys/dev/ixgbe/ixgbe.c	(revision 265236)
+++ sys/dev/ixgbe/ixgbe.c	(working copy)
@@ -1138,14 +1138,7 @@ ixgbe_init_locked(struct adapter *adapter)
 	** Determine the correct mbuf pool
 	** for doing jumbo frames
 	*/
-	if (adapter->max_frame_size <= 2048)
-		adapter->rx_mbuf_sz = MCLBYTES;
-	else if (adapter->max_frame_size <= 4096)
-		adapter->rx_mbuf_sz = MJUMPAGESIZE;
-	else if (adapter->max_frame_size <= 9216)
-		adapter->rx_mbuf_sz = MJUM9BYTES;
-	else
-		adapter->rx_mbuf_sz = MJUM16BYTES;
+	adapter->rx_mbuf_sz = m_preferredsize(adapter->max_frame_size);
 
 	/* Prepare receive descriptors and buffers */
 	if (ixgbe_setup_receive_structures(adapter)) {
Index: sys/dev/e1000/if_em.c
===================================================================
--- sys/dev/e1000/if_em.c	(revision 265236)
+++ sys/dev/e1000/if_em.c	(working copy)
@@ -1342,12 +1342,7 @@ em_init_locked(struct adapter *adapter)
 	** Figure out the desired mbuf
 	** pool for doing jumbos
 	*/
-	if (adapter->hw.mac.max_frame_size <= 2048)
-		adapter->rx_mbuf_sz = MCLBYTES;
-	else if (adapter->hw.mac.max_frame_size <= 4096)
-		adapter->rx_mbuf_sz = MJUMPAGESIZE;
-	else
-		adapter->rx_mbuf_sz = MJUM9BYTES;
+	adapter->rx_mbuf_sz = m_preferredsize(adapter->hw.mac.max_frame_size);
 
 	/* Prepare receive descriptors and buffers */
 	if (em_setup_receive_structures(adapter)) {
Index: sys/dev/e1000/if_igb.c
===================================================================
--- sys/dev/e1000/if_igb.c	(revision 265236)
+++ sys/dev/e1000/if_igb.c	(working copy)
@@ -1335,12 +1335,7 @@ igb_init_locked(struct adapter *adapter)
 	** Figure out the desired mbuf pool
 	** for doing jumbo/packetsplit
 	*/
-	if (adapter->max_frame_size <= 2048)
-		adapter->rx_mbuf_sz = MCLBYTES;
-	else if (adapter->max_frame_size <= 4096)
-		adapter->rx_mbuf_sz = MJUMPAGESIZE;
-	else
-		adapter->rx_mbuf_sz = MJUM9BYTES;
+	adapter->rx_mbuf_sz = m_preferredsize(adapter->max_frame_size);
 
 	/* Prepare receive descriptors and buffers */
 	if (igb_setup_receive_structures(adapter)) {

--------------060607030109030205060500--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?53690885.1010704>