Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 8 Apr 2010 16:07:50 -0700
From:      Pyun YongHyeon <pyunyh@gmail.com>
To:        Jack Vogel <jfvogel@gmail.com>
Cc:        freebsd-stable@freebsd.org
Subject:   Re: em driver regression
Message-ID:  <20100408230750.GR5734@michelle.cdnetworks.com>
In-Reply-To: <n2q2a41acea1004081406o969e7acbla0cef35f65f14f0a@mail.gmail.com>
References:  <201004081313.o38DD4JM041821@lava.sentex.ca> <7.1.0.9.0.20100408091756.10652be0@sentex.net> <201004081446.o38EkU7h042296@lava.sentex.ca> <20100408181741.GI5734@michelle.cdnetworks.com> <201004081831.o38IVR3s043434@lava.sentex.ca> <20100408205626.GN5734@michelle.cdnetworks.com> <201004082105.o38L5DCH044187@lava.sentex.ca> <n2q2a41acea1004081406o969e7acbla0cef35f65f14f0a@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help

--O3RTKUHj+75w1tg5
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline

On Thu, Apr 08, 2010 at 02:06:09PM -0700, Jack Vogel wrote:
> Only one device support by em does multiqueue right now, and that is
> Hartwell, 82574.
> 

Thanks for the info.

Mike, here is updated patch. Now UDP bulk TX transfer performance
recovered a lot(about 890Mbps) but it still shows bad numbers
compared to other controllers. For example, bce(4) shows about
958Mbps for the same load.
During the testing I found a strong indication of packet reordering
issue of drbr interface. If I forcibly change to use single TX
queue, em(4) got 950Mbps as it used to be.

Jack, as we talked about possible drbr issue with igb(4), UDP
transfer seems to suffer from packet reordering issue here. Can we
make em(4)/igb(4) use single TX queue until we solve drbr interface
issue? Given that only one em(4) controller supports multiqueue,
dropping multiqueue support for em(4) does not look bad to me.

> Jack
> 
> 
> On Thu, Apr 8, 2010 at 2:05 PM, Mike Tancsa <mike@sentex.net> wrote:
> 
> > At 04:56 PM 4/8/2010, Pyun YongHyeon wrote:
> >
> >> On Thu, Apr 08, 2010 at 02:31:18PM -0400, Mike Tancsa wrote:
> >> > At 02:17 PM 4/8/2010, Pyun YongHyeon wrote:
> >> >
> >> > >Try this patch. It should fix the issue. It seems Jack forgot to
> >> > >strip CRC bytes as old em(4) didn't strip it, probably to
> >> > >workaround silicon bug of old em(4) controllers.
> >> >
> >> > Thanks! The attached patch does indeed fix the dhclient issue.
> >> >
> >> >
> >> > >It seems there are also TX issues here. The system load is too high
> >> > >and sometimes system is not responsive while TX is in progress.
> >> > >Because I initiated TCP bulk transfers, TSO should reduce CPU load
> >> > >a lot but it didn't so I guess it could also be related watchdog
> >> > >timeouts you've seen. I'll see what can be done.
> >> >
> >> > Thanks for looking into that as well!!
> >> >
> >> >         ---Mike
> >> >
> >>
> >> Mike,
> >>
> >> Here is patch I'm working on. This patch fixes high system load and
> >> system is very responsive as before. But it seems there is still
> >> some TX issue here. Bulk UDP performance is very poor(< 700Mbps)
> >> and I have no idea what caused this at this moment.
> >>
> >> BTW, I have trouble to reproduce watchdog timeouts. I'm not sure
> >> whether latest fix from Jack cured it. By chance does your
> >> controller support multi TX/RX queues? You can check whether em(4)
> >> uses multi-queues with "vmstat -i". If em(4) use multi-queue you
> >> may have multiple irq output for em0.
> >>
> >
> > Hi,
> >        I will give it a try later tonight!  This one does not seem to.
> >
> > 0(ich10)# vmstat -i
> > interrupt                          total       rate
> > irq16: uhci0+                         30          0
> > irq18: ehci0 uhci5                158419         17
> > irq19: fwohci0++                      86          0
> > irq21: uhci1                          17          0
> > irq23: uhci3 ehci1                     2          0
> > cpu0: timer                     18570305       1994
> > irq256: igb0                          80          0
> > irq257: igb0                         255          0
> > irq258: igb0                          66          0
> > irq259: igb0                          32          0
> > irq260: igb0                           2          0
> > irq261: igb1                        2679          0
> > irq262: igb1                         998          0
> > irq263: igb1                        2468          0
> > irq264: igb1                        6361          0
> > irq265: igb1                           2          0
> > irq266: em0                        33910          3
> > irq267: ahci1                      15317          1
> > cpu1: timer                     18557074       1993
> > cpu3: timer                     18557168       1993
> > cpu2: timer                     18557108       1993
> > Total                           74462379       7998
> > 0(ich10)#
> >

--O3RTKUHj+75w1tg5
Content-Type: text/plain; charset=us-ascii
Content-Disposition: attachment; filename="em.load.patch2"

Index: sys/dev/e1000/if_em.c
===================================================================
--- sys/dev/e1000/if_em.c	(revision 206403)
+++ sys/dev/e1000/if_em.c	(working copy)
@@ -812,6 +812,10 @@
 		return (err);
 	}
 
+        /* Call cleanup if number of TX descriptors low */
+	if (txr->tx_avail <= EM_TX_CLEANUP_THRESHOLD)
+		em_txeof(txr);
+
 	enq = 0;
 	if (m == NULL) {
 		next = drbr_dequeue(ifp, txr->br);
@@ -834,11 +838,16 @@
 		ETHER_BPF_MTAP(ifp, next);
 		if ((ifp->if_drv_flags & IFF_DRV_RUNNING) == 0)
                         break;
+        	if (txr->tx_avail < EM_MAX_SCATTER) {
+			ifp->if_drv_flags |= IFF_DRV_OACTIVE;
+			break;
+		}
 		next = drbr_dequeue(ifp, txr->br);
 	}
 
 	if (enq > 0) {
                 /* Set the watchdog */
+		txr->watchdog_time = ticks;
                 txr->watchdog_check = TRUE;
 	}
 	return (err);
@@ -864,8 +873,7 @@
 	txr = &adapter->tx_rings[i];
 
 	if (EM_TX_TRYLOCK(txr)) {
-		if (ifp->if_drv_flags & IFF_DRV_RUNNING)
-			error = em_mq_start_locked(ifp, txr, m);
+		error = em_mq_start_locked(ifp, txr, m);
 		EM_TX_UNLOCK(txr);
 	} else 
 		error = drbr_enqueue(ifp, txr->br, m);
@@ -909,8 +917,15 @@
 	if (!adapter->link_active)
 		return;
 
+        /* Call cleanup if number of TX descriptors low */
+	if (txr->tx_avail <= EM_TX_CLEANUP_THRESHOLD)
+		em_txeof(txr);
+
 	while (!IFQ_DRV_IS_EMPTY(&ifp->if_snd)) {
-
+        	if (txr->tx_avail < EM_MAX_SCATTER) {
+			ifp->if_drv_flags |= IFF_DRV_OACTIVE;
+			break;
+		}
                 IFQ_DRV_DEQUEUE(&ifp->if_snd, m_head);
 		if (m_head == NULL)
 			break;
@@ -930,7 +945,8 @@
 		ETHER_BPF_MTAP(ifp, m_head);
 
 		/* Set timeout in case hardware has problems transmitting. */
-		txr->watchdog_check = TRUE;
+		txr->watchdog_time = ticks;
+                txr->watchdog_check = TRUE;
 	}
 
 	return;
@@ -1427,17 +1443,12 @@
 	struct ifnet	*ifp = adapter->ifp;
 	struct tx_ring	*txr = adapter->tx_rings;
 	struct rx_ring	*rxr = adapter->rx_rings;
-	u32		loop = EM_MAX_LOOP;
-	bool		more_rx, more_tx;
+	bool		more_rx;
 
-
 	if (ifp->if_drv_flags & IFF_DRV_RUNNING) {
+		more_rx = em_rxeof(rxr, adapter->rx_process_limit);
 		EM_TX_LOCK(txr);
-		do {
-			more_rx = em_rxeof(rxr, adapter->rx_process_limit);
-			more_tx = em_txeof(txr);
-		} while (loop-- && (more_rx || more_tx));
-
+		em_txeof(txr);
 #if __FreeBSD_version >= 800000
 		if (!drbr_empty(ifp, txr->br))
 			em_mq_start_locked(ifp, txr, NULL);
@@ -1445,10 +1456,9 @@
 		if (!IFQ_DRV_IS_EMPTY(&ifp->if_snd))
 			em_start_locked(ifp, txr);
 #endif
-		if (more_rx || more_tx)
+		EM_TX_UNLOCK(txr);
+		if (more_rx)
 			taskqueue_enqueue(adapter->tq, &adapter->que_task);
-
-		EM_TX_UNLOCK(txr);
 	}
 
 	em_enable_intr(adapter);
@@ -1466,18 +1476,13 @@
 {
 	struct tx_ring *txr = arg;
 	struct adapter *adapter = txr->adapter;
-	bool		more;
 
 	++txr->tx_irq;
 	EM_TX_LOCK(txr);
-	more = em_txeof(txr);
+	em_txeof(txr);
 	EM_TX_UNLOCK(txr);
-	if (more)
-		taskqueue_enqueue(txr->tq, &txr->tx_task);
-	else
-		/* Reenable this interrupt */
-		E1000_WRITE_REG(&adapter->hw, E1000_IMS, txr->ims);
-	return;
+	/* Reenable this interrupt */
+	E1000_WRITE_REG(&adapter->hw, E1000_IMS, txr->ims);
 }
 
 /*********************************************************************
@@ -1531,14 +1536,15 @@
 {
 	struct rx_ring	*rxr = context;
 	struct adapter	*adapter = rxr->adapter;
-	u32		loop = EM_MAX_LOOP;
         bool            more;
 
-        do {
-		more = em_rxeof(rxr, adapter->rx_process_limit);
-        } while (loop-- && more);
-        /* Reenable this interrupt */
-	E1000_WRITE_REG(&adapter->hw, E1000_IMS, rxr->ims);
+	more = em_rxeof(rxr, adapter->rx_process_limit);
+	if (more)
+		taskqueue_enqueue(rxr->tq, &rxr->rx_task);
+	else {
+		/* Reenable this interrupt */
+		E1000_WRITE_REG(&adapter->hw, E1000_IMS, rxr->ims);
+	}
 }
 
 static void
@@ -1547,15 +1553,10 @@
 	struct tx_ring	*txr = context;
 	struct adapter	*adapter = txr->adapter;
 	struct ifnet	*ifp = adapter->ifp;
-	u32		loop = EM_MAX_LOOP;
-        bool            more;
 
 	if (!EM_TX_TRYLOCK(txr))
 		return;
-	do {
-		more = em_txeof(txr);
-	} while (loop-- && more);
-
+	em_txeof(txr);
 #if __FreeBSD_version >= 800000
 	if (!drbr_empty(ifp, txr->br))
 		em_mq_start_locked(ifp, txr, NULL);
@@ -1912,12 +1913,7 @@
 	bus_dmamap_sync(txr->txdma.dma_tag, txr->txdma.dma_map,
 	    BUS_DMASYNC_PREREAD | BUS_DMASYNC_PREWRITE);
 	E1000_WRITE_REG(&adapter->hw, E1000_TDT(txr->me), i);
-	txr->watchdog_time = ticks;
 
-        /* Call cleanup if number of TX descriptors low */
-	if (txr->tx_avail <= EM_TX_CLEANUP_THRESHOLD)
-		em_txeof(txr);
-
 	return (0);
 }
 
@@ -3619,7 +3615,8 @@
 		if (last != -1) {
         		eop_desc = &txr->tx_base[last];
 			/* Get new done point */
-			if (++last == adapter->num_tx_desc) last = 0;
+			if (++last == adapter->num_tx_desc)
+				last = 0;
 			done = last;
 		} else
 			break;
@@ -4078,7 +4075,7 @@
 em_rxeof(struct rx_ring *rxr, int count)
 {
 	struct adapter		*adapter = rxr->adapter;
-	struct ifnet		*ifp = adapter->ifp;;
+	struct ifnet		*ifp = adapter->ifp;
 	struct mbuf		*mp, *sendmp;
 	u8			status;
 	u16 			len;
@@ -4088,6 +4085,7 @@
 
 	EM_RX_LOCK(rxr);
 
+	status = 0;
 	for (i = rxr->next_to_check, processed = 0; count != 0;) {
 
 		if ((ifp->if_drv_flags & IFF_DRV_RUNNING) == 0)
@@ -4195,7 +4193,11 @@
 	rxr->next_to_check = i;
 
 	EM_RX_UNLOCK(rxr);
+#ifdef DEVICE_POLLING
 	return (rxdone);
+#else
+	return ((status & E1000_RXD_STAT_DD) ? TRUE : FALSE);
+#endif
 }
 
 #ifndef __NO_STRICT_ALIGNMENT
Index: sys/dev/e1000/if_em.h
===================================================================
--- sys/dev/e1000/if_em.h	(revision 206403)
+++ sys/dev/e1000/if_em.h	(working copy)
@@ -223,7 +223,7 @@
 #define HW_DEBUGOUT1(S, A)          if (DEBUG_HW) printf(S "\n", A)
 #define HW_DEBUGOUT2(S, A, B)       if (DEBUG_HW) printf(S "\n", A, B)
 
-#define EM_MAX_SCATTER		64
+#define EM_MAX_SCATTER		32
 #define EM_VFTA_SIZE		128
 #define EM_TSO_SIZE		(65535 + sizeof(struct ether_vlan_header))
 #define EM_TSO_SEG_SIZE		4096	/* Max dma segment size */

--O3RTKUHj+75w1tg5--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20100408230750.GR5734>