From owner-freebsd-hackers@FreeBSD.ORG  Wed Jul 26 11:28:44 2006
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
X-Original-To: freebsd-hackers@freebsd.org
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 3290D16A4DA
	for <freebsd-hackers@freebsd.org>; Wed, 26 Jul 2006 11:28:44 +0000 (UTC)
	(envelope-from murat@enderunix.org)
Received: from istanbul.enderunix.org (freefall.marmara.edu.tr
	[193.140.143.23])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 4ECB643D49
	for <freebsd-hackers@freebsd.org>; Wed, 26 Jul 2006 11:28:43 +0000 (GMT)
	(envelope-from murat@enderunix.org)
Received: (qmail 22416 invoked by uid 1002); 26 Jul 2006 11:16:57 -0000
Date: Wed, 26 Jul 2006 14:16:57 +0300
From: Murat Balaban <murat@enderunix.org>
To: freebsd-hackers@freebsd.org
Message-ID: <20060726111657.GA22358@enderunix.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Subject: sys/dev/em/if_em.c
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
	<freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
	<mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
	<mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 26 Jul 2006 11:28:44 -0000


Hello hackers,

I have a special-purpose setting where I have a ng_hub like kernel module (ng_lb)
which I've been coding. The box I'm using has two em(4) adapters, and I've
hooked em0's lower with my ng_lb's link0, and em1's lower with ng_lb's link1.

Situation looks like this:

        lower            link0                 link1              lower
em0 ---------------> -------------> ng_lb --------------> ---------------> em1

Every packet that is received by em0 is handed over to my netgraph module and
after very little modification in the packet ethernet header (changing destination
mac addresss) I NG_FWD_ITEM() the packet to em1.

I'm generating traffic with a packet generator, and em0 seems to be ok with around
910 Mbit/s traffic.

However if I write the packets into em1, em1 seems to drop 40-60 Mbit/s (of 910 Mbit/s)
data. I digged the problem a bit, and found out that, IFQ_HANDOFF, called deep inside
from NG_FWD_ITEM was returning ENOBUFS.

A little more investigation proved me that the source of ENOBUFS error was that
the em1 was running out of Tx descriptors. The relative logic in dev/em/if_em.c
(em_encap) was that if # of Tx descriptors falls below a threshold, the driver
tries to clean transmit interrupts once. # of available Tx desc. is again checked
and if the number is still not incresed ENOBUFS error is returned.

What I'd like to ask is, instead of cleaning the transmit interrupts only once,
why not do it many times till the number of available tx descriptors increases 
to a moderate level?

The following patch solved my problem, though I wanted to get your opinions about
it.

Cheers, 
Murat
http://www.enderunix.org/murat/

PS: Both cards are plugged into a 64-bit 66 Mhz PCI-X bus. I've polling enabled
in both interfaces, and HZ set to 10000.


--- if_em_murat.c       Wed Jul 26 13:59:22 2006
+++ if_em.c     Wed Jul 26 14:01:11 2006
@@ -1177,11 +1177,9 @@
          * available hits the threshold
          */
         if (adapter->num_tx_desc_avail <= EM_TX_CLEANUP_THRESHOLD) {
-                em_clean_transmit_interrupts(adapter);
-                if (adapter->num_tx_desc_avail <= EM_TX_CLEANUP_THRESHOLD) {
-                        adapter->no_tx_desc_avail1++;
-                        return(ENOBUFS);
-                }
+               do {
+                       em_clean_transmit_interrupts(adapter);
+               while (adapter->num_tx_desc_avail <= EM_TX_CLEANUP_THRESHOLD);
         }

         /*