From owner-freebsd-net@FreeBSD.ORG  Tue Jun 20 09:54:26 2006
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
X-Original-To: freebsd-net@freebsd.org
Delivered-To: freebsd-net@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 51FAA16A47B
	for <freebsd-net@freebsd.org>; Tue, 20 Jun 2006 09:54:26 +0000 (UTC)
	(envelope-from pyunyh@gmail.com)
Received: from nz-out-0102.google.com (nz-out-0102.google.com [64.233.162.203])
	by mx1.FreeBSD.org (Postfix) with ESMTP id D759743D46
	for <freebsd-net@freebsd.org>; Tue, 20 Jun 2006 09:54:22 +0000 (GMT)
	(envelope-from pyunyh@gmail.com)
Received: by nz-out-0102.google.com with SMTP id m7so781059nzf
	for <freebsd-net@freebsd.org>; Tue, 20 Jun 2006 02:54:22 -0700 (PDT)
DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com;
	h=received:date:from:to:cc:subject:message-id:reply-to:references:mime-version:content-type:content-disposition:in-reply-to:user-agent;
	b=bZI0mHce9ahGIRymWfJ5ITYIO3cMP4JNDYBiYg+rQJXKIRghDybiqIraPJKWLmus0HtcyAiQO9J+Wh5+PxjRSMXdQJrtSCSSHdihQIcmEbcGeJDvdYqZtnTxRxpxjvsg+FSAGyw20FnP7Jip4k6NB7y1uggb8OsoSmWZpiQ7+60=
Received: by 10.36.250.42 with SMTP id x42mr8755279nzh;
	Tue, 20 Jun 2006 02:54:22 -0700 (PDT)
Received: from michelle.cdnetworks.co.kr ( [211.53.35.84])
	by mx.gmail.com with ESMTP id 39sm11143801nzk.2006.06.20.02.54.19;
	Tue, 20 Jun 2006 02:54:22 -0700 (PDT)
Received: from michelle.cdnetworks.co.kr (localhost.cdnetworks.co.kr
	[127.0.0.1])
	by michelle.cdnetworks.co.kr (8.13.5/8.13.5) with ESMTP id
	k5K9sYaM010467
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
	Tue, 20 Jun 2006 18:54:34 +0900 (KST)
	(envelope-from pyunyh@gmail.com)
Received: (from yongari@localhost)
	by michelle.cdnetworks.co.kr (8.13.5/8.13.5/Submit) id k5K9sVYq010466; 
	Tue, 20 Jun 2006 18:54:31 +0900 (KST)
	(envelope-from pyunyh@gmail.com)
Date: Tue, 20 Jun 2006 18:54:31 +0900
From: Pyun YongHyeon <pyunyh@gmail.com>
To: Bruce Evans <bde@zeta.org.au>
Message-ID: <20060620095431.GB8645@cdnetworks.co.kr>
References: <20060615115738.J2512@fledge.watson.org>
	<XFMail.20060615091807.jdp@polstra.com>
	<20060618194044.GC1142@funkthat.com>
	<20060619162819.F44832@delplex.bde.org>
	<20060619122753.GA5600@cdnetworks.co.kr>
	<20060620154425.Q48009@delplex.bde.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20060620154425.Q48009@delplex.bde.org>
User-Agent: Mutt/1.4.2.1i
Cc: freebsd-net@freebsd.org, John-Mark Gurney <gurney_j@resnet.uoregon.edu>,
	Robert Watson <rwatson@freebsd.org>, John Polstra <jdp@polstra.com>
Subject: Re: IF_HANDOFF vs. IFQ_HANDOFF
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
Reply-To: pyunyh@gmail.com
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
	<mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
	<mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 20 Jun 2006 09:54:26 -0000

On Tue, Jun 20, 2006 at 05:11:18PM +1000, Bruce Evans wrote:
 > On Mon, 19 Jun 2006, Pyun YongHyeon wrote:
 > 
 > Please trim quotes.
 > 
 > >On Mon, Jun 19, 2006 at 06:04:26PM +1000, Bruce Evans wrote:
 > 
 > >> To max out the link without unmaxing CPU for other uses, you do have
 > >> to know when the tx approaches running out of packets.  This is best
 > >> done using watermark stuff.  There should be a nearly-complete interrupt
 > >> at low water, and (only after low water is reached and the interrupt
 > >> handler doesn't refill the tx ring to be above low water again) a
 > >> completion interrupt at actual completion.  My version of the sk driver
 > >> does this.  It arrange for the nearly-complete interrupt at about 32
 > >> fragments (min 128 uS) before the tx runs dry, and no other tx interrupts
 > >> unless the queue length stays below 32, while the -current driver gets
 > >> an interrupt after every packet.  It does this mainly to reduce the
 > >> tx interrupt load from 1 per packet to (under load) 1 per 480 fragments.
 > >> The correct handling of OACTIVE is obtained as a side effect almost
 > >> automatically.  ...
 > >>
 > >> I'm not very familiar with NIC hardware and don't know how other NICs
 > >> support timing of tx interrupts, but watermark stuff like the above
 > >> is routine for serial devices/drivers.  sk's support for interrupting
 > >> on any fragment is too flexible to be good (it is painful to program,
 > >> and there doesn't seem to be a good way to time out if there is no
 > >> good fragment to interrupt on or when you program the interruption on
 > >> a wrong fragment).
 > >> ...
 > 
 > >AFAIK SK GENESIS has no programming interface for a watermark.
 > >Some advanced hardware provides a way to interrupt when it reaches
 > >a programmed threshold but SK does not. It just provides a way whether
 > >hardware should raise an interrupt depending on Tx descriptor value.
 > >By tracking number of index it's possible to generate an interrupt
 > >for every N frames instead of every frame(1 <= N <= MAX Tx. Desc.).
 > 
 > I only have a Yukon, and think that's what I do, with a very variable N.
 > (Do we mean the same thing by the "Tx descriptor value"?  I mean

Yes.
 > SK_TXCTL_EOF_INTR.  Surely that's portable -- it's used in all versions
 > of sk with no ifdefs for GENESIS.).
 > 
 > My sk_start() tries to fill the tx ring (to length 512) and then put
 > an interrupt mark only on the last fragment in a packet nearest to 32
 > from the end, so in the best case N is about 480, but it us less if
 > tx is not streaming.  Cases where there is not much choice are harder
 > to program.  I had some success with removing interrupt marks and with
 > dummy packets of length 0 whose purpose is just to hold an interrupt
 > mark, but I don't trust those methods.  I didn't try putting an
 > interrupt mark on fragments in the middle of a packet.  That would be
 > simpler if it works.
 > 

I think it would take a long time to generate an Tx completion
interrupt for committed frames(every frame vs. the last frame) The
hardware may have some free Tx descriptors before generating an
Tx completion interrupt. I guess it would be more efficient if we
know there are some free Tx descriptors and use it before waiting for
an Tx completion interrupt. Just waiting for a completion interrupt
would add additional latency. Anyway, I have to experiment it.

 > >We may also need to add a routine to reclaim pending Tx descriptors
 > >before sending frames in sk_start if number of available Tx descriptors
 > >are less then a threshold.
 > 
 > I'm not sure what you mean here.  If there are < 32 tx descriptors
 > available, AND there is an (active) descriptor with an interrupt mark,
 > then my sk_start() just sets IFF_OACTIVE and returns.  The case where
 > there are < 32 tx descriptors but no descriptor with an interrupt mark
 > is trickier: a mark must be added, and I don't trust adding it to an
 > active packet, so it must be added to a new packet, but it might be
 > impossible to add one for the following reasons:
 > - no space.  The magic 32 is hopefully enough.
 > - no packets in the ifq.  My sk_start() tries to leave a spare one when
 >   one might be needed, but I think upper layers can eat it.
 > A dummy packet of length 0 can be used to handle both cases but may be
 > bad for the network -- does the hardware send a frame with no data?

I can't sure.
Since you know when you have to insert interrupt mark in sk_encap
I think you can use m_defrag and set SK_TXCTL_EOF_INTR.

 > 
 > >However I don't know how the driver should handle transmit errors
 > >occurred between interrupt-less Tx operations. Just flushing all
 > >committed frames would result in poor TCP performance.
 > 
 > Doesn't the hardware just proceed to the next packet without interrupting
 > (except possibly for a special error interrupt), and anyway act the same
 > as if the interrupt were delayed by interrupt moderation?  Errors for
 > individual packets don't seem to be detected or reported in either case.
 > 

Yes that is the problem. It seems that there is no way to know which
packet caused Tx errors and I think we have no choice but flushing
entire FIFOs. SK just flushes all frames in FIFO if it detect Tx
FIFO underrun or Rx FIFO overflow. But I can't sure how Yukon should
handle this case. The flushing routine in sk is guess work from
Linux skge implementation and I don't know internal details of Yukon
hardware. Since Yukon uses defferent registers to flush FIFOs and the
existence of unique registers related with interrupt and FIFOs I guess
it uses completely different approach.

 > >The difference between Yukon and SK hardware also make it hard to
 > >implement above interrupt-less Tx operations. There is no publicly
 > 
 > My version is not interrupless, but tries to use tx interrupts for
 > everything, just not many of them.
 > 

Ok, I'll take your idea and will try to experiment it next week.

-- 
Regards,
Pyun YongHyeon