From owner-freebsd-net@FreeBSD.ORG Tue Jun 20 09:54:26 2006 Return-Path: X-Original-To: freebsd-net@freebsd.org Delivered-To: freebsd-net@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 51FAA16A47B for ; Tue, 20 Jun 2006 09:54:26 +0000 (UTC) (envelope-from pyunyh@gmail.com) Received: from nz-out-0102.google.com (nz-out-0102.google.com [64.233.162.203]) by mx1.FreeBSD.org (Postfix) with ESMTP id D759743D46 for ; Tue, 20 Jun 2006 09:54:22 +0000 (GMT) (envelope-from pyunyh@gmail.com) Received: by nz-out-0102.google.com with SMTP id m7so781059nzf for ; Tue, 20 Jun 2006 02:54:22 -0700 (PDT) DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:date:from:to:cc:subject:message-id:reply-to:references:mime-version:content-type:content-disposition:in-reply-to:user-agent; b=bZI0mHce9ahGIRymWfJ5ITYIO3cMP4JNDYBiYg+rQJXKIRghDybiqIraPJKWLmus0HtcyAiQO9J+Wh5+PxjRSMXdQJrtSCSSHdihQIcmEbcGeJDvdYqZtnTxRxpxjvsg+FSAGyw20FnP7Jip4k6NB7y1uggb8OsoSmWZpiQ7+60= Received: by 10.36.250.42 with SMTP id x42mr8755279nzh; Tue, 20 Jun 2006 02:54:22 -0700 (PDT) Received: from michelle.cdnetworks.co.kr ( [211.53.35.84]) by mx.gmail.com with ESMTP id 39sm11143801nzk.2006.06.20.02.54.19; Tue, 20 Jun 2006 02:54:22 -0700 (PDT) Received: from michelle.cdnetworks.co.kr (localhost.cdnetworks.co.kr [127.0.0.1]) by michelle.cdnetworks.co.kr (8.13.5/8.13.5) with ESMTP id k5K9sYaM010467 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Tue, 20 Jun 2006 18:54:34 +0900 (KST) (envelope-from pyunyh@gmail.com) Received: (from yongari@localhost) by michelle.cdnetworks.co.kr (8.13.5/8.13.5/Submit) id k5K9sVYq010466; Tue, 20 Jun 2006 18:54:31 +0900 (KST) (envelope-from pyunyh@gmail.com) Date: Tue, 20 Jun 2006 18:54:31 +0900 From: Pyun YongHyeon To: Bruce Evans Message-ID: <20060620095431.GB8645@cdnetworks.co.kr> References: <20060615115738.J2512@fledge.watson.org> <20060618194044.GC1142@funkthat.com> <20060619162819.F44832@delplex.bde.org> <20060619122753.GA5600@cdnetworks.co.kr> <20060620154425.Q48009@delplex.bde.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20060620154425.Q48009@delplex.bde.org> User-Agent: Mutt/1.4.2.1i Cc: freebsd-net@freebsd.org, John-Mark Gurney , Robert Watson , John Polstra Subject: Re: IF_HANDOFF vs. IFQ_HANDOFF X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: pyunyh@gmail.com List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 20 Jun 2006 09:54:26 -0000 On Tue, Jun 20, 2006 at 05:11:18PM +1000, Bruce Evans wrote: > On Mon, 19 Jun 2006, Pyun YongHyeon wrote: > > Please trim quotes. > > >On Mon, Jun 19, 2006 at 06:04:26PM +1000, Bruce Evans wrote: > > >> To max out the link without unmaxing CPU for other uses, you do have > >> to know when the tx approaches running out of packets. This is best > >> done using watermark stuff. There should be a nearly-complete interrupt > >> at low water, and (only after low water is reached and the interrupt > >> handler doesn't refill the tx ring to be above low water again) a > >> completion interrupt at actual completion. My version of the sk driver > >> does this. It arrange for the nearly-complete interrupt at about 32 > >> fragments (min 128 uS) before the tx runs dry, and no other tx interrupts > >> unless the queue length stays below 32, while the -current driver gets > >> an interrupt after every packet. It does this mainly to reduce the > >> tx interrupt load from 1 per packet to (under load) 1 per 480 fragments. > >> The correct handling of OACTIVE is obtained as a side effect almost > >> automatically. ... > >> > >> I'm not very familiar with NIC hardware and don't know how other NICs > >> support timing of tx interrupts, but watermark stuff like the above > >> is routine for serial devices/drivers. sk's support for interrupting > >> on any fragment is too flexible to be good (it is painful to program, > >> and there doesn't seem to be a good way to time out if there is no > >> good fragment to interrupt on or when you program the interruption on > >> a wrong fragment). > >> ... > > >AFAIK SK GENESIS has no programming interface for a watermark. > >Some advanced hardware provides a way to interrupt when it reaches > >a programmed threshold but SK does not. It just provides a way whether > >hardware should raise an interrupt depending on Tx descriptor value. > >By tracking number of index it's possible to generate an interrupt > >for every N frames instead of every frame(1 <= N <= MAX Tx. Desc.). > > I only have a Yukon, and think that's what I do, with a very variable N. > (Do we mean the same thing by the "Tx descriptor value"? I mean Yes. > SK_TXCTL_EOF_INTR. Surely that's portable -- it's used in all versions > of sk with no ifdefs for GENESIS.). > > My sk_start() tries to fill the tx ring (to length 512) and then put > an interrupt mark only on the last fragment in a packet nearest to 32 > from the end, so in the best case N is about 480, but it us less if > tx is not streaming. Cases where there is not much choice are harder > to program. I had some success with removing interrupt marks and with > dummy packets of length 0 whose purpose is just to hold an interrupt > mark, but I don't trust those methods. I didn't try putting an > interrupt mark on fragments in the middle of a packet. That would be > simpler if it works. > I think it would take a long time to generate an Tx completion interrupt for committed frames(every frame vs. the last frame) The hardware may have some free Tx descriptors before generating an Tx completion interrupt. I guess it would be more efficient if we know there are some free Tx descriptors and use it before waiting for an Tx completion interrupt. Just waiting for a completion interrupt would add additional latency. Anyway, I have to experiment it. > >We may also need to add a routine to reclaim pending Tx descriptors > >before sending frames in sk_start if number of available Tx descriptors > >are less then a threshold. > > I'm not sure what you mean here. If there are < 32 tx descriptors > available, AND there is an (active) descriptor with an interrupt mark, > then my sk_start() just sets IFF_OACTIVE and returns. The case where > there are < 32 tx descriptors but no descriptor with an interrupt mark > is trickier: a mark must be added, and I don't trust adding it to an > active packet, so it must be added to a new packet, but it might be > impossible to add one for the following reasons: > - no space. The magic 32 is hopefully enough. > - no packets in the ifq. My sk_start() tries to leave a spare one when > one might be needed, but I think upper layers can eat it. > A dummy packet of length 0 can be used to handle both cases but may be > bad for the network -- does the hardware send a frame with no data? I can't sure. Since you know when you have to insert interrupt mark in sk_encap I think you can use m_defrag and set SK_TXCTL_EOF_INTR. > > >However I don't know how the driver should handle transmit errors > >occurred between interrupt-less Tx operations. Just flushing all > >committed frames would result in poor TCP performance. > > Doesn't the hardware just proceed to the next packet without interrupting > (except possibly for a special error interrupt), and anyway act the same > as if the interrupt were delayed by interrupt moderation? Errors for > individual packets don't seem to be detected or reported in either case. > Yes that is the problem. It seems that there is no way to know which packet caused Tx errors and I think we have no choice but flushing entire FIFOs. SK just flushes all frames in FIFO if it detect Tx FIFO underrun or Rx FIFO overflow. But I can't sure how Yukon should handle this case. The flushing routine in sk is guess work from Linux skge implementation and I don't know internal details of Yukon hardware. Since Yukon uses defferent registers to flush FIFOs and the existence of unique registers related with interrupt and FIFOs I guess it uses completely different approach. > >The difference between Yukon and SK hardware also make it hard to > >implement above interrupt-less Tx operations. There is no publicly > > My version is not interrupless, but tries to use tx interrupts for > everything, just not many of them. > Ok, I'll take your idea and will try to experiment it next week. -- Regards, Pyun YongHyeon