Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 20 Jun 2006 17:11:18 +1000 (EST)
From:      Bruce Evans <bde@zeta.org.au>
To:        Pyun YongHyeon <pyunyh@gmail.com>
Cc:        freebsd-net@freebsd.org, John-Mark Gurney <gurney_j@resnet.uoregon.edu>, Robert Watson <rwatson@freebsd.org>, John Polstra <jdp@polstra.com>
Subject:   Re: IF_HANDOFF vs. IFQ_HANDOFF
Message-ID:  <20060620154425.Q48009@delplex.bde.org>
In-Reply-To: <20060619122753.GA5600@cdnetworks.co.kr>
References:  <20060615115738.J2512@fledge.watson.org> <XFMail.20060615091807.jdp@polstra.com> <20060618194044.GC1142@funkthat.com> <20060619162819.F44832@delplex.bde.org> <20060619122753.GA5600@cdnetworks.co.kr>

next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, 19 Jun 2006, Pyun YongHyeon wrote:

Please trim quotes.

> On Mon, Jun 19, 2006 at 06:04:26PM +1000, Bruce Evans wrote:

> > To max out the link without unmaxing CPU for other uses, you do have
> > to know when the tx approaches running out of packets.  This is best
> > done using watermark stuff.  There should be a nearly-complete interrupt
> > at low water, and (only after low water is reached and the interrupt
> > handler doesn't refill the tx ring to be above low water again) a
> > completion interrupt at actual completion.  My version of the sk driver
> > does this.  It arrange for the nearly-complete interrupt at about 32
> > fragments (min 128 uS) before the tx runs dry, and no other tx interrupts
> > unless the queue length stays below 32, while the -current driver gets
> > an interrupt after every packet.  It does this mainly to reduce the
> > tx interrupt load from 1 per packet to (under load) 1 per 480 fragments.
> > The correct handling of OACTIVE is obtained as a side effect almost
> > automatically.  ...
> >
> > I'm not very familiar with NIC hardware and don't know how other NICs
> > support timing of tx interrupts, but watermark stuff like the above
> > is routine for serial devices/drivers.  sk's support for interrupting
> > on any fragment is too flexible to be good (it is painful to program,
> > and there doesn't seem to be a good way to time out if there is no
> > good fragment to interrupt on or when you program the interruption on
> > a wrong fragment).
> > ...

> AFAIK SK GENESIS has no programming interface for a watermark.
> Some advanced hardware provides a way to interrupt when it reaches
> a programmed threshold but SK does not. It just provides a way whether
> hardware should raise an interrupt depending on Tx descriptor value.
> By tracking number of index it's possible to generate an interrupt
> for every N frames instead of every frame(1 <= N <= MAX Tx. Desc.).

I only have a Yukon, and think that's what I do, with a very variable N.
(Do we mean the same thing by the "Tx descriptor value"?  I mean
SK_TXCTL_EOF_INTR.  Surely that's portable -- it's used in all versions
of sk with no ifdefs for GENESIS.).

My sk_start() tries to fill the tx ring (to length 512) and then put
an interrupt mark only on the last fragment in a packet nearest to 32
from the end, so in the best case N is about 480, but it us less if
tx is not streaming.  Cases where there is not much choice are harder
to program.  I had some success with removing interrupt marks and with
dummy packets of length 0 whose purpose is just to hold an interrupt
mark, but I don't trust those methods.  I didn't try putting an
interrupt mark on fragments in the middle of a packet.  That would be
simpler if it works.

> We may also need to add a routine to reclaim pending Tx descriptors
> before sending frames in sk_start if number of available Tx descriptors
> are less then a threshold.

I'm not sure what you mean here.  If there are < 32 tx descriptors
available, AND there is an (active) descriptor with an interrupt mark,
then my sk_start() just sets IFF_OACTIVE and returns.  The case where
there are < 32 tx descriptors but no descriptor with an interrupt mark
is trickier: a mark must be added, and I don't trust adding it to an
active packet, so it must be added to a new packet, but it might be
impossible to add one for the following reasons:
- no space.  The magic 32 is hopefully enough.
- no packets in the ifq.  My sk_start() tries to leave a spare one when
   one might be needed, but I think upper layers can eat it.
A dummy packet of length 0 can be used to handle both cases but may be
bad for the network -- does the hardware send a frame with no data?

> However I don't know how the driver should handle transmit errors
> occurred between interrupt-less Tx operations. Just flushing all
> committed frames would result in poor TCP performance.

Doesn't the hardware just proceed to the next packet without interrupting
(except possibly for a special error interrupt), and anyway act the same
as if the interrupt were delayed by interrupt moderation?  Errors for
individual packets don't seem to be detected or reported in either case.

> The difference between Yukon and SK hardware also make it hard to
> implement above interrupt-less Tx operations. There is no publicly

My version is not interrupless, but tries to use tx interrupts for
everything, just not many of them.

> available documentation for Yukon adapters and Yukon seems to use
> completely different registers for FIFO handling and flow control.
> This is one of main reason why I couldn't implement polling(4) for
> sk(4). It is also known to me Yukon adapters have a bug which loses
> Tx completion interrupts under certain conditions.

Anything that prevents implementation of polling is good.

> BTW, as SK adapters have no limit on the number of Tx/Rx descriptors
> how about increasing the number of Tx descriptors(i.e. 1024 or 2048)
> to reduce the chance of running out of Tx descriptors?
> It does not decrease number of interrupts generated but it would help
> to push the hardware to the limit without much overhead, I guess.

I tried this instead of increasing the ifq length (*) and seemed to hit
a limit.  I don't understand the problem of running out of tx descriptors
-- see above.

(*) It's necessary to have an enormous number of descriptors to avoid
the tx running dry afer poor handling of ENOBUFS.  select() doesn't
work right for sockets, so at least test programs like ttcp retry after
an unshort sleep.  The old sgi ttcp tries to sleep for 18000 uS.  This
may have been long enough for 1Mbps ethernet, but for sk it's 4500
packet times with minimal packets, and for better hardware it would
be a multiple instead of a factor of 18000 packet times.  Newer versions
of ttcp uses a shorter sleep or a busy-wait.  Neither works well.  The
shortest sleep length averages 1.5/HZ seconds and I use HZ = 100 so
this is almost the same as the old sgi sleep time (but 18000 gets
rounded up to an average of 25000).  Busy-waiting results in ttcp
taking 100% of the CPU and the system and driver parts of this being
harder to distinguish.  In sk, I use an ifq length scaled by hz to
ensure that the queue doesn't run dry during several 1/HZ's of sleeping
and ttcp works right.  This gives an enormous ifq with HZ = 100 (5-10
thousand descriptors).  An unchanged fxp behaves like an unchanged sk
here, but an unhanged bge behaves differently.  bge apparently never
returns ENOBUFS, but busy-waits in the kernel.

Bruce



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20060620154425.Q48009>