Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 5 Jan 2013 20:53:16 -0800
From:      Adrian Chadd <adrian@freebsd.org>
To:        freebsd-wireless@freebsd.org
Subject:   [CFT] ath(4) migration to if_transmit() and a transmit tasklet, rather than direct dispatch
Message-ID:  <CAJ-Vmo=b2p_HMvSUAYFaY0ZunuB6Sjbrx8cEpDFfVBM=SGVEYg@mail.gmail.com>

next in thread | raw e-mail | index | archive | help
Hi,

I've written up a replacement ath(4) TX path that does a bunch of things.

The patch I'd like everyone to try:

http://people.freebsd.org/~adrian/ath/20130105-if_transmit_txfrag_2.diff

What it does:

* It implements a driver staging queue for frames from if_start and
if_transmit();
* It populates that staging queue with ath_buf's, that contain the
mbuf and node ref;
* The actual TX occurs in a taskqueue (the default ath taskqueue for
now, but I'll move it to another one soon) in order to serialise
things;
* The tx fragment list is populated correctly (but doesn't quite work,
see below);
* the reliance on "peeking" at the next mbuf in a fragment list is
gone - instead, I now store the data length of the next fragment in
the current buffer and fire that across.

Now, in station mode I get exactly the same throughput as before -
150-180mbit TCP iperf tests. It's great.
I need to do some more (single core) MIPS testing - if it drops in
throughput there, it'll likely be the ridiculously huge calls to
taskqueue_enqueue() and/or some lock contention.

I'd like to commit this to -HEAD and then begin next the next phase, which is:

* Figure out why TX fragments are transmitted but dropped by (some)
receivers.  FreeBSD -> FreeBSD works fine, but FreeBSD -> (some random
11g cable modem router) just plain drops the fragments in question.
Sigh;
* Tidy up some more of the locking, which likely involves separating
out the TX queue lock from the TX taskqueue lock;
* Push raw xmit frames into the same queue mechanism, so they are
queued in the same fashion and obey the same sequence number / CCMP IV
allocation ordering that data frames do (which is important as things
like EAPOL frames are encrypted and have sequence numbers, but come in
the raw xmit path. Grr.)
* Finish tidying up when things are called - specifically, I'd like to
make sure all the buffer completions occur outside of the locks behind
held, so I can finally avoid a bunch of potential LORs when doing
things like transmitting BAR frames from the TX completion path.

I'd really appreciate any testing that can be done for this. It
doesn't matter which mode you're in - adhoc, hostap, mesh, sta, 11n or
non-11n - all ath(4) chips share the same TX path code and this all
happens before the software TX queue and aggregation handling.

Once this is all verified and working, I'll work on migrating the
net80211 TX path to actually use if_transmit() itself and use a TX
taskqueue to serialise all TX. That should fix a whole bunch of
subtle, niggling little TX side bugs that have been the bane of my
existence since I took this code on a couple years ago.

Finally - although I'd like to _fix_ TX fragment handling, it still
has its .. quirks. I'm still worried that a very active STA or AP with
multiple traffic sessions will end up with TX fragments in the
software queue that aren't kept 100% in order (ie, other frames from
other sessions get interspersed with other session traffic. For now it
won't happen - the TX lock is held for the duration of running the TX
queuing and so (in theory!) nothing should appear in the TX queue in
between TX fragments. But I don't trust it. Chances are the correct
fix is a lot more nasty than the current net80211 way of "just do
fragmentation in the net80211 layer and the driver will figure it out"
lets me do cleanly. (Ie, I think the clean way is to do fragmentation
at the point where you're about to queue it to the actual hardware and
not in net80211, but I digress.)



Adrian



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAJ-Vmo=b2p_HMvSUAYFaY0ZunuB6Sjbrx8cEpDFfVBM=SGVEYg>