Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 30 Dec 2006 12:20:42 -0800
From:      Sam Leffler <sam@errno.com>
To:        JoaoBR <joao@matik.com.br>
Cc:        freebsd-stable@freebsd.org
Subject:   Re: ath0 timeout problem - again
Message-ID:  <4596CA1A.9040906@errno.com>
In-Reply-To: <200612282002.11562.joao@matik.com.br>
References:  <200612282002.11562.joao@matik.com.br>

next in thread | previous in thread | raw e-mail | index | archive | help
JoaoBR wrote:
> I need some help here, this is not a single case, I get this on a several 
> machines, this is releng_6 , recent, but old problem getting ugly
> 
> 
> first I get this kind of events in messages, independent if it is client mode 
> or hostap or adhoc
> 
> Dec 28 16:50:53 ap1-cds kernel: ath0: discard oversize frame (ether type 5e4 
> flags 3 len 1522 > max 1514)
> Dec 28 16:51:01 ap1-cds kernel: ath0: discard oversize frame (ether type 5e4 
> flags 3 len 1522 > max 1514)
> Dec 28 16:58:16 ap1-cds kernel: ath0: device timeout
> ... timeout event repeats 
> 
> I really do not know what this event means (ether type 5e4), for my 
> understandings it is vague in the source, so I am lost here

Seems pretty clear: it's the type field extracted from the ethernet
header of the oversized packet.  A quick check of sys/net/ethernet.h
shows no such ETHERTYPE defined.  So something in your network is
transmitting packets that either being rx'd incorrectly or, more likely,
corrupted in transit.

> 
> {
> I get continously:
> 
>  kernel: ath0: link state changed to DOWN
>  kernel: ath0: link state changed to UP
> 
> when WL client but it recovers when the AP comes back to normal
> so wl-cli mode is not the issue
> }

Sorry this is hard to understand.  You are saying that when you see
packets discarded on the ap the client stations lose their association
to the ap?  You've said nothing about your environment but I'd guess
you've got some heavy interference like a microwave oven operating.

> 
> 
> but when the machine is running hostap the link state up/down events do not 
> come up but transmission is interrupted, or better, goes slow and stops 
> then - and stops forever until cold reboot, no chance to get this card back, 
> not even unload ath and reload the driver (that was a try but I use it 
> compiled into the kernel) 
> this is not related to any WEP settings or any rate, this problem is coming up 
> with either rate-sample or rate_onoe
> 
> 
> this is not related to the "tx stopped" problem (OACTIVE) and it is not 
> related to any [TX|RX]BUF value (whatever it is set to)
> 
> this problem is not a single case and not hardware related, here I mean MB, 
> CPU, memory but is related in a certain way to the ath drv - same machine, 
> but wi0 (prism card) and it does NOT happen this way
> 
> 
> I am with this problem since 6.0 and would be glad if somebody could convince 
> Mr. Sam L.  to attend this since it is a serious issue - any FreeBSD releng_6 
> has this problem but releng_5 does not

Well "Mr. Sam L" has other things to do that are more important to him.
 If you want help I can try to provide it but this is not exactly a
problem one can diagnose from afar.  I suggest you sniff traffic from a
separate station and try to identify what is going on in the network
when you this event occur.  It would also help to do the obvious things
like swap ath cards.  You've also said nothing about your environment
such as the mac+phy revs for the card and the computer this is operating in.

> 
> depending on the amount of traffic I get this any hour ( when 2-3Mbit/s or 
> more) or several times a day (when 1-2Mbit/s)
> 
> it get worse when I have more then one ath card installed

Sounds like you've got radio/antenna issues that manifest themselves as
noise that drives the radio's into silence.  Diagnosing something like
that may requires tools like a spectrum analyzer.

> 
> 
> ath stats:
> 
> 70777 data frames received
> 71551 data frames transmit
> 420 tx frames with an alternate rate
> 10821 long on-chip tx retries
> 260 tx failed 'cuz too many retries
> 11M current transmit rate
> 10489 tx management frames
> 1 tx frames discarded prior to association
> 786 tx frames with no ack marked
> 80516 tx frames with short preamble
> 54395 rx failed 'cuz of bad CRC
> 146438 rx failed 'cuz of PHY err
>     145013 CCK timing
>     1425 CCK restart
> 5295 beacons transmitted
> 19 periodic calibrations
> 42 rssi of last ack
> 31 avg recv rssi
> -98 rx noise floor
> 572 cabq frames transmitted
> 11 cabq xmit overflowed beacon interval

This should not happen.  You have stations in power save mode in your
bss and the transmission of queued multicast frames overflowed the
interval following the beacon frame.  This should be handled (I
explicitly tested it) but you might want to observe if this occurs when
you have problems.

> 1525 switched default/rx antenna
> Antenna profile:
> [1] tx    41285 rx    4

This makes no sense; you rx'd 4 frames total?  That's inconsistent with
the "data frames received" counter and makes me question whether these
numbers are meaningful.
> 
> 
> ifconfig
> 
> ath0: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> mtu 1500
>         ether 00:13:46:8b:f1:86
>         media: IEEE 802.11 Wireless Ethernet DS/11Mbps mode 11b <hostap>
>         status: associated
>         ssid omegasul channel 1 (2412) bssid 00:13:46:8b:f1:86
>         authmode OPEN privacy ON deftxkey 1
>         wepkey 1:40-bit
>         wepkey 2:40-bit
>         wepkey 3:40-bit
>         wepkey 4:40-bit powersavemode OFF powersavesleep 100 txpowmax 36
>         txpower 63 rtsthreshold 2346 mcastrate 1 fragthreshold 2346 bmiss 7
>         -pureg protmode CTS -wme burst ssid HIDE -apbridge dtimperiod 1
>         bintval 100

Unfortunately you've not provide critical info like the mac+phy of the
card and the platform (E.g. is this a soekris box).  As I said I can try
to _HELP_ you but I cannot fix your problem.  You need to diagnose what
is happening.

	Sam



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4596CA1A.9040906>