Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 23 Mar 2009 14:08:20 -0700
From:      Sam Leffler <sam@freebsd.org>
To:        Bruce M Simpson <bms@incunabulum.net>
Cc:        freebsd-net@freebsd.org, Alexei <alexei@raylab.com>
Subject:   Re: ath0 apparent silent disassociation
Message-ID:  <49C7FA44.8010801@freebsd.org>
In-Reply-To: <49C7EBD8.8010708@incunabulum.net>
References:  <20090317103650.GA6156@rebelion.Sisis.de> <49BF7DE4.4010804@incunabulum.net> <20090317104548.GB6182@rebelion.Sisis.de> <alpine.BSF.2.00.0903171148250.51892@thor.farley.org> <49BFF258.4020207@freebsd.org> <20090323182310.GA1825@rebelion.Sisis.de> <49C7D89A.6070502@incunabulum.net> <49C7DF8A.8070408@freebsd.org> <49C7EBD8.8010708@incunabulum.net>

next in thread | previous in thread | raw e-mail | index | archive | help
Bruce M Simpson wrote:
> Sam Leffler wrote:
>> Bruce M Simpson wrote:
>>> ...
>>> This may be orthogonal, but:
>>>    A lab colleague and I have been seeing a sporadic problem where 
>>> the ath0 exhibits the symptoms of being disassociated from its AP. 
>>> We are running RELENG_7 on the EeePC 701 since the open source HAL 
>>> merge.
>>>    In the behaviour we're seeing, we don't see any problem with the 
>>> initial dhclient run, the ath0 just seems to get disassociated 
>>> within 5-10 minutes of associating.
>>>
>>> If we leave 'ping <ap-ip-address>' running in the background, we 
>>> don't see this problem.
>>>
>>>
>>> I'll try to get set up with 'tcpdump -y ieee802_11' from initial 
>>> boot (including dhcp and anything we bump into).
>>
>> There are many issues with the wireless code in RELENG_7.  Now that 
>> the hal is merged we can try to address them.  Unfortunately the 7.2 
>> release has just begun so it's unclear what we can get in.  I'm also 
>> limited in what I'm willing to commit given that I do not run RELENG_7.
>
> OK. We've managed to reproduce this set of symptoms now in our work area.
> I've attached some script(1) output of netstat -in being run, and a 
> pcap dump.
>
>    Timebase: beginning of the pcap is in sync with a bringup from 
> single-user mode; the tcpdump runs in the background from init whilst 
> the system is brought up.
>
>    OK, so I timed the apparent loss of connectivity as 6m 30s from 
> that point I hit the stopwatch, to when I hit it again when the AP's 
> Web GUI no longer shows the STA affected as being associated.
>    Obviously such a timing is subject to human/visual jitter, and how 
> often Netgear's firmware pulls the STA association list from the AP 
> into the web GUI.
>
>    What stands out in the pcap is that 302.291s in (almost 5m 
> exactly), the STA (ath0) sends an IEEE 802.11 NULL frame to the AP 
> with the PWR MGT bit set (I'm going to sleep!). This more or less 
> coincides with a normal beacon from the Netgear AP. It does not 
> advertise Auto Power Save Delivery (apsd), that bit is 0.
>    This is puzzling as we don't enable power management by default. As 
> I understand it, this may be an AP feature in some environments... I 
> can try reproducing this with an explicit 'ifconfig ath0 -powersave' 
> and see if it reoccurs.
>
>    You'll see that after this NULL frame is sent, there is another 
> Probe Request, and the Netgear AP does Probe Respond, but this makes 
> no difference (I ended the capture around 150s after the NULL frame 
> was sent).
>
>    At this point we can't send traffic from the ath0, or rather, the 
> AP is acting as though it never even heard the STA. The STA learns the 
> AP's IP address/MAC mapping through passive ARP -- we still see 
> broadcasts on the SSID -- but the AP has started to totally ignore the 
> STA, and seemed to have ignored its ARP requests also.
>    We are using MAC address ACL control with this AP, and the ath0 
> affected is definitely listed in its ACL table, configured up, 
> rebooted etc.
>
>    It is as though the STA is entering power saving mode when not 
> explicitly told to, and the AP is not waking up the STA as it should.
>
> If any more information needed, or where to look, please let me know 
> what's involved (I MFCed the change after all, so I'll help where I 
> can until I'm on holiday this week...)
>
> My lab colleague is just working around this with 'ping <ap-ip>' for 
> now, that keeps things up, as does OpenVPN...

Your sta did a background scan.  There are bugs in this area fixed in 
HEAD.  One was that periodic calibration in the driver might kick in 
while off channel and setup state that was wrong for the channel where 
the ap was.

As I said, now that the hal code is finally in RELENG_7 I'm willing to 
look at stuff.  You or someone else can do likewise but given things 
have sat basically untouched since 7.0RC1 I suspect that's expecting too 
much.  Of course if people don't test HEAD then once 8.0 goes out we'll 
likely have a similar situation on that branch.  I do feel more 
confident about HEAD as that code has gone through multiple product 
cycles outside the tree.

    Sam




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?49C7FA44.8010801>