Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 10 Jul 2009 13:13:35 -0700
From:      Sam Leffler <sam@freebsd.org>
To:        Marten Vijn <info@martenvijn.nl>
Cc:        mobile@freebsd.org
Subject:   Re: ath0 / wlan0 on 8.0
Message-ID:  <4A57A0EF.5060409@freebsd.org>
In-Reply-To: <1247251131.5235.147.camel@mvn-desktop>
References:  <1247090347.10461.7.camel@mvn-desktop>	<4A5601CB.3060809@freebsd.org> <1247251131.5235.147.camel@mvn-desktop>

next in thread | previous in thread | raw e-mail | index | archive | help
Marten Vijn wrote:
> On Thu, 2009-07-09 at 07:42 -0700, Sam Leffler wrote: 
>   
>> Marten Vijn wrote:
>>     
>>> hi all,
>>>
>>> When benchmarkmarking an ath0 card an I get these errors.
>>> After a could (~10) no traffic is possible anymore.
>>>   
>>>       
>> "no traffic is possible" doesn't say enough.  
>>     
>
> Retested for beta-1 
> - ping does not respond any more.
> - the client loses associations.
>
> see: http://www.youtube.com/watch?v=sfE-YskT7XM
> for a screen cast
>
>
>   
>> Do you see beacons from the ap? 
>>     
>
> - I still see beacon frames
>
>
>   
>> Are interrupts being received on the ap? Are you out of 
>> resources like mbufs? 
>>     
>
> What are the command to produce this information?
>   

athdebug +intr sends a console msg for every interrupt but you've 
already said you are sending beacons so the stuck beacon complaints are 
irrelevant.  BTW when you enable something like +intr be sure to turn it 
off in the same cmd line as otherwise you'll likely never get control 
again on a box like this; e.g.

athdebug +intr; sleep 1; athdebug -intr

or

athdebug +intr; read x; athdebug -intr

is what I use.

netstat -m shows mbufs.

>   
>> I have seen, for example, things like nightly 
>> cron scripts accidentally left to run and kill operation.
>>     
>
> Not very likey, It an NET4826, just rebooted, 
> this very reproducable it 1 or 2 test runs. 
>   

You said you were using an Alix board for the ap.  Now it's a Soekris 
4826?  I am not interested in anything but systems running 8.0.

>   
>>> What does this message mean?
>>>
>>> ath0: stuck beacon; resetting (bmiss count 4)
>>> ath0: stuck beacon; resetting (bmiss count 4)
>>> ath0: stuck beacon; resetting (bmiss count 4)
>>>   
>>>       
>> It means 4 consecutive beacon intervals went by w/o the ap being able to 
>> xmit a beacon frame.  When this happens the driver does a h/w reset of 
>> the chip and continues.  You can raise this threshold but if it's 
>> happening a lot you should understand why.
>>     
>
> To much traffic that leads to a dos. I am aware that I create a lot
> of traffic. That's why it is benchmark. 
>
>   
>>> # uname -a 
>>> FreeBSD  8.0-CURRENT FreeBSD 8.0-CURRENT #0: Mon Jun 29 21:44:19 CEST
>>> 2009     root@master:/usr/obj/nanobsd.node_ap_64M/usr/src/sys/KERNEL
>>> i386
>>> #
>>>
>>>   
>>>       
>> The only 8.0 log under "crashes" does not point to a system crash.
>>
>> I didn't see information on the wireless setup (e.g. ifconfig commands 
>> to setup and/or status to show final operating syste).  Does this happen 
>> on all channels?  What else is running on the machine with the ap?  What 
>> is the network traffic mix (e.g. tx vs rx)?
>>     
>
> I created a new report for 8.0.beta-1 on the site with more info:
>
> http://bsd.wifisoft.org/trac/wiki/crash4  
>
> all (UDP) traffic created on the clients and flows to the server (one
> jail per client)  
>   

Please file a PR and track your information through that.

>   
>> I know several groups using the Alix board in similar configs to run 
>> production ap's with >10 users but you will need to tune the system for 
>> best operation. 
>>     
>
> I had 50~60 users concurrent on ap's on ApacheCon Amsterdam in spring.
> Ap's did fail every couple of hours/days. That workable for 10 ap's, but
> not for > 100.
>
>   
>> Under extreme wireless network load the PCI bus becomes 
>> a bottleneck and causes the host to be unable to setup each beacon frame 
>> in real-time to satisfy NextTBTT requirements. 
>>  Look at how the SWBA mechanism works in the driver and the hw.ath.hal.sw_brt and 
>> hw.ath.hal.dma_brt tunables.
>>     
>
> Could you give my pointers for documentation/stuff to read? (it will
> take me some time to read and maybe understand it. my background not
> technical)
>   

sysctl hw.ath.hal,  the source code, and many many previous postings 
(findable through google) should explain things.

>   
>> Otherwise stuck beacon conditions can be caused by the ap not getting 
>> access to the wireless medium due to it being busy.  You should sniff 
>> traffic around the time of a problem for clues.  There are also h/w 
>> registers you can observe (e.g. with athregs) to see how busy the medium 
>> is from the POV of the ap.  There have been chip bugs related to this 
>> condition but doing a reset should always restore operation. 
>>     
>
>   
>> If this 
>> isn't happening should be able to diagnose what's going with the 
>> existing facilities (e.g. athdebug msgs). 
>>     
>
> nice! I will rebuild the nanobsd img's to use
> /usr/src/tools/tools/ath/*
>
>   
>> Understand however that building a product ap is nontrivial and the FreeBSD ath driver can 
>> easily be optimized better for this purpose.
>>     
>
> I am not building a product, I am using FreeBSD wifi / networking env
> and as show case of working Opensource Software. IHMO the benchmarking
> could help harden to wlan drivers and FreeBSD based accesspoints. 
>   

Possibly but more likely you will need to tune your setup to your 
hardware and that does not apply in general.

> I expect to double the clients in the benchmark in the coming weeks.
> Then I will do more testing. Please let know how I can support to create
> usable (debug) output.
>   
Learn the tools.  Isolate the problem to something specific.  Then 
provide a recipe for reproducing it or sufficient information to 
diagnose what's going on.  99% of the time I'll not be able to reproduce 
it because it depends on local conditions.

Understand however that months ago was the right time to be doing this 
kind of testing w/ HEAD; now we are in a code freeze and anything that 
comes of this will likely not make the release.

    Sam




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4A57A0EF.5060409>