Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 18 Mar 2011 18:43:40 +0100
From:      Mats Lindberg <mats.w.lindberg@gmail.com>
To:        Mark Tinguely <marktinguely@gmail.com>
Cc:        freebsd-hackers@freebsd.org
Subject:   Re: FreeBSD 6 vs 8.1
Message-ID:  <AANLkTimje8yrzTYAdVKnkJLM0wo%2Bk66%2BkWv09wSdWknE@mail.gmail.com>
In-Reply-To: <4D837C27.4040802@gmail.com>
References:  <AANLkTi=23g1%2BKv%2B4Pmda3-75-r13GaRFu1_Mtofej3RJ@mail.gmail.com> <4D7DFC6F.80008@gmail.com> <AANLkTi=Gx=YZ%2BZr0q%2BFZ8mcbQyGhjZPSYm6de4ZVwSwx@mail.gmail.com> <4D7E0831.4060804@gmail.com> <AANLkTins89qcvAjd4_x=iZVjR3rMnGaEJUwuMpMAFKny@mail.gmail.com> <4D834F35.5030806@gmail.com> <AANLkTi=QzX9YF=G-5e4c4UWAZMaXF-Gkhq0ZbrA6e3mM@mail.gmail.com> <4D837C27.4040802@gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
2011/3/18 Mark Tinguely <marktinguely@gmail.com>

>  On 3/18/2011 10:11 AM, Mats Lindberg wrote:
>
>
>
> 2011/3/18 Mark Tinguely <marktinguely@gmail.com>
>
>> On 3/18/2011 3:35 AM, Mats Lindberg wrote:
>>
>>> So - after a while I've made some observations.
>>> My problem is actually connected to arp.
>>>
>>> My config is very static so basically I want to turn off arp requests.
>>> Somewhere in the startup scripts I did
>>> > sysctl -w net.link.ether.inet.max_age=2147483647 (max accepted value)
>>> Which on freebsd-6.x worked fine.
>>> In freebsd-8.1 this makes the kernel arp functionality go bezerk -
>>> probably an integer overflow somewhere.
>>> arp requests were sent countinously from my freebsd-8.1 node to others,
>>> flooding the network.
>>> I tried to lower this value and found that 500000000s works fine
>>> 1000000000s does not. 500000000s is OK to me so I won't try to narrow it
>>> down more.
>>>
>>> The reason I was suspecting swapping problems was that after a while with
>>> the flooding going on I got a kernel panic saying 'page fault', which I
>>> would guess is a another bug, but, with a sensible setting on the arp
>>> timeout the kernel panic does not show itself any longer.
>>>
>>> I've googled for my arp-setting problem but not found anything on it. So
>>> - maybe I'm the first to see this.
>>> Should I enter a bug report somewhere?
>>> I guess this forum is not the place.
>>>
>>> /Mats
>>>
>>>
>>  Did your HZ (timer interrupts per second) increase from 100 on FreeBSD-6
>> to 1000 on FreeBSD-8.1? This must be a 32 bit computer / OS because that
>> variable is multiplied to hz:
>>
>>            canceled = callout_reset(&la->la_timer,
>>                hz * V_arpt_keep, arptimer, la);
>>
>> and:
>>
>> #define    callout_reset(c, on_tick, fn, arg)                \
>>    callout_reset_on((c), (on_tick), (fn), (arg), (c)->c_cpu)
>>
>> where:
>>
>> int callout_reset_on(struct callout *, int , void (*ftn)(void *), void *,
>> int)
>>
>> I would guess that you are wrapping with 32 bit arithmetic to a small
>> value. Both the hz==100 and hz==1000 will wrap to about the same number (a
>> negative number). I did not look at the FreeBSD 6.x callout, but I think in
>> the FreeBSD 8 callout, negative on_tick will be immediately called on the
>> next tick..
>>
>
> Yes I could imagine this is it.
>
>
>>
>> A page fault panic is a kernel access to a non-mapped VA (a bad pointer).
>> The panic message would have the VA and instruction address information.
>>
>> --Mark
>>
>
> Well,
> Both systems are i386 32bit
>
> On FreeBSD-6 I have: (GENERIC) kernel
> kern.clockrate: { hz = 1000, tick = 1000, profhz = 666, stathz = 133 }
>  On FreeBSD-8 I have:(Excluded some drivers from GENERIC kernel)
> kern.clockrate: { hz = 1000, tick = 1000, profhz = 2000, stathz = 133 }
>  kern.hz: 1000
>
> So same HZ -- seems the callout is implemented differently 6.x->8.1
>
> For the kernel panic I get
> fault virtual address:             0x8
> instruction pointer:               0x20:0xc0679ed7
> current process:                  0, (em0 taskq)
>
> I don't know anything about these numbers, or if you even did want to know.
> To me I get the feeling that this is connected to my arp problem, seems to
> be something in the em driver that is not handled at this high load.
>
> I'm quite happy now - my system has been up and running for the whole day -
> so I'll leave it at this - thanks
>
> /Mats
>
> Good news.
>
> After the reply, I did look at the FreeBSD 6.4 ARP code
> (sys/netinet/if_ether.c) and the code changed between FreeBSD 6 and 8.  I
> would suggest that if you set the max arp number, that it be less than
> (2^32-1)/hz. This value is added to the route time out value also, so be
> careful on the value.
>
> The fault va/instruction pointer  is a classic NULL pointer dereference.
>
> --Mark.
>

Good, many thanks...

Just out of interest - is this a bug?
1) The sysctl accepting values it can't handle
2) The kernel/em driver panic?
In my world it would be...

/Mats



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?AANLkTimje8yrzTYAdVKnkJLM0wo%2Bk66%2BkWv09wSdWknE>