Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 26 Feb 2018 11:21:45 +0000
From:      Joe Jones <joe@stream-technologies.com>
To:        Kristof Provost <kristof@sigsegv.be>
Cc:        freebsd-pf@freebsd.org
Subject:   Re: Kernel Panic
Message-ID:  <5A93EDC9.7020407@stream-technologies.com>
In-Reply-To: <5289570D-24E1-4292-B4D2-D2F67D7D2D4F@sigsegv.be>
References:  <5A842FC6.7020806@stream-technologies.com> <FCB6BE6F-5346-42EC-ACB2-9CD99A1A16F0@sigsegv.be> <5A8443BF.8040208@stream-technologies.com> <5289570D-24E1-4292-B4D2-D2F67D7D2D4F@sigsegv.be>

next in thread | previous in thread | raw e-mail | index | archive | help
Hi Kristof,

we are not updating rules during the test although in production we will 
reload the rule set from time to time. We are constantly adding and 
removing from tables though, using the  DIOCRADDADDRS and DIOCRDELADDRS 
ioctl, also DIOCKILLSTATES is being called a lot. These are all in 
response to RADIUS events. We tried using pfctl shell command rather 
than calling ioctl directly, to check that it wasn't a problem with how 
we are calling the ioctl.

A little background. Our production system is running on 8.4 and has 
been stable for years. We are in the process of moving to 11.1 and are 
having big problems with stability when we allow customer traffic into 
the machine. At the moment we are using mirror ports on the switch to 
play live traffic into it. We're trying to work out the simplest 
configuration that causes a problem with a view to producing a good bug 
report.

I have notices that the pfil interface 
https://www.freebsd.org/cgi/man.cgi?query=pfil&sektion=9 has locking in 
it which didn't exist in 8, I think it was introduced in 9? the locking 
functions appear in the man page in 10. I don't know if that interface 
is used directly by pf, but I'm guessing packet processing needs to be 
thread safe in a way it didn't in 8.


Regards

Joe Jones

On 25/02/18 10:56, Kristof Provost wrote:
> On 14 Feb 2018, at 19:57, Joe Jones wrote:
>> On 14/02/18 13:09, Kristof Provost wrote:
>>> On 14 Feb 2018, at 23:47, Joe Jones wrote:
>>>> we are running test traffic through our system, after between 1 and 
>>>> 12 hours we get a kernel panic, always in the pfr_pool_get function 
>>>> in /usr/src/sys/netpfil/pf/pf_table.c line 2140. After a bit of 
>>>> investigation I confirmed that ke2 is set to null on line 2122.
>>>>
>>> It’d probably be interesting to know what the contents of uaddr/addr 
>>> is here.
>>> From a very quick look at the code there’s supposed to be a route 
>>> lookup there, and I’d expect there to always be a result. The code 
>>> certainly expects it, because that looks to be what causes the panic.
>>>
>>
>> (kgdb) p *uaddr
>> No symbol "uaddr" in current context.
>>
>> (kgdb) p *addr
>> $1 = {
>>   pfa = {
>>     v4 = {
>>       s_addr = 2016475826
>>     },
>>     v6 = {
>>       __u6_addr = {
>>         __u6_addr8 = 0xfffffe0000310d0c "��0x0\r1",
>>         __u6_addr16 = 0xfffffe0000310d0c,
>>         __u6_addr32 = 0xfffffe0000310d0c
>>       }
>>     },
>>     addr8 = 0xfffffe0000310d0c "��0x0\r1",
>>     addr16 = 0xfffffe0000310d0c,
>>     addr32 = 0xfffffe0000310d0c
>>   }
>> }
>>
> Interesting… That looks okay, so I have no idea why that lookup 
> returned NULL.
> Are you modifying tables/rules at all during this test?
>
>> Am I right in thinking that's in network order.
>>
> I believe so, yes.
>
> Regards,
> Kristof




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?5A93EDC9.7020407>