Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 15 Mar 2008 20:36:41 +0000 (GMT)
From:      Robert Watson <rwatson@FreeBSD.org>
To:        Alex Popa <razor@dataxnet.ro>
Cc:        mlaier@FreeBSD.org, freebsd-stable@FreeBSD.org
Subject:   Re: Lock Order Reversal on 7.0-STABLE with pf and ipfw / dummynet
Message-ID:  <20080315203121.I42065@fledge.watson.org>
In-Reply-To: <20080314192359.GA4677@dataxnet.ro>
References:  <20080314192359.GA4677@dataxnet.ro>

next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, 14 Mar 2008, Alex Popa wrote:

> World was cvsupped on March 6th, around 18:00 GMT.
>
> Built and installed kernel + world, with options WITNESS and 
> WITNESS_SKIPSPIN.
>
> Short background:  7.0-RELEASE had excellent performance on the machine, but 
> it would randomly lock up after some hours (usually over 10 hours). The 
> lockups were hard, meaning nothing seemed to work (NumLock didn't toggle the 
> keyboard LED, no replies to ping, no disk activity).  We changed the 
> motherboard and RAM and had the same behaviour.  6.2-REL is rock solid on 
> this machine (had over 50 days uptime), but upgrading to 6.3-REL made it 
> lock up just like 7.0 (so we put 6.2 back and accepted the lower performance 
> for the time being).
>
> The LOR messages from dmesg of 7.0-STABLE are as follows:
>
> lock order reversal:
> 1st 0xffffffffb19e0680 pf task mtx (pf task mtx) @ /usr/src/sys/modules/pf/../../contrib/pf/net/pf.c:6729
> 2nd 0xffffff00042ea0f0 radix node head (radix node head) @ /usr/src/sys/net/route.c:147
> lock order reversal:
> 1st 0xffffffff80938508 PFil hook read/write mutex (PFil hook read/write mutex) @ /usr/src/sys/net/pfil.c:73
> 2nd 0xffffffff80938c48 tcp (tcp) @ /usr/src/sys/netinet/tcp_input.c:400

Dear Alex,

Thanks for this report, and sorry about the problem.  It could well be that 
the lock order warning from WITNESS is related to the hang, and might reflect 
a recursion-related bug in the pf policy routing code.  I'm not sure to what 
extent you can tolerate further downtime, but it would be useful to gather 
some more information about the hang itself to try and confirm the involvement 
of lock order.  In particular, if it's feasible, it would be very helpful if 
you could boot back to the 7-STABLE kernel (keeping the 6.2-STABLE userspace 
should be fine, I think), and when the hang occurs, use the console debuggger 
(ideally hooked up to serial or firewire) to run the following debugging 
commands:

   show pcpu
   show allpcpu
   trace
   alltrace
   show allocks
   show witness
   show lockedvnods
   show uma
   show malloc

A shot-in-the-dark guess is that something about pf's interactions with the 
protocol stack is involved here, but unfortunately I suspect we'll need some 
more information to track it down.

Also, could you confirm if you're using any credential-related firewall rules 
with either ipfw or pf?  These would be uid/gid/jail matching rules.

Robert N M Watson
Computer Laboratory
University of Cambridge


>
> More details about the machine in the attached dmesg.  It's a SMP with
> 4GB of RAM, 3 gigabit cards (em0, em1 and, depending on the motherboard
> we used, either bge0 or msk0).  Only em0 is linked to a gigabit port,
> the others are 100Mbits/s
>
> My setup has in-kernel IPFIREWALL, IPFIREWALL_VERBOSE,
> IPFIREWALL_DEFAULT_TO_ACCEPT, DUMMYNET.  I have commented out INET6,
> SCTP and the wireless interfaces.  WITNESS and WITNESS_SKIPSPIN were
> only added in the hope of figuring out what locks it up, and they did
> signal these 2 LORs.
>
> pf and pflog are loaded as modules (pf_enable and pflog_enable set to
> yes in rc.conf).
>
> - The ipfw/dummynet side:
>
> I use net.link.ether.ipfw = 1 for MAC address checking, ipfw + dummynet
> for traffic shaping (4 queues at 95Mbits/s for the 2 external interfaces
> in/out, and 4 more queues for traffic that goes outside the AS group for
> which we have fast access).  Deciding which queue traffic goes in
> depends on its source address and whether its destination is in ipfw
> tables 1, 2 or none.  These tables are synchronized from pf tables via a
> custom script in crontab, which runs every 3 minutes.  The pf tables
> used as source for these are controlled by OpenBGPD.
>
> - The pf side:
>
> Filtering is done here, as is policy routing.  Filtering also contains
> redirecting to a transparent squid proxy of traffic destined to port 80
> but not bound for networks received via BGP and saved to tables <metro>
> and <special>.  Metro and special port 80 traffic goes directly to
> the destination server.
>
> Traffic from net1 and net2 is routed via the "other" external interface,
> which doesn't contain the default route... with the exception of traffic
> to pf table <special> (from BGP, same as table 2 in ipfw).  Traffic to
> <special> is routed via fastroute in pf (meaning using the default
> route).
>
> Attached are full dmesg and the kernel config.
>
> I still have access to the hard drive with 7.0-STABLE on it, but not the
> motherboard/CPU and the network cards... they are running off the hard
> drive with 6.2 on it.
>
> -- 
> "Computer science is no more about computers
>     than astronomy is about telescopes" -- E. W. Dijkstra
>



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20080315203121.I42065>