Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 21 Feb 2009 23:47:55 +0000 (GMT)
From:      Robert Watson <rwatson@FreeBSD.org>
To:        Mike Tancsa <mike@sentex.net>
Cc:        freebsd-stable@freebsd.org, Pete French <petefrench@ticketswitch.com>
Subject:   Re: Big problems with 7.1 locking up :-(
Message-ID:  <alpine.BSF.2.00.0902212345170.98609@fledge.watson.org>
In-Reply-To: <200902180110.n1I1AaPL031693@pyroxene.sentex.ca>
References:  <E1LL6dg-0007CN-DI@dilbert.ticketswitch.com> <alpine.BSF.2.00.0901292234430.30840@fledge.watson.org> <200902180110.n1I1AaPL031693@pyroxene.sentex.ca>

next in thread | previous in thread | raw e-mail | index | archive | help

On Tue, 17 Feb 2009, Mike Tancsa wrote:

>        Do you have any other details about these issues ? Were the fixes 
> ever MFC'd

Earlier today I handed off some patches for Pete to test (attached below), 
which he's running alongside the patches in kern/130652.  When I run with the 
patches, basically an MFC of a subset of Kip's routing improvements in 8.x, I 
can no longer reproduce the lock reversal, which will hopefully mean Pete can 
no longer reproduce the hang.  I plan to merge these in a couple of days once 
(with any luck) he's confirmed that is the case.  We may want to get a subset 
of this patch on the errata note path, if we can get the ICMP redirect fix 
down to a very short patch.

Robert N M Watson
Computer Laboratory
University of Cambridge


Merge r185747, r185774, r185807, r185849, r185964, r185965, r186051,
r186052 from head to stable/7; note that only the locking fixes and
invariants checking are added from r185747, but not the move to an
rwlock which would modify the kernel binary interface, nor the move
to a non-recursible lock, which is still seeing problem reports in
head.  This corrects, among other things, a deadlock that may occur
when processing incoming ICMP redirects.

r185747:

   - convert radix node head lock from mutex to rwlock
   - make radix node head lock not recursive
   - fix LOR in rtexpunge
   - fix LOR in rtredirect

   Reviewed by:	sam

r185774:

   - avoid recursively locking the radix node head lock
   - assert that it is held if RTF_RNH_LOCKED is not passed

r185807:

   Fix a bug introduced in r185747: rather than dereferencing an
   uninitialized *rt to something undefined, use the fibnum that came in as
   function argument.

   Found with:	Coverity Prevent(tm)
   CID:		4168

r185849:

   fix a reported panic when adding a route and one hit here when deleting a
   route

   - pass RTF_RNH_LOCKED to rtalloc1_fib in 2 cases where the lock is held
   - make sure the rnh lock is held across rt_setgate and rt_getifa_fib

r185964:

   Pass RTF_RNH_LOCKED to rtalloc1 sunce the node head is locked, this avoids
   a recursive lock panic on inet6 detach.

   Reviewed by:	kmacy

r185965:

   RTF_RNH_LOCKED needs to be passed in the flags arg not report,
   apologies to thompsa

r186051:

   in6_addroute is called through rnh_addadr which is always called with the
   radix node head lock held exclusively. Pass RTF_RNH_LOCKED to rtalloc so
   that rtalloc1_fib will not try to re-acquire the lock.

r186052:

   don't acquire lock recursively

All original commits to head were by Kip Macy <kmacy>, except r185964 by
<thompsa>.

Reviewed by:	bz
Tested by:	Pete French <petefrench at ticketswitch com>



Property changes on: sys
___________________________________________________________________
Modified: svn:mergeinfo
    Merged /head/sys:r185747,185774,185807,185849,185964-185965,186051-186052

Index: sys/netinet/in_rmx.c
===================================================================
--- sys/netinet/in_rmx.c	(revision 188767)
+++ sys/netinet/in_rmx.c	(working copy)
@@ -111,7 +111,7 @@
  		 * ARP entry and delete it if so.
  		 */
  		rt2 = in_rtalloc1((struct sockaddr *)sin, 0,
-		    RTF_CLONING, rt->rt_fibnum);
+		    RTF_CLONING|RTF_RNH_LOCKED, rt->rt_fibnum);
  		if (rt2) {
  			if (rt2->rt_flags & RTF_LLINFO &&
  			    rt2->rt_flags & RTF_HOST &&

Property changes on: sys/dev/cxgb
___________________________________________________________________
Modified: svn:mergeinfo
    Merged /head/sys/dev/cxgb:r185747,185774,185807,185849,185964-185965,186051-186052


Property changes on: sys/dev/ath/ath_hal
___________________________________________________________________
Modified: svn:mergeinfo
    Merged /head/sys/dev/ath/ath_hal:r185747,185774,185807,185849,185964-185965,186051-186052

Index: sys/net/route.c
===================================================================
--- sys/net/route.c	(revision 188767)
+++ sys/net/route.c	(working copy)
@@ -277,6 +277,7 @@
  	struct rt_addrinfo info;
  	u_long nflags;
  	int err = 0, msgtype = RTM_MISS;
+	int needlock;

  	KASSERT((fibnum < rt_numfibs), ("rtalloc1_fib: bad fibnum"));
  	if (dst->sa_family != AF_INET)	/* Only INET supports > 1 fib now */
@@ -290,7 +291,13 @@
  		rtstat.rts_unreach++;
  		goto miss2;
  	}
-	RADIX_NODE_HEAD_LOCK(rnh);
+	needlock = !(ignflags & RTF_RNH_LOCKED);
+	if (needlock)
+		RADIX_NODE_HEAD_LOCK(rnh);
+#ifdef INVARIANTS
+	else
+		RADIX_NODE_HEAD_LOCK_ASSERT(rnh);
+#endif
  	if ((rn = rnh->rnh_matchaddr(dst, rnh)) &&
  	    (rn->rn_flags & RNF_ROOT) == 0) {
  		/*
@@ -343,7 +350,8 @@
  			RT_LOCK(newrt);
  			RT_ADDREF(newrt);
  		}
-		RADIX_NODE_HEAD_UNLOCK(rnh);
+		if (needlock)
+			RADIX_NODE_HEAD_UNLOCK(rnh);
  	} else {
  		/*
  		 * Either we hit the root or couldn't find any match,
@@ -352,7 +360,8 @@
  		 */
  		rtstat.rts_unreach++;
  	miss:
-		RADIX_NODE_HEAD_UNLOCK(rnh);
+		if (needlock)
+			RADIX_NODE_HEAD_UNLOCK(rnh);
  	miss2:	if (report) {
  			/*
  			 * If required, report the failure to the supervising
@@ -482,6 +491,8 @@
  	short *stat = NULL;
  	struct rt_addrinfo info;
  	struct ifaddr *ifa;
+	struct radix_node_head *rnh =
+	    rt_tables[fibnum][dst->sa_family];

  	/* verify the gateway is directly reachable */
  	if ((ifa = ifa_ifwithnet(gateway)) == NULL) {
@@ -531,6 +542,8 @@
  			info.rti_info[RTAX_NETMASK] = netmask;
  			info.rti_ifa = ifa;
  			info.rti_flags = flags;
+			if (rt0 != NULL)
+				RT_UNLOCK(rt0); /* drop lock to avoid LOR with RNH */
  			error = rtrequest1_fib(RTM_ADD, &info, &rt, fibnum);
  			if (rt != NULL) {
  				RT_LOCK(rt);
@@ -538,7 +551,7 @@
  				flags = rt->rt_flags;
  			}
  			if (rt0)
-				RTFREE_LOCKED(rt0);
+				RTFREE(rt0);

  			stat = &rtstat.rts_dynamic;
  		} else {
@@ -554,8 +567,12 @@
  			/*
  			 * add the key and gateway (in one malloc'd chunk).
  			 */
+			RT_UNLOCK(rt);
+			RADIX_NODE_HEAD_LOCK(rnh);
+			RT_LOCK(rt);
  			rt_setgate(rt, rt_key(rt), gateway);
-			gwrt = rtalloc1(gateway, 1, 0);
+			gwrt = rtalloc1(gateway, 1, RTF_RNH_LOCKED);
+			RADIX_NODE_HEAD_UNLOCK(rnh);
  			EVENTHANDLER_INVOKE(route_redirect_event, rt, gwrt, dst);
  			RTFREE_LOCKED(gwrt);
  		}
@@ -641,7 +658,7 @@
  	if (ifa == NULL)
  		ifa = ifa_ifwithnet(gateway);
  	if (ifa == NULL) {
-		struct rtentry *rt = rtalloc1_fib(gateway, 0, 0UL, fibnum);
+		struct rtentry *rt = rtalloc1_fib(gateway, 0, RTF_RNH_LOCKED, fibnum);
  		if (rt == NULL)
  			return (NULL);
  		/*
@@ -788,7 +805,9 @@
  	struct ifaddr *ifa;
  	int error = 0;

+	rnh = rt_tables[rt->rt_fibnum][rt_key(rt)->sa_family];
  	RT_LOCK_ASSERT(rt);
+	RADIX_NODE_HEAD_LOCK_ASSERT(rnh);
  #if 0
  	/*
  	 * We cannot assume anything about the reference count
@@ -804,8 +823,6 @@
  	if (rnh == NULL)
  		return (EAFNOSUPPORT);

-	RADIX_NODE_HEAD_LOCK(rnh);
-
  	/*
  	 * Remove the item from the tree; it should be there,
  	 * but when callers invoke us blindly it may not (sigh).
@@ -826,7 +843,7 @@
  	 * Now search what's left of the subtree for any cloned
  	 * routes which might have been formed from this node.
  	 */
-	if ((rt->rt_flags & RTF_CLONING) && rt_mask(rt))
+	if ((rt->rt_flags & RTF_CLONING) && rt_mask(rt))
  		rnh->rnh_walktree_from(rnh, rt_key(rt), rt_mask(rt),
  				       rt_fixdelete, rt);

@@ -860,7 +877,6 @@
  	 */
  	rttrash++;
  bad:
-	RADIX_NODE_HEAD_UNLOCK(rnh);
  	return (error);
  }

@@ -874,7 +890,7 @@
  rtrequest1_fib(int req, struct rt_addrinfo *info, struct rtentry **ret_nrt,
  				u_int fibnum)
  {
-	int error = 0;
+	int error = 0, needlock = 0;
  	register struct rtentry *rt;
  	register struct radix_node *rn;
  	register struct radix_node_head *rnh;
@@ -891,7 +907,12 @@
  	rnh = rt_tables[fibnum][dst->sa_family];
  	if (rnh == NULL)
  		return (EAFNOSUPPORT);
-	RADIX_NODE_HEAD_LOCK(rnh);
+	needlock = ((flags & RTF_RNH_LOCKED) == 0);
+	flags &= ~RTF_RNH_LOCKED;
+	if (needlock)
+		RADIX_NODE_HEAD_LOCK(rnh);
+	else
+		RADIX_NODE_HEAD_LOCK_ASSERT(rnh);
  	/*
  	 * If we are adding a host route then we don't want to put
  	 * a netmask in the tree, nor do we want to clone it.
@@ -1036,7 +1057,7 @@
  			 * then we just blow it away and retry the insertion
  			 * of the new one.
  			 */
-			rt2 = rtalloc1_fib(dst, 0, 0, fibnum);
+			rt2 = rtalloc1_fib(dst, 0, RTF_RNH_LOCKED, fibnum);
  			if (rt2 && rt2->rt_parent) {
  				rtexpunge(rt2);
  				RT_UNLOCK(rt2);
@@ -1125,7 +1146,8 @@
  		error = EOPNOTSUPP;
  	}
  bad:
-	RADIX_NODE_HEAD_UNLOCK(rnh);
+	if (needlock)
+		RADIX_NODE_HEAD_UNLOCK(rnh);
  	return (error);
  #undef senderr
  }
@@ -1153,7 +1175,7 @@
  	if (rt->rt_parent == rt0 &&
  	    !(rt->rt_flags & (RTF_PINNED | RTF_CLONING))) {
  		return rtrequest_fib(RTM_DELETE, rt_key(rt), NULL, rt_mask(rt),
-				 rt->rt_flags, NULL, rt->rt_fibnum);
+				 rt->rt_flags|RTF_RNH_LOCKED, NULL, rt->rt_fibnum);
  	}
  	return 0;
  }
@@ -1230,6 +1252,7 @@

  again:
  	RT_LOCK_ASSERT(rt);
+	RADIX_NODE_HEAD_LOCK_ASSERT(rnh);

  	/*
  	 * A host route with the destination equal to the gateway
@@ -1256,7 +1279,7 @@
  		struct rtentry *gwrt;

  		RT_UNLOCK(rt);		/* XXX workaround LOR */
-		gwrt = rtalloc1_fib(gate, 1, 0, rt->rt_fibnum);
+		gwrt = rtalloc1_fib(gate, 1, RTF_RNH_LOCKED, rt->rt_fibnum);
  		if (gwrt == rt) {
  			RT_REMREF(rt);
  			return (EADDRINUSE); /* failure */
@@ -1327,12 +1350,8 @@

  		arg.rnh = rnh;
  		arg.rt0 = rt;
-		RT_UNLOCK(rt);		/* XXX workaround LOR */
-		RADIX_NODE_HEAD_LOCK(rnh);
-		RT_LOCK(rt);
  		rnh->rnh_walktree_from(rnh, rt_key(rt), rt_mask(rt),
  				       rt_fixchange, &arg);
-		RADIX_NODE_HEAD_UNLOCK(rnh);
  	}

  	return 0;
Index: sys/net/route.h
===================================================================
--- sys/net/route.h	(revision 188767)
+++ sys/net/route.h	(working copy)
@@ -172,6 +172,7 @@
  #define	RTF_BROADCAST	0x400000	/* route represents a bcast address */
  #define	RTF_MULTICAST	0x800000	/* route represents a mcast address */
  					/* 0x1000000 and up unassigned */
+#define	RTF_RNH_LOCKED	0x40000000	/* radix node head locked by caller */

  /* Mask of RTF flags that are allowed to be modified by RTM_CHANGE. */
  #define RTF_FMASK	\
Index: sys/net/rtsock.c
===================================================================
--- sys/net/rtsock.c	(revision 188767)
+++ sys/net/rtsock.c	(working copy)
@@ -628,9 +628,11 @@
  			     !sa_equal(info.rti_info[RTAX_IFA],
  				       rt->rt_ifa->ifa_addr))) {
  				RT_UNLOCK(rt);
+				RADIX_NODE_HEAD_LOCK(rnh);
  				if ((error = rt_getifa_fib(&info,
  				    rt->rt_fibnum)) != 0)
  					senderr(error);
+				RADIX_NODE_HEAD_UNLOCK(rnh);
  				RT_LOCK(rt);
  			}
  			if (info.rti_ifa != NULL &&
@@ -642,8 +644,14 @@
  				IFAFREE(rt->rt_ifa);
  			}
  			if (info.rti_info[RTAX_GATEWAY] != NULL) {
-				if ((error = rt_setgate(rt, rt_key(rt),
-					info.rti_info[RTAX_GATEWAY])) != 0) {
+				RT_UNLOCK(rt);
+				RADIX_NODE_HEAD_LOCK(rnh);
+				RT_LOCK(rt);
+ 
+				error = rt_setgate(rt, rt_key(rt),
+				    info.rti_info[RTAX_GATEWAY]);
+				RADIX_NODE_HEAD_UNLOCK(rnh);
+				if (error != 0) {
  					RT_UNLOCK(rt);
  					senderr(error);
  				}
Index: sys/netinet6/in6_ifattach.c
===================================================================
--- sys/netinet6/in6_ifattach.c	(revision 188767)
+++ sys/netinet6/in6_ifattach.c	(working copy)
@@ -823,7 +823,7 @@
  	/* XXX grab lock first to avoid LOR */
  	if (rt_tables[0][AF_INET6] != NULL) {
  		RADIX_NODE_HEAD_LOCK(rt_tables[0][AF_INET6]);
-		rt = rtalloc1((struct sockaddr *)&sin6, 0, 0UL);
+		rt = rtalloc1((struct sockaddr *)&sin6, 0, RTF_RNH_LOCKED);
  		if (rt) {
  			if (rt->rt_ifp == ifp)
  				rtexpunge(rt);
Index: sys/netinet6/in6_rmx.c
===================================================================
--- sys/netinet6/in6_rmx.c	(revision 188767)
+++ sys/netinet6/in6_rmx.c	(working copy)
@@ -154,7 +154,7 @@
  		 * Find out if it is because of an
  		 * ARP entry and delete it if so.
  		 */
-		rt2 = rtalloc1((struct sockaddr *)sin6, 0, RTF_CLONING);
+		rt2 = rtalloc1((struct sockaddr *)sin6, 0, RTF_RNH_LOCKED|RTF_CLONING);
  		if (rt2) {
  			if (rt2->rt_flags & RTF_LLINFO &&
  				rt2->rt_flags & RTF_HOST &&
@@ -181,7 +181,7 @@
  		 *	net route entry, 3ffe:0501:: -> if0.
  		 *	This case should not raise an error.
  		 */
-		rt2 = rtalloc1((struct sockaddr *)sin6, 0, RTF_CLONING);
+		rt2 = rtalloc1((struct sockaddr *)sin6, 0, RTF_RNH_LOCKED|RTF_CLONING);
  		if (rt2) {
  			if ((rt2->rt_flags & (RTF_CLONING|RTF_HOST|RTF_GATEWAY))
  					== RTF_CLONING

Property changes on: sys/contrib/pf
___________________________________________________________________
Modified: svn:mergeinfo
    Merged /head/sys/contrib/pf:r185747,185774,185807,185849,185964-185965,186051-186052




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?alpine.BSF.2.00.0902212345170.98609>