Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 27 Aug 2005 18:44:51 +0100 (BST)
From:      Robert Watson <rwatson@FreeBSD.org>
To:        "M. Warner Losh" <imp@bsdimp.com>
Cc:        bzeeb-lists@lists.zabbadoz.net, freebsd-current@FreeBSD.org, dandee@volny.cz
Subject:   Re: LOR route vr0
Message-ID:  <20050827184153.A24510@fledge.watson.org>
In-Reply-To: <20050827.114013.35047360.imp@bsdimp.com>
References:  <Pine.BSF.4.53.0508270912550.969@e0-0.zab2.int.zabbadoz.net> <20050827.104631.10908351.imp@bsdimp.com> <20050827181827.O24510@fledge.watson.org> <20050827.114013.35047360.imp@bsdimp.com>

next in thread | previous in thread | raw e-mail | index | archive | help

On Sat, 27 Aug 2005, M. Warner Losh wrote:

> : Generally speaking, network interface device driver locks follow network
> : stack locks in the lock order.  However, I've not really looked much at
> : the route table locking so can't speak to whether that is the case
> : specifically for routing locks.  If it is, the below traces reflect the
> : correct order, and you might want to add a hard-coded entry to witness in
> : order to catch the reverse order.
>
> Can you pose a quickie summary on how to do that? I tried last night and 
> was unsuccessful...

You need to add an entry to subr_witness.c creating a graph edge between 
the softc lock and the routing lock.  An example of an entry in 
subr_witness.c:

         /*
          * TCP/IP
          */
         { "tcp", &lock_class_mtx_sleep },
         { "tcpinp", &lock_class_mtx_sleep },
         { "so_snd", &lock_class_mtx_sleep },
         { NULL, NULL },

Note that sets of ordered entries are terminated with a double-null.  This 
declares that locks of type "tcp" preceed "tcpinp" which preceed 
"so_snd".

> : Lock order reversals between the
> : network stack and device drivers tend to occur as a result of the device
> : driver calling into the network stack while holding the device driver
> : mutex.
>
> I'm as sure as I can be that no locks are held when I call INTO the 
> network layer.  As far as I can tell, I only do that when I call 
> ifp->if_input, and I drop the locks to do that.

If I had to guess, you do a media status update, which can cause routing 
socket events indicating the link went up or down.

> : Someone (tm) should work out if the right order is route locks ->
> : device driver locks, as it's likely a common calss of bugs across many
> : drivers.
>
> I just discovered the problem in my code.  I'm not sure where the
> other order happens, but in my code I do the following:
>
> 	ED_LOCK(sc);
> 	ed_setrcr(sc);
> 	    ed_ds_getmcst(sc);
> 		IF_ADDR_LOCK(sc->ifp);
> 		TAILQ_FOREACH(ifma, &sc->ifp->if_multiaddrs, ifma_link) {
> 		...
> 		IF_ADDR_UNLOCK(sc->ifp);
> 	ED_UNLOCK(sc);
>
> since the lock for ED should be a leaf lock, this causes problems. I'm 
> guessing that the network layer calls into the driver with this lock 
> held.  Without hard coding the locking into witness (see above), I'm 
> unsure where this happens.  A quick grep of the code doesn't reveal 
> anything obvious...

I think this case should be OK, and we should document that as being the 
case using a hard-coded witness entry.

> When I comment out the abouve IF_ADDR locks, I have no more LORs, but I 
> think maybe other problems :-).

Hmmm.  I was thinking that it was a separate issue.  Could you try adding 
a graph edge to witness forcing the ifaddrmtx's to fall before the driver 
mutexes, in order to identify a path by which ifaddrmtx preceeds the 
driver mutex?

Robert N M Watson



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20050827184153.A24510>