Date: Wed, 22 Apr 2009 00:08:22 +0300 From: Mikolaj Golub <to.my.trociny@gmail.com> To: freebsd-stable@FreeBSD.org Subject: Re: RELENG_7 crash Message-ID: <86k55demux.fsf@kopusha.onet> In-Reply-To: <alpine.BSF.2.00.0904211851040.67705@fledge.watson.org> (Robert Watson's message of "Tue\, 21 Apr 2009 18\:53\:54 %2B0100 \(BST\)") References: <200904210524.n3L5O9YS086865@lava.sentex.ca> <200904211111.57295.jhb@freebsd.org> <200904211519.n3LFJFsk090691@lava.sentex.ca> <20090421153112.GA47589@edoofus.dev.vega.ru> <200904211610.n3LGAYll090970@lava.sentex.ca> <alpine.BSF.2.00.0904211851040.67705@fledge.watson.org>
next in thread | previous in thread | raw e-mail | index | archive | help
--=-=-= On Tue, 21 Apr 2009 18:53:54 +0100 (BST) Robert Watson wrote: RW> On Tue, 21 Apr 2009, Mike Tancsa wrote: >> At 11:31 AM 4/21/2009, Ruslan Ermilov wrote: >>> : >>> : Note that these changes simply close races around use of ifindex_table, >>> : and make no attempt to solve the probem of disappearing ifnets. Further >>> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ >>> : refinement of this work, including with respect to ifindex_table >>> : resizing, is still required. >>> : >>> : In a future change, the ifnet lock should be converted from a mutex to an >>> : rwlock in order to reduce contention. >> >> Thanks for the info! In the mean, time, apart from disabling >> snmpwalking, is there anything I can do to mitigate triggering this >> bug ? The box runs ospf/zebra for routing daemons and mpd53 for l2tp >> LNS termination. RW> There are several bugs here, one difficult to fix (lack of RW> refcounting), but also stuff like ifp being derived from an interface RW> number twice, but checked against NULL only the first time (line 85 RW> checked for NULL, re-queried but no check line 88). Fixing the top RW> bit of the function to only query the ifp once and check it for NULL RW> then would be a good idea. More fundamentally, we do need to refcount RW> ifnets when used from the management path, which is not all that hard RW> a change, but preferably to try the easy way first given where we are RW> in the release cycle. I was thinking a bit on this problem (the same issue was reported in kern/132734) and eliminating double call of ifnet_byindex() was the first thing I did. But I decided then that the proper fix would be to wrap all critical code in sysctl_ifdata in IFNET_RLOCK/IFNET_RUNLOCK (the patch is attached). It looks like I am wrong and my idea about how the kernel works is oversimplified? :-) Unfortunately, I didn't manage to reproduce the panic in my environment so I was not able to do some experiments and tests. -- Mikolaj Golub --=-=-= Content-Type: text/x-diff Content-Disposition: inline; filename=if_mib.c.patch --- sys/net/if_mib.c.orig 2009-04-21 23:19:54.000000000 +0300 +++ sys/net/if_mib.c 2009-04-21 23:20:07.000000000 +0300 @@ -71,7 +71,7 @@ static int sysctl_ifdata(SYSCTL_HANDLER_ARGS) /* XXX bad syntax! */ { int *name = (int *)arg1; - int error; + int error = 0; u_int namelen = arg2; struct ifnet *ifp; struct ifmibdata ifmd; @@ -81,15 +81,21 @@ sysctl_ifdata(SYSCTL_HANDLER_ARGS) /* XX if (namelen != 2) return EINVAL; - if (name[0] <= 0 || name[0] > if_index || - ifnet_byindex(name[0]) == NULL) + if (name[0] <= 0 || name[0] > if_index) return ENOENT; + IFNET_RLOCK(); ifp = ifnet_byindex(name[0]); + if (ifp == NULL) { + IFNET_RUNLOCK(); + return ENOENT; + } + switch(name[1]) { default: - return ENOENT; + error = ENOENT; + break; case IFDATA_GENERAL: bzero(&ifmd, sizeof(ifmd)); @@ -105,12 +111,12 @@ sysctl_ifdata(SYSCTL_HANDLER_ARGS) /* XX ifmd.ifmd_snd_drops = ifp->if_snd.ifq_drops; error = SYSCTL_OUT(req, &ifmd, sizeof ifmd); - if (error || !req->newptr) - return error; + if (error || !req->newptr) + break; error = SYSCTL_IN(req, &ifmd, sizeof ifmd); if (error) - return error; + break; #define DONTCOPY(fld) ifmd.ifmd_data.ifi_##fld = ifp->if_data.ifi_##fld DONTCOPY(type); @@ -130,18 +136,20 @@ sysctl_ifdata(SYSCTL_HANDLER_ARGS) /* XX case IFDATA_LINKSPECIFIC: error = SYSCTL_OUT(req, ifp->if_linkmib, ifp->if_linkmiblen); - if (error || !req->newptr) - return error; + if (error || !req->newptr) + break; error = SYSCTL_IN(req, ifp->if_linkmib, ifp->if_linkmiblen); if (error) - return error; + break; case IFDATA_DRIVERNAME: /* 20 is enough for 64bit ints */ dlen = strlen(ifp->if_dname) + 20 + 1; - if ((dbuf = malloc(dlen, M_TEMP, M_NOWAIT)) == NULL) - return (ENOMEM); + if ((dbuf = malloc(dlen, M_TEMP, M_NOWAIT)) == NULL) { + error = ENOMEM; + break; + } if (ifp->if_dunit == IF_DUNIT_NONE) strcpy(dbuf, ifp->if_dname); else @@ -151,9 +159,10 @@ sysctl_ifdata(SYSCTL_HANDLER_ARGS) /* XX if (error == 0 && req->newptr != NULL) error = EPERM; free(dbuf, M_TEMP); - return (error); + break; } - return 0; + IFNET_RUNLOCK(); + return error; } SYSCTL_NODE(_net_link_generic, IFMIB_IFDATA, ifdata, CTLFLAG_RW, --=-=-=--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?86k55demux.fsf>