Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 1 Nov 2004 22:51:46 +0100
From:      Stefan =?iso-8859-1?Q?E=DFer?= <se@FreeBSD.org>
To:        audit@freebsd.org
Subject:   [Patch] ISDN+NETGRAPH+INET6 => Panic
Message-ID:  <20041101215146.GA42768@StefanEsser.FreeBSD.org>

next in thread | raw e-mail | index | archive | help
There is a problem with the initialization order of network devices
and protocols/address families, leading to a later panic because of
an incomplete initialization of a data structure, if INET6 configured.

To reproduce, build a kernel with ISDN, Netgraph and INET6. The IPv6
neighbour discovery function will panic after exactly one hour, because
of a NULL pointer dereference. The cause is the initialization of both
the ISDN (pseudo) devices and Netgraph in unspecified order due to:

	SYSINIT(... ,SI_SUB_PSEUDO, SI_ORDER_ANY);

A side effect of the initialization of Netgraph is that the "netgraph"
protocol domain is defined early. It should not happen before phase
SI_SUB_PROTO_DOMAIN:

  SI_SUB_INIT_IF		= 0x3000000,	/* prep for net interfaces */
[ ... ]
  SI_SUB_PSEUDO			= 0x7000000,	/* pseudo devices*/
  SI_SUB_EXEC			= 0x7400000,	/* execve() handlers */
  SI_SUB_PROTO_BEGIN		= 0x8000000,	/* XXX: set splimp (kludge)*/
  SI_SUB_PROTO_IF		= 0x8400000,	/* interfaces*/
  SI_SUB_PROTO_DOMAIN		= 0x8800000,	/* domains (address families?)*/
  SI_SUB_PROTO_IFATTACHDOMAIN	= 0x8800001,	/* domain dependent data init*/
  SI_SUB_PROTO_END		= 0x8ffffff,	/* XXX: set splx (kludge)*/

As soon as the netgraph domain is defined, if_attachdomain1() will
be called during if_attach(), resulting in if_afdata_initialized=1,
which will prevent the correct and required initialisation during
sysinit phase SI_SUB_PROTO_IFATTACHDOMAIN.

The least intrusive patch is attached below. It forces an order into the
initialisation of i4b and netgraph by use of SI_ORDER_MIDDLE for the i4b
init. This is not a complete fix, since it is netgraph that is violating
the rules. It should not call net_add_domain() before phase SI_PROTO_DOMAIN.
Currently only SI_ORDER_ANY is used with SI_SUB_PSEUDO, which means that
no order is enforced, the order is effectively unspecified. Changing the
order parameter to SI_ORDER_MIDDLE leads to an order that might have
resulted from the use of SI_ORDER_ANY by accident.

I'd rather leave the moving of the net_add_domain(netgraph) into phase
SI_SUB_PROTO_DOMAIN (assuming that it really belongs there) to the
Netgraph maintainers.

Any objections to adding the following work-around to RELENG_5 via
-CURRENT ? I'd really want to see it in 5.3, since it does fix a real
problem well enough to make ISDN work in kernels with NETGRAPH and INET6.
That configuration will carsh after one hour of uptime, else:



Index: i4b/include/i4b_global.h
--- i4b/include/i4b_global.h	15 Jul 2004 08:26:05 -0000	1.12
+++ i4b/include/i4b_global.h	1 Nov 2004 18:48:07 -0000
@@ -67,7 +67,8 @@
 		name ## _modevent, \
 		(void *)sym \
 	}; \
-	DECLARE_MODULE(name, name ## _mod, SI_SUB_PSEUDO, SI_ORDER_ANY)
+	/* XXX work-around: i4b must be initialized before netgraph! */ \
+	DECLARE_MODULE(name, name ## _mod, SI_SUB_PSEUDO, SI_ORDER_MIDDLE)
 #endif
 
 /*---------------*/



>From a mail threat that discussed a similar panic caused by the
neighbour discovery function I got the following patch that was
attributed to Robert Watson's netperf branch:

Index: netinet6/nd6.c
--- netinet6/nd6.c	11 Oct 2004 03:46:38 -0000	1.43.2.3
+++ netinet6/nd6.c	30 Oct 2004 21:08:17 -0000
@@ -1798,6 +1798,8 @@
 	    nd6_slowtimo, NULL);
 	IFNET_RLOCK();
 	for (ifp = TAILQ_FIRST(&ifnet); ifp; ifp = TAILQ_NEXT(ifp, if_list)) {
+		if (ifp->if_afdata[AF_INET6] == NULL)
+			continue;
 		nd6if = ND_IFINFO(ifp);
 		if (nd6if->basereachable && /* already initialized */
 		    (nd6if->recalctm -= ND6_SLOWTIMER_INTERVAL) <= 0) {

That patch did not prevent the panic for me, since there is another
similar loop in in6_tmpaddrtimer():

Index: netinet6/in6_ifattach.c
--- netinet6/in6_ifattach.c	27 Aug 2004 04:46:27 -0000	1.23.2.1
+++ netinet6/in6_ifattach.c	1 Nov 2004 00:06:30 -0000
@@ -895,6 +895,8 @@
 
 	bzero(nullbuf, sizeof(nullbuf));
 	for (ifp = TAILQ_FIRST(&ifnet); ifp; ifp = TAILQ_NEXT(ifp, if_list)) {
+		if (ifp->if_afdata[AF_INET6] == NULL)
+			continue;
 		ndi = ND_IFINFO(ifp);
 		if (bcmp(ndi->randomid, nullbuf, sizeof(nullbuf)) != 0) {
 			/*

These two NULL pointer checks are not necessary in the ISDN+NETGRAPH+
INET6 scenario, but may well be if there are other calls of if_attach()
during SI_SUB_PSEUDO/SI_ORDER_ANY. In fact, a KASSERT() may be more
appropriate, since the premature call of net_add_domain() will cause
incomplete initialisation of network device structures until after
phase SI_SUB_PROTO_DOMAIN. Or just a plain

  panic("%s: incomplete initialisation of %s!", __func__, ifp->if_xname);

instead of the "continue" in both patches above ...


I think all three places should be fixed. The first one does make
ISDN work again in a kernel with NETGRAPH and INET6 by fixing the
initialisation of i4b devices. The latter two either mask or report
the incomplete initialisation of pseudo network devices.

Any comments on the patches ?

Any objections to me commiting them to -CURRENT ?

I'd want to see the i4b SYSINIT patch in 5.3, since it is a fix
and does just enforce an order in an as of now random sequence,
thus can't have a negative impact.

Regards, STefan


Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20041101215146.GA42768>