Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 30 Nov 2011 01:20:37 +0100
From:      Daan Vreeken <Daan@vitsch.nl>
To:        Gleb Smirnoff <glebius@freebsd.org>
Cc:        FreeBSD Current <freebsd-current@freebsd.org>
Subject:   Re: if_clone.c allows creating multiple interfaces with the same name
Message-ID:  <201111300120.37501.Daan@vitsch.nl>
In-Reply-To: <20111129142842.GC44498@glebius.int.ru>
References:  <201111240928.51162.Daan@vitsch.nl> <201111290107.13631.Daan@vitsch.nl> <20111129142842.GC44498@glebius.int.ru>

next in thread | previous in thread | raw e-mail | index | archive | help
Hi Glebius,

On Tuesday 29 November 2011 15:28:42 Gleb Smirnoff wrote:
>   Daan,
>
> On Tue, Nov 29, 2011 at 01:07:13AM +0100, Daan Vreeken wrote:
> D> Thanks for the looking into this and for your quick commit. I like your
> D> twist on the patch with the move from the unit bitmap to allocating unit
> D> numbers with alloc_unr(9).
> D>
> D> I do have two comments on the new code though.
> D> Before, in the !wildcard case, ifc_alloc_unit() would return ENOSPC when
> D> the user requested a unit number larger than ifc->ifc_maxunit. Now the
> D> function returns EEXIST in that case because alloc_unr_specific()
> D> returns -1 both when a number is already allocated and when the number
> D> exceeds it's limits. This leads to the somewhat confusing:
> D>
> D>         # ifconfig bridge123456 create
> D>         ifconfig: SIOCIFCREATE2: File exists
> D>
> D> Apart from that I'm a bit worried about the "/* XXXGL: yep, it's a unit
> D> leak */" part of your change. Once a unit number is leaked, there
> D> currently seems to be no way to ever get it back. In a worst case
> D> scenario, where the names of multiple interfaces in a row collides with
> D> the numbers alloc_unr() returns, an entire row of unit numbers could be
> D> leaked during the creation of a single cloned interface.
> D> We have a product where cloned interfaces come and go very frequently.
> D> I'd hate to run out of unit numbers after 'just a few' years of uptime
> D> ;-)
> D> I've created a new patch on top of your change that fixes both of these
> D> problems. Could you have a look at the attached diff?
>
> Thanks, I will work on applying it.

Great!

> D> > Considering the second part, that adds locking. Unfortunately, right
> D> > now we have numerous races in the network configuration ocde. Many
> D> > SIOCSsomething ioctls can race with each other producing unpredictable
> D> > results and kernel panics. So, running two ifconfig(8) in parallel is
> D> > a bad idea today. :(
> D> > Your patch with IFNET_NAMING_LOCK() just plumbs one race case: a race
> D> > between two SIOCSIFNAME ioctls. And it doesn't plumb a race between
> D> > SIOCSIFNAME vs SIOCIFCREATE, because IFNET_NAMING_LOCK() is dropped
> D> > after unit allocation, but prior to interface attachement to global
> D> > interface list. 
> D>
> D> Are you sure? With the patch in the PR, the relevant code in
> D> ifc_simple_create() would become :
> D>
> D> 	...
> D> 	IFNET_NAMING_LOCK();
> D> 	err = ifc_alloc_unit(ifc, &unit);
> D> 	if (err != 0) {
> D> 		IFNET_NAMING_UNLOCK();
> D> 		return (err);
> D> 	}
> D>
> D> 	err = ifcs->ifcs_create(ifc, unit, params);
> D> 	IFNET_NAMING_UNLOCK();
> D> 	if (err != 0) {
> D> 		ifc_free_unit(ifc, unit);
> D> 		return (err);
> D> 	}
> D> 	...
> D>
> D> The lock is acquired before the call to ifc_alloc_unit().
> D> In the non-error case (e.g. when creating an if_bridge interface) the
> D> call ends up calling bridge_clone_create(), which calls
> D> ether_ifattach(), which calls if_attach() -> if_attach_internal() where
> D> the ifp is added to the ifnet list.
> D> Only when that's done, the lock is dropped.
> D>
> D> Am I overlooking something obvious here?
>
> The code above isn't correct since it holds mutex when calling
> ifcs->ifcs_create(). These methods may (they actually do) call malloc()
> with M_WAITOK.

I'd noticed the malloc() too (see my comment in the PR: "We can't use 
IFNET_WLOCK() here since the code paths may sleep.")
The IFNET_WLOCK() macro is defined as:

	#define IFNET_WLOCK() do {                                              \
		sx_xlock(&ifnet_sxlock);                                        \
		rw_wlock(&ifnet_rwlock);                                        \
	} while (0)

Since rwlocks cannot be held while sleeping, I've added the IFNET_NAMING_ 
macros in the patch, defining them as :

	#define IFNET_NAMING_LOCK()	sx_xlock(&ifnet_sxlock);
	#define IFNET_NAMING_UNLOCK()	sx_unlock(&ifnet_sxlock);

This only does the first half of the work IFNET_WLOCK() normally does. My 
understanding from reading the sx(9) manual is that holding an sx lock while 
sleeping should be allowed.
(WITNESS and INVARIANTS didn't seem to disagree with me. :)
After acquiring the ifnet_sxlock, malloc() should be allowed to sleep while 
allocating memory for the new interface. When it's done sleeping, the 
interface can be added to the ifnet list in if_attach().
if_attach() will IFNET_WLOCK() which should recurse on the ifnet_sxlock and 
acquire a lock on the ifnet_rwlock.

(I'm still confused as to why this wouldn't work.)


> In general FreeBSD uses M_WAITOK on the configuration code paths, like
> any SIOCSsomething ioctl. So to do correct protection, you first need to
> allocate every kind of memory needed, then lock mutex, then run through
> configuration sequence, then release mutex and in case of error free all
> allocated memory. Sounds easy, but isn't in practice, especially when
> several network modules are involved.
>
> So I'm still thinking about...
>
> D> > From my point of view, we need a generic approach to ioctl() vs
> D> > ioctl() races, may be some global serializer of all re-configuration
> D> > requests of interfaces and addresses.
>
> ... but several developers have noted that this approach may have some
> hidden problems.  When experimenting with new CARP, I have tried it on the
> carp_ioctl() without any problems. The idea is simple:
>
> static int serializer = 0;
> static struct mtx serializer_mtx;
> MTX_SYSINIT("sermtx", &serializer_mtx, "ioctl serializer mutex", MTX_DEF);
>
> int
> foo_ioctl()
> {
> 	mtx_lock(&serializer_mtx);
> 	if (serializer != 0)
> 		msleep(&serializer, &serializer_mtx, 0, "ioctl", 0);
> 	serializer = 1;
> 	mtx_unlock(&serializer_mtx);
>
> 	... any code here, uncluding calls to different lower layers...
>
> 	mtx_lock(&serializer_mtx);
> 	serializer = 0;
> 	wakeup(&serializer);
> 	mtx_unlock(&serializer_mtx);
>
> 	return (error);
> }
>
> I have tried this for carp_ioctl() and it worked fine. You can try it for
> entire ifioctl() and all its descendants, but be aware of hidden problems
> :)

That reminds me of sem_wait() and sem_post(). I'm going to remember that 
construction. It could come in handy some time. :)


Regards,
-- 
Daan Vreeken
Vitsch Electronics
http://Vitsch.nl
tel: +31-(0)40-7113051 / +31-(0)6-46210825
KvK nr: 17174380



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201111300120.37501.Daan>