From owner-freebsd-current@freebsd.org  Mon Jul 13 14:52:14 2015
Return-Path: <owner-freebsd-current@freebsd.org>
Delivered-To: freebsd-current@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 6081B999E16
 for <freebsd-current@mailman.ysv.freebsd.org>;
 Mon, 13 Jul 2015 14:52:14 +0000 (UTC)
 (envelope-from luigi@onelab2.iet.unipi.it)
Received: from onelab2.iet.unipi.it (onelab2.iet.unipi.it [131.114.59.238])
 by mx1.freebsd.org (Postfix) with ESMTP id E45F2151B;
 Mon, 13 Jul 2015 14:52:13 +0000 (UTC)
 (envelope-from luigi@onelab2.iet.unipi.it)
Received: by onelab2.iet.unipi.it (Postfix, from userid 275)
 id 8768A7300A; Mon, 13 Jul 2015 17:00:30 +0200 (CEST)
Date: Mon, 13 Jul 2015 17:00:30 +0200
From: Luigi Rizzo <rizzo@iet.unipi.it>
To: Konstantin Belousov <kostikbel@gmail.com>
Cc: freebsd-current <freebsd-current@freebsd.org>,
 John Baldwin <jhb@freebsd.org>
Subject: Re: protection against module unloading ?
Message-ID: <20150713150030.GA25208@onelab2.iet.unipi.it>
References: <CA+hQ2+iNWHzH=L6YC8TQvsWsz3EhMcLTv9VOP5nXtmDWD4TD0g@mail.gmail.com>
 <20150713124603.GH2404@kib.kiev.ua>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20150713124603.GH2404@kib.kiev.ua>
User-Agent: Mutt/1.5.20 (2009-06-14)
X-BeenThere: freebsd-current@freebsd.org
X-Mailman-Version: 2.1.20
Precedence: list
List-Id: Discussions about the use of FreeBSD-current
 <freebsd-current.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-current>, 
 <mailto:freebsd-current-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-current/>
List-Post: <mailto:freebsd-current@freebsd.org>
List-Help: <mailto:freebsd-current-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>,
 <mailto:freebsd-current-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 13 Jul 2015 14:52:14 -0000

On Mon, Jul 13, 2015 at 03:46:03PM +0300, Konstantin Belousov wrote:
> On Mon, Jul 13, 2015 at 02:28:40PM +0200, Luigi Rizzo wrote:
> > Hi,
> > I am trying to understand how to protect efficiently against
> > module removals when a device driver is in use.
> > This issue came up some time ago when trying netmap
> > loaded as a module.
> > 
> > --- Summary of the current mode of operation (as i understand it): ---
> > 
> > When a device is compiled as a module, all devfs calls in
> > sys/fs/devfs/devfs_vnops.c are protected by setting
> > dev->si_threadcount (within dev*_refthread()) for the duration of the call.
> > 
> > This is implemented using devfs_fp_check()/dev_relthread() which
> > in turn, for the module case, updates the dev->si_threadcount under
> > dev_lock() which is a single shared mutex, devmtx .
> > 
> > The above is problematic for driver modules that do a lot of I/O
> > using poll, ioctl, read or write, because of the system-wide contention
> > on devmtx
> > 
> > To mitigate the contention, statically compiled modules have the
> > SI_ETERNAL attribute that prevents the lock (major contention point)
> > and si_threadcount update.
> > 
> > --- Alternative ?
> > 
> > I was hoping to make the protection mechanism cheaper by only
> > increasing  dev->si_threadcount on devfs_open() without calling
> > dev_relthread() ), and then decreasing si_threadcount on defvs_close() .
> > This way the reference is active for the lifetime of the file descriptor
> > without needing to track individual high-frequency calls.
> > 
> > This is probably not enough because there could be mmap handlers,
> > which remain active even after the device is closed.
> This is of course wrong because destroy_dev(9) would be blocked until
> all file descriptors referencing the device are closed.  The blocked
> thread which called destroy_dev() is blocked hard, i.e. you cannot
> interrupt it with a signal.  The situation is incomprehensible for
> the user issued the kldunload command.
> 
> The current si_threadcount mechanism is designed and intended to work on
> the code call granulariry, to allow the device destruction and driver
> unload when there is no code call into the driver.

thanks a lot for the clarification on the intent.
I clearly need to understand more on the architecture of the module unload.

In any case: the global contention on devmtx for I/O syscalls is
really a showstopper for making effective use of modular drivers
so we should really find a way to remove it.
Is there any other way to protect access to dev->si_threadcount ?

Eg how about the following:
- use a (leaf) lock into struct cdev to protect dev->si_threadcount, so that
  we could replace dev_lock() with mtx_lock(&dev->foo) in dev_refthread(dev)
  dev_relthread(dev) and other places that access si_threadcount

- devvn_refthread() further uses vp->v_rdev, which elsewhere is protected by
  VI_LOCK(vp), so devvn_refthread() could use the sequence
	VI_LOCK(vp);
	dev = vp->v_rdev;
	if (dev != NULL) {
		mtx_lock(&dev->foo)
		if ( ... ) {
			atomic_add_long(&dev->si_threadcount, 1);
		}
		mtx_unlock(&dev->foo);
	}
	VI_UNLOCK(vp)


  (this is almos the same as what we have in devfs_allocv() and devfs_reclaim(),
  except that they use dev_lock() instead of the device-specific lock.

cheers
luigi