From owner-freebsd-current@FreeBSD.ORG Fri Jul 31 14:59:23 2009 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id B5D8E106564A for ; Fri, 31 Jul 2009 14:59:23 +0000 (UTC) (envelope-from asmrookie@gmail.com) Received: from mail-fx0-f210.google.com (mail-fx0-f210.google.com [209.85.220.210]) by mx1.freebsd.org (Postfix) with ESMTP id 131C18FC12 for ; Fri, 31 Jul 2009 14:59:22 +0000 (UTC) (envelope-from asmrookie@gmail.com) Received: by fxm6 with SMTP id 6so194088fxm.43 for ; Fri, 31 Jul 2009 07:59:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:sender:received:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=73U4yfcJrzV+NFUhpM/f20IQ3ZPW56uV+AISipJl2R0=; b=LPBUVRvA9HeQZp60EZY8Qtw1hnePbtxKMkBDFwM6VXNsXzDe+3SUj7Syji8Mf0PS84 IdlevrYFCt7ZpcTG6rX/Er4eMvnyzi/7NDk/zTNHRw9ji3wWmziVsEYEbIkejjNA0Ce2 LIr3PgQGJjoaM3MzdaHNMVdF0Q0Uf2mNc4nI8= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:sender:date:x-google-sender-auth:message-id:subject :from:to:cc:content-type:content-transfer-encoding; b=XdaFESkiyraSQpAhV3U3qAluH92dhoUKV9KV9hiFHK7FZxzJ6X200/+/bHvEhSmrCq qvhvt5rTmoXc1pVmypw1S7GO7J5w73vxTxKoJqmsr9rVEBSgBB6h92yCiPfMrsgf0b1k 4GS2B3t+FQZTH2SWXrX/wR/RH5Oq+KbFm5SVs= MIME-Version: 1.0 Sender: asmrookie@gmail.com Received: by 10.223.115.193 with SMTP id j1mr1564858faq.98.1249052361845; Fri, 31 Jul 2009 07:59:21 -0700 (PDT) Date: Fri, 31 Jul 2009 16:59:21 +0200 X-Google-Sender-Auth: 61a4962dd9430720 Message-ID: <3bbf2fe10907310759o3be1f565t4122fcd66c4531f4@mail.gmail.com> From: Attilio Rao To: freebsd-current@freebsd.org Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Cc: Peter Holm Subject: [PATCH] Newbus locking X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 31 Jul 2009 14:59:24 -0000 I spent the last two weeks searching, coding and trying different ways to get newbus locked. In newbus we have a lot of different members in datastructures which are accessed in parallel: * A list of devclasses * Any devclass maintains a list of drivers and a table of devices (useful for accessing them through the unit number) * Any device maintain a list of children devices and makes operations on them (generally a bus) * Flags for devclasses and states for devices In order to maintain consistency on all the accesses to such datastructures I outlined the following pre-requisites: * Locking must maintain consistency between set of different operations (you can't consider to break things like devclasses/children devices lists modifications in the middle) * Locking must not change pre-requisites in the callers (aka: don't switch a normally not sleeping function into a sleeping one) * Locking must take into account that virtual functions (example: BUS_DRIVER_ADDED(), DEVICE_DETACH(), etc) can sleeep and however their content is unknown. * Caching objects within the above mentioned datastructure was not a good option because if datastructures were allowed to add/remove new objects in the while some informations could have lost In order to have a good locking scheme I tried several things. The problem is that newbus presents a lot of calls willing to access to one of the lists and calling a virtual function inside. This was giving some problems because just dropping the lock, call the function and reacquiring the lock was not a viable option. Refcounting an objects on reading of such datastructures (like the list of drivers for a given devclass, for example) was going to work but it was going to offer much pain on the write path for the datastructure: as long as we could not sleep we had to fail the operation. While this could work in place it was going to complicate things a lot and adding a lot of failure point. While it is true that in newbus operations can fail, it is also true they often don't fail (they just usually do in the case of failed malloc()s, for example). This would have given high likely of failure on simple things like device probe and attach. This method, however, was also not going to protect efficiently members like devices flags. In order to satisfy prerequisites, the better thing would have been to: * Have a lock which could have be maintained through virtual functions (so probabilly a sx lock) * Have a lock to be acquired outside of the functions itself (in order to avoid recursion in the newbus specific functions and maintain a critical path across several calls) This is not too difficult and it is basically what Giant currently does. Additively, newbus design helps us a bit here because newbus modifies happens in some well known contexts: 1) boot time, single-threaded, busses/device initialization 2) post-boot, multi-threaded, busses/device initialization via the config hooks 3) character device operations which handles new devices (it is the case of ATA, for example) 4) run-time loaded modules (referred to driver_module_handler()) 5) edge cases of adeguately dedicated threads (usb) and taskqueues (ATA) that do post-initialization probing The idea is to use a global lock to be acquired in key points, with a bit of help from the consumers (and please note that consumers alredy forsee such help in the Giant case, so we are no adding anything difficult). More in detail: 1 is adeguately protected alone on its own because it is single-threaded. 2 can be protected by locking in the config hooks themselves by each driver (as it happens now) or locking in the run_interrupt_driven_config_hooks() directly (for the patch I followed the former approach in order to minimize the overhead when it is not necessary) 3 can have sleepable context so they can handle to acquire a sx lock there. 4 can be simply locked in driver_module_handler() 5 should have its own locking as any normal thread, so they can acquire newbus lock So I prepared this patch based on the above mentioned analysis: http://www.freebsd.org/~attilio/Yahoo/newbus/newbus_locking3.diff This patch has been reviewed by all the key committers (in the newbus and device drivers area) and tested by a couple of testers, but I'd really would like to gather more testing before to commit the patch as we are very near to a new release. More specifically I would like to get more testing in the 'hotplugging devices' area like USB and pccard. Ideally, a tester, should compile a kernel with KDB, DDB, INVARIANT_SUPPORT, INVARIANTS and WITNESS and try how many different devices he can, both loading as module or directly at boot time. Ideally, testers should seek for LOR involving the newbus lock, panic for newbus lock recursion or lost assertions on newbus lock and Giant. If one of these happens, please report the backtrace along with the 'show alllocks' from ddb (if you can't collect with textdumps, console or pictures, even writing down by hand is a viable option). This patch is considered a major cornerstone for FreeBSD-8.0 RELEASE and we should consider its successfull completition an high priority task. The work has been kindly founded by Yahoo! incorporated. Thanks, Attilio -- Peace can only be achieved by understanding - A. Einstein