From owner-freebsd-hackers@freebsd.org Wed Aug 2 18:49:32 2017 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 8BE2FDB10BA; Wed, 2 Aug 2017 18:49:32 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from mail.baldwin.cx (bigwig.baldwin.cx [96.47.65.170]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 638A672E68; Wed, 2 Aug 2017 18:49:30 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from ralph.baldwin.cx (c-73-231-226-104.hsd1.ca.comcast.net [73.231.226.104]) by mail.baldwin.cx (Postfix) with ESMTPSA id 2E8BC10AB01; Wed, 2 Aug 2017 14:49:29 -0400 (EDT) From: John Baldwin To: Hans Petter Selasky Cc: freebsd-current@freebsd.org, FreeBSD Hackers , Andriy Gapon Subject: Re: order of executing MOD_LOAD and registering module sysctl-s Date: Wed, 02 Aug 2017 11:49:16 -0700 Message-ID: <2156888.F9y1Fr7Oe8@ralph.baldwin.cx> User-Agent: KMail/4.14.10 (FreeBSD/11.1-STABLE; KDE/4.14.30; amd64; ; ) In-Reply-To: <1c40d6ef-bfd2-d8ee-e057-47cd8d695544@selasky.org> References: <62e7ab4d-8956-545e-b204-4fb63cfe5fbf@FreeBSD.org> <2718016.8bPh6cqhGc@ralph.baldwin.cx> <1c40d6ef-bfd2-d8ee-e057-47cd8d695544@selasky.org> MIME-Version: 1.0 Content-Transfer-Encoding: 7Bit Content-Type: text/plain; charset="us-ascii" X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.4.3 (mail.baldwin.cx); Wed, 02 Aug 2017 14:49:29 -0400 (EDT) X-Virus-Scanned: clamav-milter 0.99.2 at mail.baldwin.cx X-Virus-Status: Clean X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 02 Aug 2017 18:49:32 -0000 On Wednesday, August 02, 2017 06:53:54 PM Hans Petter Selasky wrote: > On 08/02/17 17:49, John Baldwin wrote: > > On Wednesday, August 02, 2017 12:39:36 PM Hans Petter Selasky wrote: > >> On 08/02/17 12:13, Andriy Gapon wrote: > >>> > >>> As far as I understand a module initialization routine is executed via the > >>> sysinit mechanism. Specifically, module_register_init is set up as the sysinit > >>> function for every module and it calls MOD_EVENT(mod, MOD_LOAD) to invoke the > >>> module event handler. > >>> > >>> In linker_load_file() I see the following code: > >>> linker_file_register_sysctls(lf); > >>> linker_file_sysinit(lf); > >>> > >>> I think that this means that any statically declared sysctl-s in the module > >>> would be registered before the module receives the MOD_LOAD event. > >>> It's possible that some of the sysctl-s could have procedures as handlers and > >>> they might access data that is supposed to be initialized by the module event > >>> handler. > >>> > >>> So, for example, running sysctl -a at just the right moment during the loading > >>> of a module might end up in an expected behavior (including a crash). > >>> > >>> Is my interpretation of how the code works correct? > >>> Can the order of linker_file_sysinit and linker_file_register_sysctls be changed > >>> without a great risk? > >>> > >>> Thank you! > >>> > >>> P.S. > >>> The same applies to: > >>> linker_file_sysuninit(file); > >>> linker_file_unregister_sysctls(file); > >>> > >> > >> Hi, > >> > >> Not sure if this answers your question. > >> > > Hi, > > >> If a SYSCTL() is TUNABLE, it's procedure can be called when the sysctl > >> is created. Else the SYSCTL() procedure callback might be called right > >> after it's registered. I think there is an own subsystem in sys/kernel.h > >> which takes care of the actual SYSCTL() creation/destruction - after the > >> linker is involved. > > > > sysctl nodes are created explicitly via linker_file_register_sysctls, not via > > SYSINITs, so you can't order them with respect to other init functions. > > For GENERIC (non-modules) the SYSCTLS() are registered by > sysctl_register_all() at SYSINIT(sysctl, SI_SUB_KMEM, SI_ORDER_FIRST, > sysctl_register_all, 0); > > > > > I think Andriy's suggestion of doing sysctls "inside" sysinits (so they are > > registered last and unregistered first) is probably better than the current > > state and is a simpler fix than changing all sysctls to use SYSINITs. > > > > If the module provided SYSCTLS's could use the same SI_SUB_KMEM it would > be compatible. > > You have three cases to think about: > > 1) SYSCTLS's in modules loaded before the kernel is booted > 2) SYSCTLS's in modules after the kernel is booted > 3) SYSCTLS's in the GENERIC kernel. > > I'm not 100% sure, but I think 1) and 2) are treated differently. > Correct me if I'm wrong. 3) sysctls in the kernel are handled at SI_SUB_KMEM. 1) modules loaded by the loader are handled in linker_preload() at SI_SUB_KLD. Their sysctls are all registered when linker_preload() executes. Their SYSINITs are added to the pending sysinit list. Any SYSINITs earlier than SI_SUB_KLD will be executed in sorted order by mi_startup() after linker_preload() returns. Any SYSINITs after SI_SUB_KLD will be kept in the pending list in order and executed as if they were in the kernel. 2) All of the sysctl's are registered first, and afterwards the SYSINITs are run in sorted order. The race Andriy describes has always been present, but it was perhaps easier to fix by just inverting the order when TUNABLE_* were used explicitly as those registered explicit SI_SUB_TUNABLES sysinits. Note that it has always been the case with old TUNABLE_* that handlers could be run before locks were initialized (e.g. via MTX_SYSINIT which runs later at SI_SUB_LOCK), so if you have handlers that need to use locks, etc. you really shouldn't use RDTUN/RWTUN but instead you should use a dedicated SYSINIT to read a tunable and apply the right logic. There are a few different ways to fix Andriy's race while still solving the issue Ian notes that you may have SYSINITs that depend on the tunable fetches having been performed. One observation is that it shouldn't break anything to invert the order on unload and always unregister sysctls before invoking SYSUNINITs, so I think we should probably make that part of the change regardless. The variety comes in how to handle the module loading case: 1) Just extend the SYSCTL_XLOCK and hold it while invoking SYSINITs in modules during post-boot, and/or use some other explicit locking mechanism with sleep/wakeup or a cv to "pause" any sysctl requests until a kld is fully registered. This is a bit of a sledgehammer, but it is simple and kldload isn't a hot path. 2) Use two passes the add sysctl nodes. The first pass registers the sysctl and marks it as "new" which prevents it from being invoked, but the name can be extracted to generate tunable string, etc. SYSINITs are run after the first pass when loading post-boot, and the second pass after SYSINITs goes through and clears "new" from the sysctl nodes. I would probably lean towards 1) as it is simpler. I suspect that holding SYSCTL_XLOCK across SYSINITs won't fly as I know some SYSINIT routines can add dynamic sysctls (e.g. driver_module_handler SYSINIT for if_cxgbe.ko will invoke t4_attach -> t4_sysctls), so you'd have to use the sleep/wakeup or cv route. -- John Baldwin