Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 02 Aug 2017 11:49:16 -0700
From:      John Baldwin <jhb@freebsd.org>
To:        Hans Petter Selasky <hps@selasky.org>
Cc:        freebsd-current@freebsd.org, FreeBSD Hackers <freebsd-hackers@freebsd.org>, Andriy Gapon <avg@freebsd.org>
Subject:   Re: order of executing MOD_LOAD and registering module sysctl-s
Message-ID:  <2156888.F9y1Fr7Oe8@ralph.baldwin.cx>
In-Reply-To: <1c40d6ef-bfd2-d8ee-e057-47cd8d695544@selasky.org>
References:  <62e7ab4d-8956-545e-b204-4fb63cfe5fbf@FreeBSD.org> <2718016.8bPh6cqhGc@ralph.baldwin.cx> <1c40d6ef-bfd2-d8ee-e057-47cd8d695544@selasky.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Wednesday, August 02, 2017 06:53:54 PM Hans Petter Selasky wrote:
> On 08/02/17 17:49, John Baldwin wrote:
> > On Wednesday, August 02, 2017 12:39:36 PM Hans Petter Selasky wrote:
> >> On 08/02/17 12:13, Andriy Gapon wrote:
> >>>
> >>> As far as I understand a module initialization routine is executed via the
> >>> sysinit mechanism.  Specifically, module_register_init is set up as the sysinit
> >>> function for every module and it calls MOD_EVENT(mod, MOD_LOAD) to invoke the
> >>> module event handler.
> >>>
> >>> In linker_load_file() I see the following code:
> >>>                           linker_file_register_sysctls(lf);
> >>>                           linker_file_sysinit(lf);
> >>>
> >>> I think that this means that any statically declared sysctl-s in the module
> >>> would be registered before the module receives the MOD_LOAD event.
> >>> It's possible that some of the sysctl-s could have procedures as handlers and
> >>> they might access data that is supposed to be initialized by the module event
> >>> handler.
> >>>
> >>> So, for example, running sysctl -a at just the right moment during the loading
> >>> of a module might end up in an expected behavior (including a crash).
> >>>
> >>> Is my interpretation of how the code works correct?
> >>> Can the order of linker_file_sysinit and linker_file_register_sysctls be changed
> >>> without a great risk?
> >>>
> >>> Thank you!
> >>>
> >>> P.S.
> >>> The same applies to:
> >>>                   linker_file_sysuninit(file);
> >>>                   linker_file_unregister_sysctls(file);
> >>>
> >>
> >> Hi,
> >>
> >> Not sure if this answers your question.
> >>
> 
> Hi,
> 
> >> If a SYSCTL() is TUNABLE, it's procedure can be called when the sysctl
> >> is created. Else the SYSCTL() procedure callback might be called right
> >> after it's registered. I think there is an own subsystem in sys/kernel.h
> >> which takes care of the actual SYSCTL() creation/destruction - after the
> >> linker is involved.
> > 
> > sysctl nodes are created explicitly via linker_file_register_sysctls, not via
> > SYSINITs, so you can't order them with respect to other init functions.
> 
> For GENERIC (non-modules) the SYSCTLS() are registered by 
> sysctl_register_all() at SYSINIT(sysctl, SI_SUB_KMEM, SI_ORDER_FIRST, 
> sysctl_register_all, 0);
> 
> > 
> > I think Andriy's suggestion of doing sysctls "inside" sysinits (so they are
> > registered last and unregistered first) is probably better than the current
> > state and is a simpler fix than changing all sysctls to use SYSINITs.
> > 
> 
> If the module provided SYSCTLS's could use the same SI_SUB_KMEM it would 
> be compatible.
> 
> You have three cases to think about:
> 
> 1) SYSCTLS's in modules loaded before the kernel is booted
> 2) SYSCTLS's in modules after the kernel is booted
> 3) SYSCTLS's in the GENERIC kernel.
> 
> I'm not 100% sure, but I think 1) and 2) are treated differently. 
> Correct me if I'm wrong.

3) sysctls in the kernel are handled at SI_SUB_KMEM.
1) modules loaded by the loader are handled in linker_preload() at
   SI_SUB_KLD.  Their sysctls are all registered when linker_preload()
   executes.  Their SYSINITs are added to the pending sysinit list.
   Any SYSINITs earlier than SI_SUB_KLD will be executed in sorted
   order by mi_startup() after linker_preload() returns.  Any SYSINITs
   after SI_SUB_KLD will be kept in the pending list in order and
   executed as if they were in the kernel.
2) All of the sysctl's are registered first, and afterwards the
   SYSINITs are run in sorted order.

The race Andriy describes has always been present, but it was perhaps
easier to fix by just inverting the order when TUNABLE_* were used 
explicitly as those registered explicit SI_SUB_TUNABLES sysinits.

Note that it has always been the case with old TUNABLE_* that handlers
could be run before locks were initialized (e.g. via MTX_SYSINIT which
runs later at SI_SUB_LOCK), so if you have handlers that need to use
locks, etc. you really shouldn't use RDTUN/RWTUN but instead you should
use a dedicated SYSINIT to read a tunable and apply the right logic.

There are a few different ways to fix Andriy's race while still solving
the issue Ian notes that you may have SYSINITs that depend on the tunable
fetches having been performed.  One observation is that it shouldn't
break anything to invert the order on unload and always unregister sysctls
before invoking SYSUNINITs, so I think we should probably make that part
of the change regardless.  The variety comes in how to handle the
module loading case:

1) Just extend the SYSCTL_XLOCK and hold it while invoking SYSINITs in
   modules during post-boot, and/or use some other explicit locking
   mechanism with sleep/wakeup or a cv to "pause" any sysctl requests
   until a kld is fully registered.  This is a bit of a sledgehammer, but
   it is simple and kldload isn't a hot path.

2) Use two passes the add sysctl nodes.  The first pass registers the
   sysctl and marks it as "new" which prevents it from being invoked,
   but the name can be extracted to generate tunable string, etc.
   SYSINITs are run after the first pass when loading post-boot, and
   the second pass after SYSINITs goes through and clears "new" from the
   sysctl nodes.

I would probably lean towards 1) as it is simpler.  I suspect that
holding SYSCTL_XLOCK across SYSINITs won't fly as I know some SYSINIT
routines can add dynamic sysctls (e.g. driver_module_handler SYSINIT
for if_cxgbe.ko will invoke t4_attach -> t4_sysctls), so you'd have to
use the sleep/wakeup or cv route.

-- 
John Baldwin



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?2156888.F9y1Fr7Oe8>