Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 28 Jun 2019 20:16:51 +1000 (EST)
From:      Bruce Evans <brde@optusnet.com.au>
To:        Andriy Gapon <avg@freebsd.org>
Cc:        src-committers@freebsd.org, svn-src-all@freebsd.org,  svn-src-head@freebsd.org
Subject:   Re: svn commit: r349459 - head/sys/sys
Message-ID:  <20190628192001.K982@besplex.bde.org>
In-Reply-To: <bdfad10f-de6d-0a83-288d-c7c79e8469f0@FreeBSD.org>
References:  <201906271507.x5RF775Q070616@repo.freebsd.org> <20190628012059.D2091@besplex.bde.org> <dec79f80-3b00-15ad-cd63-6e6ecb83e176@FreeBSD.org> <20190628032225.T2863@besplex.bde.org> <bdfad10f-de6d-0a83-288d-c7c79e8469f0@FreeBSD.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, 27 Jun 2019, Andriy Gapon wrote:

> On 27/06/2019 20:37, Bruce Evans wrote:
>> On Thu, 27 Jun 2019, Andriy Gapon wrote:
>>
>>> On 27/06/2019 18:47, Bruce Evans wrote:
>>>> On Thu, 27 Jun 2019, Andriy Gapon wrote:
>>>>
>>>>> Log:
>>>>> =C2=A0upgrade the warning printf-s in bus accessors to KASSERT-s
>>>>>
>>>>> =C2=A0After this change sys/bus.h includes sys/systm.h.
>>>>
>>>> This is further namespace pollution.=C2=A0 sys/systm.h is a prerequist=
e for
>>>> all kernel headers except sys/param.h and the headers that that includ=
es,
>>>> since any kernel header (except sys/param.hm etc.) may have inlines wh=
ich
>>>> use features in systm.h, e.g., KASSERT() or an inline function.
>>>
>>> what do you think about amending style(9) to require that if sys/systm.=
h is to
>>> be included, then it must be included before any other header except fo=
r
>>> sys/param.h (and possibly sys/cdefs.h) ?
>>
>> It is not a style matter.
>
> I know... but style(9) documents sys/param.h for similar reasons.
>
>> sys/systm.h is just a prerequisite for almost
>> all kernel code.=C2=A0 Perhaps this can be enforced using #ifdef magic e=
ven
>> for headers that don't currently use it.=C2=A0 If it is to be included n=
ested,
>> then sys/param.h is the place to include it, but I don't like that since
>> it is even less related to parameters than sys/param.h, and this would b=
e
>> a regression from minor depollution of sys/param.h.
>>
>> sys/systm.h is also kernel-only, and as you found, including it nested
>> tends to break applications, so other headers that have the bug of
>> including it tend to have _KERNEL ifdefs to avoid including it in
>> applications.=C2=A0 This is so fundamental that it doesn't have a "No
>> user-serviceable parts inside" ifdef.
>
> I think that there is a trivial fix to my commit: moving the sys/systm.h =
include
> to the _KERNEL section of sys/bus.h.  That's where its definitions are ac=
tually
> used.
> But I agree that there should be a better approach.

I did a quick check of how many .c files are missing includes of sys/systm.=
h.
In an old version of FreeBSD, my normal kernel has about 1100 object files,
and about 89 of these broke when a KASSERT() was added to sys/bus.h without
adding the pollution.  That is many more than I expected.

Many of these errors are in automatically generated bus if.c files.  E.g.,
the include list in acpi_if.c is:

#include <sys/param.h>
#include <sys/queue.h>
#include <sys/kernel.h>
#include <sys/kobj.h>
#include <sys/bus.h>
#include <sys/types.h>
#include <contrib/dev/acpica/include/acpi.h>
#include "acpi_if.h"

Here sys/bus.h and sys/types.h and the contrib include are in the .m file,
and the others are automatically generated.  Including sys/types.h is
nonsense since sys/param.h already included it and sys/bus.h already used
typedefs in it.  The generation also misorders sys/queue.h and mangles
the formatting of the contrib include (in the .m file, it is separated
by spaces).

Here the .m file should know sys/bus.h's prerequisites and add them all
except sys/param.h and sys/systm.h, or maybe the include of sys/bus.h
should be auto-generated too (my test kernel has 25 foo_if.c files and
21 of these include sys/bus.c).

Most of the generated if.c files include sys/systm.h by nested pollution
anyway.  Just not early enough to actually work.  E.g., in acpi_if.c,
the above 8 includes expand to 45 lines, 144 words and 2679 chars in
=2Edepend.acpi_if.o according to wc.  The word count of 144 is
approximately the number of includes.  sys/systm.h is first included
deeply nested in contrib/..../acpi.h.

11 of the 25 generated foo_if.c end up never including sys/systm.h.

Of the remaining 64 files that fail to compile, all except 7 including
1 maintained by me (cy_pci.c) *blush* end up including sys/systm.h.
The other 57 apparently include it either misordered in the .c file
or via nested pollution.

After removing all nested includes of sys/systm.h in <sys> headers, only
another 50 object files fail to build.  most of the extras are for
libkern functions (20 for str*() alone).  libkern was polluted.h relatively
recently as part of unimproving the implementation of bcd conversion APIs
from macros to inlines with KASSERT()s.  The KASSERT()s were a bad fix for
missing sanity checks of bcd data in callers.

Bruce
From owner-svn-src-all@freebsd.org  Fri Jun 28 10:38:58 2019
Return-Path: <owner-svn-src-all@freebsd.org>
Delivered-To: svn-src-all@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 6381615DE63C;
 Fri, 28 Jun 2019 10:38:58 +0000 (UTC)
 (envelope-from hselasky@FreeBSD.org)
Received: from mxrelay.nyi.freebsd.org (mxrelay.nyi.freebsd.org
 [IPv6:2610:1c1:1:606c::19:3])
 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
 server-signature RSA-PSS (4096 bits)
 client-signature RSA-PSS (4096 bits) client-digest SHA256)
 (Client CN "mxrelay.nyi.freebsd.org",
 Issuer "Let's Encrypt Authority X3" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id EDDA26C2C8;
 Fri, 28 Jun 2019 10:38:57 +0000 (UTC)
 (envelope-from hselasky@FreeBSD.org)
Received: from repo.freebsd.org (repo.freebsd.org
 [IPv6:2610:1c1:1:6068::e6a:0])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mxrelay.nyi.freebsd.org (Postfix) with ESMTPS id C1935257FF;
 Fri, 28 Jun 2019 10:38:57 +0000 (UTC)
 (envelope-from hselasky@FreeBSD.org)
Received: from repo.freebsd.org ([127.0.1.37])
 by repo.freebsd.org (8.15.2/8.15.2) with ESMTP id x5SAcvAm088281;
 Fri, 28 Jun 2019 10:38:57 GMT (envelope-from hselasky@FreeBSD.org)
Received: (from hselasky@localhost)
 by repo.freebsd.org (8.15.2/8.15.2/Submit) id x5SAcv5f088278;
 Fri, 28 Jun 2019 10:38:57 GMT (envelope-from hselasky@FreeBSD.org)
Message-Id: <201906281038.x5SAcv5f088278@repo.freebsd.org>
X-Authentication-Warning: repo.freebsd.org: hselasky set sender to
 hselasky@FreeBSD.org using -f
From: Hans Petter Selasky <hselasky@FreeBSD.org>
Date: Fri, 28 Jun 2019 10:38:57 +0000 (UTC)
To: src-committers@freebsd.org, svn-src-all@freebsd.org,
 svn-src-head@freebsd.org
Subject: svn commit: r349506 - in head: share/man/man9 sys/kern sys/sys
X-SVN-Group: head
X-SVN-Commit-Author: hselasky
X-SVN-Commit-Paths: in head: share/man/man9 sys/kern sys/sys
X-SVN-Commit-Revision: 349506
X-SVN-Commit-Repository: base
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
X-Rspamd-Queue-Id: EDDA26C2C8
X-Spamd-Bar: --
Authentication-Results: mx1.freebsd.org
X-Spamd-Result: default: False [-2.98 / 15.00];
 local_wl_from(0.00)[FreeBSD.org];
 NEURAL_HAM_MEDIUM(-1.00)[-0.999,0];
 NEURAL_HAM_LONG(-1.00)[-1.000,0];
 NEURAL_HAM_SHORT(-0.98)[-0.976,0];
 ASN(0.00)[asn:11403, ipnet:2610:1c1:1::/48, country:US]
X-BeenThere: svn-src-all@freebsd.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "SVN commit messages for the entire src tree \(except for &quot;
 user&quot; and &quot; projects&quot; \)" <svn-src-all.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/svn-src-all>,
 <mailto:svn-src-all-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/svn-src-all/>;
List-Post: <mailto:svn-src-all@freebsd.org>
List-Help: <mailto:svn-src-all-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/svn-src-all>,
 <mailto:svn-src-all-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 28 Jun 2019 10:38:58 -0000

Author: hselasky
Date: Fri Jun 28 10:38:56 2019
New Revision: 349506
URL: https://svnweb.freebsd.org/changeset/base/349506

Log:
  Implement API for draining EPOCH(9) callbacks.
  
  The epoch_drain_callbacks() function is used to drain all pending
  callbacks which have been invoked by prior epoch_call() function calls
  on the same epoch. This function is useful when there are shared
  memory structure(s) referred to by the epoch callback(s) which are not
  refcounted and are rarely freed. The typical place for calling this
  function is right before freeing or invalidating the shared
  resource(s) used by the epoch callback(s). This function can sleep and
  is not optimized for performance.
  
  Differential Revision: https://reviews.freebsd.org/D20109
  MFC after:	1 week
  Sponsored by:	Mellanox Technologies

Modified:
  head/share/man/man9/Makefile
  head/share/man/man9/epoch.9
  head/sys/kern/subr_epoch.c
  head/sys/sys/epoch.h

Modified: head/share/man/man9/Makefile
==============================================================================
--- head/share/man/man9/Makefile	Fri Jun 28 05:11:02 2019	(r349505)
+++ head/share/man/man9/Makefile	Fri Jun 28 10:38:56 2019	(r349506)
@@ -945,6 +945,15 @@ MLINKS+=drbr.9 drbr_free.9 \
 MLINKS+=DRIVER_MODULE.9 DRIVER_MODULE_ORDERED.9 \
 	DRIVER_MODULE.9 EARLY_DRIVER_MODULE.9 \
 	DRIVER_MODULE.9 EARLY_DRIVER_MODULE_ORDERED.9
+MLINKS+=epoch.9 epoch_context.9 \
+	epoch.9 epoch_alloc.9 \
+	epoch.9 epoch_free.9 \
+	epoch.9 epoch_enter.9 \
+	epoch.9 epoch_exit.9 \
+	epoch.9 epoch_wait.9 \
+	epoch.9 epoch_call.9 \
+	epoch.9 epoch_drain_callbacks.9 \
+	epoch.9 in_epoch.9
 MLINKS+=EVENTHANDLER.9 EVENTHANDLER_DECLARE.9 \
 	EVENTHANDLER.9 EVENTHANDLER_DEFINE.9 \
 	EVENTHANDLER.9 EVENTHANDLER_DEREGISTER.9 \

Modified: head/share/man/man9/epoch.9
==============================================================================
--- head/share/man/man9/epoch.9	Fri Jun 28 05:11:02 2019	(r349505)
+++ head/share/man/man9/epoch.9	Fri Jun 28 10:38:56 2019	(r349506)
@@ -26,7 +26,7 @@
 .\"
 .\" $FreeBSD$
 .\"
-.Dd June 25, 2018
+.Dd June 28, 2019
 .Dt EPOCH 9
 .Os
 .Sh NAME
@@ -38,6 +38,7 @@
 .Nm epoch_exit ,
 .Nm epoch_wait ,
 .Nm epoch_call ,
+.Nm epoch_drain_callbacks ,
 .Nm in_epoch ,
 .Nd kernel epoch based reclamation
 .Sh SYNOPSIS
@@ -60,6 +61,8 @@
 .Fn epoch_wait_preempt "epoch_t epoch"
 .Ft void
 .Fn epoch_call "epoch_t epoch" "epoch_context_t ctx" "void (*callback) (epoch_context_t)"
+.Ft void
+.Fn epoch_drain_callbacks "epoch_t epoch"
 .Ft int
 .Fn in_epoch "epoch_t epoch"
 .Sh DESCRIPTION
@@ -120,6 +123,18 @@ routines must be used and the caller can no longer mod
 in place.
 An item to be modified must be handled with copy on write
 and frees must be deferred until after a grace period has elapsed.
+.Pp
+The
+.Fn epoch_drain_callbacks
+function is used to drain all pending callbacks which have been invoked by prior
+.Fn epoch_call
+function calls on the same epoch.
+This function is useful when there are shared memory structure(s)
+referred to by the epoch callback(s) which are not refcounted and are
+rarely freed.
+The typical place for calling this function is right before freeing or
+invalidating the shared resource(s) used by the epoch callback(s).
+This function can sleep and is not optimized for performance.
 .Sh RETURN VALUES
 .Fn in_epoch curepoch
 will return 1 if curthread is in curepoch, 0 otherwise.

Modified: head/sys/kern/subr_epoch.c
==============================================================================
--- head/sys/kern/subr_epoch.c	Fri Jun 28 05:11:02 2019	(r349505)
+++ head/sys/kern/subr_epoch.c	Fri Jun 28 10:38:56 2019	(r349506)
@@ -43,6 +43,7 @@ __FBSDID("$FreeBSD$");
 #include <sys/pcpu.h>
 #include <sys/proc.h>
 #include <sys/sched.h>
+#include <sys/sx.h>
 #include <sys/smp.h>
 #include <sys/sysctl.h>
 #include <sys/turnstile.h>
@@ -64,6 +65,8 @@ static MALLOC_DEFINE(M_EPOCH, "epoch", "epoch based re
 TAILQ_HEAD (epoch_tdlist, epoch_tracker);
 typedef struct epoch_record {
 	ck_epoch_record_t er_record;
+	struct epoch_context er_drain_ctx;
+	struct epoch *er_parent;
 	volatile struct epoch_tdlist er_tdlist;
 	volatile uint32_t er_gen;
 	uint32_t er_cpuid;
@@ -74,6 +77,9 @@ struct epoch {
 	epoch_record_t e_pcpu_record;
 	int	e_idx;
 	int	e_flags;
+	struct sx e_drain_sx;
+	struct mtx e_drain_mtx;
+	volatile int e_drain_count;
 };
 
 /* arbitrary --- needs benchmarking */
@@ -178,6 +184,7 @@ epoch_ctor(epoch_t epoch)
 		ck_epoch_register(&epoch->e_epoch, &er->er_record, NULL);
 		TAILQ_INIT((struct threadlist *)(uintptr_t)&er->er_tdlist);
 		er->er_cpuid = cpu;
+		er->er_parent = epoch;
 	}
 }
 
@@ -203,6 +210,8 @@ epoch_alloc(int flags)
 	MPASS(epoch_count < MAX_EPOCHS - 2);
 	epoch->e_flags = flags;
 	epoch->e_idx = epoch_count;
+	sx_init(&epoch->e_drain_sx, "epoch-drain-sx");
+	mtx_init(&epoch->e_drain_mtx, "epoch-drain-mtx", NULL, MTX_DEF);
 	allepochs[epoch_count++] = epoch;
 	return (epoch);
 }
@@ -210,18 +219,13 @@ epoch_alloc(int flags)
 void
 epoch_free(epoch_t epoch)
 {
-#ifdef INVARIANTS
-	struct epoch_record *er;
-	int cpu;
 
-	CPU_FOREACH(cpu) {
-		er = zpcpu_get_cpu(epoch->e_pcpu_record, cpu);
-		MPASS(TAILQ_EMPTY(&er->er_tdlist));
-	}
-#endif
+	epoch_drain_callbacks(epoch);
 	allepochs[epoch->e_idx] = NULL;
 	epoch_wait(global_epoch);
 	uma_zfree_pcpu(pcpu_zone_record, epoch->e_pcpu_record);
+	mtx_destroy(&epoch->e_drain_mtx);
+	sx_destroy(&epoch->e_drain_sx);
 	free(epoch, M_EPOCH);
 }
 
@@ -655,6 +659,83 @@ int
 in_epoch(epoch_t epoch)
 {
 	return (in_epoch_verbose(epoch, 0));
+}
+
+static void
+epoch_drain_cb(struct epoch_context *ctx)
+{
+	struct epoch *epoch =
+	    __containerof(ctx, struct epoch_record, er_drain_ctx)->er_parent;
+
+	if (atomic_fetchadd_int(&epoch->e_drain_count, -1) == 1) {
+		mtx_lock(&epoch->e_drain_mtx);
+		wakeup(epoch);
+		mtx_unlock(&epoch->e_drain_mtx);
+	}
+}
+
+void
+epoch_drain_callbacks(epoch_t epoch)
+{
+	epoch_record_t er;
+	struct thread *td;
+	int was_bound;
+	int old_pinned;
+	int old_cpu;
+	int cpu;
+
+	WITNESS_WARN(WARN_GIANTOK | WARN_SLEEPOK, NULL,
+	    "epoch_drain_callbacks() may sleep!");
+
+	/* too early in boot to have epoch set up */
+	if (__predict_false(epoch == NULL))
+		return;
+#if !defined(EARLY_AP_STARTUP)
+	if (__predict_false(inited < 2))
+		return;
+#endif
+	DROP_GIANT();
+
+	sx_xlock(&epoch->e_drain_sx);
+	mtx_lock(&epoch->e_drain_mtx);
+
+	td = curthread;
+	thread_lock(td);
+	old_cpu = PCPU_GET(cpuid);
+	old_pinned = td->td_pinned;
+	was_bound = sched_is_bound(td);
+	sched_unbind(td);
+	td->td_pinned = 0;
+
+	CPU_FOREACH(cpu)
+		epoch->e_drain_count++;
+	CPU_FOREACH(cpu) {
+		er = zpcpu_get_cpu(epoch->e_pcpu_record, cpu);
+		sched_bind(td, cpu);
+		epoch_call(epoch, &er->er_drain_ctx, &epoch_drain_cb);
+	}
+
+	/* restore CPU binding, if any */
+	if (was_bound != 0) {
+		sched_bind(td, old_cpu);
+	} else {
+		/* get thread back to initial CPU, if any */
+		if (old_pinned != 0)
+			sched_bind(td, old_cpu);
+		sched_unbind(td);
+	}
+	/* restore pinned after bind */
+	td->td_pinned = old_pinned;
+
+	thread_unlock(td);
+
+	while (epoch->e_drain_count != 0)
+		msleep(epoch, &epoch->e_drain_mtx, PZERO, "EDRAIN", 0);
+
+	mtx_unlock(&epoch->e_drain_mtx);
+	sx_xunlock(&epoch->e_drain_sx);
+
+	PICKUP_GIANT();
 }
 
 void

Modified: head/sys/sys/epoch.h
==============================================================================
--- head/sys/sys/epoch.h	Fri Jun 28 05:11:02 2019	(r349505)
+++ head/sys/sys/epoch.h	Fri Jun 28 10:38:56 2019	(r349506)
@@ -69,6 +69,7 @@ epoch_t	epoch_alloc(int flags);
 void	epoch_free(epoch_t epoch);
 void	epoch_wait(epoch_t epoch);
 void	epoch_wait_preempt(epoch_t epoch);
+void	epoch_drain_callbacks(epoch_t epoch);
 void	epoch_call(epoch_t epoch, epoch_context_t ctx, void (*callback) (epoch_context_t));
 int	in_epoch(epoch_t epoch);
 int in_epoch_verbose(epoch_t epoch, int dump_onfail);



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20190628192001.K982>