Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 1 Jun 2012 19:53:15 +0200
From:      Giovanni Trematerra <giovanni.trematerra@gmail.com>
To:        freebsd-arch@freebsd.org
Cc:        Attilio Rao <attilio@freebsd.org>, alc@freebsd.org, Konstantin Belousov <kib@freebsd.org>, Alexander Kabaev <kan@freebsd.org>
Subject:   [RFC] Kernel shared variables
Message-ID:  <CACfq090r1tWhuDkxdSZ24fwafbVKU0yduu1yV2%2BoYo%2BwwT4ipA@mail.gmail.com>

next in thread | raw e-mail | index | archive | help
Hello,
I'd like to discuss a way to provide a mechanism to share some read-only
data between kernel and user space programs avoiding syscall overhead,
implementing some them, such as gettimeofday(3) and time(3) as ordinary
user space routine.

The patch at
http://www.trematerra.net/patches/ksvar_experimental.patch

is in a very experimental stage. It's just a proof-of-concept.
Only works for an AMD64 kernel and only for 64-bit applications.
The idea is to have all the variables that we want to share between kernel
and user space into one or more consecutive pages of memory that will be
mapped read-only into every running process. At the start of the first
shared page
there'll be a table with as many entries as the number of the shared variables.
Each entry is a 32-bit value that is the offset between the start of the shared
page and the start of the variable in the page. The user space processes need
to find out the map address of shared page and use the table to access to the
shared variables.
Kernel will export a variable to user space as an index, so user space code
must refer to a specific index to access a kernel shared variable.
Let's take a quick look to the KPI/API for exporting/importing kernel
shared variables.
Say we want implement a routine to export an int from the kernel.
To define the variable to be exported inside the kernel you would use

KSVAR_DEFINE(0, int, test_value);

You have just defined an int variable named "test_value" at index 0.
Inside the kernel you can write/read as usual using the symbol test_value;
Now you likely want add to libc a function callable from user processes
that return the test_value variable. So first of all you need the import the
variable.

KSVAR_IMPORT(0, int, test_value);

and to obtain a pointer to read the value you would use

KSVAR(test_value);

so your function would look like something like this

int get_test_value()
{

     return (*KSVAR(test_value));
}

Then inside your process just call get_test_value() function as you usually
do and you'll get a kernel written value without switching in kernel mode.

Let's see now in more detail how that could be accomplished.
The shared variables will be accessed as normal variables and are read/write
inside the kernel. The variables need to be inside the same page(s) and nothing
but the shared variables (and the table) must be into the page(s). To
obtain that
I changed the linker script in this way

--- a/sys/conf/ldscript.amd64
+++ b/sys/conf/ldscript.amd64
@@ -177,6 +177,15 @@ SECTIONS
    *(.ldata .ldata.* .gnu.linkonce.l.*)
    . = ALIGN(. != 0 ? 64 / 8 : 1);
  }
+  .ksvar ALIGN(CONSTANT (COMMONPAGESIZE)) :
+  {
+    __ksvar_set_start = .;
+    *(.ksvar_table)
+    *(.ksvar)
+
+   . = ALIGN(CONSTANT (COMMONPAGESIZE));
+   __ksvar_set_stop = .;
+  }
  . = ALIGN(64 / 8);
  _end = .; PROVIDE (end = .);
  . = DATA_SEGMENT_END (.);

When we want to define a variable in the kernel to share with user space
we have to use KSVAR_DEFINE macro in sys/sys/ksvar.h

+struct ksvar_set {
+       uint32_t idx;
+       char *pksvar;
+};
+
+/*
+ * Declare a variable into kernel shared linker_set.
+ */
+#define        KSVAR_DEFINE(index, type, name) \
+       static type name __section(".ksvar");                   \
+       static struct ksvar_set name ## _ksvar_set = {          \
+               .idx = index,                                   \
+               .pksvar = (char *) &name                        \
+       };                                                      \
+       DATA_SET(ksvar_set, name ## _ksvar_set)

Every variable must have a unique index. The indexes must
start from zero and be consecutive. When you add an index
you must bump the size of the table (KSVAR_TABLE_SIZE)
(see sys/sys/ksvar.h)

The variables are inside the kernel static image that isn't managed
by the VM and so we need to allocate pages to map the physical addresses.
A new SYSINIT (ksvarinit) will allocate a set of vm_page_t  through
the vm_phys_fictitious_reg_range interface and fill the table using
the information
of the ksvar_set linker set, then will create a vm_object_t (vm_object_ksvar),
mark the fake pages as valid and put them into it.
When a new process is created by exec(3) the vm_object_ksvar will be
mapped read-only into the process address space by vm_map_fixed routine
just before mapping the user stack. The address of mapping will be recorded
inside the new p_ksvar field of the struct proc.
This field will be exported through a sysctl to the user space processes.
In order to implement syscalls as user space routines, we have to find out the
mapped address of the kernel shared variables when the libc is mapped into
the process. So I added a function marked with the attribute constructor.
It will called before any code into user process and before any code inside
the libc.

+__attribute((constructor)) void init_kernel_shared()
+{
+       int mib[2];
+       size_t len;
+       vm_offset_t ksvar_address;
+
+       mib[0] = CTL_KERN;
+       mib[1] = KERN_KSVAR;
+       len = sizeof(vm_offset_t);
+       if (__sysctl(mib, 2, (void *) &ksvar_address, &len, NULL, 0) != -1)
+               ksvar_table = (uint32_t *) ksvar_address;
+}

Once the libc knows the address of the table it can access to the shared
variables.

Just as proof of concept I re-implemented gettimeofday(3) in user space.
First of all I didn't remove the entry into the syscall.master, just renamed the
sys_gettimeofday. I need it for the fallback path.
In the kernel I introduced a struct wall_clock.

+struct wall_clock
+{
+       struct timeval  tv;
+       struct timezone tz;
+};

The struct is exported through sys/sys/time.h header.
I defined a new kernel shared variable. To do so I added an index in
sys/sys/ksvar.h
WALL_CLOCK_INDEX and bumped KSVAR_TABLE_SIZE to 1.
In the sys/kern/kern_clocksource.c

+/* kernel shared variable for implmenting gettimeofday. */
+KSVAR_DEFINE(WALL_CLOCK_INDEX, struct wall_clock, wall_clock);

Now we defined a shared variable at index WALL_CLOCK_INDEX of type
struct wall_clock and named wall_clock.
Inside handleevents I update the info exported by wall_clock.

+       struct timeval tv;
+
+       /* update time for userspace gettimeofday */
+       microtime(&tv);
+       wall_clock.tv = tv;
+       wall_clock.tz.tz_minuteswest = tz_minuteswest;
+       wall_clock.tz.tz_dsttime = tz_dsttime;

Now, in libc we import the shared variable

+KSVAR_IMPORT(WALL_CLOCK_INDEX, struct wall_clock, wall_clock);

note that WALL_CLOCK_INDEX must be the same of the one defined
inside the kernel, and define a new function gettimeofday

+int
+gettimeofday(struct timeval *tp, struct timezone *tzp)
+{
+
+       /* fallback to syscall if kernel doesn't export ksvar */
+       if (!KSVAR_IS_ACTIVE())
+               return (sys_gettimeofday(tp, tzp));
+
+       if (tp != NULL)
+               *tp = KSVAR(wall_clock)->tv;
+       if (tzp != NULL)
+               *tzp = KSVAR(wall_clock)->tz;
+       return (0);
+}

Now when a process will call getimeofday, will call that function actually.
If the process makes a lot of call to gettimeofday, we will see a
performance boost.
Note that if ksvar are not exported from the kernel (KSVAR_IS_ACTIVE),
the function
fallback to call the actual syscall (sys_gettimeofday).

Open tasks
- implement support for 32-bit emulated processes running in a 64-bit
environment.
- extend support to others arch
- implement more syscalls
- benchmarks
- Test, test, test.

I'm looking forward to hear about your comments and suggestions.

--
Gianni



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CACfq090r1tWhuDkxdSZ24fwafbVKU0yduu1yV2%2BoYo%2BwwT4ipA>