From owner-svn-src-all@FreeBSD.ORG Fri Jun 22 10:46:35 2012 Return-Path: Delivered-To: svn-src-all@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E7794106566C; Fri, 22 Jun 2012 10:46:35 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from mail.zoral.com.ua (mx0.zoral.com.ua [91.193.166.200]) by mx1.freebsd.org (Postfix) with ESMTP id CF24F8FC0C; Fri, 22 Jun 2012 10:46:34 +0000 (UTC) Received: from skuns.kiev.zoral.com.ua (localhost [127.0.0.1]) by mail.zoral.com.ua (8.14.2/8.14.2) with ESMTP id q5MAkQS9039120; Fri, 22 Jun 2012 13:46:26 +0300 (EEST) (envelope-from kostikbel@gmail.com) Received: from deviant.kiev.zoral.com.ua (kostik@localhost [127.0.0.1]) by deviant.kiev.zoral.com.ua (8.14.5/8.14.5) with ESMTP id q5MAkQbu041972; Fri, 22 Jun 2012 13:46:26 +0300 (EEST) (envelope-from kostikbel@gmail.com) Received: (from kostik@localhost) by deviant.kiev.zoral.com.ua (8.14.5/8.14.5/Submit) id q5MAkQAA041971; Fri, 22 Jun 2012 13:46:26 +0300 (EEST) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: deviant.kiev.zoral.com.ua: kostik set sender to kostikbel@gmail.com using -f Date: Fri, 22 Jun 2012 13:46:26 +0300 From: Konstantin Belousov To: Alexander Motin Message-ID: <20120622104626.GE2337@deviant.kiev.zoral.com.ua> References: <201206220706.q5M76fbO062751@svn.freebsd.org> <4FE42812.3050807@FreeBSD.org> <20120622082502.GB2337@deviant.kiev.zoral.com.ua> <4FE432C4.7000608@FreeBSD.org> <20120622102342.GD2337@deviant.kiev.zoral.com.ua> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="r5UKZFo5ar5Icv+k" Content-Disposition: inline In-Reply-To: <20120622102342.GD2337@deviant.kiev.zoral.com.ua> User-Agent: Mutt/1.4.2.3i X-Virus-Scanned: clamav-milter 0.95.2 at skuns.kiev.zoral.com.ua X-Virus-Status: Clean X-Spam-Status: No, score=-4.0 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00 autolearn=ham version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on skuns.kiev.zoral.com.ua Cc: svn-src-head@freebsd.org, svn-src-all@freebsd.org, src-committers@freebsd.org Subject: Re: svn commit: r237433 - in head/sys: amd64/include arm/include conf i386/include ia64/include kern mips/include pc98/include powerpc/include sparc64/include sys x86/include x86/x86 X-BeenThere: svn-src-all@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "SVN commit messages for the entire src tree \(except for " user" and " projects" \)" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 22 Jun 2012 10:46:36 -0000 --r5UKZFo5ar5Icv+k Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Fri, Jun 22, 2012 at 01:23:42PM +0300, Konstantin Belousov wrote: > On Fri, Jun 22, 2012 at 11:54:28AM +0300, Alexander Motin wrote: > > On 22.06.2012 11:25, Konstantin Belousov wrote: > > >On Fri, Jun 22, 2012 at 11:08:50AM +0300, Alexander Motin wrote: > > >>On 06/22/12 10:06, Konstantin Belousov wrote: > > >>>Author: kib > > >>>Date: Fri Jun 22 07:06:40 2012 > > >>>New Revision: 237433 > > >>>URL: http://svn.freebsd.org/changeset/base/237433 > > >>> > > >>>Log: > > >>> Implement mechanism to export some kernel timekeeping data to > > >>> usermode, using shared page. The structures and functions have v= dso > > >>> prefix, to indicate the intended location of the code in some fut= ure. > > >>> > > >>> The versioned per-algorithm data is exported in the format of str= uct > > >>> vdso_timehands, which mostly repeats the content of in-kernel str= uct > > >>> timehands. Usermode reading of the structure can be lockless. > > >>> Compatibility export for 32bit processes on 64bit host is also > > >>> provided. Kernel also provides usermode with indication about > > >>> currently used timecounter, so that libc can fall back to syscall= if > > >>> configured timecounter is unknown to usermode code. > > >>> > > >>> The shared data updates are initiated both from the tc_windup(), = where > > >>> a fast task is queued to do the update, and from sysctl handlers = which > > >>> change timecounter. A manual override switch > > >>> kern.timecounter.fast_gettime allows to turn off the mechanism. > > >>> > > >>> Only x86 architectures export the real algorithm data, and there,= only > > >>> for tsc timecounter. HPET counters page could be exported as well= , but > > >>> I prefer to not further glue the kernel and libc ABI there until > > >>> proper vdso-based solution is developed. > > >>> > > >>> Minimal stubs neccessary for non-x86 architectures to still compi= le > > >>> are provided. > > >>> > > >>> Discussed with: bde > > >>> Reviewed by: jhb > > >>> Tested by: flo > > >>> MFC after: 1 month > > >> > > >> > > >>>@@ -1360,6 +1367,7 @@ tc_windup(void) > > >>> #endif > > >>> > > >>> timehands =3D th; > > >>>+ taskqueue_enqueue_fast(taskqueue_fast,&tc_windup_push_vdso_task); > > >>> } > > >>> > > >>> /* Report or change the active timecounter hardware. */ > > >> > > >>This taskqueue_enqueue_fast() will schedule extra thread to run each > > >>time hardclock() fires. One thread may be not a big problem, but > > >>together with callout swi and possible other threads woken up there it > > >>will wake up several other CPU cores from sleep just to put them back= in > > >>few microseconds. Now davide@ and me are trying to fix that by avoidi= ng > > >>callout SWI use for simple tasks. Please, let's not create another > > >>problem same time. > > > > > >The patch was public for quite a time. If you have some comments about > > >it, it would be much more productive to let me know about them before > > >the commit, not after. > >=20 > > I'm sorry, I haven't seen it. My mad. > >=20 > > >Anyway, what is your proposal for 'let's not create another problem > > >same time' part of the message ? It was discussed, as a possibility, > > >to have permanent mapping for the shared page in the KVA and to perform > > >lock-less update of the struct vdso_timehands directly from hardclock > > >handler. My opinion was that amount of work added by tc_windup > > >eventhandler is not trivial, so it is better to be postponed to > > >less critical context. It also slightly more safe to not perform > > >lockless update for vdso_timehands, since otherwise module load which > > >register exec handler could cause transient gettimeofday() failure > > >in usermode. > > > > > >This might boil down to the fact that tc_windup function is called > > >too often, in fact. Also, packing execution of tc_windup eventhandler > > >together with the clock swi is fine from my POV. > >=20 > > I have nothing against using shared pages. On UP system I would probabl= y=20 > > have not so much against several threads. But on SMP system it will=20 > > cause at least one, but in many cases two extra CPUs to be woken up.=20 > > There are two or more threads to run on hardclock(): this taskqueue,=20 > > callout swi and some thread(s) woken from callouts. Scheduler has no=20 > > idea how heavy they are. So it will try to move each of them to separat= e=20 > > idle CPU. Does the amount of work done in event handlers worth extra=20 > > Watts consumed by rapidly waking CPUs? As quite rare person running=20 > > FreeBSD on laptop, I am not sure. I am not sure even that on=20 > > desktop/server this won't kill all benefit of fast clocks by limiting= =20 > > TurboBoost. >=20 > Patch below would probably work, but I cannot test it right now on real > hardware due to ACPI issue. It worked for me in qemu. >=20 > commit 4f2ffd93b36d20eae61495776fc6b0855745fd7f > Author: Konstantin Belousov > Date: Fri Jun 22 13:19:22 2012 +0300 >=20 > Use persistent kernel mapping of the shared page, and update the > vdso_timehands from hardclock, instead of scheduling task. Slightly improved version. Since tc_fill_vdso_timehands is now called from hardclock context, thee is no need to spin waiting for valid current generation of timehands. diff --git a/sys/kern/kern_exec.c b/sys/kern/kern_exec.c index 80502e3..9365223 100644 --- a/sys/kern/kern_exec.c +++ b/sys/kern/kern_exec.c @@ -1517,42 +1517,13 @@ exec_unregister(execsw_arg) static struct sx shared_page_alloc_sx; static vm_object_t shared_page_obj; static int shared_page_free; - -struct sf_buf * -shared_page_write_start(int base) -{ - vm_page_t m; - struct sf_buf *s; - - VM_OBJECT_LOCK(shared_page_obj); - m =3D vm_page_grab(shared_page_obj, OFF_TO_IDX(base), VM_ALLOC_RETRY); - VM_OBJECT_UNLOCK(shared_page_obj); - s =3D sf_buf_alloc(m, SFB_DEFAULT); - return (s); -} - -void -shared_page_write_end(struct sf_buf *sf) -{ - vm_page_t m; - - m =3D sf_buf_page(sf); - sf_buf_free(sf); - VM_OBJECT_LOCK(shared_page_obj); - vm_page_wakeup(m); - VM_OBJECT_UNLOCK(shared_page_obj); -} +char *shared_page_mapping; =20 void shared_page_write(int base, int size, const void *data) { - struct sf_buf *sf; - vm_offset_t sk; =20 - sf =3D shared_page_write_start(base); - sk =3D sf_buf_kva(sf); - bcopy(data, (void *)(sk + (base & PAGE_MASK)), size); - shared_page_write_end(sf); + bcopy(data, shared_page_mapping + base, size); } =20 static int @@ -1596,6 +1567,7 @@ static void shared_page_init(void *dummy __unused) { vm_page_t m; + vm_offset_t addr; =20 sx_init(&shared_page_alloc_sx, "shpsx"); shared_page_obj =3D vm_pager_allocate(OBJT_PHYS, 0, PAGE_SIZE, @@ -1605,25 +1577,24 @@ shared_page_init(void *dummy __unused) VM_ALLOC_ZERO); m->valid =3D VM_PAGE_BITS_ALL; VM_OBJECT_UNLOCK(shared_page_obj); + addr =3D kmem_alloc_nofault(kernel_map, PAGE_SIZE); + pmap_qenter(addr, &m, 1); + shared_page_mapping =3D (char *)addr; } =20 SYSINIT(shp, SI_SUB_EXEC, SI_ORDER_FIRST, (sysinit_cfunc_t)shared_page_ini= t, NULL); =20 static void -timehands_update(void *arg) +timehands_update(struct sysentvec *sv) { - struct sysentvec *sv; - struct sf_buf *sf; struct vdso_timehands th; struct vdso_timekeep *tk; uint32_t enabled, idx; =20 - sv =3D arg; - sx_xlock(&shared_page_alloc_sx); enabled =3D tc_fill_vdso_timehands(&th); - sf =3D shared_page_write_start(sv->sv_timekeep_off); - tk =3D (void *)(sf_buf_kva(sf) + (sv->sv_timekeep_off & PAGE_MASK)); + tk =3D (struct vdso_timekeep *)(shared_page_mapping + + sv->sv_timekeep_off); idx =3D sv->sv_timekeep_curr; atomic_store_rel_32(&tk->tk_th[idx].th_gen, 0); if (++idx >=3D VDSO_TH_NUM) @@ -1637,25 +1608,19 @@ timehands_update(void *arg) tk->tk_enabled =3D enabled; atomic_store_rel_32(&tk->tk_th[idx].th_gen, sv->sv_timekeep_gen); tk->tk_current =3D idx; - shared_page_write_end(sf); - sx_xunlock(&shared_page_alloc_sx); } =20 #ifdef COMPAT_FREEBSD32 static void -timehands_update32(void *arg) +timehands_update32(struct sysentvec *sv) { - struct sysentvec *sv; - struct sf_buf *sf; struct vdso_timekeep32 *tk; struct vdso_timehands32 th; uint32_t enabled, idx; =20 - sv =3D arg; - sx_xlock(&shared_page_alloc_sx); enabled =3D tc_fill_vdso_timehands32(&th); - sf =3D shared_page_write_start(sv->sv_timekeep_off); - tk =3D (void *)(sf_buf_kva(sf) + (sv->sv_timekeep_off & PAGE_MASK)); + tk =3D (struct vdso_timekeep32 *)(shared_page_mapping + + sv->sv_timekeep_off); idx =3D sv->sv_timekeep_curr; atomic_store_rel_32(&tk->tk_th[idx].th_gen, 0); if (++idx >=3D VDSO_TH_NUM) @@ -1669,11 +1634,32 @@ timehands_update32(void *arg) tk->tk_enabled =3D enabled; atomic_store_rel_32(&tk->tk_th[idx].th_gen, sv->sv_timekeep_gen); tk->tk_current =3D idx; - shared_page_write_end(sf); - sx_xunlock(&shared_page_alloc_sx); } #endif =20 +/* + * This is hackish, but easiest way to avoid creating list structures + * that needs to be iterated over from the hardclock interrupt + * context. + */ +static struct sysentvec *host_sysentvec; +#ifdef COMPAT_FREEBSD32 +static struct sysentvec *compat32_sysentvec; +#endif + +void +timekeep_push_vdso(void) +{ + + if (host_sysentvec !=3D NULL && host_sysentvec->sv_timekeep_base !=3D 0) + timehands_update(host_sysentvec); +#ifdef COMPAT_FREEBSD32 + if (compat32_sysentvec !=3D NULL && + compat32_sysentvec->sv_timekeep_base !=3D 0) + timehands_update32(compat32_sysentvec); +#endif +} + void exec_sysvec_init(void *param) { @@ -1688,29 +1674,32 @@ exec_sysvec_init(void *param) sv->sv_shared_page_obj =3D shared_page_obj; sv->sv_sigcode_base =3D sv->sv_shared_page_base + shared_page_fill(*(sv->sv_szsigcode), 16, sv->sv_sigcode); + if ((sv->sv_flags & SV_ABI_MASK) !=3D SV_ABI_FREEBSD) + return; tk_ver =3D VDSO_TK_VER_CURR; #ifdef COMPAT_FREEBSD32 if ((sv->sv_flags & SV_ILP32) !=3D 0) { tk_base =3D shared_page_alloc(sizeof(struct vdso_timekeep32) + sizeof(struct vdso_timehands32) * VDSO_TH_NUM, 16); KASSERT(tk_base !=3D -1, ("tk_base -1 for 32bit")); - EVENTHANDLER_REGISTER(tc_windup, timehands_update32, sv, - EVENTHANDLER_PRI_ANY); shared_page_write(tk_base + offsetof(struct vdso_timekeep32, tk_ver), sizeof(uint32_t), &tk_ver); + KASSERT(compat32_sysentvec =3D=3D 0, + ("Native compat32 already registered")); + compat32_sysentvec =3D sv; } else { #endif tk_base =3D shared_page_alloc(sizeof(struct vdso_timekeep) + sizeof(struct vdso_timehands) * VDSO_TH_NUM, 16); KASSERT(tk_base !=3D -1, ("tk_base -1 for native")); - EVENTHANDLER_REGISTER(tc_windup, timehands_update, sv, - EVENTHANDLER_PRI_ANY); shared_page_write(tk_base + offsetof(struct vdso_timekeep, tk_ver), sizeof(uint32_t), &tk_ver); + KASSERT(host_sysentvec =3D=3D 0, ("Native already registered")); + host_sysentvec =3D sv; #ifdef COMPAT_FREEBSD32 } #endif sv->sv_timekeep_base =3D sv->sv_shared_page_base + tk_base; sv->sv_timekeep_off =3D tk_base; - EVENTHANDLER_INVOKE(tc_windup); + timekeep_push_vdso(); } diff --git a/sys/kern/kern_tc.c b/sys/kern/kern_tc.c index 0b8fefe..4a75af5 100644 --- a/sys/kern/kern_tc.c +++ b/sys/kern/kern_tc.c @@ -31,7 +31,6 @@ __FBSDID("$FreeBSD$"); #include #include #include -#include #include #include #include @@ -121,12 +120,8 @@ SYSCTL_INT(_kern_timecounter, OID_AUTO, stepwarnings, = CTLFLAG_RW, ×tepwarnings, 0, "Log time steps"); =20 static void tc_windup(void); -static void tc_windup_push_vdso(void *ctx, int pending); static void cpu_tick_calibrate(int); =20 -static struct task tc_windup_push_vdso_task =3D TASK_INITIALIZER(0, - tc_windup_push_vdso, 0); - static int sysctl_kern_boottime(SYSCTL_HANDLER_ARGS) { @@ -1367,7 +1362,7 @@ tc_windup(void) #endif =20 timehands =3D th; - taskqueue_enqueue_fast(taskqueue_fast, &tc_windup_push_vdso_task); + timekeep_push_vdso(); } =20 /* Report or change the active timecounter hardware. */ @@ -1394,7 +1389,7 @@ sysctl_kern_timecounter_hardware(SYSCTL_HANDLER_ARGS) (void)newtc->tc_get_timecount(newtc); =20 timecounter =3D newtc; - EVENTHANDLER_INVOKE(tc_windup); + timekeep_push_vdso(); return (0); } return (EINVAL); @@ -1865,7 +1860,7 @@ sysctl_fast_gettime(SYSCTL_HANDLER_ARGS) if (error !=3D 0) return (error); vdso_th_enable =3D old_vdso_th_enable; - EVENTHANDLER_INVOKE(tc_windup); + timekeep_push_vdso(); return (0); } SYSCTL_PROC(_kern_timecounter, OID_AUTO, fast_gettime, @@ -1877,19 +1872,15 @@ tc_fill_vdso_timehands(struct vdso_timehands *vdso_= th) { struct timehands *th; uint32_t enabled; - int gen; =20 - do { - th =3D timehands; - gen =3D th->th_generation; - vdso_th->th_algo =3D VDSO_TH_ALGO_1; - vdso_th->th_scale =3D th->th_scale; - vdso_th->th_offset_count =3D th->th_offset_count; - vdso_th->th_counter_mask =3D th->th_counter->tc_counter_mask; - vdso_th->th_offset =3D th->th_offset; - vdso_th->th_boottime =3D boottimebin; - enabled =3D cpu_fill_vdso_timehands(vdso_th); - } while (gen =3D=3D 0 || timehands->th_generation !=3D gen); + th =3D timehands; + vdso_th->th_algo =3D VDSO_TH_ALGO_1; + vdso_th->th_scale =3D th->th_scale; + vdso_th->th_offset_count =3D th->th_offset_count; + vdso_th->th_counter_mask =3D th->th_counter->tc_counter_mask; + vdso_th->th_offset =3D th->th_offset; + vdso_th->th_boottime =3D boottimebin; + enabled =3D cpu_fill_vdso_timehands(vdso_th); if (!vdso_th_enable) enabled =3D 0; return (enabled); @@ -1901,30 +1892,19 @@ tc_fill_vdso_timehands32(struct vdso_timehands32 *v= dso_th32) { struct timehands *th; uint32_t enabled; - int gen; =20 - do { - th =3D timehands; - gen =3D th->th_generation; - vdso_th32->th_algo =3D VDSO_TH_ALGO_1; - *(uint64_t *)&vdso_th32->th_scale[0] =3D th->th_scale; - vdso_th32->th_offset_count =3D th->th_offset_count; - vdso_th32->th_counter_mask =3D th->th_counter->tc_counter_mask; - vdso_th32->th_offset.sec =3D th->th_offset.sec; - *(uint64_t *)&vdso_th32->th_offset.frac[0] =3D th->th_offset.frac; - vdso_th32->th_boottime.sec =3D boottimebin.sec; - *(uint64_t *)&vdso_th32->th_boottime.frac[0] =3D boottimebin.frac; - enabled =3D cpu_fill_vdso_timehands32(vdso_th32); - } while (gen =3D=3D 0 || timehands->th_generation !=3D gen); + th =3D timehands; + vdso_th32->th_algo =3D VDSO_TH_ALGO_1; + *(uint64_t *)&vdso_th32->th_scale[0] =3D th->th_scale; + vdso_th32->th_offset_count =3D th->th_offset_count; + vdso_th32->th_counter_mask =3D th->th_counter->tc_counter_mask; + vdso_th32->th_offset.sec =3D th->th_offset.sec; + *(uint64_t *)&vdso_th32->th_offset.frac[0] =3D th->th_offset.frac; + vdso_th32->th_boottime.sec =3D boottimebin.sec; + *(uint64_t *)&vdso_th32->th_boottime.frac[0] =3D boottimebin.frac; + enabled =3D cpu_fill_vdso_timehands32(vdso_th32); if (!vdso_th_enable) enabled =3D 0; return (enabled); } #endif - -static void -tc_windup_push_vdso(void *ctx, int pending) -{ - - EVENTHANDLER_INVOKE(tc_windup); -} diff --git a/sys/sys/sysent.h b/sys/sys/sysent.h index 22769c2..6de72d9 100644 --- a/sys/sys/sysent.h +++ b/sys/sys/sysent.h @@ -265,8 +265,6 @@ int shared_page_alloc(int size, int align); int shared_page_fill(int size, int align, const void *data); void shared_page_write(int base, int size, const void *data); void exec_sysvec_init(void *param); -struct sf_buf *shared_page_write_start(int base); -void shared_page_write_end(struct sf_buf *sf); =20 #define INIT_SYSENTVEC(name, sv) \ SYSINIT(name, SI_SUB_EXEC, SI_ORDER_ANY, \ diff --git a/sys/sys/vdso.h b/sys/sys/vdso.h index 9f3f3af..653a606 100644 --- a/sys/sys/vdso.h +++ b/sys/sys/vdso.h @@ -29,7 +29,6 @@ #define _SYS_VDSO_H =20 #include -#include #include =20 struct vdso_timehands { @@ -74,6 +73,8 @@ u_int __vdso_gettc(const struct vdso_timehands *vdso_th); =20 #ifdef _KERNEL =20 +void timekeep_push_vdso(void); + uint32_t tc_fill_vdso_timehands(struct vdso_timehands *vdso_th); =20 /* @@ -86,9 +87,6 @@ uint32_t tc_fill_vdso_timehands(struct vdso_timehands *vd= so_th); */ uint32_t cpu_fill_vdso_timehands(struct vdso_timehands *vdso_th); =20 -typedef void (*tc_windup_fn)(void *); -EVENTHANDLER_DECLARE(tc_windup, tc_windup_fn); - #define VDSO_TH_NUM 4 =20 #ifdef COMPAT_FREEBSD32 --r5UKZFo5ar5Icv+k Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (FreeBSD) iEYEARECAAYFAk/kTQEACgkQC3+MBN1Mb4j1JwCbBlIe+aX2TtvQoMbqZCMiLbeq bZIAnRAqwAxtQl8uMWjTfO/+Xi+ysCfM =nV8y -----END PGP SIGNATURE----- --r5UKZFo5ar5Icv+k--