Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 25 Jun 2008 08:26:11 +1000 (EST)
From:      Bruce Evans <brde@optusnet.com.au>
To:        FreeBSD-gnats-submit@FreeBSD.org
Subject:   kern/124963: old pagezero fixes for alc
Message-ID:  <20080625082417.C25463@besplex.bde.org>
Resent-Message-ID: <200806242230.m5OMU5AK076872@freefall.freebsd.org>

next in thread | raw e-mail | index | archive | help

>Number:         124963
>Category:       kern
>Synopsis:       old pagezero fixes for alc
>Confidential:   no
>Severity:       non-critical
>Priority:       low
>Responsible:    freebsd-bugs
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:
>Class:          change-request
>Submitter-Id:   current-users
>Arrival-Date:   Tue Jun 24 22:30:05 UTC 2008
>Closed-Date:
>Last-Modified:
>Originator:     Bruce Evans
>Release:        FreeBSD 5.2-CURRENT i386
>Organization:
>Environment:
System: FreeBSD besplex.bde.org 5.2-CURRENT FreeBSD 5.2-CURRENT #4295: Tue Mar 4 09:02:36 EST 2008 bde@besplex.bde.org:/c/obj/usr/src/sys/compile/BESPLEX i386

>Description:

alc asked me to submit these patches for the (currently disabled)
pagezero thread as a PR.

I'm getting tired of maintaining these patches -- there was a patch
conflict in them today.  Please commit some.  They are hard to test
since idlezero (sic) is disabled.  Disabling idlezero still loses a
few percent of performance relative to old versions of FreeBSD with
idlezero and no vm_phys in cases where there is enough idle time to
actually run idlezero (e.g., for a kernel build starting with sources
not cached).  In other cases, I don't see much difference for either
idlezero or vm_phys.

>How-To-Repeat:

>Fix:

% Index: vm_phys.c
% ===================================================================
% RCS file: /home/ncvs/src/sys/vm/vm_phys.c,v
% retrieving revision 1.9
% diff -u -2 -r1.9 vm_phys.c
% --- vm_phys.c	6 Apr 2008 18:09:28 -0000	1.9
% +++ vm_phys.c	19 Apr 2008 23:09:00 -0000
% @@ -41,6 +41,8 @@
%  #include <sys/malloc.h>
%  #include <sys/mutex.h>
% +#include <sys/proc.h>
%  #include <sys/queue.h>
%  #include <sys/sbuf.h>
% +#include <sys/sched.h>
%  #include <sys/sysctl.h>
%  #include <sys/vmmeter.h>
% @@ -552,7 +554,18 @@
%  					cnt.v_free_count--;
%  					mtx_unlock(&vm_page_queue_free_mtx);
% +#ifndef PREEMPTION_AND_PREEMPTION_WORKS
% +					if (sched_runnable()) {
% +						thread_lock(curthread);
% +						critical_exit();
% +						mi_switch(SW_VOL | SWT_IDLE,
% +						    NULL);
% +						thread_unlock(curthread);
% +					} else
% +#endif
% +						critical_exit();
%  					pmap_zero_page_idle(m_tmp);
%  					m_tmp->flags |= PG_ZERO;
%  					mtx_lock(&vm_page_queue_free_mtx);
% +					critical_enter();
%  					cnt.v_free_count++;
%  					vm_phys_free_pages(m_tmp, 0);

Move stuff here, fix bugs in it, and always configure voluntarily
preemption:
- preemption should be here since it should be considered for every page.
   It must be here since preemption while holding the mutex cannot be good,
   and the caller shouldn't drop the mutex.
- involuntary preemption while holding the mutex cannot be good either, so
   the caller prevents it using critical_enter() (patch originally by jhb).
- it may be better better keep critical_enter() until after zeroing the
   page.
- I don't know why I put the voluntary preemption before the page zeroing
   instead of after.  Perhaps just to minimise the ugliness in the else
   clause.
- always configure voluntary preemption, since invountary preemption had
   never worked at the time when these patches were written, for SMP at
   least with SCHED_4BSD and no IPI_PREEMPTION.  In at least this case,
   one CPU tended to be left running pagezero for too long, since nothing
   sent it an IPI to tell it not to, and this was very bad for performance,
   apparently since the CPU spent its time thrashing caches when it should
   be doing useful work.

% Index: vm_zeroidle.c
% ===================================================================
% RCS file: /home/ncvs/src/sys/vm/vm_zeroidle.c,v
% retrieving revision 1.52
% diff -u -2 -r1.52 vm_zeroidle.c
% --- vm_zeroidle.c	17 Apr 2008 04:20:10 -0000	1.52
% +++ vm_zeroidle.c	19 Apr 2008 23:04:56 -0000
% @@ -122,18 +122,14 @@
% %  	mtx_lock(&vm_page_queue_free_mtx);
% +	critical_enter();

See above.

%  	for (;;) {
%  		if (vm_page_zero_check()) {
%  			vm_page_zero_idle();
% -#ifndef PREEMPTION
% -			if (sched_runnable()) {
% -				thread_lock(curthread);
% -				mi_switch(SW_VOL | SWT_IDLE, NULL);
% -				thread_unlock(curthread);
% -			}
% -#endif

This was moved after fixing it.

%  		} else {
%  			wakeup_needed = TRUE;
% +			critical_exit();
%  			msleep(&zero_state, &vm_page_queue_free_mtx, 0,
%  			    "pgzero", hz * 300);
% +			critical_enter();
%  		}
%  	}

Just always release critical_enter() when releasing the mutex.  I think
we would prefer to acquire critical_enter() before the mutex and release
it after releasing the mutex (which we half do in vm_page_zero_idle(),
but msleep() doesn't support that.  Races from this of course only cause
extra context switches, and I haven't noticed any cases where even the
extra context switches prevented by critical_enter() make a difference.
Another thing I tried here that didn't seem to make much difference was
to run pagezero at a high priority except transiently when it gives up
control.  critical_enter() does essentially the same thing but more
hackishly.

>Release-Note:
>Audit-Trail:
>Unformatted:



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20080625082417.C25463>