Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 14 Aug 1998 02:27:05 +0000 (GMT)
From:      Terry Lambert <tlambert@primenet.com>
To:        dg@root.com
Cc:        mike@smith.net.au, peter@sirius.com, mrcpu@internetcds.com, hackers@FreeBSD.ORG, stable@FreeBSD.ORG
Subject:   Re: vmopar state in 2.2.7?
Message-ID:  <199808140227.TAA25476@usr02.primenet.com>
In-Reply-To: <199808140023.RAA17790@implode.root.com> from "David Greenman" at Aug 13, 98 05:23:38 pm

next in thread | previous in thread | raw e-mail | index | archive | help
> >> Try changing lines 1226 and 1261 to something like:
> >> 	tsleep(p, PVM, "vmopar", 5 * hz);
> >...
> >> This function would return "EWOULDBLOCK" after the time-out expires then, 
> >> no clue what that will do to your system or apps ;) -- I would expect the
> >> blocked process to go away within 5 seconds...
> >
> >I dont' have 2.2 sources to hand, and the above is now just a call to 
> >vm_page_sleep, but if the timeout expires, the entire operation is 
> >retried, so it should be harmless (although it is masking a legitimate 
> >bug).
> 
>    I said in my last reply on this that it would do bad things, but you're
> probably right that it should be harmless. There are other parts of the code
> where premature wakeups would be bad, so one must be careful about making
> this assumption.
>    If that "fixes" the problem, then this probably indicates a problem with
> a missing wakeup somewhere.

It will do bad things.  Or rather, it will increase the frequency of
bad things; the code itself is broken.  I have sent them patches, which
are wrong because I neglected the non-increment case.  If it sleeps on
the first iteration, the page will be bogusly freed twice.  8-(.


This may very well be the nasty VM bug no one has fixed...


Here is the analysis I sent them, which is correct:

-----------------------------------------------------------------------------
> But what ensures that the world did not change between lines 1224 and
> 1225? Could the wakeup() happen after 1224 has determined to issue
> the tsleep() but before the flagging in 1225 was registered? Then it
> would be missed. Is this a race condition biting heavily hit machines?

There is a race condition here, but only for SMP systems.  Since this
is vm_object.c from the 2.2.x stream, it will never hit.

If the process is in the kernel, the only way for the kernel to be
reentered is via interrupt.  Since this code is never called at
interrupt level (it's in vm_object_page_remove()), it will never
be reentered.  FreeBSD, unfortunately, does not support kernel
preemption (it's necessary for hard realtime, which is why it's
unfortunate).

Basically, the race that would exist in the SMP case, if the code
was even vaguely similar under 3.x, is between 1222 and 1223; there
is no race between 1224 and 1225, since the vm_page object p is
present and buys, and the flags modification occurs at splvm.

Timing out the tsleep would certianly be harmless; I would expect
it to do nothing in the 1226 case.  That is, if you ever hit a
condition where this was an issue, you would merely increase the
size of your infinite loop.

In the second case, 1261, the consequences could be bad.  In fact,
the consequences of sleeping at all could be bad.  The reason it
seems to work around the problem for you is that the page causing
the problem is not the first page on the list and start has been
incremented.  Because start is not decremented before the "goto again",
the calculation on line 1207:

	again:                  
		size = end - start;
		if (size > 4 || size >= object->size / 4) {

Bogusly omits the page which was just obtained.  This means that

	vm_page_protect(p, VM_PROT_NONE);
	PAGE_WAKEUP(p);
	vm_page_free(p);

is never called.

The net result is that pages are left hanging out in space, but the system
thinks they have been reclaimed.
-----------------------------------------------------------------------------


Here is a patch for 3.1, which has the same bug; I didn't change the
while to a for because I still need to know if start was incremented:
-----------------------------------------------------------------------------
*** vm_object.c	Thu May 21 07:47:58 1998
--- vm_object.c.good	Fri Aug 14 02:08:15 1998
***************
*** 1348,1353 ****
--- 1348,1354 ----
  			}
  		}
  	} else {
+ 		s = 0;
  		while (size > 0) {
  			if ((p = vm_page_lookup(object, start)) != 0) {
  
***************
*** 1356,1361 ****
--- 1357,1363 ----
  					vm_page_protect(p, VM_PROT_NONE);
  					start += 1;
  					size -= 1;
+ 					s = 1;
  					continue;
  				}
  
***************
*** 1363,1376 ****
  				 * The busy flags are only cleared at
  				 * interrupt -- minimize the spl transitions
  				 */
!  				if (vm_page_sleep(p, "vmopar", &p->busy))
  					goto again;
  
  				if (clean_only && p->valid) {
  					vm_page_test_dirty(p);
  					if (p->valid & p->dirty) {
  						start += 1;
  						size -= 1;
  						continue;
  					}
  				}
--- 1365,1381 ----
  				 * The busy flags are only cleared at
  				 * interrupt -- minimize the spl transitions
  				 */
!  				if (vm_page_sleep(p, "vmopar", &p->busy)) {
! 					start -= s;
  					goto again;
+ 				}
  
  				if (clean_only && p->valid) {
  					vm_page_test_dirty(p);
  					if (p->valid & p->dirty) {
  						start += 1;
  						size -= 1;
+ 						s = 1;
  						continue;
  					}
  				}
***************
*** 1381,1386 ****
--- 1386,1392 ----
  			}
  			start += 1;
  			size -= 1;
+ 			s = 1;
  		}
  	}
  	vm_object_pip_wakeup(object);
-----------------------------------------------------------------------------


					Terry Lambert
					terry@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-stable" in the body of the message



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199808140227.TAA25476>