Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 16 Aug 1998 14:14:05 +0000 (GMT)
From:      Terry Lambert <tlambert@primenet.com>
To:        dg@root.com
Cc:        tlambert@primenet.com, current@FreeBSD.ORG, karl@mcs.net
Subject:   Re: Better VM patches (was Tentative fix for VM bug)
Message-ID:  <199808161414.HAA16245@usr08.primenet.com>
In-Reply-To: <199808161315.GAA00184@implode.root.com> from "David Greenman" at Aug 16, 98 06:15:30 am

next in thread | previous in thread | raw e-mail | index | archive | help
> >This may be the source of the:
> >
> >         * XXX: sometimes we get pages which aren't wired down or on any queue -
> >         * we need to put them on the inactive queue also, otherwise we lose 
> >         * track of them. Paul Mackerras (paulus@cs.anu.edu.au) 9-Jan-93.  
> >
> >comment in vm_page.c, and it may be the source of the zero page
> >phenomenon.
> 
>    That comment simply means what it says and must be taken in the context
> of the function where it appears (vm_page_deactivate). The system sometimes
> needs to put a page on a queue - any queue, and vm_page_deactivate needs to
> be able to handle the case of not-on-any-queue. Really, the comment is
> obsolete and dates back to a time when the VM system didn't manage the page
> queues properly.

Then it should be replaced with a DIAGNOSTIC panic.  The issue that's
being addressed here is the "XXX", more than anything else.  I really
think there are three cases in the code where pages can be orphaned;
I think the object code only deals with one of these; the other two
can be implemented with late-binding reaps without a problem (ie: the
code is not technically correct, except in reduction).


> >Another problem in the mmap code is that when the vnode_pager object
> >is created, it is set to be the number of pages times the page size
> >in length; but also in vnode_pager, the setsize on the object sets
> >the length to the actual file length.
> >
> >This discrepancy opens a window in which some changes may be lost.
> 
>    My reading of the code shows that the object size is set to the rounded-up
> index. vnp_size is set to the unrounded new size, but that's expected and
> I don't see any bugs related to doing that; rounding it will likely break the
> proper reading of partial pages. Am I missing something? vnp_size is mostly
> private to the vnode_pager. The rounded-up object->size is used outside of
> the vnode pager.

OK, consider the boundry cases:

1)	The page is less than the boundary, but is extended to less
	than the boundary.  This is seen as a truncation by setsize.

2)	The page is less than the boundary, but is extended to the
	boundary.  This is seen as a NOP by the first compare in the
	setsize code.

3)	The mapped page is written (an implicit extend without a write
	fault -- this is, I believe, Karl Denniger's case when the
	code provided failed -- between the mapping and the actual size).

There are actually two more failure modes, but they are hellaciously
complex to describe.  8-(.


> >John also suggested another change for detecting a case where page
> >orphans can be created.  in vm_page.c, it's possible that an object
> >entry in the page insert function will overwrite an existing entry;
> >I added a DIAGNOSTIC panic to catch this when it happens.
> 
>    You mean that there can be multiple pages at the same offset? It would
> be bad if that happend, but I'm skeptical that it actually does.

So was I, when John suggested it.  John's suggested change would
detect an orphaning event.  Since I believe that what is happening is
actually an alias event (for the bug that is at the top of my list),
the DIAGNOSTIC change is a NOP, as far as I'm concerned.  But it's a
nice thing to have to detect the XXX comment case, which, in theory,
should never occur (ie: a page should be on a queue, and this recovery
mechanism was a kludge which covers the real problem, which is orphaned
pages).


> >+ 		/*
> >+ 		 * XXX we are in a race here.  If the file is extended
> >+ 		 * between the time we get the object and the time
> >+ 		 * we set the size, then we lose...
> >+ 		 */
> >+ 		if (!(flags & MAP_ANON) && vp->v_type != VCHR)
> >+ 			object->un_pager.vnp.vnp_size = vat.va_size;
> 
>    Hmmm.

This is, in fact, the most interesting change, as far as Karl Denniger is
concerned.  I would like to close the race window here (by adding an
"actual_length" parameter to the various "allocate" routines) to see
if this resolves Karl's remaining bugs (per the SPLVM bug noted in the
comment here)... I suspect that Karl has two bugs; the one I hit, and
a sepoerate one, and the remaining window accounts for a tiny fraction
of the misbehaviour that he is still experiencing.

The real way out of this mes is to reference cont everything and to note
1->2 transitions for page references.  This takes a significant amount of
code and structure changes to accomplish; it would be nice if the problem
were a bit easier to duplicate, but it's not.  I have to make changes
that narrow down the problem instead of resolve it; A/B tests.  8-(.


> >+ 		/*
> >+ 		 * The object is allocated to a page boundary.  This is
> >+ 		 * incorrect if page is only partially backed by the vp;
> >+ 		 * we must fix this up in the caller.
> >+ 		 */
> >  		object->un_pager.vnp.vnp_size = (vm_ooffset_t) size * PAGE_SIZE;
> 
>    I think the above change is wrong.

It's only a comment.  The actual change is:

 		if (!(flags & MAP_ANON) && vp->v_type != VCHR)
			object->un_pager.vnp.vnp_size = vat.va_size;

But me too, since it allows a race window.  A correct fix would get rid
of the window by passing an additional parameter to those alloc routines
that need to care about actual backing length (ie: vnode_pager).  It
was more of a test case than anything else... if the problem decreased,
it points to the race window that was partially closed.  If it didn't,
then the operation was a boundary condition, which, while nice to
address, did nothing to address reported bugs.



For non-page-aligned pagers, this would be important; for all others,
I expect that round_page() would be used to assure page alignment.

This is very tricky; I nearly posted code to page align setsize instead,
but that would have been a mistake, since the real issue is the bzero
that occurs in the setsize code.  The point is to set the size in such
a way that it is not misinterpreted as a truncate between the set event
and the truncation event handler.  This is the remaining race.

A very real problem here is that the code is hideously poorly commented,
which makes fixes problematic.  I haven't investigated what it would take
to put asserts of locking state in there; from a naieve examination of the
code, there would need to be a parent map pointer in vm_map_t that was
used to see if the parent was locked (this may exist; I haven't looked
for it).  The PITA is that there is an implicit rather than an explicit
hierarchical lock relationship.  8-(.

At a minimum, it would be nice if each function asserted its assumptions
with regard to locks based on its parameters... at least in the
DIAGNOSTIC case, so that runtime checks could be enabled.


					Terry Lambert
					terry@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-current" in the body of the message



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199808161414.HAA16245>