From owner-freebsd-hackers@FreeBSD.ORG  Sat Apr 12 20:50:42 2008
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 7587B1065684;
	Sat, 12 Apr 2008 20:50:42 +0000 (UTC)
	(envelope-from dillon@apollo.backplane.com)
Received: from apollo.backplane.com (apollo.backplane.com [216.240.41.2])
	by mx1.freebsd.org (Postfix) with ESMTP id 3404D8FC12;
	Sat, 12 Apr 2008 20:50:42 +0000 (UTC)
	(envelope-from dillon@apollo.backplane.com)
Received: from apollo.backplane.com (localhost [127.0.0.1])
	by apollo.backplane.com (8.14.1/8.14.1) with ESMTP id m3CKdrSP065293;
	Sat, 12 Apr 2008 13:39:53 -0700 (PDT)
Received: (from dillon@localhost)
	by apollo.backplane.com (8.14.1/8.13.4/Submit) id m3CKdrPJ065292;
	Sat, 12 Apr 2008 13:39:53 -0700 (PDT)
Date: Sat, 12 Apr 2008 13:39:53 -0700 (PDT)
From: Matthew Dillon <dillon@apollo.backplane.com>
Message-Id: <200804122039.m3CKdrPJ065292@apollo.backplane.com>
To: Jonathan Chen <jon@freebsd.org>
References: <20080309212441.GA56523@porthos.spock.org>
Cc: hackers@freebsd.org
Subject: Re: mlock & COW
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
	<freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
	<mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
	<mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 12 Apr 2008 20:50:42 -0000


:I've been battling with a bug related to mlock and COW pages.  Since I'm 
:basically clueless when it comes to the VM subsystem, I was hoping someone 
:with more clue can look at my fix and let me know if I'm doing the right 
:thing here.
:
:The problem: User programs will crash (SEGV/BUS) if COW pages become 
:writable after the pages are mlocked.  This happens if shared libraries is 
:loaded in a program after a call to mlockall(MCL_FUTURE), and can be seen 
:with amd and ntpd when nss_ldap is used in the system.  Included at the end 
:of this message is a sample program that demonstrates the problem.
:
:The "solution": Forcibly wire the page via vm_fault_wire() when the page 
:protection bits are changed.  This seems to make the problem disappear, but 
:I'm not sure if causing a fault on vm_map_protect() is a good thing to do.  
:Am I correct in thinking vm_fault_wire() will cause the COW page to be 
:copied immediately?  I think this is the right thing to do, given the page 
:is supposed to be wired, but I'm not sure if somewhere in the vm fault 
:routines might be a better place to put the fix.  I'm also not sure what 
:the last argument to vm_fault_wire() (fictitious) means, I just copied the 
:argument from another invocation.  My patch (against 7-STABLE) is included 
:below.
:
:I'll commit this if someone blesses the fix as the right thing.
:
:
:Index: sys/vm/vm_map.c
:===================================================================
:RCS file: /home/ncvs/src/sys/vm/vm_map.c,v
:retrieving revision 1.388.2.3
:diff -u -p -r1.388.2.3 vm_map.c
:--- sys/vm/vm_map.c	18 Jan 2008 10:02:53 -0000	1.388.2.3
:+++ sys/vm/vm_map.c	9 Mar 2008 20:55:50 -0000
:@@ -1621,6 +1621,15 @@ vm_map_protect(vm_map_t map, vm_offset_t
: 			    current->end,
: 			    current->protection & MASK(current));
: #undef	MASK
:+			if ((entry->eflags & 
:+			    (MAP_ENTRY_USER_WIRED|MAP_ENTRY_COW)) ==
:+			    (MAP_ENTRY_USER_WIRED|MAP_ENTRY_COW)) {
:+				vm_fault_wire(map, current->start,
:+				    current->end, TRUE,
:+				    entry->object.vm_object != NULL &&
:+				    entry->object.vm_object->type ==
:+				    OBJT_DEVICE);
:+			}
: 		}
: 		vm_map_simplify_entry(map, current);
: 		current = current->next;

    Hmm.  mlock() will COW pages in writeable maps before locking the pages,
    so you are on the right track with regards to what to do if a read-only
    map is locked first, and then made writeable later.  Here's mlock's
    flow:

    mlock() calls vm_map_wire():

	* lookup entry
	* mark entry in transition
	* bump entry wired count
	* unlock map
	* vm_fault_wire	<--- this does the COW
	* relock map

    vm_map_protect's flow:

	* lookup entry
	  [ entry not marked in transition ]
	* adjust pmap protections (if COW forces read-only)
	  [ This is where your patch currently is ]

    I don't think your patch is going to work as-is.  The pages are already
    wired read-only at this point in your bug reproduction sequence:

	* mmap
	* protect read-only
	* mlock
	* protect read-write  <----- this point pages already wired RO
	* modify memory

    Here are some likely issues:

    * You can't issue the vm_fault_wire() unless the entry is marked
      in-transition, otherwise the entry can get ripped out from under you.

    * When you call vm_fault_wire() it will COW the page, but the *ORIGINAL*
      read-only page will still have been marked wired and will probably
      wind up being out of sync with the related map entries.

      I don't think vm_fault_wire() was designed to deal with that situation
      so its likely you will get a wiring leak.

    I think your only choice is to first do a wholesale unwiring of the 
    pages (equivalent to munlock()), and then re-lock them after the protection
    change.  It should be possible to do this from inside vm_map_protect()
    as long as you properly handle the vm_map_entry.

					-Matt
					Matthew Dillon 
					<dillon@backplane.com>