Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 13 Sep 2010 12:30:29 -0500
From:      Alan Cox <alc@rice.edu>
To:        Kostik Belousov <kostikbel@gmail.com>,  Ryan Stone <rstone@freebsd.org>
Cc:        svn-src-head@freebsd.org, svn-src-all@freebsd.org, src-committers@freebsd.org
Subject:   Re: svn commit: r212281 - head/sys/vm
Message-ID:  <4C8E5FB5.9070009@rice.edu>
In-Reply-To: <20100907080446.GA2853@deviant.kiev.zoral.com.ua>
References:  <201009070023.o870Njtg072607@svn.freebsd.org> <20100907080446.GA2853@deviant.kiev.zoral.com.ua>

next in thread | previous in thread | raw e-mail | index | archive | help
Kostik Belousov wrote:
> On Tue, Sep 07, 2010 at 12:23:45AM +0000, Ryan Stone wrote:
>   
>> Author: rstone
>> Date: Tue Sep  7 00:23:45 2010
>> New Revision: 212281
>> URL: http://svn.freebsd.org/changeset/base/212281
>>
>> Log:
>>   In munmap() downgrade the vm_map_lock to a read lock before taking a read
>>   lock on the pmc-sx lock.  This prevents a deadlock with
>>   pmc_log_process_mappings, which has an exclusive lock on pmc-sx and tries
>>   to get a read lock on a vm_map.  Downgrading the vm_map_lock in munmap
>>   allows pmc_log_process_mappings to continue, preventing the deadlock.
>>   
>>   Without this change I could cause a deadlock on a multicore 8.1-RELEASE
>>   system by having one thread constantly mmap'ing and then munmap'ing a
>>   PROT_EXEC mapping in a loop while I repeatedly invoked and stopped pmcstat
>>   in system-wide sampling mode.
>>   
>>   Reviewed by:	fabient
>>   Approved by:	emaste (mentor)
>>   MFC after:	2 weeks
>>
>> Modified:
>>   head/sys/vm/vm_mmap.c
>>
>> Modified: head/sys/vm/vm_mmap.c
>> ==============================================================================
>> --- head/sys/vm/vm_mmap.c	Mon Sep  6 23:52:04 2010	(r212280)
>> +++ head/sys/vm/vm_mmap.c	Tue Sep  7 00:23:45 2010	(r212281)
>> @@ -579,6 +579,7 @@ munmap(td, uap)
>>  	 * Inform hwpmc if the address range being unmapped contains
>>  	 * an executable region.
>>  	 */
>> +	pkm.pm_address = (uintptr_t) NULL;
>>  	if (vm_map_lookup_entry(map, addr, &entry)) {
>>  		for (;
>>  		     entry != &map->header && entry->start < addr + size;
>> @@ -587,16 +588,23 @@ munmap(td, uap)
>>  				entry->end, VM_PROT_EXECUTE) == TRUE) {
>>  				pkm.pm_address = (uintptr_t) addr;
>>  				pkm.pm_size = (size_t) size;
>> -				PMC_CALL_HOOK(td, PMC_FN_MUNMAP,
>> -				    (void *) &pkm);
>>  				break;
>>  			}
>>  		}
>>  	}
>>  #endif
>> -	/* returns nothing but KERN_SUCCESS anyway */
>>  	vm_map_delete(map, addr, addr + size);
>> +
>> +#ifdef HWPMC_HOOKS
>> +	/* downgrade the lock to prevent a LOR with the pmc-sx lock */
>> +	vm_map_lock_downgrade(map);
>> +	if (pkm.pm_address != (uintptr) NULL)
>> +		PMC_CALL_HOOK(td, PMC_FN_MUNMAP, (void *) &pkm);
>> +	vm_map_unlock_read(map);
>> +#else
>>  	vm_map_unlock(map);
>> +#endif
>> +	/* vm_map_delete returns nothing but KERN_SUCCESS anyway */
>>  	return (0);
>>  }
>>  
>>     
> Note that vm_map_unlock() is more then just dropping the lock on the map.
> Due to ordering of the vnode lock before vm map lock, vm_map_unlock()
> processes the deferred free entries after map lock is dropped. After your
> change, the deferred list might keep entries for some time until next
> unlock is performed.
>
>   

I'm afraid that this understates the effect.  Over the weekend, when I 
updated one of my amd64 machines to include this change, I found that 
the delay in object and page deallocation is leading to severe 
fragmentation within the physical memory allocator.  As a result, the 
time spent in the kernel during a "buildworld" increased by about 8%.  
Moreover, superpage promotion by applications effectively stopped.

For now, I think it would be best to back out r212281 and r212282.  
Ultimately, the fix may be to change the vm map synchronization 
primitives, and simply reinstate r212281 and r212282, but I'd like some 
time to consider the options.

Regards,
Alan





Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4C8E5FB5.9070009>