From owner-freebsd-current@FreeBSD.ORG Sun Aug 7 03:45:33 2005 Return-Path: X-Original-To: freebsd-current@freebsd.org Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id A2E6F16A422 for ; Sun, 7 Aug 2005 03:45:33 +0000 (GMT) (envelope-from julian@elischer.org) Received: from delight.idiom.com (delight.idiom.com [216.240.32.16]) by mx1.FreeBSD.org (Postfix) with ESMTP id 0B4054430B for ; Sun, 7 Aug 2005 03:12:59 +0000 (GMT) (envelope-from julian@elischer.org) Received: from idiom.com (idiom.com [216.240.32.1]) by delight.idiom.com (Postfix) with ESMTP id AB64720067E; Sat, 6 Aug 2005 20:12:58 -0700 (PDT) Received: from [192.168.2.3] (home.elischer.org [216.240.48.38]) by idiom.com (8.12.11/8.12.11) with ESMTP id j773CtDP056997; Sat, 6 Aug 2005 20:12:56 -0700 (PDT) (envelope-from julian@elischer.org) Message-ID: <42F57C37.3080900@elischer.org> Date: Sat, 06 Aug 2005 20:12:55 -0700 From: Julian Elischer User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.7.7) Gecko/20050424 X-Accept-Language: en, hu MIME-Version: 1.0 To: Martin Nilsson References: <42F45923.2080401@gneto.com> <42F4659F.5030407@gneto.com> <42F46942.7030005@elischer.org> <42F4730C.6040204@gneto.com> In-Reply-To: <42F4730C.6040204@gneto.com> Content-Type: multipart/mixed; boundary="------------040706020200020108010809" Cc: freebsd-current@freebsd.org Subject: Re: Something is very wrong with disk caching in 7.0 X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 07 Aug 2005 03:45:33 -0000 This is a multi-part message in MIME format. --------------040706020200020108010809 Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Martin Nilsson wrote: > Julian Elischer wrote: > try this patch --------------040706020200020108010809 Content-Type: text/plain; name="pageout.diff2" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="pageout.diff2" Index: sys/vm/vm_pageout.c =================================================================== RCS file: /home/ncvs/src/sys/vm/vm_pageout.c,v retrieving revision 1.268 diff -u -r1.268 vm_pageout.c --- sys/vm/vm_pageout.c 7 Jan 2005 02:29:27 -0000 1.268 +++ sys/vm/vm_pageout.c 30 Jul 2005 03:12:37 -0000 @@ -5,6 +5,8 @@ * All rights reserved. * Copyright (c) 1994 David Greenman * All rights reserved. + * Copyright (c) 2005 Yahoo! Technologies Norway AS + * All rights reserved. * * This code is derived from software contributed to Berkeley by * The Mach Operating System project at Carnegie-Mellon University. @@ -210,6 +212,16 @@ static void vm_pageout_page_stats(void); /* + * Experimental VM_PAGEOUT_FORCE_BLOCKING_OBJLOCK option, which will cause + * pagedaemon to fall back to blocking locking of vm objects if nonblocking + * lock attempt failed. Lock order violation is avoided by unlocking + * the page queues before locking the object. marker pages are used + * to detect changes and allow for continued page queue traversal even + * when changes had occurred. + */ +#define VM_PAGEOUT_FORCE_BLOCKING_OBJLOCK + +/* * vm_pageout_clean: * * Clean the page and remove it from the laundry. @@ -750,8 +762,37 @@ * queue, most likely are being paged out. */ if (!VM_OBJECT_TRYLOCK(object)) { +#ifdef VM_PAGEOUT_FORCE_BLOCKING_OBJLOCK + /* + * Cannot lock object while holding page queue lock. + * Depend on both struct vm_object and normal + * struct vm_page being type stable and sanity + * check after reobtaining page queue lock. + */ + TAILQ_INSERT_AFTER(&vm_page_queues[PQ_INACTIVE].pl, + m, &marker, pageq); + vm_page_unlock_queues(); + VM_OBJECT_LOCK(object); + vm_page_lock_queues(); + /* Page queue might have changed. */ + next = TAILQ_NEXT(&marker, pageq); + if (m->queue != PQ_INACTIVE || + m->object != object || + m->hold_count != 0 || + &marker != TAILQ_NEXT(m, pageq)) { + /* Page changed. */ + VM_OBJECT_UNLOCK(object); + TAILQ_REMOVE(&vm_page_queues[PQ_INACTIVE].pl, + &marker, pageq); + addl_page_shortage++; + continue; + } + TAILQ_REMOVE(&vm_page_queues[PQ_INACTIVE].pl, + &marker, pageq); +#else addl_page_shortage++; continue; +#endif } if (m->busy || (m->flags & PG_BUSY)) { VM_OBJECT_UNLOCK(object); @@ -1024,10 +1065,44 @@ next = TAILQ_NEXT(m, pageq); object = m->object; +#ifdef VM_PAGEOUT_FORCE_BLOCKING_OBJLOCK + if ((m->flags & PG_MARKER) != 0) { + m = next; + continue; + } +#endif if (!VM_OBJECT_TRYLOCK(object)) { +#ifdef VM_PAGEOUT_FORCE_BLOCKING_OBJLOCK + /* + * Cannot lock object while holding page queue lock. + * Depend on both struct vm_object and normal + * struct vm_page being type stable and sanity + * check after reobtaining page queue lock. + */ + TAILQ_INSERT_AFTER(&vm_page_queues[PQ_ACTIVE].pl, + m, &marker, pageq); + vm_page_unlock_queues(); + VM_OBJECT_LOCK(object); + vm_page_lock_queues(); + /* Page queue might have changed. */ + next = TAILQ_NEXT(&marker, pageq); + if (m->queue != PQ_ACTIVE || + m->object != object || + &marker != TAILQ_NEXT(m, pageq)) { + /* Page changed. */ + VM_OBJECT_UNLOCK(object); + TAILQ_REMOVE(&vm_page_queues[PQ_ACTIVE].pl, + &marker, pageq); + m = next; + continue; + } + TAILQ_REMOVE(&vm_page_queues[PQ_ACTIVE].pl, + &marker, pageq); +#else vm_pageq_requeue(m); m = next; continue; +#endif } /* @@ -1264,6 +1339,9 @@ vm_pageout_page_stats() { vm_object_t object; +#ifdef VM_PAGEOUT_FORCE_BLOCKING_OBJLOCK + struct vm_page marker; +#endif vm_page_t m,next; int pcount,tpcount; /* Number of pages to check */ static int fullintervalcount = 0; @@ -1287,6 +1365,16 @@ fullintervalcount = 0; } +#ifdef VM_PAGEOUT_FORCE_BLOCKING_OBJLOCK + /* + * Initialize our marker + */ + bzero(&marker, sizeof(marker)); + marker.flags = PG_BUSY | PG_FICTITIOUS | PG_MARKER; + marker.queue = PQ_INACTIVE; + marker.wire_count = 1; +#endif + m = TAILQ_FIRST(&vm_page_queues[PQ_ACTIVE].pl); while ((m != NULL) && (pcount-- > 0)) { int actcount; @@ -1296,10 +1384,45 @@ next = TAILQ_NEXT(m, pageq); object = m->object; + +#ifdef VM_PAGEOUT_FORCE_BLOCKING_OBJLOCK + if ((m->flags & PG_MARKER) != 0) { + m = next; + continue; + } +#endif if (!VM_OBJECT_TRYLOCK(object)) { +#ifdef VM_PAGEOUT_FORCE_BLOCKING_OBJLOCK + /* + * Cannot lock object while holding page queue lock. + * Depend on both struct vm_object and normal + * struct vm_page being type stable and sanity + * check after reobtaining page queue lock. + */ + TAILQ_INSERT_AFTER(&vm_page_queues[PQ_ACTIVE].pl, + m, &marker, pageq); + vm_page_unlock_queues(); + VM_OBJECT_LOCK(object); + vm_page_lock_queues(); + /* Page queue might have changed. */ + next = TAILQ_NEXT(&marker, pageq); + if (m->queue != PQ_ACTIVE || + m->object != object || + &marker != TAILQ_NEXT(m, pageq)) { + /* Page changed. */ + VM_OBJECT_UNLOCK(object); + TAILQ_REMOVE(&vm_page_queues[PQ_ACTIVE].pl, + &marker, pageq); + m = next; + continue; + } + TAILQ_REMOVE(&vm_page_queues[PQ_ACTIVE].pl, + &marker, pageq); +#else vm_pageq_requeue(m); m = next; continue; +#endif } /* --------------040706020200020108010809--