From owner-cvs-all Mon Oct 19 00:36:55 1998 Return-Path: Received: (from majordom@localhost) by hub.freebsd.org (8.8.8/8.8.8) id AAA29694 for cvs-all-outgoing; Mon, 19 Oct 1998 00:36:55 -0700 (PDT) (envelope-from owner-cvs-all@FreeBSD.ORG) Received: from apollo.backplane.com (apollo.backplane.com [209.157.86.2]) by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id AAA29678; Mon, 19 Oct 1998 00:36:53 -0700 (PDT) (envelope-from dillon@apollo.backplane.com) Received: (from dillon@localhost) by apollo.backplane.com (8.9.1/8.9.1) id AAA12327; Mon, 19 Oct 1998 00:36:30 -0700 (PDT) (envelope-from dillon) Date: Mon, 19 Oct 1998 00:36:30 -0700 (PDT) From: Matthew Dillon Message-Id: <199810190736.AAA12327@apollo.backplane.com> To: cvs-committers@FreeBSD.ORG, cvs-all@FreeBSD.ORG Subject: Anyone got any ideas? getblk/pgtblk/inode lockup Sender: owner-cvs-all@FreeBSD.ORG Precedence: bulk I've been trying to track this down for a month or two now... it's the only regular crash I get under -current these days. What happens is that a process gets stuck in pgtblk (kern/vfs_bio.c line 1800 or so). This causes a cascade of processes locking up until the system goes crunch. The lockup occurs about once a week on a heavily loaded machine with lots of processes that use mmap() heavily (i.e. running Diablo). The code in question is this: } else if (m->flags & PG_BUSY) { s = splvm(); if (m->flags & PG_BUSY) { vm_page_flag_set(m, PG_WANTED); tsleep(m, PVM, "pgtblk", 0); } splx(s); goto doretry; } ... Looking at the kernel dump, the page structure 'm' in the process locked up in pgtblk has PG_WANTED and PG_TABLED set, but PG_BUSY cleared. This combined with the code fragment above leads me to believe that PG_BUSY is somehow being cleared on the page without waking up the process. (kgdb) print *m $3 = { pageq = { tqe_next = 0xf0f46024, tqe_prev = 0xf0ee7438 }, hashq = { tqe_next = 0xf0f46024, tqe_prev = 0xf0c3df50 }, listq = { tqe_next = 0xf0f184e4, tqe_prev = 0xf0eb0d14 }, object = 0xfc4e3440, pindex = 0x90, phys_addr = 0x35ed000, queue = 0x82, flags = 0x6, PG_WANTED, PG_TABLED pc = 0x2d, wire_count = 0x0, hold_count = 0x0, act_count = 0x5, busy = 0x1, valid = 0x0, dirty = 0x0 } I have other kernel dumps that show the same thing. The page flags are similar, except I see 0x46 sometimes (PG_ZERO|PG_TABLED|PG_WANTED). It's virtually the only crash I've gotten with FreeBSD-current so far. If anyone has any ideas, I'm all ears. I've been chasing this one for a few months. -Matt Matthew Dillon Engineering, HiWay Technologies, Inc. & BEST Internet Communications & God knows what else. (Please include original email in any response) To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe cvs-all" in the body of the message