From owner-freebsd-hackers Sun Oct 13 11:45:10 2002 Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 3255637B401 for ; Sun, 13 Oct 2002 11:45:06 -0700 (PDT) Received: from sundance.cse.ucsc.edu (sundance.cse.ucsc.edu [128.114.48.62]) by mx1.FreeBSD.org (Postfix) with SMTP id C6DE443E88 for ; Sun, 13 Oct 2002 11:45:05 -0700 (PDT) (envelope-from dkulp@cse.ucsc.edu) Received: from localhost (dkulp@localhost) by sundance.cse.ucsc.edu (8.6.10/8.6.12) with ESMTP id LAA16564 for ; Sun, 13 Oct 2002 11:45:05 -0700 X-Authentication-Warning: sundance.cse.ucsc.edu: dkulp owned process doing -bs Date: Sun, 13 Oct 2002 11:45:05 -0700 (PDT) From: "David C. Kulp" To: hackers@freebsd.org Subject: panic: vm_page_remove(): page not found in hash Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-hackers@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG I've been experiencing frequent panics on a new installation of 4.7-RC. I recompiled the kernel with -g, got another panic, obtained a core dump, and plopped myself into kgdb. I'd like to provide some insight or solicit some help as to the cause or cure f the crash, but my stack trace leaves a lot to be desired. Here I've cleaned it up for readability. I lose the source in kgdb prior to the syscall2. #0 dumpsys () at /usr/src/sys/kern/kern_shutdown.c:487 #1 in boot () at /usr/src/sys/kern/kern_shutdown.c:316 #2 in panic () at /usr/src/sys/kern/kern_shutdown.c:595 #3 in db_panic () at /usr/src/sys/ddb/db_command.c:435 #4 in db_command () at /usr/src/sys/ddb/db_command.c:333 #5 in db_command_loop () at /usr/src/sys/ddb/db_command.c:457 #6 in db_trap () at /usr/src/sys/ddb/db_trap.c:71 #7 in kdb_trap () at /usr/src/sys/i386/i386/db_interface.c:158 #8 in trap () at /usr/src/sys/i386/i386/trap.c:592 #9 in Debugger () at machine/cpufunc.h:67 #10 in panic ("vm_page_remove(): page not found in hash") at /usr/src/sys/kern/kern_shutdown.c:593 #11 in vm_page_remove () at /usr/src/sys/vm/vm_page.c:460 #12 in vm_page_free_toq () at /usr/src/sys/vm/vm_page.c:1103 #13 in vm_page_alloc () at /usr/src/sys/vm/vm_page.h:514 #14 in allocbuf () at /usr/src/sys/kern/vfs_bio.c:2517 #15 in getblk () at /usr/src/sys/kern/vfs_bio.c:2292 #16 in ffs_balloc () at /usr/src/sys/ufs/ffs/ffs_balloc.c:303 #17 in ffs_write () at vnode_if.h:1056 #18 in vn_write () at vnode_if.h:363 #19 in dofilewrite () at /usr/src/sys/sys/file.h:162 #20 in write () at /usr/src/sys/kern/sys_generic.c:329 #21 in syscall2 () at /usr/src/sys/i386/i386/trap.c:1175 #22 in Xint0x80_syscall () #23 in ?? () #24 ... It looks to me like a memory or disk failure. Possibly relevant clues: I've got a single 512MB chip on a mainboard with a VIA Apollo KTE 333/686B chipset. I've got an old, small UDMA2 disk for the root file system and a second new, large disk that could be running faster but it's on the same cable as the slower disk. Most of the I/O is on the larger slave. I'm running a wireless access point (wi driver), firewall, and server. The panics seem to be associated with large network usage and file I/O, but I believe the problem is not associated with the wireless because panics have occurred during heavy traffic over either the wi or fxp interfaces. dmesg is attached below. Any guidance? Thanks in advance, -d ps. BTW, I found that using "-z" for savecore fails with savecore: writing compressed core to /opt/data/crash/vmcore.0.gz savecore: /opt/data/crash/vmcore.0.gz: Illegal seek savecore: WARNING: vmcore may be incomplete but removing the -z works fine, albeit using much more space. Copyright (c) 1992-2002 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 19\ 94 The Regents of the University of California. All rights reserv\ ed. FreeBSD 4.7-STABLE #6: Sat Oct 12 00:17:11 PDT 2002 root@XXX:/opt/data/usr/src/obj/opt/data/usr/src/sys/FW Timecounter "i8254" frequency 1193182 Hz CPU: AMD Duron(tm) processor (901.60-MHz 686-class CPU) Origin = "AuthenticAMD" Id = 0x631 Stepping = 1 Features=0x183f9ff AMD Features=0xc0440000<,AMIE,DSP,3DNow!> real memory = 536805376 (524224K bytes) avail memory = 516956160 (504840K bytes) Preloaded elf kernel "kernel" at 0xc0539000. Pentium Pro MTRR support enabled md0: Malloc disk Using $PIR table, 8 entries at 0xc00fdba0 npx0: on motherboard npx0: INT 16 interface pcib0: on motherboard pci0: on pcib0 pcib1: at device 1.0 on\ pci0 pci1: on pcib1 isab0: at device 7.0 on pci0 isa0: on isab0 atapci0: port 0xd000-0xd00f at device 7\ .1 on pci0 atapci0: Correcting VIA config for southbridge data corruption bug ata0: at 0x1f0 irq 14 on atapci0 ata1: at 0x170 irq 15 on atapci0 uhci0: port 0xd400-0xd41f irq 11 at device\ 7.2 on pci0 usb0: on uhci0 usb0: USB revision 1.0 uhub0: VIA UHCI root hub, class 9/0, rev 1.00/1.00, addr 1 uhub0: 2 ports with 2 removable, self powered uhci1: port 0xd800-0xd81f irq 11 at device\ 7.3 on pci0 uhub1: VIA UHCI root hub, class 9/0, rev 1.00/1.00, addr 1 uhub1: 2 ports with 2 removable, self powered pci0: (vendor=0x1106, dev=0x3057) at 7.4 pci0: at 14.0 irq 12 fxp0: port 0xdc00-0xdc3f mem 0xd7000\ 000-0xd70fffff,0xd7100000-0xd7100fff irq 10 at device 15.0 on pci0 fxp0: Ethernet address 00:02:b3:30:28:93 inphy0: on miibus0 inphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto ed0: port 0xe000-0xe01f irq 11 at\ device 16.0 on pci0 ed0: address 00:80:ad:b6:ee:4d, type NE2000 (16 bit) orm0: