From owner-freebsd-questions@FreeBSD.ORG Sun Feb 22 03:04:22 2004 Return-Path: Delivered-To: freebsd-questions@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 46B2416A4CE for ; Sun, 22 Feb 2004 03:04:22 -0800 (PST) Received: from antsrv1.ant.uni-bremen.de (antsrv1.ant.uni-bremen.de [134.102.176.16]) by mx1.FreeBSD.org (Postfix) with ESMTP id DD8A143D1F for ; Sun, 22 Feb 2004 03:04:21 -0800 (PST) (envelope-from rebehn@ant.uni-bremen.de) Received: from europa.ant.uni-bremen.de ([134.102.176.10] helo=ant.uni-bremen.de) by antsrv1.ant.uni-bremen.de with esmtp (Exim 4.30; FreeBSD) id 1AurPI-0000aC-NH; Sun, 22 Feb 2004 12:04:20 +0100 Message-ID: <4038995A.4080503@ant.uni-bremen.de> Date: Sun, 22 Feb 2004 11:58:18 +0000 From: Heinrich Rebehn User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.6) Gecko/20040212 X-Accept-Language: en-us, en MIME-Version: 1.0 To: Lowell Gilbert References: <4037920D.3060702@ant.uni-bremen.de> <44r7wo85rw.fsf@be-well.ilk.org> In-Reply-To: <44r7wo85rw.fsf@be-well.ilk.org> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit cc: freebsd-questions@freebsd.org Subject: Re: Hardware or software error ? X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 22 Feb 2004 11:04:22 -0000 Lowell Gilbert wrote: > Heinrich Rebehn writes: > > >>Hi list, >> >>does anybody have a clue, if the following is a hard or software error? >> >>######################################################################### >> syslogd: kernel boot file is /boot/kernel/kernel >> kernel: >> kernel: >> kernel: Fatal trap 12: page fault while in kernel mode >> kernel: cpuid = 0; apic id = 00 >> kernel: fault virtual address = 0x4 >> kernel: fault code = supervisor read, page not present >> kernel: instruction pointer = 0x8:0xc0533f98 >> kernel: stack pointer = 0x10:0xe11f6b3c >> kernel: frame pointer = 0x10:0xe11f6b64 >> kernel: code segment = base 0x0, limit 0xfffff, type 0x1b >> kernel: = DPL 0, pres 1, def32 1, gran 1 >> kernel: processor eflags = interrupt enabled, resume, IOPL = 0 >> kernel: current process = 29 (swi1: net) >> kernel: trap number = 12 >> kernel: panic: page fault >> kernel: cpuid = 0; >> kernel: >> kernel: syncing disks, buffers remaining... >> kernel: >> kernel: Fatal trap 12: page fault while in kernel mode >> kernel: cpuid = 0; apic id = 00 >> kernel: fault virtual address = 0x4 >> kernel: fault code = supervisor read, page not present >> kernel: instruction pointer = 0x8:0xc0533f98 >> kernel: stack pointer = 0x10:0xe124bbcc >> kernel: frame pointer = 0x10:0xe124bbf4 >> kernel: code segment = base 0x0, limit 0xfffff, type 0x1b >> kernel: = DPL 0, pres 1, def32 1, gran 1 >> kernel: processor eflags = interrupt enabled, resume, IOPL = 0 >> kernel: current process = 26 (irq15: xl0 ata1+) >> kernel: trap number = 12 >> kernel: panic: page fault >> kernel: cpuid = 0; >> kernel: Uptime: 3d10h11m49s >>######################################################################### >> >>The system is running FreeBSD 5.2.1-RC2 > > > More likely to be a software problem, although it could be either. > Could you take a kernel dump to the -CURRENT list? I'm afraid my system is not set up to enable crash dumps. I must admit that i never cared about this, and unfortunately, the default install does not seem to enable it either. If i am wrong: where would i find the dump? Also, since this is our main server, i prefer going back to 4.9 which seems to be more solid than 5.2 :-(. I simply don't have time to do more experiments. Update: This morning's crash (which i forget in the mail before): ##################################################################### ntpd[470]: too many recvbufs allocated (40) cron[8309]: login_getclass: unknown class 'des_users' syslogd: kernel boot file is /boot/kernel/kernel kernel: kernel: kernel: Fatal trap 12: page fault while in kernel mode kernel: cpuid = 0; apic id = 00 kernel: fault virtual address = 0x4c kernel: fault code = supervisor read, page not present kernel: instruction pointer = 0x8:0xc04cc807 kernel: stack pointer = 0x10:0xe2cccca8 kernel: frame pointer = 0x10:0xe2ccccc8 kernel: code segment = base 0x0, limit 0xfffff, type 0x1b kernel: = DPL 0, pres 1, def32 1, gran 1 kernel: processor eflags = interrupt enabled, resume, IOPL = 0 kernel: current process = 38 (usbtask) kernel: trap number = 12 kernel: panic: page fault kernel: cpuid = 0; kernel: kernel: syncing disks, buffers remaining... 7137 7117 7117 7117 7117 7117 7117 7117 7117 7117 7117 7117 7117 7117 7117 7117 7117 7117 7117 7117 7117 kernel: giving up on 4591 buffers kernel: Uptime: 14h28m36s kernel: (da0:umass-sim0:0:0:0): Synchronize cache failed, status == 0x34, scsi status == 0x88 ######################################################################## The "giving up on nnnn Buffers" is also typical for 5.2. I rarely had it with 4.9. Unfortunately, all disks are marked dirty then, even if it is only 1 buffer. Update: I changed my mind and will continue to try 5.2. I disabled softupdates to get rid of the "softupdates inconsistency error" and set up the machine to enable crash dumps, as described in http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneldebug.html I also clocked down the AMD XP1800+ CPU to 100 MHz bus clock to reduce possible hardware instability. All we have to now is wait for the next crash ;-) Heinrich