Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 13 Jan 2000 23:35:22 +1100
From:      Tony Frank <tfrank@eric.net.au>
To:        freebsd-questions@freebsd.org
Subject:   3.4-STABLE crashes intermittently - need suggestions
Message-ID:  <20000113233522.A1035@random.n2-au>

next in thread | raw e-mail | index | archive | help
Hi,

I have a system running at 3.4-STABLE that seems to occasionally freeze or hang for no obvious reason.

What I'm after are some pointers on what to do to try and isolate the problem, hopefully with the end result being a system that stays up for more than about 3 days at a time.  I will describe the symptoms, my configuration, and also include my attempts at a post mortem below...

The problem:
Most of the time the system just appears to freeze, with no error message on the console - at this time the IDE HD LED is usually on solid (however no audiable disk activity) the keyboard offers no response, the system no longer responds to any form of network probing (arp, ping etc) and requires physical intervention in the form of a power on/off (no reset button)

This seems to occur most often when the system is under high load, but also 
appears to occur when system is mostly idle.

I can do a make world (and also make -j4 world) with no problems, which indicates to me that the hardware should be pretty much ok.  I have not swapped the various components around or tried parts from other systems in there, but I am reasonably confident in the hardware - certainly the harddisk and ethernet cards were working in my other PC for 6months+ with no problems.  

The most recent time it occured, I had some moderate NFS traffic (cvs update across 10mb ethernet, frozen system as NFS server) and at the same time some fairly heavy local disk activity (local cvs update from local repository) and also routing light http traffic to/from ISP (multilink userppp - 2x56k modems at about 8k/s sustained traffic)
After the reboot, there was no coredump generated.

I have the following output of an attempt to debug a panic from several days ago(my first time attempting to use gdb so I didn't get very far) I basically followed the example in the faq/handbook but didn't know what to make of the output...

kernel post mortem output:

17:08:18 tony@random (/usr/src/sys/compile/RAND)$ gdb -kernel kernel.debug /var

/crash/vmcore.5
17:10:19 tony@random (/usr/src/sys/compile/RAND)$ gdb -kernel
GNU gdb 4.18
Copyright 1998 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i386-unknown-freebsd".
(kgdb) symbol-file kernel.debug
Reading symbols from kernel.debug...done.
(kgdb) exec-file /var/crash/kernel.5
(kgdb) core-file /var/crash/vmcore.5
IdlePTD 2670592
initial pcb at 222250
panicstr: page fault
panic messages:
---
Fatal trap 12: page fault while in kernel mode
fault virtual address	= 0x0
fault code		= supervisor read, page not present
instruction pointer	= 0x8:0x0
stack pointer	        = 0x10:0xc33b2acc
frame pointer	        = 0x10:0xc33b2ad4
code segment		= base 0x0, limit 0xfffff, type 0x1b
			= DPL 0, pres 1, def32 1, gran 1
processor eflags	= interrupt enabled, resume, IOPL = 0
current process		= 167 (nfsd)
interrupt mask		= 
trap number		= 12
panic: page fault

syncing disks... 

Fatal trap 12: page fault while in kernel mode
fault virtual address	= 0x0
fault code		= supervisor read, page not present
instruction pointer	= 0x8:0x0
stack pointer	        = 0x10:0xc33b28d0
frame pointer	        = 0x10:0xc33b28d8
code segment		= base 0x0, limit 0xfffff, type 0x1b
			= DPL 0, pres 1, def32 1, gran 1
processor eflags	= interrupt enabled, resume, IOPL = 0
current process		= 167 (nfsd)
interrupt mask		= bio 
trap number		= 12
panic: page fault

dumping to dev 20001, offset 88064
dump 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 
---
#0  boot (howto=260) at ../../kern/kern_shutdown.c:285
285			dumppcb.pcb_cr3 = rcr3();
(kgdb) where
#0  boot (howto=260) at ../../kern/kern_shutdown.c:285
#1  0xc012c1b8 in at_shutdown (
    function=0xc020a6de <__set_sysinit_set_sym_memdev_sys_init+1050>, 
    arg=0xc3376c80, queue=-1019762752) at ../../kern/kern_shutdown.c:446
#2  0xc01e6099 in trap_fatal (frame=0xc33b2894, eva=0)
    at ../../i386/i386/trap.c:942
#3  0xc01e5d77 in trap_pfault (frame=0xc33b2894, usermode=0, eva=0)
    at ../../i386/i386/trap.c:835
#4  0xc01e59ee in trap (frame={tf_es = 16, tf_ds = 16, tf_edi = -1053105680, 
      tf_esi = -1019799296, tf_ebp = -1019533096, tf_isp = -1019533124, 
      tf_ebx = 6144, tf_edx = -1019533040, tf_ecx = 40, tf_eax = 0, 
      tf_trapno = 12, tf_err = 0, tf_eip = 0, tf_cs = 8, tf_eflags = 66198, 
      tf_esp = -1072316847, tf_ss = -1019533040}) at ../../i386/i386/trap.c:437
#5  0x0 in ?? ()
(kgdb) up 4
#4  0xc01e59ee in trap (frame={tf_es = 16, tf_ds = 16, tf_edi = -1053105680, 
      tf_esi = -1019799296, tf_ebp = -1019533096, tf_isp = -1019533124, 
      tf_ebx = 6144, tf_edx = -1019533040, tf_ecx = 40, tf_eax = 0, 
      tf_trapno = 12, tf_err = 0, tf_eip = 0, tf_cs = 8, tf_eflags = 66198, 
      tf_esp = -1072316847, tf_ss = -1019533040}) at ../../i386/i386/trap.c:437
437				(void) trap_pfault(&frame, FALSE, eva);
(kgdb) frame frame->tf_ebp frame->tf_eip
#0  0x0 in ?? ()
(kgdb) list
432	#endif
433			/* kernel trap */
434	
435			switch (type) {
436			case T_PAGEFLT:			/* page fault */
437				(void) trap_pfault(&frame, FALSE, eva);
438				return;
439	
440			case T_DNA:
441	#if NNPX > 0
(kgdb) up
#1  0xc012bf42 in boot (howto=-1019533040) at ../../kern/kern_shutdown.c:287
287			dumpsys();
(kgdb) up
#2  0xc01be691 in ufs_vnoperatespec (ap=0xc33b2910)
    at ../../ufs/ufs/ufs_vnops.c:2318
2318		return (VOCALL(ufs_specop_p, ap->a_desc->vdesc_offset, ap));
(kgdb) up
#3  0xc014a3a3 in vfs_bio_awrite (bp=0xc13ae1f0) at vnode_if.h:1145
1145		return (VCALL((bp)->b_vp, VOFFSET(vop_bwrite), &a));
(kgdb) up
#4  0xc01b85ea in ffs_fsync (ap=0xc33b2998) at ../../ufs/ffs/ffs_vnops.c:205
205				vfs_bio_awrite(bp);
(kgdb) up
#5  0xc01b6a93 in ffs_sync (mp=0xc0772600, waitfor=2, cred=0xc04da680, 
    p=0xc0236f94) at vnode_if.h:499
499		return (VCALL(vp, VOFFSET(vop_fsync), &a));
(kgdb) up
#6  0xc015291f in sync (p=0xc0236f94, uap=0x0) at ../../kern/vfs_syscalls.c:549
549				VFS_SYNC(mp, MNT_NOWAIT,
(kgdb) up
#7  0xc012bd79 in boot (howto=256) at ../../kern/kern_shutdown.c:203
203			sync(&proc0, NULL);
(kgdb) up
#8  0xc012c1b8 in at_shutdown (
    function=0xc020a6de <__set_sysinit_set_sym_memdev_sys_init+1050>, 
    arg=0xc3376c80, queue=-1019762752) at ../../kern/kern_shutdown.c:446
446		boot(bootopt);
(kgdb) up
#9  0xc01e6099 in trap_fatal (frame=0xc33b2a90, eva=0)
    at ../../i386/i386/trap.c:942
942			panic(trap_msg[type]);
(kgdb) up
#10 0xc01e5d77 in trap_pfault (frame=0xc33b2a90, usermode=0, eva=0)
    at ../../i386/i386/trap.c:835
835			trap_fatal(frame, eva);
(kgdb) up
#11 0xc01e59ee in trap (frame={tf_es = 16, tf_ds = 16, tf_edi = -1063922432, 
      tf_esi = 0, tf_ebp = -1019532588, tf_isp = -1019532616, 
      tf_ebx = -1053129064, tf_edx = -1019532548, tf_ecx = 34, tf_eax = 0, 
      tf_trapno = 12, tf_err = 0, tf_eip = 0, tf_cs = 8, tf_eflags = 66071, 
      tf_esp = -1072316847, tf_ss = -1019532548}) at ../../i386/i386/trap.c:437
437				(void) trap_pfault(&frame, FALSE, eva);
(kgdb) up
#12 0x0 in ?? ()
(kgdb) up
Initial frame selected; you cannot go up.
(kgdb) list
432	#endif
433			/* kernel trap */
434	
435			switch (type) {
436			case T_PAGEFLT:			/* page fault */
437				(void) trap_pfault(&frame, FALSE, eva);
438				return;
439	
440			case T_DNA:
441	#if NNPX > 0
(kgdb) quit

System details:
Hardware: IBM PC340 (Intel p100/32mbRAM/3G IDE HDD) 
Also Netgear FA310TX(pn0) and NE2000(ed0), and IBM Auto 16/4 Token Ring card
(not used/probed etc)
Software: 3.4-STABLE, kernel based on GENERIC with everything but pn0 and ed0 removed and BRIDGE+SOFTUPDATES added (see included file at end)
Also, I patched the pn0 driver to support bridging, however the problem appears
to exist whether I apply this patch or not.  So far the bridging seems to work as I would expect it to, but I may be missing something here too...

Dmesg output and kernel config are included below, along with the patch for pn0.

*** start
--- if_pn.c.original	Thu Jan 13 22:41:52 2000
+++ if_pn.c	Thu Jan 13 22:43:12 2000
@@ -77,6 +77,11 @@
 #include <net/bpf.h>
 #endif
 
+#include "opt_bdg.h"
+#ifdef BRIDGE
+#include <net/bridge.h>
+#endif
+
 #include <vm/vm.h>              /* for vtophys */
 #include <vm/pmap.h>            /* for vtophys */
 #include <machine/clock.h>      /* for DELAY */
@@ -1586,6 +1591,24 @@
 			}
 		}
 #endif
+
+#ifdef BRIDGE
+
+ 	/* Copied from if_xl.c and placed in about the same spot */
+
+ 	if (do_bridge) {
+ 		struct ifnet *bdg_ifp;
+ 		bdg_ifp = bridge_in(m);
+ 		if (bdg_ifp != BDG_LOCAL && bdg_ifp != BDG_DROP)
+			bdg_forward(&m, bdg_ifp);
+		if (((bdg_ifp != BDG_LOCAL) && (bdg_ifp != BDG_BCAST) &&
+ 			(bdg_ifp != BDG_MCAST)) || bdg_ifp == BDG_DROP) {
+			m_freem(m);
+ 			continue;
+		}
+	}
+#endif
+
 		/* Remove header from mbuf and pass it on. */
 		m_adj(m, sizeof(struct ether_header));
 		ether_input(ifp, eh, m);
*** end

dmesg output:

Copyright (c) 1992-1999 FreeBSD Inc.
Copyright (c) 1982, 1986, 1989, 1991, 1993
	The Regents of the University of California. All rights reserved.
FreeBSD 3.4-STABLE #0: Thu Jan 13 21:18:29 EST 2000
    tony@random.n2-au:/usr1/src/sys/compile/RAND
Timecounter "i8254"  frequency 1193182 Hz
CPU: Pentium/P54C (99.47-MHz 586-class CPU)
  Origin = "GenuineIntel"  Id = 0x525  Stepping = 5
  Features=0x1bf<FPU,VME,DE,PSE,TSC,MSR,MCE,CX8>
real memory  = 33554432 (32768K bytes)
avail memory = 30281728 (29572K bytes)
Preloaded elf kernel "kernel" at 0xc0278000.
Probing for devices on PCI bus 0:
chip0: <Host to PCI bridge (vendor=1039 device=5511)> rev 0x00 on pci0.0.0
chip1: <SiS 85c503> rev 0x01 on pci0.1.0
ide_pci0: <PCI IDE controller (busmaster capable)> rev 0x08 int a irq 0 on pci0.1.1
pn0: <82c169 PNIC 10/100BaseTX> rev 0x21 int a irq 10 on pci0.14.0
pn0: Ethernet address: 00:a0:cc:3c:d1:bf
pn0: autoneg complete, link status good (half-duplex, 10Mbps)
vga0: <Cirrus Logic GD5436 SVGA controller> rev 0x00 on pci0.20.0
Probing for devices on the ISA bus:
sc0 on isa
sc0: VGA color <16 virtual consoles, flags=0x0>
ed0 at 0x340-0x35f irq 5 on isa
ed0: address 00:40:c7:11:c5:62, type NE2000 (16 bit) 
atkbdc0 at 0x60-0x6f on motherboard
atkbd0 irq 1 on isa
psm0 not found
sio0 at 0x3f8-0x3ff irq 4 flags 0x10 on isa
sio0: type 16550A
sio1 at 0x2f8-0x2ff irq 3 on isa
sio1: type 16550A
wdc0 at 0x1f0-0x1f7 irq 14 flags 0xa0ffa0ff on isa
wdc0: unit 0 (wd0): <FUJITSU MPC3032AT>, 32-bit, multi-block-16
wd0: 3093MB (6335280 sectors), 6704 cyls, 15 heads, 63 S/T, 512 B/S
ppc0 at 0x378 irq 7 flags 0x40 on isa
ppc0: SMC-like chipset (ECP/EPP/PS2/NIBBLE) in COMPATIBLE mode
ppc0: FIFO with 16/16/15 bytes threshold
lpt0: <generic printer> on ppbus 0
lpt0: Interrupt-driven port
ppi0: <generic parallel i/o> on ppbus 0
plip0: <PLIP network interface> on ppbus 0
vga0 at 0x3b0-0x3df maddr 0xa0000 msize 131072 on isa
npx0 on motherboard
npx0: INT 16 interface
apm0 flags 0x31 on isa
apm: found APM BIOS version 1.2
Intel Pentium detected, installing workaround for F00F bug
BRIDGE 990810, have 6 interfaces
-- index 1  type 6 phy 0 addrl 6 addr 00.a0.cc.3c.d1.bf
-- index 2  type 6 phy 0 addrl 6 addr 00.40.c7.11.c5.62
changing root device to wd0s1a
WARNING: / was not properly dismounted

Kernel Config:

machine		"i386"
cpu		"I486_CPU"
cpu		"I586_CPU"
ident		RAND
maxusers	32

makeoptions 	DEBUG="-g"

options 	SOFTUPDATES
options 	INET			#InterNETworking
options 	FFS			#Berkeley Fast Filesystem
options 	FFS_ROOT		#FFS usable as root device [keep this!]
options 	NFS			#Network Filesystem
options 	PROCFS			#Process filesystem
options 	"COMPAT_43"		#Compatible with BSD 4.3 [KEEP THIS!]
options 	UCONSOLE		#Allow users to grab the console
options 	FAILSAFE		#Be conservative
options 	USERCONFIG		#boot -c editor
options 	KTRACE			#ktrace(1) syscall trace support
options 	SYSVSHM			#SYSV-style shared memory
options 	SYSVMSG			#SYSV-style message queues
options 	SYSVSEM			#SYSV-style semaphores

config		kernel	root on wd0

controller	isa0
controller	pci0

# IDE controller and disks
controller	wdc0	at isa? flags 0xa0ffa0ff port "IO_WD1" bio irq 14
disk		wd0	at wdc0 drive 0

# atkbdc0 controls both the keyboard and the PS/2 mouse
controller	atkbdc0	at isa? port IO_KBD tty
device		atkbd0	at isa? tty irq 1
device		psm0	at isa? tty irq 12

device		vga0	at isa? port ? conflicts

# splash screen/screen saver
pseudo-device	splash

# syscons is the default console driver, resembling an SCO console
device		sc0	at isa? tty

# Floating point support - do not disable.
device		npx0	at isa? port IO_NPX irq 13

# Power management support (see LINT for more options)
device		apm0    at isa?	flags 0x31 # Advanced Power Management

# Serial (COM) ports
device		sio0	at isa? port "IO_COM1" flags 0x10 tty irq 4
device		sio1	at isa? port "IO_COM2" tty irq 3

# Parallel port
device		ppc0	at isa? port? flags 0x40 net irq 7
controller	ppbus0			# Parallel port bus (required)
device		lpt0	at ppbus?	# Printer
device		plip0	at ppbus?	# TCP/IP over parallel
device		ppi0	at ppbus?	# Parallel port interface device

options 	BRIDGE

# ISA Ethernet NICs.
device		ed0	at isa? port 0x340 net irq 5 iomem 0xd8000

# PCI Ethernet NICs.
device 	pn0

# Pseudo devices - the number indicates how many units to allocated.
pseudo-device	loop		# Network loopback
pseudo-device	ether		# Ethernet support
pseudo-device	tun	2	# User-PPP
pseudo-device	pty	16	# Pseudo-ttys (telnet etc)
pseudo-device	gzip		# Exec gzipped a.out's
pseudo-device 	vn

pseudo-device	bpfilter 4	#Berkeley packet filter

-- 
 Tony Frank                                               _ __  ___ ___ ___
 tfrank@eric.net.au                                   _ __ ___ | _ ) __|   \
                      http://www.freebsd.org/    _ __ ___ ____ | _ \__ \ |) |
 FreeBSD: The Power to Serve!              _ __ ___ ____ _____ |___/___/___/



To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-questions" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20000113233522.A1035>