Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 8 Feb 1996 19:16:26 -0500
From:      Esa Ahola <esa@mindspring.com>
To:        FreeBSD-gnats-submit@freebsd.org
Subject:   kern/1008: Daily crash while writing network backups to local tape 
Message-ID:  <199602090016.TAA14805@firebrick.mindspring.com>
Resent-Message-ID: <199602090020.QAA11743@freefall.freebsd.org>

next in thread | raw e-mail | index | archive | help

>Number:         1008
>Category:       kern
>Synopsis:       Daily crash while writing network backups to local tape
>Confidential:   no
>Severity:       critical
>Priority:       high
>Responsible:    freebsd-bugs
>State:          open
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Thu Feb  8 16:20:01 PST 1996
>Last-Modified:
>Originator:     Esa Ahola <esa@mindspring.com>
>Organization:
MindSpring Enterprises, Inc.
>Release:        FreeBSD 2.1-STABLE i386 (sup date 1/29)
>Environment:

	Differences from 2.1R:

	--- GENERIC	Wed Oct 25 13:29:51 1995
	+++ FIREBRICK	Thu Feb  8 17:31:53 1996
	@@ -11,2 +11,2 @@
	-ident		GENERIC
	-maxusers	10
	+ident		FIREBRICK
	+maxusers	254
	@@ -22 +22 @@
	-options		"SCSI_DELAY=15"		#Be pessimistic about Joe SCSI device
	+options		"SCSI_DELAY=5"		#Be pessimistic about Joe SCSI device
	@@ -24,0 +25,6 @@
	+options		COMCONSOLE		#prefer serial console to video console
	+options		KTRACE			#kernel tracing
	+options		"CHILD_MAX=256"
	+options		"OPEN_MAX=512"
	+options 	"MAXMEM=131072"         
	+options 	"NMBCLUSTERS=4096"
	@@ -30 +36 @@
	-config		kernel	root on wd0 
	+config		kernel	root on sd0 
	@@ -40,7 +46,7 @@
	-controller	wdc0	at isa? port "IO_WD1" bio irq 14 vector wdintr
	-disk		wd0	at wdc0 drive 0
	-disk		wd1	at wdc0 drive 1
	-
	-controller	wdc1	at isa? port "IO_WD2" bio irq 15 vector wdintr
	-disk		wd2	at wdc1 drive 0
	-disk		wd3	at wdc1 drive 1
	+# controller	wdc0	at isa? port "IO_WD1" bio irq 14 vector wdintr
	+# disk		wd0	at wdc0 drive 0
	+# disk		wd1	at wdc0 drive 1
	+
	+# controller	wdc1	at isa? port "IO_WD2" bio irq 15 vector wdintr
	+# disk		wd2	at wdc1 drive 0
	+# disk		wd3	at wdc1 drive 1
	@@ -122 +128 @@
	-pseudo-device	pty	16
	+pseudo-device	pty	64
	@@ -123,0 +130 @@
	+pseudo-device	bpfilter 4

	Hardware:
	- ASUS P55TP4XE P133
	- ASUS SC-2000 SCSI (2)
	- ZNYX fast ethernet

	dmesg:

	FreeBSD 2.1-STABLE #0: Wed Jan 31 01:49:20 EST 1996
	    root@firebrick.mindspring.com:/usr/src-stable/sys/compile/FIREBRICK
	CPU: 133-MHz Pentium 735\\90 or 815\\100 (Pentium-class CPU)
	  Origin = "GenuineIntel"  Id = 0x52b  Stepping=11
	  Features=0x1bf<FPU,VME,PSE,MCE,CX8,APIC>
	real memory  = 134217728 (131072K bytes)
	avail memory = 127660032 (124668K bytes)
	Probing for devices on PCI bus 0:
	chip0 <Intel 82437 (Triton)> rev 2 on pci0:0
	chip1 <Intel 82371 (Triton)> rev 2 on pci0:7
	ncr0 <ncr 53c810 scsi> rev 2 int a irq 12 on pci0:10
	ncr0 waiting for scsi devices to settle
	(ncr0:0:0): "SEAGATE ST32550N 0014" type 0 fixed SCSI 2
	sd0(ncr0:0:0): Direct-Access 
	sd0(ncr0:0:0): FAST SCSI-2 100ns (10 Mb/sec) offset 8.
	2047MB (4194058 512 byte sectors)
	(ncr0:1:0): "SEAGATE ST32550N 0014" type 0 fixed SCSI 2
	sd1(ncr0:1:0): Direct-Access 
	sd1(ncr0:1:0): FAST SCSI-2 100ns (10 Mb/sec) offset 8.
	2047MB (4194058 512 byte sectors)
	(ncr0:4:0): "DEC DLT2000 8202" type 1 removable SCSI 2
	st0(ncr0:4:0): Sequential-Access 
	st0(ncr0:4:0): 200ns (5 Mb/sec) offset 8.
	density code 0x19,  drive empty
	ncr1 <ncr 53c810 scsi> rev 2 int a irq 10 on pci0:11
	ncr1 waiting for scsi devices to settle
	(ncr1:0:0): "SEAGATE ST15150N 0020" type 0 fixed SCSI 2
	sd2(ncr1:0:0): Direct-Access 
	sd2(ncr1:0:0): FAST SCSI-2 100ns (10 Mb/sec) offset 8.
	4095MB (8388315 512 byte sectors)
	(ncr1:1:0): "SEAGATE ST15150N 0020" type 0 fixed SCSI 2
	sd3(ncr1:1:0): Direct-Access 
	sd3(ncr1:1:0): FAST SCSI-2 100ns (10 Mb/sec) offset 8.
	4095MB (8388315 512 byte sectors)
	(ncr1:2:0): "SEAGATE ST15150N 0020" type 0 fixed SCSI 2
	sd4(ncr1:2:0): Direct-Access 
	sd4(ncr1:2:0): FAST SCSI-2 100ns (10 Mb/sec) offset 8.
	4095MB (8388315 512 byte sectors)
	(ncr1:3:0): "SEAGATE ST15150N 0020" type 0 fixed SCSI 2
	sd5(ncr1:3:0): Direct-Access 
	sd5(ncr1:3:0): FAST SCSI-2 100ns (10 Mb/sec) offset 8.
	4095MB (8388315 512 byte sectors)
	de0 <Digital DC21140 Fast Ethernet> rev 17 int a irq 11 on pci0:12
	de0: ZNYX ZX34X DC21140 [10-100Mb/s] pass 1.1 Ethernet address 00:c0:95:f8:05:d8
	de0: enabling 100baseTX UTP port
	Probing for devices on the ISA bus:
	scprobe: keyboard RESET failed fe
	sc0 at 0x60-0x6f irq 1 on motherboard
	sc0: VGA color <16 virtual consoles, flags=0x0>
	ed0 not found at 0x280
	ed1 not found at 0x300
	sio0 at 0x3f8-0x3ff irq 4 on isa
	sio0: type 16550A
	sio1 at 0x2f8-0x2ff irq 3 on isa
	sio1: type 16550A
	sio2 not found at 0x3e8
	sio3 not found at 0x2e8
	lpt0 at 0x378-0x37f irq 7 on isa
	lpt0: Interrupt-driven port
	lp0: TCP/IP capable interface
	lpt1 not found at 0xffffffff
	lpt2 not found at 0xffffffff
	mse0: wrong signature ff
	mse0 not found at 0x23c
	fdc0 at 0x3f0-0x3f7 irq 6 drq 2 on isa
	fdc0: NEC 72065B
	fd0: 1.44MB 3.5in
	uha0 not found at 0x330
	aha0 not found at 0x330
	aic0 not found at 0x340
	nca0 not found at 0x1f88
	nca1 not found at 0x350
	sea0 not found
	wt0 not found at 0x300
	mcd0: timeout getting status
	mcd0 not found at 0x300
	mcd1: timeout getting status
	mcd1 not found at 0x340
	matcdc0 not found at 0x230
	scd0 not found at 0x230
	ie0 not found at 0x360
	ep0 not found at 0x300
	ix0 not found at 0x300
	le0: no board found at 0x300
	le0 not found at 0x300
	lnc0 not found at 0x280
	lnc1 not found at 0x300
	ze0 not found at 0x300
	zp0 not found at 0x300
	npx0 on motherboard
	npx0: INT 16 interface

>Description:

	A lightly-loaded news server (but full newsfeed) crashes
	most nights while running backups from remote machines to
	local DLT tape drive.  Crashes have only occurred while
	doing I/O to tape.  Newsfeed activity or newsreader load
	doesn't seem to matter.

	crash.4:
	IdlePTD 208000
	current pcb at 1f58c0
	panic: page fault
	#0  boot (howto=256) at ../../i386/i386/machdep.c:894
	894                                     dumppcb.pcb_ptd = rcr3();
	(kgdb) bt
	#0  boot (howto=256) at ../../i386/i386/machdep.c:894
	#1  0xf01134c3 in panic (fmt=0xf01a2ecc "page fault")
	    at ../../kern/subr_prf.c:124
	#2  0xf01a39ce in trap_fatal (frame=0xefbffd04) at ../../i386/i386/trap.c:746
	#3  0xf01a3540 in trap_pfault (frame=0xefbffd04, usermode=0)
	    at ../../i386/i386/trap.c:668
	#4  0xf01a31df in trap (frame={tf_es = 16, tf_ds = 16, tf_edi = -230151168, 
	      tf_esi = -221137920, tf_ebp = -272630412, tf_isp = -266940001, 
	      tf_ebx = -1073610748, tf_edx = -266940004, tf_ecx = -1073543014, 
	      tf_eax = 1, tf_trapno = 12, tf_err = 2, tf_eip = -266940001, tf_cs = 8, 
	      tf_eflags = 66178, tf_esp = -267151551, tf_ss = -230151168})
	    at ../../i386/i386/trap.c:308
	#5  0xf019937d in calltrap ()
	#6  0xf016d19f in tulip_addr_filter (sc=0xf2482c00) at ../../pci/if_de.c:1847
	#7  0xf01455d6 in ip_output (m0=0xf2d1b400, opt=0x0, ro=0xf2d5e9ac, flags=0, 
	    imo=0x0) at ../../netinet/ip_output.c:324
	#8  0xf01494ee in tcp_output (tp=0xf2abe400) at ../../netinet/tcp_output.c:668
	#9  0xf014a2e2 in tcp_usrreq (so=0xf2a1ba00, req=8, m=0x0, nam=0x0, 
	    control=0x0) at ../../netinet/tcp_usrreq.c:272
	#10 0xf01207c7 in soreceive (so=0xf2a1ba00, paddr=0x0, uio=0xefbfff2c, 
	    mp0=0x0, controlp=0x0, flagsp=0x0) at ../../kern/uipc_socket.c:786
	#11 0xf01158a9 in soo_read (fp=0xf2e29880, uio=0xefbfff2c, cred=0xf1c75000)
	    at ../../kern/sys_socket.c:63
	#12 0xf01146e7 in read (p=0xf2a74200, uap=0xefbfff94, retval=0xefbfff8c)
	    at ../../kern/sys_generic.c:112
	#13 0xf01a3c9b in syscall (frame={tf_es = 39, tf_ds = 39, tf_edi = 9, 
	      tf_esi = -272639912, tf_ebp = -272640008, tf_isp = -272629788, 
	      tf_ebx = 5, tf_edx = 0, tf_ecx = 5, tf_eax = 3, tf_trapno = 514, 
	      tf_err = 514, tf_eip = 134865573, tf_cs = 31, tf_eflags = 514, 
	      tf_esp = -272656416, tf_ss = 39}) at ../../i386/i386/trap.c:906
	#14 0xf01993cb in Xsyscall ()
	#15 0x9aa3 in ?? ()
	#16 0x4139 in ?? ()
	#17 0x3e9b in ?? ()
	#18 0x3896 in ?? ()
	#19 0x3142 in ?? ()
	#20 0x2d22 in ?? ()
	#21 0x10d3 in ?? ()


	crash.5:
	Copyright 1994 Free Software Foundation, Inc...
	IdlePTD 208000
	current pcb at 1f58c0
	panic: page fault
	#0  boot (howto=256) at ../../i386/i386/machdep.c:894
	894                                     dumppcb.pcb_ptd = rcr3();
	(kgdb) bt
	#0  boot (howto=256) at ../../i386/i386/machdep.c:894
	#1  0xf01134c3 in panic (fmt=0xf01a2ecc "page fault")
	    at ../../kern/subr_prf.c:124
	#2  0xf01a39ce in trap_fatal (frame=0xefbffcd8) at ../../i386/i386/trap.c:746
	#3  0xf01a3540 in trap_pfault (frame=0xefbffcd8, usermode=0)
	    at ../../i386/i386/trap.c:668
	#4  0xf01a31df in trap (frame={tf_es = 16, tf_ds = 16, tf_edi = -230151168, 
	      tf_esi = -238595840, tf_ebp = -272630488, tf_isp = 1963065219, 
	      tf_ebx = -230151168, tf_edx = -266932336, tf_ecx = 1017, 
	      tf_eax = 1963065219, tf_trapno = 12, tf_err = 0, tf_eip = 1963065219, 
	      tf_cs = 8, tf_eflags = 66050, tf_esp = -266902197, tf_ss = -230151168})
	    at ../../i386/i386/trap.c:308
	#5  0xf019937d in calltrap ()
	#6  0x7501ff83 in ?? ()
	#7  0xf016f2c6 in ncr_complete (np=0xf2482c00, cp=0xf2d35680)
	    at ../../pci/ncr.c:4317
	#8  0xf01455d6 in ip_output (m0=0xf2d35680, opt=0x0, ro=0xf28ff82c, flags=0, 
	    imo=0x0) at ../../netinet/ip_output.c:324
	#9  0xf01494ee in tcp_output (tp=0xf289c100) at ../../netinet/tcp_output.c:668
	#10 0xf014a2e2 in tcp_usrreq (so=0xf2884300, req=8, m=0x0, nam=0x0, 
	    control=0x0) at ../../netinet/tcp_usrreq.c:272
	#11 0xf01207c7 in soreceive (so=0xf2884300, paddr=0x0, uio=0xefbfff2c, 
	    mp0=0x0, controlp=0x0, flagsp=0x0) at ../../kern/uipc_socket.c:786
	#12 0xf01158a9 in soo_read (fp=0xf2e12dc0, uio=0xefbfff2c, cred=0xf1c75000)
	    at ../../kern/sys_socket.c:63
	#13 0xf01146e7 in read (p=0xf2b25000, uap=0xefbfff94, retval=0xefbfff8c)
	    at ../../kern/sys_generic.c:112
	#14 0xf01a3c9b in syscall (frame={tf_es = 39, tf_ds = 39, tf_edi = 9, 
	      tf_esi = -272639912, tf_ebp = -272640008, tf_isp = -272629788, 
	      tf_ebx = 5, tf_edx = 0, tf_ecx = 5, tf_eax = 3, tf_trapno = 514, 
	      tf_err = 514, tf_eip = 134865573, tf_cs = 31, tf_eflags = 514, 
	      tf_esp = -272656416, tf_ss = 39}) at ../../i386/i386/trap.c:906
	#15 0xf01993cb in Xsyscall ()
	#16 0x9aa3 in ?? ()
	#17 0x4139 in ?? ()
	#18 0x3e9b in ?? ()
	#19 0x3896 in ?? ()
	#20 0x3142 in ?? ()
	#21 0x2d22 in ?? ()
	#22 0x10d3 in ?? ()

	A colleague commented:

	> Notice that both have this in common:

	> #7  0xf01455d6 in ip_output (m0=0xf2d1b400, opt=0x0, ro=0xf2d5e9ac, flags=0, 
	>     imo=0x0) at ../../netinet/ip_output.c:324

	> Both are followed by a completely unrelated procedure call
	> (ncr_complete and tulip_addr_filter).  Either they're interrupt
	> handlers or random jumps.  tulip_addr_filter is called in two places:
	> to add or delete multicast addresses to the DEC21140 board's filter
	> list, and when resetting the board (which happens when initializing
	> it, when recovering from certain errors in the interrupt handler, when
	> changing the physical port, etc.).  I would expect to see a function
	> on the call stack before it though, because tulip_addr_filter doesn't
	> seem to be an interrupt handler itself.

	> Here's the relevant part of ip_output.c:

	> sendit: 
	> 	/*
	> 	 * If small enough for interface, can just send directly.
	> 	 */
	> 	if ((u_short)ip->ip_len <= ifp->if_mtu) {
	> 		ip->ip_len = htons((u_short)ip->ip_len);
	> 		ip->ip_off = htons((u_short)ip->ip_off);
	> 		ip->ip_sum = 0; 
	> 		ip->ip_sum = in_cksum(m, hlen);
	> 324:            error = (*ifp->if_output)(ifp, m,
	> 				(struct sockaddr *)dst, ro->ro_rt);
	> 		goto done;
	> 	}

	> Perhaps ifp->if_output has been corrupted somehow?


>How-To-Repeat:

	Run backups. :-\

>Fix:
	
	None.

>Audit-Trail:
>Unformatted:



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199602090016.TAA14805>