Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 3 Mar 2002 12:01:21 +0700
From:      Eugene Grosbein <eugen@www.svzserv.kemerovo.su>
To:        stable@freebsd.org
Subject:   4.5-STABLE softupdates brokeness: repeated panics and lockups
Message-ID:  <20020303120121.A2197@svzserv.kemerovo.su>

next in thread | raw e-mail | index | archive | help
Urgent! Please help!

My quite old 4.5-STABLE system suffered from hanging network connections.
Turning off syncookies helped but I've read this has already been fixed
in -STABLE so 1 March 2002 I ran cvsup and rebuilt kernel and world
as usual. Now I state that softupdates code is BROKEN for me.

That night my server crashed hard. I have options DDB and DDB_UNATTENDED,
my kernel is build with debugging symbols and I have savecore enabled
in /etc/rc.conf and have enough swap space and disk space in /var
so server should leave core and restart after panic. It failed to do that.
Usually I lock the console with vlock and this prevented me to escape
to DDB, I was forced to turn power off and on next morning.
Nothing suspictious in logs besides this:

Mar  1 22:35:38 <kern.crit> www /kernel: z_decompress0: inflate returned -2 ()

That was the last record before crash.

So I left console unlocked 2 March and today it crashed again.
Well, that was kernel panic and system locked after 'syncing disks...' message,
no one character printed after '...'. The panic reason was
'panic: softdep_setup_allocdirect: lost block'.

It was possible to escape to DDB and say 'trace' and 'panic', so
I have got crashdump. The last message in log again was:

Mar  3 09:57:33 <kern.crit> www /kernel: z_decompress0: inflate returned -2 ()

After reboot I started to investigate and suddenly it crashed again!
And the last message in log again was:

Mar  3 10:33:51 <kern.crit> www /kernel: z_decompress0: inflate returned -2 ()

Uptime was only half an hour, eh?

So I decided to turn softupdates off with tunefs on all of my filesystems.
The root filesystem had softpupdates already turned off.

Here are some details from gdb:

Script started on Sun Mar  3 11:04:49 2002
GNU gdb 4.18
Copyright 1998 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i386-unknown-freebsd"...
IdlePTD at phsyical address 0x00382000
initial pcb at physical address 0x002e8c60
panicstr: from debugger
panic messages:
---
panic: softdep_setup_allocdirect: lost block

syncing disks... panic: from debugger
Uptime: 22h19m56s

dumping to dev #ad/0x20001, offset 2560
dump ata0: resetting devices .. done
254 253 252 251 250 249 248 247 246 245 244 243 242 241 240 239 238 237 236 235 234 233 232 231 230 229 228 227 226 225 224 223 222 221 220 219 218 217 216 215 214 213 212 211 210 209 208 207 206 205 204 203 202 201 200 199 198 197 196 195 194 193 192 191 190 189 188 187 186 185 184 183 182 181 180 179 178 177 176 175 174 173 172 171 170 169 168 167 166 165 164 163 162 161 160 159 158 157 156 155 154 153 152 151 150 149 148 147 146 145 144 143 142 141 140 139 138 137 136 135 134 133 132 131 130 129 128 127 126 125 124 123 122 121 120 119 118 117 116 115 114 113 112 111 110 109 108 107 106 105 104 103 102 101 100 99 98 97 96 95 94 93 92 91 90 89 88 87 86 85 84 83 82 81 80 79 78 77 76 75 74 73 72 71 70 69 68 67 66 65 64 63 62 61 60 59 58 57 56 55 54 53 52 51 50 49 48 47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 
---
#0  dumpsys () at /home3/src/sys/kern/kern_shutdown.c:487
487		if (dumping++) {
(kgdb) where
#0  dumpsys () at /home3/src/sys/kern/kern_shutdown.c:487
#1  0xc0149cfc in boot (howto=260) at /home3/src/sys/kern/kern_shutdown.c:316
#2  0xc014a149 in panic (fmt=0xc027dfc4 "from debugger")
    at /home3/src/sys/kern/kern_shutdown.c:595
#3  0xc0121379 in db_panic (addr=-1071285907, have_addr=0, count=-1, 
    modif=0xcef857d0 "") at /home3/src/sys/ddb/db_command.c:435
#4  0xc0121317 in db_command (last_cmdp=0xc02af1b4, cmd_table=0xc02aeff4, 
    aux_cmd_tablep=0xc02e3c78) at /home3/src/sys/ddb/db_command.c:333
#5  0xc01213de in db_command_loop () at /home3/src/sys/ddb/db_command.c:457
#6  0xc012358f in db_trap (type=3, code=0) at /home3/src/sys/ddb/db_trap.c:71
#7  0xc025770c in kdb_trap (type=3, code=0, regs=0xcef858d8)
    at /home3/src/sys/i386/i386/db_interface.c:158
#8  0xc0264c08 in trap (frame={tf_fs = 16, tf_es = 16, tf_ds = -822607856, 
      tf_edi = 0, tf_esi = -1070620608, tf_ebp = -822585056, 
      tf_isp = -822585084, tf_ebx = 134, tf_edx = -1070974097, tf_ecx = 32, 
      tf_eax = 38, tf_trapno = 3, tf_err = 0, tf_eip = -1071285907, tf_cs = 8, 
      tf_eflags = 582, tf_esp = -1070974113, tf_ss = -1070986775})
    at /home3/src/sys/i386/i386/trap.c:584
#9  0xc025796d in Debugger (msg=0xc02a09e9 "manual escape to debugger")
    at machine/cpufunc.h:67
#10 0xc025486a in scgetc (sc=0xc02ffca0, flags=2)
    at /home3/src/sys/dev/syscons/syscons.c:3148
#11 0xc0250fa5 in sckbdevent (thiskbd=0xc02f8740, event=0, arg=0xc02ffca0)
    at /home3/src/sys/dev/syscons/syscons.c:616
---Type <return> to continue, or q <return> to quit---
#12 0xc0248567 in atkbd_intr (kbd=0xc02f8740, arg=0x0)
    at /home3/src/sys/dev/kbd/atkbd.c:462
#13 0xc0277c24 in atkbd_isa_intr (arg=0xc02f8740)
    at /home3/src/sys/isa/atkbd_isa.c:140
#14 0xc0147fe3 in add_interrupt_randomness (vsc=0xc02ff2ac)
    at /home3/src/sys/kern/kern_random.c:247
#15 0xc01f2798 in softdep_disk_write_complete (bp=0xc75b3c38)
    at /home3/src/sys/ufs/ffs/ffs_softdep.c:3248
#16 0xc01701f2 in vfs_backgroundwritedone (bp=0xc75b3c38)
    at /home3/src/sys/kern/vfs_bio.c:742
#17 0xc01725c4 in biodone (bp=0xc75b3c38) at /home3/src/sys/kern/vfs_bio.c:2701
#18 0xc023e522 in ad_interrupt (request=0xc2892200)
    at /home3/src/sys/dev/ata/ata-disk.c:703
#19 0xc0238aff in ata_intr (data=0xc229cb80)
    at /home3/src/sys/dev/ata/ata-all.c:1231
#20 0xc0147fe3 in add_interrupt_randomness (vsc=0xc02ff348)
    at /home3/src/sys/kern/kern_random.c:247
#21 0xc0259ab2 in vec14 ()
#22 0xc01ef086 in interlocked_sleep (lk=0xc02bfe7c, op=1, ident=0xce1a6884, 
    flags=17, wmesg=0xc029301f "drainvp", timo=0)
    at /home3/src/sys/ufs/ffs/ffs_softdep.c:329
#23 0xc01f4a4e in drain_output (vp=0xce1a6840, islocked=1)
    at /home3/src/sys/ufs/ffs/ffs_softdep.c:4913
#24 0xc01f3812 in softdep_fsync_mountdev (vp=0xce1a6840)
---Type <return> to continue, or q <return> to quit---
    at /home3/src/sys/ufs/ffs/ffs_softdep.c:4056
#25 0xc01f7b7a in ffs_fsync (ap=0xcef85c04)
    at /home3/src/sys/ufs/ffs/ffs_vnops.c:134
#26 0xc01f67cc in ffs_sync (mp=0xc234dc00, waitfor=2, cred=0xc0a78680, 
    p=0xc03003a0) at vnode_if.h:558
#27 0xc017aa47 in sync (p=0xc03003a0, uap=0x0)
    at /home3/src/sys/kern/vfs_syscalls.c:554
#28 0xc0149ab7 in boot (howto=256) at /home3/src/sys/kern/kern_shutdown.c:235
#29 0xc014a149 in panic (
    fmt=0xc0291d60 "softdep_setup_allocdirect: lost block")
    at /home3/src/sys/kern/kern_shutdown.c:595
#30 0xc01f0150 in softdep_setup_allocdirect (ip=0xc2a21900, lbn=0, 
    newblkno=398160, oldblkno=394920, newsize=8192, oldsize=8192, 
    bp=0xc758021c) at /home3/src/sys/ufs/ffs/ffs_softdep.c:1326
#31 0xc01eb0b3 in ffs_reallocblks (ap=0xcef85dd0)
    at /home3/src/sys/ufs/ffs/ffs_alloc.c:476
#32 0xc0174992 in cluster_write (bp=0xc758e7a8, filesize=65536, seqcount=10)
    at vnode_if.h:1077
#33 0xc01f765f in ffs_write (ap=0xcef85e74)
    at /home3/src/sys/ufs/ufs/ufs_readwrite.c:537
#34 0xc017f972 in vn_write (fp=0xc2a31a40, uio=0xcef85ee0, cred=0xc2df7b80, 
    flags=0, p=0xced96040) at vnode_if.h:363
#35 0xc015908e in dofilewrite (p=0xced96040, fp=0xc2a31a40, fd=4, 
    buf=0x8058000, nbyte=8192, offset=-1, flags=0)
---Type <return> to continue, or q <return> to quit---
    at /home3/src/sys/sys/file.h:162
#36 0xc0158f3f in write (p=0xced96040, uap=0xcef85f80)
    at /home3/src/sys/kern/sys_generic.c:329
#37 0xc0265551 in syscall2 (frame={tf_fs = -1072431057, tf_es = -1070727121, 
      tf_ds = -1070727121, tf_edi = 134578176, tf_esi = 403821508, 
      tf_ebp = -1077947140, tf_isp = -822583340, tf_ebx = 403764804, 
      tf_edx = 403821508, tf_ecx = 403821508, tf_eax = 4, tf_trapno = 7, 
      tf_err = 2, tf_eip = 403517864, tf_cs = 31, tf_eflags = 514, 
      tf_esp = -1077947164, tf_ss = 47})
    at /home3/src/sys/i386/i386/trap.c:1167
#38 0xc0258615 in Xint0x80_syscall ()
#39 0x18104bd9 in ?? ()
#40 0x18104b56 in ?? ()
#41 0x18101946 in ?? ()
#42 0x180eb05a in ?? ()
#43 0x804a67a in ?? ()
#44 0x804affc in ?? ()
#45 0x804bf4e in ?? ()
#46 0x804d7a3 in ?? ()
#47 0x80499f5 in ?? ()
(kgdb) quit

Script done on Sun Mar  3 11:07:41 2002

Again, I have my 256M crashdump and will answer to any questions but
I cannot investigate this more deeply myself, I'm not a kernel hacker.

Here are my disks:

/dev/ad0s1a                          49583    35145    10472    77%    /
/dev/ad0s1g                         992239   290185   622675    32%    /home
/dev/ad0s1h                        2822646  2072484   524351    80%    /home2
/dev/ad0s1e                        1488663  1195741   173829    87%    /usr
/dev/ad0s1f                         496111   361090    95333    79%    /var
/dev/ad1s1e                        9880414  5920704  3169278    65%    /home4
/dev/ad2s1e                        9807006  8191954   830492    91%    /home3

Here is my /etc/sysctl.conf:

kern.ipc.somaxconn=1024
kern.maxfiles=10000
net.inet.ip.portrange.hifirst=49152
net.inet.ip.portrange.hilast=49600
net.inet.tcp.always_keepalive=1
net.inet.tcp.sendspace=32768
net.inet.tcp.recvspace=32768
net.inet.tcp.rfc1644=1
vfs.vmiodirenable=1

I have CPUTYPE=i686 in /etc/make.conf and no other optimizations.

At last, here is my kernel config:

# WWW kernel config
# 2 Nov 2001

machine		i386
#cpu		I386_CPU
#cpu		I486_CPU
cpu		I586_CPU
cpu		I686_CPU
ident		WWW
maxusers	128
options		MAXDSIZ=(256*1024*1024)
options		DFLDSIZ=(256*1024*1024)

makeoptions	DEBUG=-g		#Build kernel with gdb(1) debug symbols

#options 	MATH_EMULATE		#Support for x87 emulation
options		CLK_CALIBRATION_LOOP
options		CLK_USE_I8254_CALIBRATION
options		CLK_USE_TSC_CALIBRATION

options 	INET			#InterNETworking
#options 	INET6			#IPv6 communications protocols
options 	FFS			#Berkeley Fast Filesystem
options 	FFS_ROOT		#FFS usable as root device [keep this!]
options 	SOFTUPDATES		#Enable FFS soft updates support
options 	MFS			#Memory Filesystem
#options 	MD_ROOT			#MD is a potential root device
options 	NFS			#Network Filesystem
#options 	NFS_ROOT		#NFS usable as root device, NFS required
#options 	MSDOSFS			#MSDOS Filesystem
options 	CD9660			#ISO 9660 Filesystem
options 	CD9660_ROOT		#CD-ROM usable as root, CD9660 required
#options 	PROCFS			#Process filesystem
options 	COMPAT_43		#Compatible with BSD 4.3 [KEEP THIS!]
options 	SCSI_DELAY=15000	#Delay (in ms) before probing SCSI
options 	UCONSOLE		#Allow users to grab the console
options 	USERCONFIG		#boot -c editor
options 	VISUAL_USERCONFIG	#visual boot -c editor
options 	KTRACE			#ktrace(1) support
options 	SYSVSHM			#SYSV-style shared memory
options 	SYSVMSG			#SYSV-style message queues
options 	SYSVSEM			#SYSV-style semaphores
options		SHMMAXPGS=4096
options 	P1003_1B		#Posix P1003_1B real-time extensions
options 	_KPOSIX_PRIORITY_SCHEDULING
options		ICMP_BANDLIM		#Rate limit bad replies
options 	KBD_INSTALL_CDEV	# install a CDEV entry in /dev
options		PPP_BSDCOMP
options		PPP_DEFLATE
options		PPP_FILTER
options		NSWAPDEV=4
options		MSGBUF_SIZE=140960

device		isa
options		"AUTO_EOI_1"

device		eisa
device		pci

# Floppy drives
device		fdc0	at isa? port IO_FD1 irq 6 drq 2
device		fd0	at fdc0 drive 0
#device		fd1	at fdc0 drive 1
#
# If you have a Toshiba Libretto with its Y-E Data PCMCIA floppy,
# don't use the above line for fdc0 but the following one:
#device		fdc0

# ATA and ATAPI devices
#device		ata0	at isa? port IO_WD1 irq 14
#device		ata1	at isa? port IO_WD2 irq 15
device		ata
device		atadisk			# ATA disk drives
device		atapicd			# ATAPI CDROM drives
#device		atapifd			# ATAPI floppy drives
#device		atapist			# ATAPI tape drives
options 	ATA_STATIC_ID		#Static device numbering

# atkbdc0 controls both the keyboard and the PS/2 mouse
device		atkbdc0	at isa? port IO_KBD
device		atkbd0	at atkbdc? irq 1 flags 0x1
#device		psm0	at atkbdc? irq 12

device		vga0	at isa?
options		VESA

# splash screen/screen saver
pseudo-device	splash

# syscons is the default console driver, resembling an SCO console
device		sc0	at isa? flags 0x100
options		MAXCONS=16
options		SC_HISTORY_SIZE=1000

# Floating point support - do not disable.
device		npx0	at nexus? port IO_NPX irq 13

# Power management support (see LINT for more options)
#device		apm0    at nexus? disable flags 0x20 # Advanced Power Management

# Serial (COM) ports
device		sio0	at isa? port IO_COM1 flags 0x10 irq 4
device		sio1	at isa? port IO_COM2 irq 3
#device		sio2	at isa? disable port IO_COM3 irq 5
#device		sio3	at isa? disable port IO_COM4 irq 9

# Parallel port
device		ppc0	at isa? irq 7
device		ppbus		# Parallel port bus (required)
device		lpt		# Printer
#device		plip		# TCP/IP over parallel
device		ppi		# Parallel port interface device
#device		vpo		# Requires scbus and da

# PCI Ethernet NICs.
#device		de		# DEC/Intel DC21x4x (``Tulip'')
#device		txp		# 3Com 3cR990 (``Typhoon'')
#device		vx		# 3Com 3c590, 3c595 (``Vortex'')

# PCI Ethernet NICs that use the common MII bus controller code.
# NOTE: Be sure to keep the 'device miibus' line in order to use these NICs!
device		miibus		# MII bus support
#device		dc		# DEC/Intel 21143 and various workalikes
device		fxp		# Intel EtherExpress PRO/100B (82557, 82558)
#device		pcn		# AMD Am79C97x PCI 10/100 NICs
#device		rl		# RealTek 8129/8139
#device		sf		# Adaptec AIC-6915 (``Starfire'')
#device		sis		# Silicon Integrated Systems SiS 900/SiS 7016
#device		ste		# Sundance ST201 (D-Link DFE-550TX)
#device		tl		# Texas Instruments ThunderLAN
#device		tx		# SMC EtherPower II (83c170 ``EPIC'')
#device		vr		# VIA Rhine, Rhine II
#device		wb		# Winbond W89C840F
#device		wx		# Intel Gigabit Ethernet Card (``Wiseman'')
#device		xl		# 3Com 3c90x (``Boomerang'', ``Cyclone'')

device pcm0 at isa? port ? irq 5 drq 1

# Pseudo devices - the number indicates how many units to allocate.
pseudo-device	loop		# Network loopback
pseudo-device	ether		# Ethernet support
#pseudo-device	sl	1	# Kernel SLIP
pseudo-device	ppp	3	# Kernel PPP
pseudo-device	tun		# Packet tunnel.
pseudo-device	pty	64	# Pseudo-ttys (telnet etc)
pseudo-device	snp	8
pseudo-device	vn
pseudo-device	gzip
pseudo-device	speaker
#pseudo-device	md		# Memory "disks"
pseudo-device	gif		# IPv6 and IPv4 tunneling
#pseudo-device	faith	1	# IPv6-to-IPv4 relaying (translation)

# The `bpf' pseudo-device enables the Berkeley Packet Filter.
# Be aware of the administrative consequences of enabling this!
pseudo-device	bpf		#Berkeley packet filter

options		QUOTA
options 	IPFIREWALL
options 	IPFIREWALL_VERBOSE
#options 	IPFIREWALL_VERBOSE_LIMIT=100
options 	IPDIVERT
options 	IPFIREWALL_FORWARD
options 	TCP_DROP_SYNFIN		#drop TCP packets with SYN+FIN
options 	DUMMYNET
options		NMBCLUSTERS=8192
options 	IBCS2
options		DDB
options		DDB_UNATTENDED
options		RANDOM_IP_ID
options		UFS_DIRHASH
options		USER_LDT
options		UCONSOLE

#end of file

Feel free to request any information.
I'd like to help resolve this ASAP.

Eugene Grosbein

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-stable" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20020303120121.A2197>