Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 28 Jan 1998 09:32:31 -0800 (PST)
From:      Matt Dillon <dillon@best.net>
To:        FreeBSD-gnats-submit@FreeBSD.ORG
Subject:   kern/5592: Kernel crash due to ufslk2/ffs_vget deadlock
Message-ID:  <199801281732.JAA14306@flea.best.net>

next in thread | raw e-mail | index | archive | help

>Number:         5592
>Category:       kern
>Synopsis:       ffs_inode_hash_lock can get permanently locked, causing the filesystem to lockup
>Confidential:   no
>Severity:       critical
>Priority:       high
>Responsible:    freebsd-bugs
>State:          open
>Quarter:
>Keywords:
>Date-Required:
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Wed Jan 28 09:40:02 PST 1998
>Last-Modified:
>Originator:     Matt Dillon
>Organization:
Best Internet Communications
>Release:        FreeBSD 2.2.5-STABLE i386
>Environment:

	PPro 200's running medium and heavily loaded shell environments.
	Lots of ram, moderate paging.

>Description:

	I tracked down a crash of one of our shell machines.  The crash 
	occured in the socket code, but was due to processes getting stuck
	in ufslk2 (inetd then forking on new connections and running the
	system out of network bufs).

	Tracking the bug down, I found the following situation:

	* most processes stuck in ufslk2

	* the ufslk2 chain terminated with a process that had the vnode locked
	  but was suck in ffs_vget()

	* the process was stuck in ffs_vget() attempting to get
	  ffs_inode_hash_lock and being unable to.

	* I found a second process which HAD ffs_inode_hash_lock but which was
	  stuck as follows:

(kgdb) #0  mi_switch () at ../../kern/kern_synch.c:635
#1  0xf0114eda in tsleep (ident=0xf26e2b00, priority=0x8, 
    wmesg=0xf01a1071 "ufslk2", timo=0x0) at ../../kern/kern_synch.c:398
#2  0xf01a10a1 in ufs_lock (ap=0xefbffc90) at ../../ufs/ufs/ufs_vnops.c:1707
#3  0xf0132a27 in vclean (vp=0xf24fd600, flags=0x8) at vnode_if.h:731
#4  0xf0132c3b in vgone (vp=0xf24fd600) at ../../kern/vfs_subr.c:1167
#5  0xf0131e52 in getnewvnode (tag=VT_UFS, mp=0xf21d3a00, vops=0xf2196800, 
    vpp=0xefbffd2c) at ../../kern/vfs_subr.c:380
#6  0xf019a25c in ffs_vget (mp=0xf21d3a00, ino=0x67205, vpp=0xefbffda8)
    at ../../ufs/ffs/ffs_vfsops.c:896
#7  0xf019d034 in ufs_lookup (ap=0xefbffe18) at ../../ufs/ufs/ufs_lookup.c:561
#8  0xf0131339 in lookup (ndp=0xefbffeac) at vnode_if.h:31
#9  0xf0130e7b in namei (ndp=0xefbffeac) at ../../kern/vfs_lookup.c:156
#10 0xf0135050 in lstat (p=0xf2764800, uap=0xefbfff94, retval=0xefbfff84)
    at ../../kern/vfs_syscalls.c:1324
#11 0xf01bf437 in syscall (frame={tf_es = 0x27, tf_ds = 0x27, 
      tf_edi = 0xffffffff, tf_esi = 0x35a00, tf_ebp = 0xefbfd758, 
      tf_isp = 0xefbfffe4, tf_ebx = 0x35a50, tf_edx = 0x33000, 
      tf_ecx = 0x35a40, tf_eax = 0xbe, tf_trapno = 0x7, tf_err = 0x7, 
      tf_eip = 0x18a85, tf_cs = 0x1f, tf_eflags = 0x246, tf_esp = 0xefbfd6e0, 
      tf_ss = 0x27}) at ../../i386/i386/trap.c:914
#12 0x18a85 in ?? ()
#13 0x7a35 in ?? ()
#14 0x7742 in ?? ()
#15 0x1e99 in ?? ()
#16 0x1d11 in ?? ()
#17 0x107e in ?? ()




>How-To-Repeat:

	

>Fix:
	
	I submit that calling vgone() on what is essentially a random 
	vnode within getnewvnode() can lead to deadlock situations in the
	filesystem, especially when called from other critical filesystem
	routines that hold critical global locks.

	The correct solution, I believe, is to NOT have getnewvnode() attempt 
        to vgone/vclean the vnode it wishes to allocate if said vnode's inode
	is locked at the time.

>Audit-Trail:
>Unformatted:



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199801281732.JAA14306>