Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 15 Jun 2010 20:38:38 GMT
From:      Dmitry Pryanishnikov <lynx.ripe@gmail.com>
To:        freebsd-gnats-submit@FreeBSD.org
Subject:   kern/147890: [regression] ufs-related lock problem in RELENG_8 (18.04.2010 -> 20.04.2010)
Message-ID:  <201006152038.o5FKccvn024098@www.freebsd.org>
Resent-Message-ID: <201006152040.o5FKe1iW046900@freefall.freebsd.org>

next in thread | raw e-mail | index | archive | help

>Number:         147890
>Category:       kern
>Synopsis:       [regression] ufs-related lock problem in RELENG_8 (18.04.2010 -> 20.04.2010)
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    freebsd-bugs
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Tue Jun 15 20:40:01 UTC 2010
>Closed-Date:
>Last-Modified:
>Originator:     Dmitry Pryanishnikov
>Release:        RELENG_8
>Organization:
Home
>Environment:
FreeBSD lynx.homenet 8.0-STABLE FreeBSD 8.0-STABLE #0: Tue Jun 15 15:05:05 EEST 2010     root@lynx.homenet:/databig/obj/databig/ftp/RELENG_8/src.100418/sys/lynx  i386

>Description:

After upgrade RELENG_8 as of 10.04.2010 -> 14.06.2010 my desktop (previously running w/o problems for days) started to lock pretty fast (within 1-2 hours of uptime). I've narrowed down the time window of the problem appearance: kernel built from the sources as of date=2010.04.18.00.00.00 runs OK, one as of date=2010.04.20.00.00.00 already locks. Nothing "special" is required in order to trigger the condition: after boot I just log in as a non-root (via xdm - it starts xfce4 environment), then start the Thunar File Manager and GQview. Then I switch to text console (crash dump often doesn't complete when console is running X) and just do 'ps axww|grep D' from time to time. Eventually processes thunar-vfs-mime-cleaner-1 (stuck in uninterruptible wait state) start to accumulate. In this state, simple 'sync' also hangs the issuing shell, and (the most painful result) file operations (file creations/moves/...) don't get synched to the disk. Then deadlock resolver (after debug.deadl
 kres.slptime_threshold seconds) detects the deadlock, which allows to get crash dump. I've got valid dumps both for kernel built from 14.06.2010 and from 20.04.2010 sources, they look similar. Here is the crash of the latter:

panic: deadlkres: possible deadlock detected for 0xcb52e4a0, blocked for 294235 ticks

(I've lowered debug.deadlkres.slptime_threshold manually after detecting hung thunar-vfs-mime-cleaner-1 processes)

(kgdb) print *(struct thread *)0xcb52e4a0
$1 = {td_lock = 0xc07db49c, td_proc = 0xcb5002a8, td_plist = {tqe_next = 0x0, tqe_prev = 0xcb5002b0}, td_runq = {tqe_next = 0x0, tqe_prev = 0xc07ce834}, td_slpq = {tqe_next = 0xcb52db90, 
    tqe_prev = 0xc7383018}, td_lockq = {tqe_next = 0x0, tqe_prev = 0x0}, td_cpuset = 0xc702ce38, td_sel = 0x0, td_sleepqueue = 0x0, td_turnstile = 0xc7ca2c00, td_umtxq = 0xcabc3940, td_tid = 100186, 
  td_sigqueue = {sq_signals = {__bits = {0, 0, 0, 0}}, sq_kill = {__bits = {0, 0, 0, 0}}, sq_list = {tqh_first = 0x0, tqh_last = 0xcb52e500}, sq_proc = 0xcb5002a8, sq_flags = 1}, td_flags = 4, 
  td_inhibitors = 2, td_pflags = 0, td_dupfd = -1, td_sqqueue = 0, td_wchan = 0xcb3a9df4, td_wmesg = 0xc0735016 "ufs", td_lastcpu = 2 '\002', td_oncpu = 255 '&#1066;', td_owepreempt = 0 '\0', td_tsqueue = 0 '\0', 
  td_locks = 0, td_rw_rlocks = 0, td_lk_slocks = 0, td_blocked = 0x0, td_lockname = 0x0, td_contested = {lh_first = 0x0}, td_sleeplocks = 0xc0908618, td_intr_nesting_level = 0, td_pinned = 0, 
  td_ucred = 0xc7668580, td_estcpu = 0, td_slptick = 703512, td_blktick = 0, td_ru = {ru_utime = {tv_sec = 0, tv_usec = 0}, ru_stime = {tv_sec = 0, tv_usec = 0}, ru_maxrss = 224, ru_ixrss = 12, ru_idrss = 4, 
    ru_isrss = 128, ru_minflt = 145, ru_majflt = 0, ru_nswap = 0, ru_inblock = 0, ru_oublock = 0, ru_msgsnd = 0, ru_msgrcv = 0, ru_nsignals = 0, ru_nvcsw = 1, ru_nivcsw = 10}, td_incruntime = 0, 
  td_runtime = 23985873, td_pticks = 0, td_sticks = 0, td_iticks = 0, td_uticks = 0, td_intrval = 0, td_oldsigmask = {__bits = {0, 0, 0, 0}}, td_sigmask = {__bits = {0, 0, 0, 0}}, td_generation = 11, 
  td_sigstk = {ss_sp = 0x0, ss_size = 0, ss_flags = 4}, td_xsig = 0, td_profil_addr = 0, td_profil_ticks = 0, td_name = "initial thread\000-cle", td_fpop = 0x0, td_dbgflags = 0, td_ng_outbound = 0, td_osd = {
    osd_nslots = 0, osd_slots = 0x0, osd_next = {le_next = 0x0, le_prev = 0x0}}, td_rqindex = 32 ' ', td_base_pri = 128 '\200', td_priority = 128 '\200', td_pri_class = 3 '\003', td_user_pri = 128 '\200', 
  td_base_user_pri = 128 '\200', td_pcb = 0xeb6a3d90, td_state = TDS_INHIBITED, td_retval = {0, 1}, td_slpcallout = {c_links = {sle = {sle_next = 0x0}, tqe = {tqe_next = 0x0, tqe_prev = 0x0}}, c_time = 0, 
    c_arg = 0x0, c_func = 0, c_lock = 0x0, c_flags = 16, c_cpu = 0}, td_frame = 0xeb6a3d38, td_kstack_obj = 0xcb4ca3b8, td_kstack = 3949600768, td_kstack_pages = 2, td_unused1 = 0x0, td_unused2 = 0, 
  td_unused3 = 0, td_critnest = 1, td_md = {md_spinlock_count = 1, md_saved_flags = 582}, td_sched = 0xcb52e6bc, td_ar = 0x0, td_syscalls = 11802, td_lprof = {{lh_first = 0x0}, {lh_first = 0x0}}, 
  td_dtrace = 0x0, td_errno = 0, td_vnet = 0x0, td_vnet_lpush = 0x0}


(kgdb) print *(struct proc *)0xcb5002a8
$2 = {p_list = {le_next = 0xcabb3d48, le_prev = 0xcabb1000}, p_threads = {tqh_first = 0xcb52e4a0, tqh_last = 0xcb52e4a8}, p_slock = {lock_object = {lo_name = 0xc073b098 "process slock", lo_flags = 720896, 
      lo_data = 0, lo_witness = 0x0}, mtx_lock = 4}, p_ucred = 0xc7668580, p_fd = 0xc781b900, p_fdtol = 0x0, p_stats = 0xc781ce00, p_limit = 0xc7826500, p_limco = {c_links = {sle = {sle_next = 0x0}, tqe = {
        tqe_next = 0x0, tqe_prev = 0x0}}, c_time = 0, c_arg = 0x0, c_func = 0, c_lock = 0xcb500330, c_flags = 0, c_cpu = 0}, p_sigacts = 0xcabea000, p_flag = 268451840, p_state = PRS_NORMAL, p_pid = 2044, 
  p_hash = {le_next = 0x0, le_prev = 0xc6fe9ff0}, p_pglist = {le_next = 0xcabb3d48, le_prev = 0xc7800db8}, p_pptr = 0xc7043000, p_sibling = {le_next = 0xcabb3d48, le_prev = 0xc7043084}, p_children = {
    lh_first = 0x0}, p_mtx = {lock_object = {lo_name = 0xc073b08b "process lock", lo_flags = 21168128, lo_data = 0, lo_witness = 0x0}, mtx_lock = 4}, p_ksi = 0xc750f410, p_sigqueue = {sq_signals = {__bits = {
        0, 0, 0, 0}}, sq_kill = {__bits = {0, 0, 0, 0}}, sq_list = {tqh_first = 0x0, tqh_last = 0xcb500368}, sq_proc = 0xcb5002a8, sq_flags = 1}, p_oppid = 0, p_vmspace = 0xc76febc8, p_swtick = 703502, 
  p_realtimer = {it_interval = {tv_sec = 0, tv_usec = 0}, it_value = {tv_sec = 0, tv_usec = 0}}, p_ru = {ru_utime = {tv_sec = 0, tv_usec = 0}, ru_stime = {tv_sec = 0, tv_usec = 0}, ru_maxrss = 0, 
    ru_ixrss = 0, ru_idrss = 0, ru_isrss = 0, ru_minflt = 0, ru_majflt = 0, ru_nswap = 0, ru_inblock = 0, ru_oublock = 0, ru_msgsnd = 0, ru_msgrcv = 0, ru_nsignals = 0, ru_nvcsw = 0, ru_nivcsw = 0}, p_rux = {
    rux_runtime = 23985873, rux_uticks = 0, rux_sticks = 1, rux_iticks = 0, rux_uu = 0, rux_su = 9993, rux_tu = 9993}, p_crux = {rux_runtime = 0, rux_uticks = 0, rux_sticks = 0, rux_iticks = 0, rux_uu = 0, 
    rux_su = 0, rux_tu = 0}, p_profthreads = 0, p_exitthreads = 0, p_traceflag = 0, p_tracevp = 0x0, p_tracecred = 0x0, p_textvp = 0xcb524218, p_lock = 0, p_sigiolst = {slh_first = 0x0}, p_sigparent = 20, 
  p_sig = 0, p_code = 0, p_stops = 0, p_stype = 0, p_step = 0 '\0', p_pfsflags = 0 '\0', p_nlminfo = 0x0, p_aioinfo = 0x0, p_singlethread = 0x0, p_suspcount = 0, p_xthread = 0x0, p_boundary_count = 0, 
  p_pendingcnt = 0, p_itimers = 0x0, p_magic = 3203398350, p_osrel = 800504, p_comm = "thunar-vfs-mime-cle", p_pgrp = 0xc7bd7040, p_sysent = 0xc07a4220, p_args = 0xc7bd6d40, p_cpulimit = 9223372036854775807, 
  p_nice = 0 '\0', p_fibnum = 0, p_xstat = 0, p_klist = {kl_list = {slh_first = 0x0}, kl_lock = 0xc04d47e0 <knlist_mtx_lock>, kl_unlock = 0xc04d4810 <knlist_mtx_unlock>, 
    kl_assert_locked = 0xc04d4b60 <knlist_mtx_assert_locked>, kl_assert_unlocked = 0xc04d4b30 <knlist_mtx_assert_unlocked>, kl_lockarg = 0xcb500330}, p_numthreads = 1, p_md = {md_ldt = 0x0}, p_itcallout = {
    c_links = {sle = {sle_next = 0x0}, tqe = {tqe_next = 0x0, tqe_prev = 0x0}}, c_time = 0, c_arg = 0x0, c_func = 0, c_lock = 0x0, c_flags = 16, c_cpu = 0}, p_acflag = 0, p_peers = 0x0, p_leader = 0xcb5002a8, 
  p_emuldata = 0x0, p_label = 0x0, p_sched = 0xcb500550, p_ktr = {stqh_first = 0x0, stqh_last = 0xcb500538}, p_mqnotifier = {lh_first = 0x0}, p_dtrace = 0x0, p_pwait = {cv_description = 0xc073b8cc "ppwait", 
    cv_waiters = 0}}


I'm ready to run any additional kgdb commands against those crash dumps, to try patches on this desktop etc...

Here is the kernel config:

ident		LYNX
makeoptions	DEBUG=-g		#Build kernel with gdb(1) debug symbols
options 	INCLUDE_CONFIG_FILE     # Include this file in kernel
cpu		I686_CPU		# aka Pentium Pro(tm)
options		SCHED_ULE
options		SMP			# Symmetric MultiProcessor Kernel
options 	PREEMPTION		# Enable kernel thread preemption
options		MUTEX_DEBUG
options 	WITNESS
device		apic			# I/O apic
options 	COMPAT_FREEBSD4
options 	COMPAT_FREEBSD5
options 	COMPAT_FREEBSD6
options 	COMPAT_FREEBSD7
options 	SYSVSHM
options 	SYSVSEM
options 	SYSVMSG
options 	KDB
options 	KDB_TRACE
options 	DDB
options 	DDB_NUMSYM
options 	BREAK_TO_DEBUGGER	# BREAK on a serial console goes to DDB
options 	SYSCTL_DEBUG
options 	KTRACE			#kernel tracing
options 	INVARIANTS
options 	INVARIANT_SUPPORT
options		DEADLKRES		#software deadlock resolver thread
options 	INET			#Internet communications protocols
options 	INET6			#IPv6 communications protocols
options		IPSEC			#IP security (requires device crypto)
options		IPSEC_NAT_T		#NAT-T support, UDP encap of ESP
device		ether			#Generic Ethernet
device		loop			#Network loopback device
device		bpf			#Berkeley packet filter
device		enc			#IPsec interface
options 	IPFIREWALL		#firewall
options 	IPFIREWALL_VERBOSE	#enable logging to syslogd(8)
options 	IPFIREWALL_VERBOSE_LIMIT=100	#limit verbosity
options 	IPFIREWALL_FORWARD	#packet destination changes
options		IPFIREWALL_NAT		#ipfw kernel nat support
options		LIBALIAS		# libalias library, performing NAT
options 	BPF_JITTER		#support for BPF just-in-time compiler
options 	FFS			#Fast filesystem
options 	SOFTUPDATES
options 	UFS_EXTATTR
options 	UFS_ACL
options 	UFS_DIRHASH
options		UFS_GJOURNAL		# Gjournal-based UFS journaling support.
options 	QUOTA			#enable disk quotas
options 	SUIDDIR			#The suiddir hack
options 	_KPOSIX_PRIORITY_SCHEDULING
options 	P1003_1B_SEMAPHORES
device		pty		#Pseudo ttys
options 	MSGBUF_SIZE=131072
device		crypto		# core crypto support
device		pci
device		atkbdc		# AT keyboard controller
device		atkbd		# AT keyboard
device		psm		# PS/2 mouse
options 	KBD_INSTALL_CDEV	# install a CDEV entry in /dev
device		vga		# VGA video card driver
device		splash		# Splash screen and screen saver support
options		X86BIOS		# x86 real mode BIOS emulator, required by atkbdc/dpms/vesa
options 	PRINTF_BUFR_SIZE=128	# Prevent printf output being interspersed.
device		sc
options 	SC_HISTORY_SIZE=20000
options 	SC_TWOBUTTON_MOUSE
device		uart			# Generic UART driver
device		ata
device		atadisk		# ATA disk drives
options 	ATA_STATIC_ID	# Static device numbering
device		scbus		# SCSI bus (required for SCSI)
device		da		# Direct Access (disks)
device		cd		# CD
device		pass		# Passthrough device (direct SCSI access)
options		SND_PCM_64	# Process 32bit samples through 64bit integer

And the loader.conf:

autoboot_delay="5"		# Delay in seconds before autobooting
beastie_disable="YES"		# Turn the beastie boot menu on and off
kernel="kernel"			# /boot sub-directory containing kernel and modules
kern.hz="1000"			# Set the kernel interval timer rate
kern.maxdsiz="1G"		# Set the max data size
kern.maxssiz="1G"		# Set the max stack size
kern.maxtsiz="1G"		# Set the max text size
kern.ipc.semmni=80		# Number of semaphore identifiers
kern.ipc.semmns=480		# Maximum number of semaphores in the system
hw.ata.atapi_dma=1		# Using DMA for ATAPI devices
hw.snd.verbose=2		# Make /dev/sndstat more verbose
hw.snd.maxautovchans=4		# Enable software sound mixer
hint.uart.0.flags="0x10"	# device is potential system console
debug.minidump=1		# Use minidumps
debug.witness.watch=0		# Turn off the witness
ipdivert_load="YES"
accf_data_load="YES"		# Wait for data accept filter
accf_dns_load="YES"		# Wait for full DNS request accept filter
accf_http_load="YES"		# Wait for full HTTP request accept filter
if_tap_load="YES"		# Ethernet tunnel software network interface
if_tun_load="YES"		# Tunnel driver (user process ppp)
if_vlan_load="YES"		# IEEE 802.1Q VLAN network interface
miibus_load="YES"		# miibus support, needed for some drivers
if_em_load="YES"		# Intel(R) PRO/1000 Gigabit Ethernet
snd_hda_load="YES"		# Intel High Definition Audio
usb_load="YES"			# USB subsystem
uhci_load="YES"			# UHCI USB Host Controller driver
ehci_load="YES"			# USB Enhanced Host Controller driver
ugen_load="YES"			# USB generic device, if all else fails ...
uhid_load="YES"			# Human Interface Devices
ukbd_load="YES"			# Keyboard
ulpt_load="YES"			# Printer
ums_load="YES"			# Mouse
umass_load="YES"		# Mass Storage Devices
ucom_load="YES"			# USB tty support:
uplcom_load="YES"		#  Prolific PL-2303/2303X/2303HX serial adapters
uftdi_load="YES"		#  FTDI FT2232C, FT8U100AX and FT8U232AM chips
uark_load="YES"			#  Arkmicro Technologies ARK3116 based USB serial adapter
snd_uaudio_load="YES"		# USB audio
umodem_load="YES"		# Modems
random_load="YES"
cpufreq_load="YES"
coretemp_load="YES"
ichwd_load="YES"
smbus_load="YES"
smb_load="YES"
ichsmb_load="YES"
speaker_load="YES"
atapicam_load="YES"
fdc_load="YES"
drm_load="YES"
radeon_load="YES"
vesa_load="YES"
acpi_video_load="YES"
firewire_load="YES"



>How-To-Repeat:

>Fix:


>Release-Note:
>Audit-Trail:
>Unformatted:



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201006152038.o5FKccvn024098>