Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 20 Feb 2002 00:13:18 -0800 (PST)
From:      Matthew Dillon <dillon@apollo.backplane.com>
To:        Kirk McKusick <mckusick@mckusick.com>
Cc:        Mike Silbersack <silby@silby.com>, Valentin Nechayev <netch@iv.nn.kiev.ua>, "David W. Chapman Jr." <dwcjr@inethouston.net>, <stable@FreeBSD.ORG>
Subject:   Softupdates failure during buffer syncing at shutdown (was Re: cvs commit: src/sys/ufs/ffs ffs_softdep.c)
Message-ID:  <200202200813.g1K8DIl85685@apollo.backplane.com>
References:   <20020211010801.K8897-100000@patrocles.silby.com>

next in thread | previous in thread | raw e-mail | index | archive | help
    Ok, I finally tracked down the buffers that 'syncing disks...' 
    could not sync.  They appear to be indirect blocks.  Syncing disks...
    is counting them because they are exclusively locked by softupdates.
    All the buffers in question are locked by setup_allocindir_phase2()
    in ffs_softdep.c line 1698 (in stable).  The buffers themselves are
    marked clean.

    For some reason, softupdates never releases its lock on these
    buffers, though it appears that it ought to have (ir_deplisthd is empty).

    I don't know why, so I am adding Kirk to the list.  Kirk, I've included
    a gdb dump of one of the buffers and the item on its worklist.  The
    problem occurs when you 'shutdown -r now' a machine immediately after
    doing something major to the filesystem.  I was able to reproduce it
    on test1 by installing the kernel to /usr/fubar twice and doing 
    a shutdown -r now immediately.  It sometimes took two or three 
    reboots before 'Syncing disks...' would fail on a number of buffers.
    i.e. it would say:

syncing disks... 86 18 15 13 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 

    The buffers it is unable to sync are clean and exclusively locked 
    by softupdates (-stable ffs_softdep.c line 1698).  The lock is never
    released.  I think there may be some kind of cleanup that is not 
    getting executed by the 'syncing disks...' code's attempt to flush
    the buffers.

					-Matt
					Matthew Dillon 
					<dillon@backplane.com>

(kgdb) print &$8
$19 = (struct buf *) 0xcf456388
(kgdb) print $8
$15 = {b_hash = {le_next = 0x0, le_prev = 0xcf3abfc0}, b_vnbufs = {
    tqe_next = 0xcf467700, tqe_prev = 0xcf451448}, b_freelist = {
    tqe_next = 0xcf4564e0, tqe_prev = 0xc02f75d0}, b_act = {tqe_next = 0x0, 
    tqe_prev = 0x0}, b_flags = 536870912, b_qindex = 0, b_xflags = 2 '\002', 
  b_lock = {lk_interlock = {lock_data = 0}, lk_flags = 1024, 
    lk_sharecount = 0, lk_waitcount = 0, lk_exclusivecount = 1, lk_prio = 20, 
    lk_wmesg = 0xc02bcc50 "bufwait", lk_timo = 0, lk_lockholder = -2}, 
  b_error = 0, b_bufsize = 8192, b_runningbufspace = 0, b_bcount = 8192, 
  b_resid = 0, b_dev = 0xc2bfb300, b_data = 0xd2b25000 "\020O\037", 
  b_kvabase = 0xd2b25000 "\020O\037", b_kvasize = 16384, b_lblkno = 3988624, 
  b_blkno = 3988624, b_offset = 2042175488, b_iodone = 0, 
  b_iodone_chain = 0x0, b_vp = 0xdc8fcb40, b_dirtyoff = 0, b_dirtyend = 0, 
  b_rcred = 0x0, b_wcred = 0x0, b_pblkno = 0, b_saveaddr = 0x0, 
  b_driver1 = 0x0, b_driver2 = 0x0, b_caller1 = 0x0, b_caller2 = 0x0, 
  b_pager = {pg_spc = 0x0, pg_reqpage = 0}, b_cluster = {cluster_head = {
      tqh_first = 0xcf4564e0, tqh_last = 0xcf4562e4}, cluster_entry = {
      tqe_next = 0xcf4564e0, tqe_prev = 0xcf4562e4}}, b_pages = {0xc093d220, 
    0xc093785c, 0x0 <repeats 30 times>}, b_npages = 2, b_dep = {
    lh_first = 0xc2cf17a0}, b_chain = {parent = 0x0, count = 0}, 
  b_olockholder = 519, b_ofile = 0xc02ca4ff "../../ufs/ffs/ffs_softdep.c", 
  b_oline = 1698}
(kgdb) print *$8.b_dep.lh_first
$16 = {wk_list = {le_next = 0x0, le_prev = 0xcf4564c8}, wk_type = 5, 
  wk_state = 33025}
(kgdb) print (struct indirdep)$16
$18 = {ir_list = {wk_list = {le_next = 0x0, le_prev = 0xcf4564c8}, 
    wk_type = 5, wk_state = 33025}, ir_saveddata = 0x0, 
  ir_savebp = 0xcf456388, ir_donehd = {lh_first = 0x0}, ir_deplisthd = {
    lh_first = 0x0}}
(kgdb) 

					-Matt

:> :Matt, can you reproduce the problem over by you?  It seems that doing
:> :anything disk intensive and then shutting down immediately will trigger
:> :it.
:> :
:> :Mike "Silby" Silbersack
:>
:>     Hmm.  I will attempt to reproduce the problem.  How much activity is
:>     'significant' ?  e.g. equivalent of an rm -rf /usr/ports or something
:>     smaller?  Do the directories have to be deeply nested for the problem
:>     to occur?
:> 					-Matt
:> 					Matthew Dillon
:
:I was seeing the problem by just making a kernel (just a few files
:changed with no config or clean steps), installing the kernel, and doing a
:shutdown -r now.  So, only a few files were active at most.  The system in
:question only has a /, /usr, and /var partition, if that matters.  Only
:/usr was mounted softupdates.
:
:Mike "Silby" Silbersack


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-stable" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200202200813.g1K8DIl85685>