Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 23 Dec 2003 13:42:05 +1100
From:      Peter Jeremy <peter.jeremy@alcatel.com.au>
To:        freebsd-stable@freebsd.org
Subject:   4.9p1 deadlock on "inode"
Message-ID:  <20031223024205.GA45693@gsmx07.alcatel.com.au>

next in thread | raw e-mail | index | archive | help
This morning I found one of my systems would not let me login or issue
commands but still seemed to be running.  ddb showed that lots of
processes were waiting on "inode".  I forced a crash dump and found
166 processes total, 95 waiting on inode and 94 on the same wchan:

(kgdb) p *(struct lock *)0xc133eb00
$9 = {lk_interlock = {lock_data = 0}, lk_flags = 0x200440, lk_sharecount = 0, 
  lk_waitcount = 94, lk_exclusivecount = 1, lk_prio = 8, 
  lk_wmesg = 0xc02b0a8a "inode", lk_timo = 101, lk_lockholder = 304}
(kgdb) 

The lockholder is cron - the process waiting on inode on a different
lock:
(kgdb) p *(struct lock *)0xc1901a00
$10 = {lk_interlock = {lock_data = 0}, lk_flags = 0x200440, lk_sharecount = 0, 
  lk_waitcount = 1, lk_exclusivecount = 1, lk_prio = 8, 
  lk_wmesg = 0xc02b0a8a "inode", lk_timo = 101, lk_lockholder = 15123}
(kgdb) 

Pid 15123 is another cron process waiting on "vlruwk" because there are
too many vnodes in use:
(kgdb) p numvnodes
$12 = 8904
(kgdb) p freevnodes
$13 = 24
(kgdb) p desiredvnodes
$14 = 8879

Process vnlru is waiting on "vlrup" with vnlru_nowhere = 18209.

Looking through the mountlist, mnt_nvnodelistsize was sane on all
filesystems except one (/mnt), where it was 8613 (97% of all vnodes).
Only one process was actively using files in /mnt, though some other
processes may have been using it for $PWD or similar.  This process
was scanning most of the files in /mnt (about 750,000) checking for
files with identical content - basically all files that could
potentially be the same (eg same length) are mmap'd and compared.
This process had 2816 entries in its vm_map.  (It's just occurred to
me that there would be one set of data that would appear in a large
number of files (~30000) but I would have expected this to result in
an error during an mmap(), not a deadlock).

Scanning through the mnt_nvnodelist on /mnt:
5797 entries were for directories with entries in v_cache_src
2804 entries were for files with a usecount > 0
  11 entries were for directories with VFREE|VDOOMED|VXLOCK
   1 VNON entry

This means that none of the vnodes in /mnt were available for
recycling (and the total vnodes on the other filesystems would not be
enough to reach the hysteresis point to unlock the vnode allocation).
I can understand that an mmap'd file holds a usecount on the file's
vnode but my understanding is that vnode entries with v_cache_src
entries should be able to be recycled (though this will slow down
namei()).  If so, should vnlru grow a "try harder" loop that will
recycle these vnodes if it winds up stuck in entries?

I notice vlrureclaim() contains the comment "don't set kern.maxvnodes
too low".  In this case, it is auto-tuned based on 128MB RAM and
"maxusers=0".  Maybe this is too low for my purposes but it would be
much nicer if the system managed to handle this situation gracefully
rather than by deadlocking.

And finally, a question on vlrureclaim():  Why does this process scan
through mnt_nvnodelist and perform a TAILQ_REMOVE(), TAILQ_INSERT_TAIL()
on each node?  Wouldn't it be cheaper to just scan the list, rather than
moving every node to the end of the list?

Peter



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20031223024205.GA45693>