Date: Tue, 23 Dec 2003 13:42:05 +1100 From: Peter Jeremy <peter.jeremy@alcatel.com.au> To: freebsd-stable@freebsd.org Subject: 4.9p1 deadlock on "inode" Message-ID: <20031223024205.GA45693@gsmx07.alcatel.com.au>
next in thread | raw e-mail | index | archive | help
This morning I found one of my systems would not let me login or issue commands but still seemed to be running. ddb showed that lots of processes were waiting on "inode". I forced a crash dump and found 166 processes total, 95 waiting on inode and 94 on the same wchan: (kgdb) p *(struct lock *)0xc133eb00 $9 = {lk_interlock = {lock_data = 0}, lk_flags = 0x200440, lk_sharecount = 0, lk_waitcount = 94, lk_exclusivecount = 1, lk_prio = 8, lk_wmesg = 0xc02b0a8a "inode", lk_timo = 101, lk_lockholder = 304} (kgdb) The lockholder is cron - the process waiting on inode on a different lock: (kgdb) p *(struct lock *)0xc1901a00 $10 = {lk_interlock = {lock_data = 0}, lk_flags = 0x200440, lk_sharecount = 0, lk_waitcount = 1, lk_exclusivecount = 1, lk_prio = 8, lk_wmesg = 0xc02b0a8a "inode", lk_timo = 101, lk_lockholder = 15123} (kgdb) Pid 15123 is another cron process waiting on "vlruwk" because there are too many vnodes in use: (kgdb) p numvnodes $12 = 8904 (kgdb) p freevnodes $13 = 24 (kgdb) p desiredvnodes $14 = 8879 Process vnlru is waiting on "vlrup" with vnlru_nowhere = 18209. Looking through the mountlist, mnt_nvnodelistsize was sane on all filesystems except one (/mnt), where it was 8613 (97% of all vnodes). Only one process was actively using files in /mnt, though some other processes may have been using it for $PWD or similar. This process was scanning most of the files in /mnt (about 750,000) checking for files with identical content - basically all files that could potentially be the same (eg same length) are mmap'd and compared. This process had 2816 entries in its vm_map. (It's just occurred to me that there would be one set of data that would appear in a large number of files (~30000) but I would have expected this to result in an error during an mmap(), not a deadlock). Scanning through the mnt_nvnodelist on /mnt: 5797 entries were for directories with entries in v_cache_src 2804 entries were for files with a usecount > 0 11 entries were for directories with VFREE|VDOOMED|VXLOCK 1 VNON entry This means that none of the vnodes in /mnt were available for recycling (and the total vnodes on the other filesystems would not be enough to reach the hysteresis point to unlock the vnode allocation). I can understand that an mmap'd file holds a usecount on the file's vnode but my understanding is that vnode entries with v_cache_src entries should be able to be recycled (though this will slow down namei()). If so, should vnlru grow a "try harder" loop that will recycle these vnodes if it winds up stuck in entries? I notice vlrureclaim() contains the comment "don't set kern.maxvnodes too low". In this case, it is auto-tuned based on 128MB RAM and "maxusers=0". Maybe this is too low for my purposes but it would be much nicer if the system managed to handle this situation gracefully rather than by deadlocking. And finally, a question on vlrureclaim(): Why does this process scan through mnt_nvnodelist and perform a TAILQ_REMOVE(), TAILQ_INSERT_TAIL() on each node? Wouldn't it be cheaper to just scan the list, rather than moving every node to the end of the list? Peter
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20031223024205.GA45693>