Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 10 Jan 1999 13:28:25 -0500 (EST)
From:      Luoqi Chen <luoqi@watermarkgroup.com>
To:        dg@FreeBSD.ORG, syssgm@dtir.qld.gov.au
Cc:        freebsd-current@FreeBSD.ORG
Subject:   Re:  Hangs on "inode" and "thrd_sleep"
Message-ID:  <199901101828.NAA07251@lor.watermarkgroup.com>

next in thread | raw e-mail | index | archive | help
I ran into the same problem before. If you search the -current archive
for deadlock, I posted a patch to solve this problem (I should have filed
a PR for this). There are more serious problems with locking in vm_fault(),
which are more difficult to fix, see PR 8416 by Tor Egge.

-lq

> My test machine hung last night during 'make -j5 buildworld' with 7 processes
> in "thrd_sleep" and 2 in "inode".  Thus began a marathon DDB session
> (punctuated by some reluctant sleep).
> 
> The machine is a 486DX2/66 with 16Mb ram, AHA1542CF, 1Gb hard disk, kernel
> from 29/12/98, compiling current from yesterday, elf binaries, elf kernel,
> softupdates.  No NFS involved.  Plenty of swap, and with only 16Mb ram and
> parallel builds it does an awful lot of paging.
> 
> The last visible bit of the compilation log went like this:
> 
>     cc -fpic -DPIC ... alias_util.so
>     building profiled alias library
>     building standard alias library
>     building shared alias library (version 2)
> 
> Since it was a parallel make, possibly all 3 library builds are running in
> parallel.  Certainly there are 3 tsort and 3 nm processes active (well, they
> would be if the whole thing wasn't wedged).
> 
> The processes in "thrd_sleep" are trying to lock exec_map.  Exec_map has
> 1 shared lock, 7 waiting, and LK_NOPAUSE LK_SHARE_NON_ZERO LK_WAIT_NON_ZERO
> and LK_WANT_EXCL set.  Where's the missing process with the shared lock?
> 
> The processes in "inode" are trying to lock the inode that refers to the
> vnode that is "/usr/obj/elf/usr/src/tmp/usr/bin/sed".  There is 1 shared lock
> and 2 waiting, and LK_SHARE_NON_ZERO LK_WAIT_NON_ZERO and LK_WANT_EXCL set.
> Similarly, where is the missing process with the shared lock?
> 
> Well, the exec_map contains 6 entries.  Three are largish and must be from
> argument copying.  The other 3 are single pages, and must come from that
> peculiar double-mapping-of-the-text-data-boundary bit in elf_load_section().
> Two of these pages are from the same "sed" vnode that the processes stuck
> in "inode" want.  Of course, what I really should be saying is that the
> same page is in exec_map in two places.
> 
> The problem was not lack of free pages.  The free list has hundreds of
> free pages.
> 
> I'd like to say I've got to the bottom of all this and add another one
> line patch to the kernel, but I've run out of puff.  I'll be leaving the
> machine on (and stuck) for a while and will try again to determine the
> root cause.
> 
> But I will ask:  What is likely to happen if two processes attempt to
> exec the same binary at the same time and the binary is not in core?
> 
> The only place I can find that issues a shared lock on exec_map is the
> vm_fault() (via vm_map_lookup()) to fill that double mapped text/data page.
> Everything else seems to want an exclusive lock.  Thus I point my finger
> vaguely in the direction of the elf exec code and yell "Witch!  Burn her!"
> 
> What else can I discover from my hung 486 that could help diagnose this?
> I've only got DDB and stupidly disconnected my serial console setup.
> 
> Stephen.
> 
> PS Finding the name of a vnode from the name cache using ddb is slow and
> painful.  What's the easy way?
> 

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-current" in the body of the message



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199901101828.NAA07251>