Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 5 Jan 2006 10:45:16 -0800 (PST)
From:      Don Lewis <truckman@FreeBSD.org>
To:        dsh@vlink.ru
Cc:        gcr+freebsd-stable@tharned.org, freebsd@McKusick.COM, freebsd-stable@FreeBSD.org, kris@obsecurity.org
Subject:   Re: Recurring problem: processes block accessing UFS file system
Message-ID:  <200601051845.k05IjGCI015543@gw.catspoiler.org>
In-Reply-To: <8764oyinh7.fsf@neva.vlink.ru>

next in thread | previous in thread | raw e-mail | index | archive | help
On  5 Jan, Denis Shaposhnikov wrote:
> Hi!
> 
>>>>>> "Greg" == Greg Rivers <gcr+freebsd-stable@tharned.org> writes:
> 
>  Greg> It's taken more than a month, but the problem has recurred
>  Greg> without snapshots ever having been run.  I've got a good trace
> 
> I think that I have the same problem on a fresh CURRENT. For some
> processes I see MWCHAN = ufs and "D" in the STAT. And I can't kill
> such processes even with -9. And system can't kill them too on
> shutdown. So, system can't do shutdown and wait forever after "All
> buffers synced" message. At this moment I've entered to KDB do "show
> lockedvnods":
> 
> Locked vnodes
> 
> 0xc687cb58: tag ufs, type VDIR
>     usecount 1, writecount 0, refcount 2 mountedhere 0
>     flags ()
>     v_object 0xcb5b1934 ref 0 pages 0
>      lock type ufs: EXCL (count 1) by thread 0xc795d600 (pid 74686) with 1 pending
>         ino 2072602, on dev ad4s1g
> 
> 0xc687ca50: tag ufs, type VDIR
>     usecount 31, writecount 0, refcount 32 mountedhere 0
>     flags ()
>     v_object 0xc85d2744 ref 0 pages 1
>      lock type ufs: EXCL (count 1) by thread 0xc7683d80 (pid 74178) with 6 pending
>         ino 2072603, on dev ad4s1g
> 
> 0xc687c948: tag ufs, type VDIR
>     usecount 2, writecount 0, refcount 3 mountedhere 0
>     flags ()
>     v_object 0xc875d000 ref 0 pages 1
>      lock type ufs: EXCL (count 1) by thread 0xc91f3300 (pid 65610) with 1 pending
>         ino 2072615, on dev ad4s1g
> 
> 0xc691f420: tag ufs, type VDIR
>     usecount 2, writecount 0, refcount 3 mountedhere 0
>     flags ()
>     v_object 0xc8a773e0 ref 0 pages 1
>      lock type ufs: EXCL (count 1) by thread 0xc68e5780 (pid 519) with 1 pending
>         ino 2072680, on dev ad4s1g
> 
> 0xc691f318: tag ufs, type VDIR
>     usecount 3, writecount 0, refcount 4 mountedhere 0
>     flags ()
>     v_object 0xc8a7b2e8 ref 0 pages 1
>      lock type ufs: EXCL (count 1) by thread 0xc7019780 (pid 74103) with 2 pending
>         ino 2072795, on dev ad4s1g
> 
> 0xc69bb528: tag ufs, type VDIR
>     usecount 2, writecount 0, refcount 3 mountedhere 0
>     flags ()
>     v_object 0xc7890744 ref 0 pages 1
>      lock type ufs: EXCL (count 1) by thread 0xc91f4600 (pid 74129) with 1 pending
>         ino 2072767, on dev ad4s1g
> Locked vnodes
> 
> 0xc687cb58: tag ufs, type VDIR
>     usecount 1, writecount 0, refcount 2 mountedhere 0
>     flags ()
>     v_object 0xcb5b1934 ref 0 pages 0
>      lock type ufs: EXCL (count 1) by thread 0xc795d600 (pid 74686) with 1 pending
>         ino 2072602, on dev ad4s1g
> 
> 0xc687ca50: tag ufs, type VDIR
>     usecount 31, writecount 0, refcount 32 mountedhere 0
>     flags ()
>     v_object 0xc85d2744 ref 0 pages 1
>      lock type ufs: EXCL (count 1) by thread 0xc7683d80 (pid 74178) with 6 pending
>         ino 2072603, on dev ad4s1g
> 
> 0xc687c948: tag ufs, type VDIR
>     usecount 2, writecount 0, refcount 3 mountedhere 0
>     flags ()
>     v_object 0xc875d000 ref 0 pages 1
>      lock type ufs: EXCL (count 1) by thread 0xc91f3300 (pid 65610) with 1 pending
>         ino 2072615, on dev ad4s1g
> 
> 0xc691f420: tag ufs, type VDIR
>     usecount 2, writecount 0, refcount 3 mountedhere 0
>     flags ()
>     v_object 0xc8a773e0 ref 0 pages 1
>      lock type ufs: EXCL (count 1) by thread 0xc68e5780 (pid 519) with 1 pending
>         ino 2072680, on dev ad4s1g
> 
> 0xc691f318: tag ufs, type VDIR
>     usecount 3, writecount 0, refcount 4 mountedhere 0
>     flags ()
>     v_object 0xc8a7b2e8 ref 0 pages 1
>      lock type ufs: EXCL (count 1) by t(kgdb) 
> 
> After that I've done "call doadump" and got vmcore.
> 
> ps show me:
> 
> (kgdb) ps
> During symbol reading, Incomplete CFI data; unspecified registers at 0xc04d97eb.
>   pid    proc   uid  ppid  pgrp   flag stat comm         wchan
> 74686 c9464000    0     1     1  000000  1  sh           ufs c687caa8
> 74195 c970d000    0  3074 74195  4000100  1  sshd         ufs c687caa8
> 74178 c7682adc    0  3074 74178  004000  1  sshd         ufs c687c9a0
> 74129 c9b82adc 1008     1  5504  004000  1  parser3.cgi   ufs c691f370
> 74103 c70b5458 1008     1  5504  000000  1  httpd        ufs c69bb580
> 65610 c92c0458 1005     1 65610  004000  1  sftp-server   ufs c691f478
>  5518 c6247458 1008     1  5516  004002  1  perl5.8.7    ufs c687caa8
>  3081 c7523d08    0     1  3081  000000  1  cron         ufs c687caa8
>  3074 c7682d08    0     1  3074  000100  1  sshd         ufs c687caa8
>  3016 c7523adc    0     1  3016  000000  1  syslogd      ufs c687caa8
>   519 c68e4d08   80     1   518  000100  1  nginx        ufs c691f370
>    34 c6260000    0     0     0  000204  1  schedcpu     - e88b3cf0
>    33 c62438b0    0     0     0  000204  1  syncer       ktsusp c6243938
>    32 c6243adc    0     0     0  000204  1  vnlru        ktsusp c6243b64
>    31 c6243d08    0     0     0  000204  1  bufdaemon    ktsusp c6243d90
>    30 c6244000    0     0     0  00020c  1  pagezero     pgzero c06c21a0
>    29 c624422c    0     0     0  000204  1  vmdaemon     psleep c06c1d08
>    28 c6244458    0     0     0  000204  1  pagedaemon   psleep c06c1cc8
>    27 c602e684    0     0     0  000204  1  irq1: atkbd0   
>    26 c602e8b0    0     0     0  000204  1  swi0: sio    
>    25 c602eadc    0     0     0  000204  1  irq18: atapci1   
>    24 c602ed08    0     0     0  000204  1  irq15: ata1   
>    23 c6074000    0     0     0  000204  1  irq14: ata0   
>    22 c607422c    0     0     0  000204  1  irq27: em1   
>    21 c6074458    0     0     0  000204  1  irq26: em0   
>    20 c6074684    0     0     0  000204  1  irq9: acpi0   
>    19 c60748b0    0     0     0  000204  1  swi2: cambio   
>    18 c6074adc    0     0     0  000204  1  swi6: task queue   
>     9 c5fd822c    0     0     0  000204  1  acpi_task2   - c6061e40
>     8 c5fd8458    0     0     0  000204  1  acpi_task1   - c6061e40
>     7 c5fd8684    0     0     0  000204  1  acpi_task0   - c6061e40
>    17 c5fd88b0    0     0     0  000204  1  swi6: Giant taskq   
>     6 c5fd8adc    0     0     0  000204  1  thread taskq   - c6062040
>    16 c5fd8d08    0     0     0  000204  1  swi5: Fast taskq   
>     5 c602e000    0     0     0  000204  1  kqueue taskq   - c6062100
>    15 c602e22c    0     0     0  000204  1  yarrow       - c06b1720
>     4 c602e458    0     0     0  000204  1  g_down       - c06b1fc0
>     3 c5fd3000    0     0     0  000204  1  g_up         - c06b1fbc
>     2 c5fd322c    0     0     0  000204  1  g_event      - c06b1fb4
>    14 c5fd3458    0     0     0  000204  1  swi3: vm     
>    13 c5fd3684    0     0     0  00020c  1  swi4: clock sio   
>    12 c5fd38b0    0     0     0  000204  1  swi1: net    
>    11 c5fd3adc    0     0     0  00020c  1  idle: cpu0   
>    10 c5fd3d08    0     0     0  00020c  1  idle: cpu1   
>     1 c5fd8000    0     0     1  004200  1  init         ufs c687cbb0
>     0 c06b20c0    0     0     0  000200  1  swapper      
> There is no member named p_pptr.

This is different because only vnode locks appear to be involved.  One
thread holding a vnode lock is getting stuck, and then all the other
threads pile up behind it.

I traced the list of locked vnodes and get stuck here:

> 0xc691f318: tag ufs, type VDIR
>     usecount 3, writecount 0, refcount 4 mountedhere 0
>     flags ()
>     v_object 0xc8a7b2e8 ref 0 pages 1
>      lock type ufs: EXCL (count 1) by t(kgdb) 

pid 519 wants to lock this vnode but some other thread is holding the
vnode lock.  Unfortunately we don't know who the lock holder is because
the message is truncated.

This might just be a vnode lock leak.  Build a debug kernel with the
DEBUG_VFS_LOCKS and DEBUG_LOCKS options and see if anything shows up.




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200601051845.k05IjGCI015543>