Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 30 Nov 2002 11:15:32 -0800
From:      Kirk McKusick <mckusick@beastie.mckusick.com>
To:        Sean Kelly <smkelly@zombie.org>
Cc:        current@FreeBSD.ORG
Subject:   Re: UFS Snapshot deadlock 
Message-ID:  <200211301915.gAUJFW59081565@beastie.mckusick.com>
In-Reply-To: Your message of "Wed, 30 Oct 2002 03:57:52 CST." <20021030095752.GA1868@edgemaster.zombie.org> 

next in thread | previous in thread | raw e-mail | index | archive | help
Your deadlock should now be fixed.

	Kirk McKusick

=-=-=-=-=

From: Kirk McKusick <mckusick@FreeBSD.org>
Date: Fri, 29 Nov 2002 23:27:12 -0800 (PST)
To: cvs-committers@FreeBSD.org, cvs-all@FreeBSD.org
Subject: cvs commit: src/sys/ufs/ffs ffs_snapshot.c
X-FreeBSD-CVS-Branch: HEAD

mckusick    2002/11/29 23:27:12 PST

  Modified files:
    sys/ufs/ffs          ffs_snapshot.c 
  Log:
  Fix two deadlocks in snapshots:
  
  1) Release the snapshot file lock while suspending the system. Otherwise
     a process trying to read the lock may block on its containing directory
     preventing the suspension from completing. Thanks to Sean Kelly
     <smkelly@zombie.org> for finding this deadlock.
  
  2) Replace some bdwrite's with bawrite's so as not to fill all the
     buffers with dirty data. The buffers could not be cleaned as the
     snapshot vnode was locked hence the system could deadlock when
     making snapshots of really massive filesystems. Thanks to
     Hidetoshi Shimokawa <simokawa@sat.t.u-tokyo.ac.jp> for figuring
     this out.
  
  Sponsored by:   DARPA & NAI Labs.
  
  Revision  Changes    Path
  1.51      +7 -2      src/sys/ufs/ffs/ffs_snapshot.c

=-=-=-=-=-=

Date: Wed, 30 Oct 2002 03:57:52 -0600
From: Sean Kelly <smkelly@zombie.org>
To: current@FreeBSD.ORG
Subject: UFS Snapshot deadlock

While playing with UFS snapshots on a UFS2 filesystem I mounted
specifically for this purpose, I encountered a little problem. It seems I
have processes deadlocked on each other.

Steps to repeat:
/# mount /dev/ad2a /mnt ; cd /mnt
/dev/ad2a on /mnt (ufs, local, soft-updates, multilabel) # UFS2
/mnt# cd /mnt; mount -u -o snapshot /mnt/snapshot /mnt

*switch vtys*

/# cd /mnt; ls -l
*ls deadlocks*
*I get bored and ^C the mount on the other vty about 30 minutes later*
/mnt# ls 
*this ls deadlocks too*

For the record, /mnt was a new filesystem. It had *nothing* in it. No
directories or anything.

So now, I've got these:
  UID   PID  PPID CPU PRI NI   VSZ  RSS MWCHAN STAT  TT       TIME COMMAND
    0  1133   669   0  -4  0   692  548 ufs    D+    v1    0:00.00 ls
 1001   939   856   0  -4  0   696  560 ufs    D+    v2    0:00.00 ls -l
    0   937     1   0  -4  0   560  336 ufs    D     v1    0:00.65 mount -u -o snapshot /mnt/snapshot /mnt


Now for some numbers.

db> trace 937
mi_switch(c71aab60,50,c03375c6,c7,c03ad2f8) at mi_switch+0x158
msleep(c75098dc,c03a9358,50,c034f732,0) at msleep+0x3b4
acquire(c75098dc,1000040,600,e6,3a9) at acquire+0xa7
lockmgr(c75098dc,1010002,c7509818,c71aab60,e5b076a8) at lockmgr+0x2f7
vop_stdlock(e5b076c4,e5b076e0,c021e306,e5b076c4,0) at vop_stdlock+0x2c
ufs_vnoperate(e5b076c4,0,c033dd28,e5b076e0,c01ba4a5) at ufs_vnoperate+0x18
vn_lock(c7509818,10002,c71aab60,815,c7509818) at vn_lock+0xd6
vget(c7509818,2,c71aab60,470,0) at vget+0xd6
ffs_sync(c74c5400,1,c726a780,c71aab60,c74f1000) at ffs_sync+0x126
vfs_write_suspend(c74c5400,c74ffcb8,d351f08c,1,c2c06e80) at vfs_write_suspend+0x70
ffs_snapshot(c74c5400,bfbffd1d,70,c033990d,252) at ffs_snapshot+0xa48
ffs_mount(c74c5400,c745ce80,bfbff000,e5b07bf0,c71aab60) at ffs_mount+0x548
vfs_mount(c71aab60,c6d2b780,c745ce80,1010000,bfbff000) at vfs_mount+0x85e
mount(c71aab60,e5b07d14,c03590ba,409,4) at mount+0xb8
syscall(2f,2f,2f,bfbfeffc,bfbff9f4) at syscall+0x22e
Xint0x80_syscall() at Xint0x80_syscall+0x1d

db> trace 939
mi_switch(c74260d0,50,c03375c6,c7,1cc) at mi_switch+0x158
msleep(c74ffd7c,c03a9688,50,c034f732,0) at msleep+0x3b4
acquire(c74ffd7c,1000040,600,e6,3ab) at acquire+0xa7
lockmgr(c74ffd7c,1010002,c74ffcb8,c74260d0,e5bfd83c) at lockmgr+0x2f7
vop_stdlock(e5bfd858,e5bfd874,c021e306,e5bfd858,246) at vop_stdlock+0x2c
ufs_vnoperate(e5bfd858,246,0,c74f1000,0) at ufs_vnoperate+0x18
vn_lock(c74ffcb8,10002,c74260d0,7f,3) at vn_lock+0xd6
vget(c74ffcb8,10002,c74260d0,7f,c74260d0) at vget+0xd6
ufs_ihashget(c74cce00,3,2,e5bfd98c,e5bfd8f0) at ufs_ihashget+0xd2
ffs_vget(c74c5400,3,2,e5bfd98c,e5bfd994) at ffs_vget+0x44
ufs_lookup(e5bfdac0,e5bfdafc,c0207a24,e5bfdac0,e5bfdc3c) at ufs_lookup+0xdae
ufs_vnoperate(e5bfdac0,e5bfdc3c,e5bfdc50,3ab,c74260d0) at ufs_vnoperate+0x18
vfs_cache_lookup(e5bfdb70,e5bfdb9c,c020bd39,e5bfdb70,c7509818) at vfs_cache_lookup+0x2e4
ufs_vnoperate(e5bfdb70,c7509818,e5bfdc50,e5bfdb5c,c74260d0) at ufs_vnoperate+0x18
lookup(e5bfdc28,0,c033d6ad,a4,c74260d0) at lookup+0x309
namei(e5bfdc28,c03ade38,c03ade10,c03b42a0,0) at namei+0x1e0
lstat(c74260d0,e5bfdd14,c03590ba,409,2) at lstat+0x52
syscall(2f,2f,2f,80d3200,80d1040) at syscall+0x22e
Xint0x80_syscall() at Xint0x80_syscall+0x1d
--- syscall (190, FreeBSD ELF32, lstat), eip = 0x805838b, esp = 0xbfbff3dc, ebp = 0xbfbff468 ---

db> trace 1133
mi_switch(c6d31680,50,c03375c6,c7,2) at mi_switch+0x158
msleep(c75098dc,c03a9358,50,c034f732,0) at msleep+0x3b4
acquire(c75098dc,1000040,600,e6,46d) at acquire+0xa7
lockmgr(c75098dc,1030002,c7509818,c6d31680,e3887ad0) at lockmgr+0x2f7
vop_stdlock(e3887aec,e3887b08,c021e306,e3887aec,0) at vop_stdlock+0x2c
ufs_vnoperate(e3887aec,0,c033e1ac,360,c01e3af0) at ufs_vnoperate+0x18
vn_lock(c7509818,20002,c6d31680,e3887b5c,c6d31680) at vn_lock+0xd6
lookup(e3887c28,0,c033d6ad,a4,c6d31680) at lookup+0x8e
namei(e3887c28,c03ade38,c03ade10,c03b42a0,0) at namei+0x1e0
stat(c6d31680,e3887d14,c03590ba,409,2) at stat+0x52
syscall(2f,2f,2f,80d3080,80d1000) at syscall+0x22e
Xint0x80_syscall() at Xint0x80_syscall+0x1d
--- syscall (188, FreeBSD ELF32, stat), eip = 0x80583b3, esp = 0xbfbff4dc, ebp = 0xbfbff568 ---

db> x/x 0xc74ffd7c, 20
0xc74ffd7c:     c03a9688        1200440         0               1
0xc74ffd8c:     500001          c034f732        6               3a9
0xc74ffd9c:     c74ffd7c        c6be9500        c74c5400        0
0xc74ffdac:     0               c74ffdac        968             c74ffcb8
0xc74ffdbc:     0               0               1               0
0xc74ffdcc:     0               0               0               0
0xc74ffddc:     ffffffff        c0370e80        c033dd98        c033dd98
0xc74ffdec:     30000           c74cb734        c7508010        c03acfd8

db> x/x 0xc75098dc, 10
0xc75098dc:     c03a9358        1200440         0               3
0xc75098ec:     500001          c034f732        6               3ab
0xc75098fc:     c75098dc        c6be9500        c74c5400        0
0xc750990c:     0               c750990c        93c             c7509818

(gdb) list *( ufs_lookup+0xdae)
0xc02bd86e is in ufs_lookup (/usr/src/sys/ufs/ufs/ufs_lookup.c:602).
597             } else if (dp->i_number == dp->i_ino) {
598                     VREF(vdp);      /* we want ourself, ie "." */
599                     *vpp = vdp;
600             } else {
601                     error = VFS_VGET(pdp->v_mount, dp->i_ino, LK_EXCLUSIVE, &tdp);
602                     if (error)
603                             return (error);
604                     if (!lockparent || !(flags & ISLASTCN)) {
605                             VOP_UNLOCK(pdp, 0, td);
606                             cnp->cn_flags |= PDIRUNLOCK;

-- 
Sean Kelly         | PGP KeyID: 77042C7B
smkelly@zombie.org | http://www.zombie.org

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-current" in the body of the message

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-current" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200211301915.gAUJFW59081565>