Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 12 Dec 2008 15:41:29 +0200
From:      Kostik Belousov <kostikbel@gmail.com>
To:        David Wolfskill <david@catwhisker.org>, hackers@freebsd.org, current@freebsd.org
Subject:   Re: NFS (& amd?) dysfunction descending a hierarchy
Message-ID:  <20081212134129.GD2038@deviant.kiev.zoral.com.ua>
In-Reply-To: <20081211225349.GB5597@albert.catwhisker.org>
References:  <20081203001538.GC96383@bunrab.catwhisker.org> <20081209190110.GW60731@albert.catwhisker.org> <Pine.GSO.4.63.0812101124430.24743@muncher.cs.uoguelph.ca> <20081210165022.GJ60731@albert.catwhisker.org> <20081210170620.GS2038@deviant.kiev.zoral.com.ua> <20081211225349.GB5597@albert.catwhisker.org>

next in thread | previous in thread | raw e-mail | index | archive | help

--ZPyQn7mfIBXOql8K
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Thu, Dec 11, 2008 at 02:53:49PM -0800, David Wolfskill wrote:
> On Wed, Dec 10, 2008 at 07:06:20PM +0200, Kostik Belousov wrote:
> > ...
> > > What concerns me is that even if the attempted unmount gets EBUSY, the
> > > user-level process descending the directory hierarchy is getting ENOE=
NT
> > > trying to issue fstatfs() against an open file descriptor.
> > >=20
> > > I'm having trouble figuring out any way that makes any sense.
> >=20
> > Basically, the problem is that NFS uses shared lookup, and this allows
> > for the bug where several negative namecache entries are created for
> > non-existent node. Then this node gets created, removing only the first
> > negative namecache entry. For some reasons, vnode is reclaimed; amd'
> > tasting of unmount is a good reason for vnode to be reclaimed.
> >=20
> > Now, you have existing path and a negative cache entry. This was
> > reported by Peter Holm first, I listed relevant revisions that
> > should fix this in previous mail.
>=20
> Well, I messed up the machine I had been using for testing, and needed
> to wait for IT to do something to it since I don't have physical or
> console access to it.
>=20
> So after I happened to demonstrate the effect using my desktop  -- which
> had been running RELENG_7_1, sources updated as of around 0400 hrs.
> US/Pacific -- I decided to go ahead and update the desktop to RELENG_7_1
> as of this morning (which had the commit to sys/kern/vfs_cache.c), then
> test.
>=20
> It still failed, apparently in the same way; details below.
>=20
> First, here's a list of the files that were changed:
>=20
> U lib/libarchive/archive_read_support_format_iso9660.c
> U lib/libarchive/archive_string.c
> U lib/libarchive/archive_string.h
> U lib/libc/gen/times.3
> U lib/libc/i386/sys/pipe.S
> U lib/libc/i386/sys/reboot.S
> U lib/libc/i386/sys/setlogin.S
> U lib/libutil/Makefile
> U lib/libutil/kinfo_getfile.c
> U lib/libutil/kinfo_getvmmap.c
> U lib/libutil/libutil.h
> U share/man/man4/bce.4
> U share/man/man5/Makefile
> U share/man/man5/fstab.5
> U share/man/man5/nullfs.5
> U sys/amd64/Makefile
> U sys/boot/forth/loader.conf.5
> U sys/dev/ale/if_ale.c
> U sys/dev/bce/if_bce.c
> U sys/dev/cxgb/cxgb_main.c
> U sys/dev/cxgb/common/cxgb_ael1002.c
> U sys/dev/cxgb/common/cxgb_t3_hw.c
> U sys/dev/cxgb/common/cxgb_xgmac.c
> U sys/dev/re/if_re.c
> U sys/fs/nullfs/null_vnops.c
> U sys/kern/Make.tags.inc
> U sys/kern/kern_descrip.c
> U sys/kern/kern_proc.c
> U sys/kern/vfs_cache.c
> U sys/netinet/in_pcb.h
> U sys/pci/if_rlreg.h
> U sys/sys/sysctl.h
> U sys/sys/user.h
> U sys/ufs/ufs/ufs_quota.c
> U usr.bin/procstat/Makefile
> U usr.bin/procstat/procstat_files.c
> U usr.bin/procstat/procstat_vm.c
> U usr.bin/tar/util.c
> U usr.bin/tar/test/Makefile
> U usr.bin/tar/test/test_strip_components.c
> U usr.bin/tar/test/test_symlink_dir.c
> U usr.bin/xargs/xargs.1
> U usr.sbin/mtree/mtree.c
>=20
> We see that sys/kern/vfs_cache.c is, indeed, among them.  And:
>=20
> dwolf-bsd(7.1-P)[5] grep '\$FreeBSD' /sys/kern/vfs_cache.c
> __FBSDID("$FreeBSD: src/sys/kern/vfs_cache.c,v 1.114.2.3 2008/12/09 16:20=
:58 kib Exp $");
> dwolf-bsd(7.1-P)[6]=20
>=20
> That should correspond to the desired version of the file.
>=20
> Here we see an excerpt from the ktrace output for the amd(8) process and
> its children; this is a point when amd(8) is trying an unmount() to see
> if it can get away with it:
>=20
>    977 amd      1229033597.269612 CALL  gettimeofday(0x807ad48,0)
>    977 amd      1229033597.269620 RET   gettimeofday 0
>    977 amd      1229033597.269630 CALL  sigprocmask(SIG_BLOCK,0xbfbfeaec,=
0xbfbfeadc)
>    977 amd      1229033597.269637 RET   sigprocmask 0
>    977 amd      1229033597.269645 CALL  fork
>    977 amd      1229033597.273810 RET   fork 1712/0x6b0
>   1712 amd      1229033597.273811 RET   fork 0
>    977 amd      1229033597.273836 CALL  sigprocmask(SIG_SETMASK,0xbfbfead=
c,0)
>   1712 amd      1229033597.273845 CALL  getpid
>    977 amd      1229033597.273850 RET   sigprocmask 0
>   1712 amd      1229033597.273855 RET   getpid 1712/0x6b0
>    977 amd      1229033597.273864 CALL  gettimeofday(0x807ad48,0)
>    977 amd      1229033597.273874 RET   gettimeofday 0
>   1712 amd      1229033597.273878 CALL  unmount(0x2832c610,<invalid>0)
> ...
>   1712 amd      1229033597.352643 RET   unmount -1 errno 16 Device busy
>   1712 amd      1229033597.352695 CALL  sigprocmask(SIG_BLOCK,0x28097c00,=
0xbfbfea0c)
>   1712 amd      1229033597.352728 RET   sigprocmask 0
>   1712 amd      1229033597.352751 CALL  sigprocmask(SIG_SETMASK,0x28097c1=
0,0)
>   1712 amd      1229033597.352769 RET   sigprocmask 0
>   1712 amd      1229033597.352781 CALL  sigprocmask(SIG_BLOCK,0x28097c00,=
0xbfbfe9dc)
>   1712 amd      1229033597.352790 RET   sigprocmask 0
>   1712 amd      1229033597.352801 CALL  sigprocmask(SIG_SETMASK,0x28097c1=
0,0)
>   1712 amd      1229033597.352805 RET   sigprocmask 0
>   1712 amd      1229033597.352815 CALL  exit(0x10)
>    977 amd      1229033597.353085 RET   select -1 errno 4 Interrupted sys=
tem call
>    977 amd      1229033597.353093 PSIG  SIGCHLD caught handler=3D0x805de5=
0 mask=3D0x0 code=3D0x0
>    977 amd      1229033597.353103 CALL  wait4(0xffffffff,0xbfbfe83c,WNOHA=
NG,0)
>    977 amd      1229033597.353116 RET   wait4 1712/0x6b0
>    977 amd      1229033597.353122 CALL  wait4(0xffffffff,0xbfbfe83c,WNOHA=
NG,0)
>    977 amd      1229033597.353127 RET   wait4 -1 errno 10 No child proces=
ses
>=20
>=20
> So amd(8) master process (pid 977) jorks off a child (pid 1712) to
> try an umount(), which it initiates at 1229033597.273878.  At
> 1229033597.352643 the child gets control back, as well as an EBUSY,
> which I would expect to mean that the attempt failed.  The child
> exits at 1229033597.352815 with a status code of 16.
>=20
> Armed with that, we look at a ktrace excerpt from "rm -fr":
>=20
>   1660 rm       1229033597.283277 CALL  rmdir(0x2822b388)
>   1660 rm       1229033597.283283 NAMI  "stvef-paks"
>   1660 rm       1229033597.285599 RET   rmdir 0
>   1660 rm       1229033597.285620 CALL  stat(0x2822b3e8,0xbfbfe8dc)
>   1660 rm       1229033597.285626 NAMI  "stvef-server"
>   1660 rm       1229033597.286071 STRU  struct stat {dev=3D83951372, ino=
=3D20124614, mode=3Ddrwxr-xr-x , nlink=3D3, uid=3D9874, gid=3D929, rdev=3D0=
, atime=3D1228844788, stime=3D1227555769, ctime=3D1228845828.326650000, bir=
thtime=3D0, size=3D4096, blksize=3D4096, blocks=3D8, flags=3D0x0 }
>   1660 rm       1229033597.286078 RET   stat 0
>   1660 rm       1229033597.286084 CALL  open(0x2822b3e8,O_NONBLOCK,<unuse=
d>0x1)
>   1660 rm       1229033597.286091 NAMI  "stvef-server"
>   1660 rm       1229033597.287145 RET   open 4
>   1660 rm       1229033597.287154 CALL  fstat(0x4,0xbfbfe8dc)
>   1660 rm       1229033597.287161 STRU  struct stat {dev=3D83951372, ino=
=3D20124614, mode=3Ddrwxr-xr-x , nlink=3D3, uid=3D9874, gid=3D929, rdev=3D0=
, atime=3D1228844788, stime=3D1227555769, ctime=3D1228845828.326650000, bir=
thtime=3D0, size=3D4096, blksize=3D4096, blocks=3D8, flags=3D0x0 }
>   1660 rm       1229033597.287166 RET   fstat 0
>   1660 rm       1229033597.287171 CALL  fcntl(0x4,F_SETFD,FD_CLOEXEC)
>   1660 rm       1229033597.287177 RET   fcntl 0
>   1660 rm       1229033597.287187 CALL  fstatfs(0x4,0xbfbfe704)
>   1660 rm       1229033597.287195 RET   fstatfs -1 errno 2 No such file o=
r directory
>   1660 rm       1229033597.287202 CALL  close(0x4)
>   1660 rm       1229033597.287211 RET   close 0
>=20
> [Sorry for the long lines....]
>=20
> Here we see that the "rm" process (pid 1660) removed a directory
> named stvef-paks sucessfully in the interval between 1229033597.283277
> (when the request was made) and 1229033597.285599 (when the 0 return
> occurred).  The "rm" process proceeds to process a directory named
> stvef-server:
>=20
> * At 1229033597.285620 it issues a stat(); the successful return
>   is at 1229033597.286078.
>=20
> * At 1229033597.286084 it issues an open(); the successful return
>   is at 1229033597.287145.  The FD is 4.
>=20
> * At 1229033597.287154 it issues an fstat() against FD 4; the successful
>   return is at 1229033597.287166.
>=20
> * At 1229033597.287171 it issues an fcntl() against FD 4; the
>   successful return is at 1229033597.287177.
>=20
> * At 1229033597.287187 it issues an fstatfs() against FD 4; the
>   unsuccessful return is at 1229033597.287195, claiming ENOENT.
>=20
> Say WHAT??!?
>=20
> I expect to be able to test a bit more promptly now.

But is this error transient or permanent ? I.e., would restart of rm
successful or failing ?

Anyway, this error looks different too.

--ZPyQn7mfIBXOql8K
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (FreeBSD)

iEYEARECAAYFAklCagkACgkQC3+MBN1Mb4jOBgCeJtuwzLJH5kX0gUBlu2VqG/yD
lzAAoK6FmV53wAKwPlQDbf0Ryd4LXsbZ
=pDvh
-----END PGP SIGNATURE-----

--ZPyQn7mfIBXOql8K--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20081212134129.GD2038>