Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 22 Oct 2011 00:08:54 +0400
From:      Subbsd <subbsd@gmail.com>
To:        freebsd-stable@freebsd.org
Cc:        freebsd-fs@freebsd.org
Subject:   VFS problem with ?fcntl SETLK? and nullfs
Message-ID:  <CAFt_eMqJVuzjzcAf_4Hdxhu2cLqPTY%2Bww==GuVH1AE7Obs2S6Q@mail.gmail.com>

next in thread | raw e-mail | index | archive | help
Hi

I found a bad issue in FreeBSD mounts nullfs file system, which may
appear in the random.
Initially, I get problems on FreeBSD-current on the host that have a
large number JAIL at the time when they start. Handbook scenario:

1) have readonly base   (for example /usr/jails/base)
2) have write area for jail personal data (for example:
/usr/jails/j1data/{home,var,local,...})
3) mount  RO base to new jail location, then mount RW part data above RO

In some cases, i watched the freeze of the system when working nullfs
mount, but could not find a reason.

On a test environment I have tried to simulate mount_nullfs with
different types of actions by the source directory:

- through dd(1) to make an huge oveload by read - does not affect
- through dd(1) to make an huge overload by write - does not affect
- through script to delete, create random-files in large numbers -
does not affect

but now I can easily with a 100% guarantee show the problem - it is
easily obtained by working with "svn cleanup" action.
For example on the directory /usr/src obtained from SVN. If start in /usr/src
svn cleanup
and at the same time try to mount_nullfs the problem appears.
As far as I can see, cleanup makes frequent lock files. It seems to
me, who some of the lock is simply not true and is inherited by a
deadlock.

I wrote sample scripts simulating the problem. I did a rotation
mount-ro + mount-rw specifically - is the repetition of the way
described in the handbook section of jail.
Since the problem can appear in random moment, I made an infinite
loop. But I am getting the problem is usually the first-pass. Here is
it:

-------/cut/-----
#!/bin/sh
SRCROOT="/usr/src"
DSTROOT="/usr/nullfstest"
ITER=`seq 100`
MOUNTO=`find ${SRCROOT} -type d -maxdepth 1 -exec basename {} \;`

[ -d "${DSTROOT}" ] || mkdir $DSTROOT

mount_subdir()
{
for mto in ${MOUNTO}; do
    if [ -d "${1}/$mto" ]; then
    mount -orw -t nullfs /bin ${1}/${mto}
    fi
done
}

cd ${SRCROOT}

while [ 1 ]; do
 echo "Mount phase"
 lockf -s -t0 /tmp/svn.lock svn cleanup &

 for iter in $ITER; do
   DST="${DSTROOT}/${iter}"
   [ -d "${DST}" ] || mkdir ${DST}
   mount -oro -t nullfs ${SRCROOT} ${DST}
   mount_subdir ${DST}
 done

echo "Unmount phase"
mount -t nullfs |awk {'printf "umount -f "$3"\n"'} |sh
done
-------/end of cut/-----

Last syscall I can see this svn cleanup is:
fcntl(3,F_SETLK,0x7fffffffc9b0)

where 3 - fd of some \.svn/file.

looks like in action this way - the system (kernel) works. but if the
process or your session will affect an action in the source directory
(in this example - /usr/src), for example:

cd /usr/src
fstat /usr/src/*
ls /usr/src/

- Get filesystem deadlock. In addition, the system in this state does
not reboot without help - system do not return from free buffer to
storage stage.

in FreeBSD 9.0 RC1 bug exists.
PS: An important detail - I could not get the problem on FreeBSD
running under a virtual machine (VirtualBox) - maybe due to the tick /
hz.kern issue?
PS2: what file system - does not matter. I get the problem on ZFS as
well as for UFS

Please check this informatio. it seems that this is serious

Thanks.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAFt_eMqJVuzjzcAf_4Hdxhu2cLqPTY%2Bww==GuVH1AE7Obs2S6Q>