Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 6 Feb 1998 19:26:08 +0000 (GMT)
From:      Terry Lambert <tlambert@primenet.com>
To:        kato@migmatite.eps.nagoya-u.ac.jp (KATO Takenori)
Cc:        current@FreeBSD.ORG
Subject:   Re: unionfs clobbers a file
Message-ID:  <199802061926.MAA15103@usr01.primenet.com>
In-Reply-To: <19980206210958N.kato@gneiss.eps.nagoya-u.ac.jp> from "KATO Takenori" at Feb 6, 98 09:09:58 pm

next in thread | previous in thread | raw e-mail | index | archive | help
> Current major problem of unionfs is:
> 	Writing a file via unionfs sometimes clobbers the file.
> 
> When new file is created and modified on unionfs, a part of the file
> is filled by zero.  The size of zero-filled part is always multiple of
> 4096 bytes.  Easy way to reproduce the problem is:
> 
> 	# mount -t union /foo /usr/obj
> 	# cd /usr/src
> 	# make world
> 
> When you got signal 11 or other error, please see
> /usr/obj/usr/src/tmp/usr/bin/make and
> /usr/obj/usr/src/usr.bin/make/.depend.  One of them contains zero-
> filled field.
> 
> Do you have any idea to solve it?

4096 bytes is a page.

Pages are hung off the vnode, when the vnode pager is using the file
for backing store.

Because of the way vnodes stack, and the lack of a general mechanism
for obtaining the backing vnode for a given vnode at the top of a
stack, combined with the lack of general support for VOP_GETPAGES and
VOP_PUTPAGES in all local media FS implementations, and the vnpager
havking a lack of knowledge of whether a given FS is implemented on
local media, aliases are created.

If I have a vnode that is the local media vnode, and it has pages
in place, and I create an overlay vnode, and it has aliases for
those pages, then I can get into a situation where the overlay vnode
and the local media vnode have the same pages referenced as existing,
but only one has copies of the disk pages.

When this happens, and you reference the page from the wrong vnode,
you get a zero filled page instead, just as you would when extending a
file or accessing a page in a sparse file.

The easy fix is to modify the vnode pager to not know about where
the pages are located on the vnode.  This will have two consequences:

1)	You *must* support VOP_GETPAGES/VOP_PUTPAGES in local media
	filesystems for them to continue to work.  If you do this,
	then the "bypass" mechanism of the stacking vnode architecture
	will "do the right thing" for FS's that do not have these
	functions in their vnops structure, and the aliases will go
	away.

2)	Most stacking FS's will start to work, except where they've
	been modified, like the commits that have been threatened
	to the umapfs.


The easy fix is *WRONG*.  The unionfs will, in fact, still not work
(I think it won't; you can probably kludge it) because of VOP_LOCK
and VOP_ADVLOCK.  There are deadlocks and recursion panics.

The harder fix is to add a VOP_FINALVP to all local media filesystems.
Adding a VOP_FINALVP will allow an upper layer to get the backing
vnode for a VM object, not matter how buried by other stacks it becomes.
This fix will have three consequences:

1)	The vnode pager *must* be modified to call VOP_FINALVP to get
	the backing object on which it is going to operate, instead of
	using page aliases from random vnode in the stack.  If you
	do this, then the "bypass" mechanism of the stacking vnode
	architecture will "do the right thing" for FS's that do not
	have this function in their vnops structure, and the aliases
	will go away.

2)	The advisory locking will need to be hung off a pointer in
	the generic vnode, instead of off a pointer in the FS
	specific inode.  All advisory locks should be asserted in
	upper level code instead of in FS code, and should be veto
	based.  The upper level code will use the VOP_FINALVP to
	get the backing node(s) for the lock range.  The locks
	will then be associated with the data they are locking.

3)	Most stacking FS's will start to work, except where they've
	been modified, like the commits that have been threatened
	to the umapfs.

The unionfs, as a multiplexer, *will* work for VOP_ADVLOCK, since
it implements the bypass and no longer has to assert sub-locks on
per FS objects.

An FS which agregates multiple vp's into a single vp will still need
to maintain alias coherency.  This is a much smaller problem; the
upper level code will assert the VOP_ADVLOCK against the alias vp,
and the VOP_ADVLOCK, instead of being a null "non-veto" of the
assert, will have to do the assert into the lower layers.  This will
generally be a non-problem.  There are currently no FS's which do
this, at this time, and the places where it  *is* done are handled
as drivers (the vnconfig and ccd code), which is probably the
correct way to do it anyway.

The unionfs may still fail because of VOP_LOCK, depending on how
it is implemented this week.  If it's still using the lockmgr code,
it will definitely fail, because that code projects a three dimensional
geodesic into a two dimensional space.  I can explain how to fix
this, if you are interested.  Generally, allowing the lock to recurse
could make it run, but would leave a race condition in the case
where the projected image of the lock relationship could have come
from the shadow of more than one possible geodesic (make a triangle
out of straws and hold it upt to a projection screen until you only
see a line and you will approximate the problem).

I have, at various times, posted the code to implement the second
fix to the -current mailing list; the code should be in the archives
(the VOP_ADVLOCK/veto code will be listed under "NFS Client locking").


					Terry Lambert
					terry@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199802061926.MAA15103>