Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 18 Dec 1998 12:53:27 -0500 (EST)
From:      Erez Zadok <ezk@cs.columbia.edu>
To:        freebsd-fs@FreeBSD.ORG
Subject:   nullfs bugs
Message-ID:  <199812181753.MAA05461@shekel.mcl.cs.columbia.edu>

next in thread | raw e-mail | index | archive | help
Hello all.  As this message is my first on this list, it unfortunately has
to be long.  My apologies in advance.  Before I go into details, I'll give a
quick overview.

* Brief overview:

My research involves stackable file systems.  I've written several stackable
file systems for a few unix platforms (freebsd, linux, and solaris).  I
fixed nullfs in freebsd 3.0, but the fixes are only workarounds to more
serious bugs.  I'm seeking help from this list in finding the real bugs in
freebsd and solving them correctly, to eventually include in an official
freebsd distribution.


Now on to the details.

* Introduction

My name is Erez Zadok, and I'm a PhD student at Columbia University,
studying Comp. Sci.  You may have heard my name as the maintainer of
am-utils (aka amd.)  I've worked with file systems for 9 years now.  I've
worked with freebsd kernels for 3+ years, but have only recently joined
freebsd-{fs,announce}.

My research involves generating stackable file systems out of a higher level
description language.  One key component is a template file system I call
wrapfs (wrapper file system).  Wrapfs includes hooks users can use to modify
file data, names, and their attributes.  Wrapfs is similar to lofs/nullfs,
but it also copies data/pages/names between the upper and lower layers,
includes hooks for a code generator, and more.

I started writing Wrapfs in Solaris 2.x, based on their lofs.  Then I moved
on to Linux 2.0 using a reference implementation of an lofs someone had
written.  After that I ported wrapfs to freebsd 3.0 using nullfs as a
starting point, and finally ported wrapfs to Linux 2.1.  Once I had wrapfs
for each platform, I wrote actual file systems using it.  I wrote a simple
encryption f/s called rot13fs, and then a stronger one called cryptfs (using
Blowfish.)  I wrote a few of other file systems based on wrapfs, all of
which are described in a few papers I've written and the sources I've
released (see below for URLs).

* nullfs for FreeBSD 3.0

When I started with nullfs on freebsd 3.0 (the May 98 snapshot) I found out
that it was not a complete file system.  Some VFS operations were left
unimplemented, most notably the MMAP ones.  I could mount nullfs, but trying
to do any MMAP operation (such as executing a binary), and the kernel
panics.

So I added the missing functionality to a point where you could do all
operations.  As a test I usually configure and build am-utils inside the new
f/s (those who've built am-utils know it has a rather lengthy configure and
build process, which makes it a good file system exerciser.)

** Bugs in Nullfs

I fixed two major bugs in nullfs:

(1) Asynchronous writes:

The vanilla nullfs has a serious bug where if you write a large file (3MB or
more) through it, several pages of the file are written as zeros to the
lower f/s.  I tried various machines running freebsd 3.0, and different
disks and CPU speeds.  In all cases I got the same data corruption.

The best "fix" I could find was to force the underlying write to happen
synchronously:

	error = VOP_WRITE(lower_vp, &temp_uio, (ioflag | IO_SYNC), cr);

That solved the problem, but obviously it hurts write performance since now
all writes through nullfs have to be done synchronously, even for writing
one byte.

My best guess for the reason for this bug is that there might be a race
condition b/t the file system and the buffer cache or even the MMU, and that
some sort of locking/synchronization is needed to avoid the race.

I'm familiar with the f/s code in freebsd, and have become very familiar
with the vfs/fs code in linux and solaris --- enough to know that this
freebsd bug is likely not the fault of my code.  Alas, there are vast areas
of the rest of the kernel I'm not familiar with.  I want to fix the bug
correctly if possible, and allow nullfs to write asynchronously, but I'm not
sure where to look at.

If anyone has any ideas how to go about finding and fixing the bug, I'll be
happy to work w/ them to fix the problem and eventually submit it for
inclusion in a future freebsd release.


(2) Getpages/Putpages:

The second bug is even stranger.  Initially, I had the implementation of
getpages and putpages call the same VOP on lowervp, with newly allocated
pages.  But then under heavy loads I got obscure problems that seem to come
from deep inside UFS.  It sometimes will return from ffs_getpages() (in
ufs_readwrite.c) with an invalid page, or one that's marked as deadc0de.  I
tried to make sense of that ufs/ffs code, and I think that somewhere either
nullfs or the higher level vfs aren't locking or synchronizing something
they should be.

I "fixed" the problem with getpages, by implementing it using read(), so now
it works reliably, but with a suboptimal data access interface.

Having implemented getpages() using read() forced me to implement
writepages() using write(), b/c otherwise the getpages and putpages didn't
seem to work well together (possibly b/c of interaction b/t [buffer] caches,
MMU, etc.)  But recall that in order to solve bug #1, I made write()
synchronous.  So now all putpages() have become synchronous as well.

Like I said before, these fixes of mine are but workarounds.  Some might
consider them hacks.  But they do make nullfs fully functional at least.  If
anyone has any idea how to fix this MMAP related bug, please let me know.

Frankly, I have a feeling that the two bugs I'm reporting here may be
related, and that fixing bug #1 would be easier and may impact the solution
to bug #2.

* URLs

Here's some info to those who want to read more about the subject.

Stackable f/s software for freebsd, solaris, and linux:

	http://www.cs.columbia.edu/~ezk/research/software/

Papers I've written about some of the f/s in the s/w page:

	http://www.cs.columbia.edu/~ezk/research/wip.html

Thanks,
Erez Zadok.
---
Columbia University Department of Computer Science.
EMail: ezk@cs.columbia.edu           Web: http://www.cs.columbia.edu/~ezk

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199812181753.MAA05461>