Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 18 Dec 1998 20:19:42 -0800
From:      Don Lewis <Don.Lewis@tsc.tdk.com>
To:        Terry Lambert <tlambert@primenet.com>, ezk@cs.columbia.edu (Erez Zadok)
Cc:        freebsd-fs@FreeBSD.ORG
Subject:   Re: nullfs bugs
Message-ID:  <199812190419.UAA11408@salsa.gv.tsc.tdk.com>
In-Reply-To: Terry Lambert <tlambert@primenet.com> "Re: nullfs bugs" (Dec 18,  9:41pm)

next in thread | previous in thread | raw e-mail | index | archive | help
On Dec 18,  9:41pm, Terry Lambert wrote:
} Subject: Re: nullfs bugs

} Right now in FreeBSD, a vnode is treated as a backing object, and a
} backing object is a mapping.
} 
} This is a consequence of a unified VM and buffer cache.

} > I "fixed" the problem with getpages, by implementing it using read(), so now
} > it works reliably, but with a suboptimal data access interface.
} > 
} > Having implemented getpages() using read() forced me to implement
} > writepages() using write(), b/c otherwise the getpages and putpages didn't
} > seem to work well together (possibly b/c of interaction b/t [buffer] caches,
} > MMU, etc.)  But recall that in order to solve bug #1, I made write()
} > synchronous.  So now all putpages() have become synchronous as well.
} > 
} > Like I said before, these fixes of mine are but workarounds.  Some might
} > consider them hacks.  But they do make nullfs fully functional at least.  If
} > anyone has any idea how to fix this MMAP related bug, please let me know.
} 
} These fixes will actually only work for a stack that is exactly one
} layer deep.  This is because the lower_vp is the object off of which
} the pages are actually hung.
} 
} If you were to use this on a nullfs on top of a nullfs, then you
} would probably see some errors (unless you implemented read in
} terms of VOP_GETPAGES).
} 
} The reason for this is that your read is creating a copy of the data
} that is hung off the lower_vp, and then returning it to a user buffer.

I did something similar when I was hacking nullfs to somewhat work
in a private version of 2.1.x.  It worked to some extent, but had
cache coherence problems.

} The problem here is that the top layer is going to issue a similar
} read to the middle layer, and it's going to fail because there is
} no backing object in the middle layer (only in the bottom layer).
} 
} This can be brute-forced to work (I believe Tor Egge is the one who
} did this at one time?) by instancing a backing object in the intermediate
} layers.

Eivind has some patches that work something like this.


} The general soloution to this, which has been discussed by John
} Heidemann, John Dyson, Michael Hancock, Eivind Ecklund, Kirk McKusick,
} and myself at various times in the past is to get rid of the aliases.
} 
} 
} The only way to effectively do that is to provide a mechanism for
} an upper layer to ask for the vp of the backing object that's
} actually backing the vm, instead of the top level object.  The
} main one that has been discussed is called VOP_GETFINALVP, or, more
} correctly, VOP_GETBACKINGVP.

I implemented one of these a while back (though I don't even recall
which name I used).  The problem I ran into was that there are
a number of references to vp->v_object scattered about.  Eivind's
patches fix those by turning them into a VOP_ (I would have used
a function call that called VOP_GETwhateverVP).

I had some time to read a little more of Heidemann's paper while
I was travelling a few weeks ago, and it appears that Heidemann
took a somewhat different approach in his SunOS implementation.
It looks like he also passes the backing vp into the VOP calls
that need to access the backing object.  See Appendix B of his paper
<ftp://ftp.cs.ucla.edu/tech-report/95-reports/950032.2.ps.Z>.
I haven't had time to look at how this would fit into the FreeBSD
implementation.

} I'm going to be intentioanlly incommunicado for a while, as I'm going
} on vacation, but I'll probably break down and read my email once
} or twice, so if you have something needing immediate clarification,
} you can send me email, but I may not respond before the first of the
} year.
} 
} Other people to contact who appear to be actively interested in
} solving these issues are Eivind Ecklund and Michael Hancock, so
} they may be good bets as well.

You can add my name to the list as well.  I need at least a somewhat
working nullfs for certain applications.  I'll be away from my email
until the 4th, and then it will take me a few days to dig through the
backlog.

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199812190419.UAA11408>