From owner-freebsd-arch  Mon Apr 10 19:25: 3 2000
Delivered-To: freebsd-arch@freebsd.org
Received: from ns1.yes.no (ns1.yes.no [195.204.136.10])
	by hub.freebsd.org (Postfix) with ESMTP id 57E4D37B52C
	for <freebsd-arch@freebsd.org>; Mon, 10 Apr 2000 19:24:51 -0700 (PDT)
	(envelope-from eivind@bitbox.follo.net)
Received: from bitbox.follo.net (bitbox.follo.net [195.204.143.218])
	by ns1.yes.no (8.9.3/8.9.3) with ESMTP id EAA06481
	for <freebsd-arch@freebsd.org>; Tue, 11 Apr 2000 04:24:51 +0200 (CEST)
Received: (from eivind@localhost)
	by bitbox.follo.net (8.8.8/8.8.6) id EAA06674
	for freebsd-arch@freebsd.org; Tue, 11 Apr 2000 04:24:48 +0200 (CEST)
Received: from smtp01.primenet.com (smtp01.primenet.com [206.165.6.131])
	by hub.freebsd.org (Postfix) with ESMTP id 362CD37B953
	for <arch@FreeBSD.ORG>; Mon, 10 Apr 2000 19:24:33 -0700 (PDT)
	(envelope-from tlambert@usr09.primenet.com)
Received: (from daemon@localhost)
	by smtp01.primenet.com (8.9.3/8.9.3) id TAA07939;
	Mon, 10 Apr 2000 19:24:20 -0700 (MST)
Received: from usr09.primenet.com(206.165.6.209)
 via SMTP by smtp01.primenet.com, id smtpdAAA7paqip; Mon Apr 10 19:23:49 2000
Received: (from tlambert@localhost)
	by usr09.primenet.com (8.8.5/8.8.5) id TAA03058;
	Mon, 10 Apr 2000 19:23:53 -0700 (MST)
From: Terry Lambert <tlambert@primenet.com>
Message-Id: <200004110223.TAA03058@usr09.primenet.com>
Subject: Re: BUF/BIO roadmap.
To: phk@critter.freebsd.dk (Poul-Henning Kamp)
Date: Tue, 11 Apr 2000 02:23:52 +0000 (GMT)
Cc: julian@elischer.org (Julian Elischer), arch@freebsd.org
In-Reply-To: <25105.955393010@critter.freebsd.dk> from "Poul-Henning Kamp" at Apr 10, 2000 08:56:50 PM
X-Mailer: ELM [version 2.5 PL2]
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

> >The Other problem I faced was the possibility that
> >when a low level device was open, the user might re-write the
> >structures that defined some upper layer devices. My solution for 
> >that was that on the close() of the lower level device, all 
> >the upper level devices were asked to verify that they were 
> >still valid.
> 
> It is certainly a sticky issue no matter how you tackle it, and I
> am more than a little bit tempted to apply root-inteligence rather
> than code complexity to this problem.  A re-probe on close is
> probably the only sane, simplest and most POLA preserving action
> available.

My suggestion to Julian at the time, which he didn't like, but
which I did, and which I thought was significantly more elegant,
was to move the slice management into the kernel.

The kernel has to have some idea of these structures, and it is
the kernel that understands the hierarchical relationships in a
nested stack.

Because the kernel has to be able to read this to identify devices,
it is a simple matter to make it write the structures as well.

That is, if I want an DOS partition table or DOS extended partition
table, or a BSD disklabel, or an SVR4 VTOC, then I issue an ioctl()
to do the work.

The interface for this could be normalized, so that it could
query available partition management schemes (even loading KLDs
on demand to support new ones, if necessary).  The ioctl() would
have the same interface.  This would let you finally have a
single "fdisk" program that could grok all of the partitioning
management schemes supported by your system, and present them
rationally and uniformly, in one interface, without needing to
link the disklabel code with the fdisk code with the ... code.


This allows the kernel to intermediate the corner cases, which a
user space write or hierarchical locking can not successfully
allow, for enforcement reasons.

Consider that I may have unallocated disk space, and I may have
a partitition management scheme with a free slot available; why
should I have to take the entire stack down, in order to write a
new DOS partitition table entry that has no negative impact on
the functioning of the system?


I view this as an issue for Vinum plexes, as well as automatic
allocation of PP's in something like an IBM JFS (indeed, if you
were to deal with the cylinder group fill issue, you could well
imagine that you could have an agregation device that could
handle FFS auto-growth, 4M of disk at a time, from a common
pool shared between several filesystems).

Given a node-locked hierarchy, as Julian has put forward, I
really can't allocate more disk space, unless I can write the
table for the agregation device, and I can't do that if the
agregation device is in use (e.g., the single most useful time
for me to want to do the job).


> >Justin Gibbs suggested that this call should also allow the driver 
> >to know WHO is making hte call, 
> 
> Yeah, but unfortunately we loose that information long before we
> get to that point, from memory I belive it was VOP_OPEN which discards
> all but the credentials.  The dup(2) problem in other words.

I think Julian was thinking of the credentials, not the proc struct.

The specfs glue code really needs to be shot.  It's not really
rational to keep "struct fileops", going forward.  There was
never sufficient integration of the VFS code, back at the time
that the VFS was brought in; the specfs and socket warts are cases
in point.  This would satisfy both of your criteria, I think, as
well as just finally cleaning out from under that particular rug.


					Terry Lambert
					terry@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message