From owner-freebsd-geom@FreeBSD.ORG  Tue Feb  6 19:16:56 2007
Return-Path: <owner-freebsd-geom@FreeBSD.ORG>
X-Original-To: geom@FreeBSD.org
Delivered-To: freebsd-geom@FreeBSD.ORG
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id 4B00916A407
	for <geom@FreeBSD.org>; Tue,  6 Feb 2007 19:16:56 +0000 (UTC)
	(envelope-from xcllnt@mac.com)
Received: from smtpout.mac.com (smtpout.mac.com [17.250.248.175])
	by mx1.freebsd.org (Postfix) with ESMTP id 2CAFF13C471
	for <geom@FreeBSD.org>; Tue,  6 Feb 2007 19:16:56 +0000 (UTC)
	(envelope-from xcllnt@mac.com)
Received: from mac.com (smtpin08-en2 [10.13.10.153])
	by smtpout.mac.com (Xserve/8.12.11/smtpout05/MantshX 4.0) with ESMTP id
	l16JGtp2019802; Tue, 6 Feb 2007 11:16:55 -0800 (PST)
Received: from [172.24.104.147] (natint3.juniper.net [66.129.224.36])
	(authenticated bits=0)
	by mac.com (Xserve/smtpin08/MantshX 4.0) with ESMTP id l16JGXYv016459
	(version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=NO);
	Tue, 6 Feb 2007 11:16:50 -0800 (PST)
In-Reply-To: <94029.1170784819@critter.freebsd.dk>
References: <94029.1170784819@critter.freebsd.dk>
Mime-Version: 1.0 (Apple Message framework v752.3)
Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed
Message-Id: <F2D457F7-46C3-4A81-9147-7F6FA6E01374@mac.com>
Content-Transfer-Encoding: 7bit
From: Marcel Moolenaar <xcllnt@mac.com>
Date: Tue, 6 Feb 2007 11:15:06 -0800
To: Poul-Henning Kamp <phk@phk.freebsd.dk>
X-Mailer: Apple Mail (2.752.3)
X-Brightmail-Tracker: AAAAAA==
X-Brightmail-scanned: yes
Cc: geom@FreeBSD.org
Subject: Re: New g_part class
X-BeenThere: freebsd-geom@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: GEOM-specific discussions and implementations
	<freebsd-geom.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-geom>,
	<mailto:freebsd-geom-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-geom>
List-Post: <mailto:freebsd-geom@freebsd.org>
List-Help: <mailto:freebsd-geom-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-geom>,
	<mailto:freebsd-geom-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 06 Feb 2007 19:16:56 -0000


On Feb 6, 2007, at 10:00 AM, Poul-Henning Kamp wrote:

> In message <835A2C66-BBEB-4A19-B6A3-A60E17572604@mac.com>, Marcel  
> Moolenaar wri
> tes:
>
>> Editing is of course still done in user space. The application
>> is responsible for providing the right values to the kernel and
>> the kernel will simply fail the operation if something is not
>> correct.
>>
>> The reason why the kernel supports the application with these
>> verbs is simple: the kernel needs to be involved because the
>> application cannot write to disk directly in all cases.
>
> Right, but the current scheme handles this by asking the kernel to
> write the finished metadata image, which the kernels taste
> functionality can be used to validate and parse.

This is in effect a replacement oriented approach that is
based on retasting. One cannot change the partition table
by adding a new partition when an existing partition is
already mounted without circumventing permissions and other
checks, right?

What if I want to replace a MBR with a GPT without actually
changing the meta-data? This doesn't work when each partition
scheme has its own image-oriented verbs, but it is supported
by g_part.

With g_part the least important and most discriminating aspect
of partitioning is abstracted: the on-disk format for storing
the meta-data. This, I believe, is the right approach.

> That way, the kernel doesn't have very rarely used code for
> editing the metadata, only the necessary code to parse and
> configure based on the on-disk metadata.

The ctlreq functions will indeed be rarely used. However,
the ctlreq functions don't actually have to be present in
the kernel to make g_part functional for dealing with
partitions. If space is a concern, then it should be possible
to put the ctlreq functions in a separate module. I don't
see this as a problem so I don't give it any attention.

>> Libdisk is badly designed (if at all) and badly implemented [...]
>
> No argument there, and that's from the guy who slapped it together
> between changing diapers :-)

:-)

It's worth noting that the introduction of GEOM brought along
some additional, and unsexy, work that had to be finished in
time for the next release. I recall that sysinstall and libdisk
were among the last components that had to be made to work with
geom and that time was of the essence. It's a commendable
achievement that you delivered.

That said: libdisk is now at the root of various forms of evil,
including sysinstall(8) and its deadbeat offspring sade(8).
Something needs to be done, and done right, if we want to stop
this madness...

>> It's the application that should exhibit artificial intelligence
>> (if at all), not the kernel.
>
> So what is the advantage of editing the metadata in the kernel
> instead of userland ?

Abstraction. Userland does not know or care what the on-disk
meta-data looks like. It performs elementary operations that
every partitioning scheme supports (modulo "extensions") and
since the kernel needs to be involved anyway, it makes sense
to have it involved at the basic level to simplify checks
and to increase flexibility.

Giving the kernel an image of what it needs to write to disk
leaves a big gap between the current state and the new state
and checking whether the new state is at all possible becomes
very hard if not impossible in cases. But when the kernel is
aware of every step on the way from the old to the new state,
it can check if each step is possible and as such will know
that the new state is valid when asked to write to disk.

Error reporting to the user will also be improved. With an
image approach the kernel has very few error conditions to
report to the user with a single errno. That can only be
improved if the kernel provides better error reporting.
The kernel can return error strings, but that's fundamentally
the wrong thing to do, because that would mean that you need
to add i18n or l10n to the kernel.

When the kernel is involved for each step and checks each
step, the user will have direct feedback to its actions
and as such will be able to understand better what went
wrong and will therefore be able to take appropriate
action.

> If you could have writte a generic partitioning tool that didn't
> know about the different formats, then I could see the point,
> but having to have the code both in userland and in the kernel
> makes little no sense to me, in particular given how seldom
> it is used.

That's the point. A single partitioning tool will be written
and it will not know about the on-disk format of the meta-data.

> The problem with BSD labels is that you need to intercept writes
> to the metadata part if one of the partitions allows this to
> happen.

I personally don't worry about that. If the meta-data is within
a partition, then the user (e.g. file system) of that partition
needs to be aware of that anyway. The responsibility of keeping
the BSD label intact is automatically delegated to the user of
that partition and cannot in general be enforced anywhere else.
The seperation between meta-data and data is gone so the checks
can logically only happen by the owner of the data, not the owner
of the meta-data. This is especially true when the file system
puts it's own meta-data in the sector that contains meta-data
for partitioning. Think fsize, bsize and bps/cpg...

-- 
Marcel Moolenaar
xcllnt@mac.com