From owner-freebsd-geom@FreeBSD.ORG  Tue Feb  6 17:39:46 2007
Return-Path: <owner-freebsd-geom@FreeBSD.ORG>
X-Original-To: geom@FreeBSD.org
Delivered-To: freebsd-geom@FreeBSD.ORG
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id 43F2516A4CF
	for <geom@FreeBSD.org>; Tue,  6 Feb 2007 17:39:46 +0000 (UTC)
	(envelope-from xcllnt@mac.com)
Received: from smtpout.mac.com (smtpout.mac.com [17.250.248.173])
	by mx1.freebsd.org (Postfix) with ESMTP id 2EFBF13C4AC
	for <geom@FreeBSD.org>; Tue,  6 Feb 2007 17:39:44 +0000 (UTC)
	(envelope-from xcllnt@mac.com)
Received: from mac.com (smtpin07-en2 [10.13.10.152])
	by smtpout.mac.com (Xserve/8.12.11/smtpout03/MantshX 4.0) with ESMTP id
	l16HdiD1008887; Tue, 6 Feb 2007 09:39:44 -0800 (PST)
Received: from [192.168.1.2] (c-67-164-11-148.hsd1.ca.comcast.net
	[67.164.11.148]) (authenticated bits=0)
	by mac.com (Xserve/smtpin07/MantshX 4.0) with ESMTP id l16HdeBD017334
	(version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=NO);
	Tue, 6 Feb 2007 09:39:42 -0800 (PST)
In-Reply-To: <89489.1170747375@critter.freebsd.dk>
References: <89489.1170747375@critter.freebsd.dk>
Mime-Version: 1.0 (Apple Message framework v752.3)
Content-Type: text/plain; charset=US-ASCII; format=flowed
Message-Id: <835A2C66-BBEB-4A19-B6A3-A60E17572604@mac.com>
Content-Transfer-Encoding: 7bit
From: Marcel Moolenaar <xcllnt@mac.com>
Date: Tue, 6 Feb 2007 09:38:13 -0800
To: Poul-Henning Kamp <phk@phk.freebsd.dk>
X-Mailer: Apple Mail (2.752.3)
X-Brightmail-Tracker: AAAAAA==
X-Brightmail-scanned: yes
Cc: geom@FreeBSD.org
Subject: Re: New g_part class
X-BeenThere: freebsd-geom@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: GEOM-specific discussions and implementations
	<freebsd-geom.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-geom>,
	<mailto:freebsd-geom-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-geom>
List-Post: <mailto:freebsd-geom@freebsd.org>
List-Help: <mailto:freebsd-geom-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-geom>,
	<mailto:freebsd-geom-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 06 Feb 2007 17:39:47 -0000


On Feb 5, 2007, at 11:36 PM, Poul-Henning Kamp wrote:

> Considering the fact that editing can be done equally well in
> userland, what is the rationale or benefit of putting the code into
> the kernel, to deal with very infrequent operations to change the
> disk-layout ?

Editing is of course still done in user space. The application
is responsible for providing the right values to the kernel and
the kernel will simply fail the operation if something is not
correct.

The reason why the kernel supports the application with these
verbs is simple: the kernel needs to be involved because the
application cannot write to disk directly in all cases. With
the kernel involved, we can have as many ad-hoc verbs as there
are partitioning schemes or we can have a single partitioning
GEOM capable of handling various on-disk schemes. Since every
partitioning scheme has the same fundamental purpose, a single
GEOM maximizes code-reuse and allows us to have a single tool
to handle all known schemes. This latter is already a need:
sysinstall.

> My second concern is if we might still have to replicate all the
> error detection in userland, if we want to retain the option for
> atomic changes, ie: allowing users to specify a set of changes (with
> disklabel -e for instance) before committing them all ?

The verbs change the in-memory data only. A commit is needed to
write the data to disk. An undo verb exists to revert the in-
memory data to match what's on disk. This not only allows complex
operations to be written to disk in an atomic fashion, but also
supports applications like sysinstall, where everything is
prepared up-front and disks are being written after the user
gives the final go-ahead.

The added complexity to support this is minimal. The benefits
are numerous. Atomicity is one of them.

> Third, I doubt this will prove as useful as expected in writing
> partitioning tools.  For instance, how will the partitioning tool
> know about the geometry/alignment restrictions of MBR ?

A simple query is all that it takes. The application does not
have to know about geometry, only about partition alignment.
The GEOM can provide this to the application at runtime, based
on the geometry of the disk.

> If you study libdisk, you will find that there are a couple of DWIW
> functiosn that translate the users wish for a NN MB size thing into
> a properly aligned and sized thing for the MBR.  Where does that
> functionality live in this situation ?

Libdisk is badly designed (if at all) and badly implemented, but
the DWIM/DWIW functionality is in the right place in libdisk.
It's the application that should exhibit artificial intelligence
(if at all), not the kernel.

>   Does the kernel return "no
> good, try these parameters instead ?" or does it silently truncate
> and align ?

I think the kernel will error. There's no use-case for this
because APM and GPT don't have this restriction. Obvious is
that the MBR partitioning scheme will have to enforce this.
It can return an error or round the start up and the size
down to make it all aligned. I favor erroring.

> So I would advocate that you try to implement the MBR method next
> and then do a prototype disk-editor utility, so we can see if this
> actually makes life easier or not.

I will write an application first. There's no partitioning tool
for PowerPC and I have a PR open to rewrite gpt(8) to use ctl
requests for a while now. That drove me to implement g_part in
the first place.

>> schemes like MBR, BSD, SUN and/or PC98.
>
> BSD labels represent a particular nasty case, because of the
> possibility that the label sector is inside on a partition.  I will
> advocate that if we go this direction, we should not migrate BSD,
> but leave it behind to die, eventually.

I wouldn't mind if BSD labels die. At this time the g_part class
already supports the notion of leaf partitioning schemes, of
which BSD will be one. Leaf schemes cannot have sub-partitions.
This would be enough to prevent infinite nesting. That the
metadata can be within the first partition is not a problem for
the g_part class proper. It doesn't know how the on-disk meta
data looks like. It only knows the beginning and end of usable
disk space that can be partitioned and it doesn't care if the
beginning is at offset 0.

-- 
Marcel Moolenaar
xcllnt@mac.com