From owner-freebsd-geom@FreeBSD.ORG Tue Feb 6 19:16:56 2007 Return-Path: X-Original-To: geom@FreeBSD.org Delivered-To: freebsd-geom@FreeBSD.ORG Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 4B00916A407 for ; Tue, 6 Feb 2007 19:16:56 +0000 (UTC) (envelope-from xcllnt@mac.com) Received: from smtpout.mac.com (smtpout.mac.com [17.250.248.175]) by mx1.freebsd.org (Postfix) with ESMTP id 2CAFF13C471 for ; Tue, 6 Feb 2007 19:16:56 +0000 (UTC) (envelope-from xcllnt@mac.com) Received: from mac.com (smtpin08-en2 [10.13.10.153]) by smtpout.mac.com (Xserve/8.12.11/smtpout05/MantshX 4.0) with ESMTP id l16JGtp2019802; Tue, 6 Feb 2007 11:16:55 -0800 (PST) Received: from [172.24.104.147] (natint3.juniper.net [66.129.224.36]) (authenticated bits=0) by mac.com (Xserve/smtpin08/MantshX 4.0) with ESMTP id l16JGXYv016459 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=NO); Tue, 6 Feb 2007 11:16:50 -0800 (PST) In-Reply-To: <94029.1170784819@critter.freebsd.dk> References: <94029.1170784819@critter.freebsd.dk> Mime-Version: 1.0 (Apple Message framework v752.3) Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed Message-Id: Content-Transfer-Encoding: 7bit From: Marcel Moolenaar Date: Tue, 6 Feb 2007 11:15:06 -0800 To: Poul-Henning Kamp X-Mailer: Apple Mail (2.752.3) X-Brightmail-Tracker: AAAAAA== X-Brightmail-scanned: yes Cc: geom@FreeBSD.org Subject: Re: New g_part class X-BeenThere: freebsd-geom@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: GEOM-specific discussions and implementations List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 06 Feb 2007 19:16:56 -0000 On Feb 6, 2007, at 10:00 AM, Poul-Henning Kamp wrote: > In message <835A2C66-BBEB-4A19-B6A3-A60E17572604@mac.com>, Marcel > Moolenaar wri > tes: > >> Editing is of course still done in user space. The application >> is responsible for providing the right values to the kernel and >> the kernel will simply fail the operation if something is not >> correct. >> >> The reason why the kernel supports the application with these >> verbs is simple: the kernel needs to be involved because the >> application cannot write to disk directly in all cases. > > Right, but the current scheme handles this by asking the kernel to > write the finished metadata image, which the kernels taste > functionality can be used to validate and parse. This is in effect a replacement oriented approach that is based on retasting. One cannot change the partition table by adding a new partition when an existing partition is already mounted without circumventing permissions and other checks, right? What if I want to replace a MBR with a GPT without actually changing the meta-data? This doesn't work when each partition scheme has its own image-oriented verbs, but it is supported by g_part. With g_part the least important and most discriminating aspect of partitioning is abstracted: the on-disk format for storing the meta-data. This, I believe, is the right approach. > That way, the kernel doesn't have very rarely used code for > editing the metadata, only the necessary code to parse and > configure based on the on-disk metadata. The ctlreq functions will indeed be rarely used. However, the ctlreq functions don't actually have to be present in the kernel to make g_part functional for dealing with partitions. If space is a concern, then it should be possible to put the ctlreq functions in a separate module. I don't see this as a problem so I don't give it any attention. >> Libdisk is badly designed (if at all) and badly implemented [...] > > No argument there, and that's from the guy who slapped it together > between changing diapers :-) :-) It's worth noting that the introduction of GEOM brought along some additional, and unsexy, work that had to be finished in time for the next release. I recall that sysinstall and libdisk were among the last components that had to be made to work with geom and that time was of the essence. It's a commendable achievement that you delivered. That said: libdisk is now at the root of various forms of evil, including sysinstall(8) and its deadbeat offspring sade(8). Something needs to be done, and done right, if we want to stop this madness... >> It's the application that should exhibit artificial intelligence >> (if at all), not the kernel. > > So what is the advantage of editing the metadata in the kernel > instead of userland ? Abstraction. Userland does not know or care what the on-disk meta-data looks like. It performs elementary operations that every partitioning scheme supports (modulo "extensions") and since the kernel needs to be involved anyway, it makes sense to have it involved at the basic level to simplify checks and to increase flexibility. Giving the kernel an image of what it needs to write to disk leaves a big gap between the current state and the new state and checking whether the new state is at all possible becomes very hard if not impossible in cases. But when the kernel is aware of every step on the way from the old to the new state, it can check if each step is possible and as such will know that the new state is valid when asked to write to disk. Error reporting to the user will also be improved. With an image approach the kernel has very few error conditions to report to the user with a single errno. That can only be improved if the kernel provides better error reporting. The kernel can return error strings, but that's fundamentally the wrong thing to do, because that would mean that you need to add i18n or l10n to the kernel. When the kernel is involved for each step and checks each step, the user will have direct feedback to its actions and as such will be able to understand better what went wrong and will therefore be able to take appropriate action. > If you could have writte a generic partitioning tool that didn't > know about the different formats, then I could see the point, > but having to have the code both in userland and in the kernel > makes little no sense to me, in particular given how seldom > it is used. That's the point. A single partitioning tool will be written and it will not know about the on-disk format of the meta-data. > The problem with BSD labels is that you need to intercept writes > to the metadata part if one of the partitions allows this to > happen. I personally don't worry about that. If the meta-data is within a partition, then the user (e.g. file system) of that partition needs to be aware of that anyway. The responsibility of keeping the BSD label intact is automatically delegated to the user of that partition and cannot in general be enforced anywhere else. The seperation between meta-data and data is gone so the checks can logically only happen by the owner of the data, not the owner of the meta-data. This is especially true when the file system puts it's own meta-data in the sector that contains meta-data for partitioning. Think fsize, bsize and bps/cpg... -- Marcel Moolenaar xcllnt@mac.com