From owner-freebsd-arch Mon Apr 10 19:25: 3 2000 Delivered-To: freebsd-arch@freebsd.org Received: from ns1.yes.no (ns1.yes.no [195.204.136.10]) by hub.freebsd.org (Postfix) with ESMTP id 57E4D37B52C for ; Mon, 10 Apr 2000 19:24:51 -0700 (PDT) (envelope-from eivind@bitbox.follo.net) Received: from bitbox.follo.net (bitbox.follo.net [195.204.143.218]) by ns1.yes.no (8.9.3/8.9.3) with ESMTP id EAA06481 for ; Tue, 11 Apr 2000 04:24:51 +0200 (CEST) Received: (from eivind@localhost) by bitbox.follo.net (8.8.8/8.8.6) id EAA06674 for freebsd-arch@freebsd.org; Tue, 11 Apr 2000 04:24:48 +0200 (CEST) Received: from smtp01.primenet.com (smtp01.primenet.com [206.165.6.131]) by hub.freebsd.org (Postfix) with ESMTP id 362CD37B953 for ; Mon, 10 Apr 2000 19:24:33 -0700 (PDT) (envelope-from tlambert@usr09.primenet.com) Received: (from daemon@localhost) by smtp01.primenet.com (8.9.3/8.9.3) id TAA07939; Mon, 10 Apr 2000 19:24:20 -0700 (MST) Received: from usr09.primenet.com(206.165.6.209) via SMTP by smtp01.primenet.com, id smtpdAAA7paqip; Mon Apr 10 19:23:49 2000 Received: (from tlambert@localhost) by usr09.primenet.com (8.8.5/8.8.5) id TAA03058; Mon, 10 Apr 2000 19:23:53 -0700 (MST) From: Terry Lambert Message-Id: <200004110223.TAA03058@usr09.primenet.com> Subject: Re: BUF/BIO roadmap. To: phk@critter.freebsd.dk (Poul-Henning Kamp) Date: Tue, 11 Apr 2000 02:23:52 +0000 (GMT) Cc: julian@elischer.org (Julian Elischer), arch@freebsd.org In-Reply-To: <25105.955393010@critter.freebsd.dk> from "Poul-Henning Kamp" at Apr 10, 2000 08:56:50 PM X-Mailer: ELM [version 2.5 PL2] MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG > >The Other problem I faced was the possibility that > >when a low level device was open, the user might re-write the > >structures that defined some upper layer devices. My solution for > >that was that on the close() of the lower level device, all > >the upper level devices were asked to verify that they were > >still valid. > > It is certainly a sticky issue no matter how you tackle it, and I > am more than a little bit tempted to apply root-inteligence rather > than code complexity to this problem. A re-probe on close is > probably the only sane, simplest and most POLA preserving action > available. My suggestion to Julian at the time, which he didn't like, but which I did, and which I thought was significantly more elegant, was to move the slice management into the kernel. The kernel has to have some idea of these structures, and it is the kernel that understands the hierarchical relationships in a nested stack. Because the kernel has to be able to read this to identify devices, it is a simple matter to make it write the structures as well. That is, if I want an DOS partition table or DOS extended partition table, or a BSD disklabel, or an SVR4 VTOC, then I issue an ioctl() to do the work. The interface for this could be normalized, so that it could query available partition management schemes (even loading KLDs on demand to support new ones, if necessary). The ioctl() would have the same interface. This would let you finally have a single "fdisk" program that could grok all of the partitioning management schemes supported by your system, and present them rationally and uniformly, in one interface, without needing to link the disklabel code with the fdisk code with the ... code. This allows the kernel to intermediate the corner cases, which a user space write or hierarchical locking can not successfully allow, for enforcement reasons. Consider that I may have unallocated disk space, and I may have a partitition management scheme with a free slot available; why should I have to take the entire stack down, in order to write a new DOS partitition table entry that has no negative impact on the functioning of the system? I view this as an issue for Vinum plexes, as well as automatic allocation of PP's in something like an IBM JFS (indeed, if you were to deal with the cylinder group fill issue, you could well imagine that you could have an agregation device that could handle FFS auto-growth, 4M of disk at a time, from a common pool shared between several filesystems). Given a node-locked hierarchy, as Julian has put forward, I really can't allocate more disk space, unless I can write the table for the agregation device, and I can't do that if the agregation device is in use (e.g., the single most useful time for me to want to do the job). > >Justin Gibbs suggested that this call should also allow the driver > >to know WHO is making hte call, > > Yeah, but unfortunately we loose that information long before we > get to that point, from memory I belive it was VOP_OPEN which discards > all but the credentials. The dup(2) problem in other words. I think Julian was thinking of the credentials, not the proc struct. The specfs glue code really needs to be shot. It's not really rational to keep "struct fileops", going forward. There was never sufficient integration of the VFS code, back at the time that the VFS was brought in; the specfs and socket warts are cases in point. This would satisfy both of your criteria, I think, as well as just finally cleaning out from under that particular rug. Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message