Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 7 Dec 1997 21:46:34 -0800 (PST)
From:      Julian Elischer <julian@whistle.com>
To:        hackers@freebsd.org
Cc:        Julian Elischer <julian@whistle.com>, mckusick@mckusick.com
Subject:   [hackers:] Architectural advice needed
Message-ID:  <Pine.BSF.3.95.971207202900.20555A-100000@current1.whistle.com>

next in thread | raw e-mail | index | archive | help
This starts out discussing a single problem and then goes on to discuss
more general problems and ideas.. stick with it..

BDE pointed out a problem in the system that showed up when using my
device filesystem.  In spec_getpages() the size of the device's blocks is
incorrectly deduced from the blocksize of the filesystem in which the
device resided (e.g. if you are accessing sd2 with a blocksize of 1K, you
will get 512 bytes because /dev/ is in / and THAT is on sd0 and has a
blocksize of 512 bytes.) This so obvioously wrong that i'm not worried
about whether it SHOULD be fixed, just HOW? 

The obvious place to store the blocksize is in the specinfo struct pointed
to by the vnode for the device. It might be possible to make a request
to the device to get this info, but it would require doing an ioctl
to the device every time you wanted to do this and that would seem a
very slow operation for retrieving a single int. Does anyone have a better
place to stash this info? vn->v_blksize (as a macro)

#define v_blksize v_specinfo->si_blksize

would seem the correct scope and placement for this information.

Now, Part 2.. How does this information GET to this location.?
It needs to be put there at the time that either
1/ the vnode is allocated. (the same time it's put on the vnode alias
list)
2/ the device is openned. 

In either case, the problem is that there is no easy call that can be made
to the device to find out it's blocksize. The deveice drivers are only
accessible through the devsw interface, and while there is a 'size' call,
there is no 'blksize' call. This leaves the IOCTL interface. 

Should I just use the 'read disklabel' ioctl, or should there be a
separate call of some sort. Open would be the right place except that the
open call does not get a vnode as an argument, but, rather dev_t, so it
can't fill in the field in the vnode. The lookup() that allocated the
vnode cannot do the right thing because it is a vnop for the ufs (or
devfs) that HOLDS the device rather than a representative of the device
itself. So either the specfs open code should do an ioctl to get the
blocksize, or the checkalias() code that is called when a device vnode is
allocated, should do this ioctl. One worry is that within the kernel, it
is possible to access the device without doing an open() on it, so the
checkalias() (or nearby) position owuld be safer, but the open() would
seem more correct.

Question 3.
One raised some time ago by PHK:
When a device is 'upgraded' to read-write from read-only, the vnode is
consulted, to see it it is permissable, but the device itself is not 
notified fo the change. If we (phk and myself) think about this and come
up with a change for this, would it be considered a useful thing..

FINALLY:

I have a long-term thought that eventually dev_t is going to be a rather
silly thing. The devsw calls should all get a vnode pointer as the first
argument. In this case they can always extract the minor number needed,
but they have a way to interact more correctly with the vnode. This would
eventually result in device driver implimented vnops.

Is this a way to go?  Overall it's about 3 months work and I'm really part
of the way there already, but I've reached a point where the magnitude of
the changes scares me. Not because of the technical problems, but rather
because of the political repercussions.

<speculate> If dev_t is redefined as

struct devref{
	int	dr_refs;		/* this too is ref counted*/
	struct vnode	*dr_vn;
	u_long		dr_v_id; 	/* the capabilty # of the vnode */
					/* consumers should check this */
};
typedef	struct devref *dev_t

with strict reference counting, and a few MACRO's this could be used to
transition from one system to the other. the VM system and others that
hash on dev_t could hash on some of the contents of the struct above, and
a whole lot of things would eventually become simpler.  </speculate>

I think there would be a phase when things got a little 'hairy' but
overall it makes a lot more sense than what we have at the moment. 

The big problem I see is how do I do this and keep in touch with the rest
of freeBSD? I have DEVFS/SLICE working as a set of patches, and I'd like
to have them commited if I can get someone to look them over. But the next
stage cannot really be done as a set of patches. DEVFS/SLICE can be in the
code and if you don't define DEVFS and SLICE, you don't get ANY changes to
what you are running, but the changes I'd like to see are really to big to
be feasible that way..

Is there a way in which such a large project can be approached?  (and is
there anyone else that thinks that this is the way to go?)

I really an looking for advice from what I consider to be a very talented
and experienced group of CS proffessionals here.

SHOULD devices respond directly to VOPS?  should dev_t continue to exist
as a 'number' that needs to be interpreted? 

I wonder if there is a forum that we could use for a fuller discussion of
this sort of thing? 

I've discussed this sort of thing with, (at various times)  phk, peter,
john, david, jkh, brian, terry, kirk, Bill(jolitz)  Mike, cgd, theo,
charles and others I forget. No-one has ever said "That's stupid", and
most have agreed that it would simplify some aspects of how the kernel
gets its work done. The question is, how do we get everybody to discuss
this sort of thing at one time? How do we decide on an aproach for such a
significant change?

julian





Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.BSF.3.95.971207202900.20555A-100000>