From owner-freebsd-arch@FreeBSD.ORG  Wed Sep  1 19:46:38 2004
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 21D8116A4CE
	for <freebsd-arch@freebsd.org>; Wed,  1 Sep 2004 19:46:38 +0000 (GMT)
Received: from athena.softcardsystems.com (mail.softcardsystems.com
	[12.34.136.114])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 95F0143D1F
	for <freebsd-arch@freebsd.org>; Wed,  1 Sep 2004 19:46:37 +0000 (GMT)
	(envelope-from sah@softcardsystems.com)
Received: from athena (athena [12.34.136.114])i81KkSRE002447;
	Wed, 1 Sep 2004 15:46:31 -0500
Date: Wed, 1 Sep 2004 15:46:28 -0500 (EST)
From: Sam <sah@softcardsystems.com>
X-X-Sender: sah@athena
To: Scott Long <scottl@samsco.org>
In-Reply-To: <413617A4.1030202@samsco.org>
Message-ID: <Pine.LNX.4.60.0409011502420.1792@athena>
References: <Pine.LNX.4.60.0409011215400.13505@athena>
	<413617A4.1030202@samsco.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
cc: freebsd-arch@freebsd.org
Subject: Re: disk_create and cdevsw_add
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Discussion related to FreeBSD architecture
	<freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 01 Sep 2004 19:46:38 -0000


On Wed, 1 Sep 2004, Scott Long wrote:

> Sam wrote:
>> 'lo again,
>> 
>> kern/subr_disk.c:/^disk_create/ takes
>> two cdevsw types, and I only vaguely
>> understand why.  Can someone explain it to me?
>
> I'm not really clear on this myself, other than the first cdevsw
> contains your actual table, and the second one is a dummy that you
> allocate but don't actually touch.
>
>> 
>> I'm generally confused about resolving
>> entry points into the driver.  Does a
>> block device only get an open() after
>> registering it with disk_create?
>
> Yes.  disk_create() is just a modified form of cdevsw_add(), and
> your cdevsw entry points are not accessable until that is called.
>
>> Supposing I want to set some ioctls for
>> an aoecontrol utility (show all devices
>> known, eg), what would aoecontrol open
>> to ioctl?
>
> You can either implement the ioctl handler in the same device as the
> AoE device, or you can create a separate control device with it's own
> major and minor that represents all of the AoE devices, or you can do
> both.  You can also create a control device per AoE device, but that
> isn't terribly common these days and has implications when porting to
> 5.x and beyond.
>
> What kind of things will aoecontrol do?  If it will be creating and
> destroying AoE device instances, then you definetly want a separate
> control device.  You might want to look at my old 4.x RAIDFrame patches
> that do this.  They can be found at http://people.freebsd.org/~scottl/rf

Right now it would be useful to pull out the list of devices
the driver knows about.  In lunix I have a char driver implementing
a set of files, one of which is stat:

% cat /dev/etherd/stat
/dev/etherd/e0.0	up
/dev/etherd/e0.1	up
...

So I'm thinking that here I'll set up an ioctl so aoecontrol
could spit out such information.  Another idea is to permit
a way to restrict the interfaces acceptable to do AoE on.
Currently I broadcast on every interface to find devices
I can talk to; I can imagine a sysadmin might find this undesirable.
The former goes away with 5.x because if it's known, it's
in /dev.  The latter could be enforced by making the user
recompile the module, a nightmare for non-coders.

A further ponderance:

Each AoE device has a major,minor address.  Let's call them
aoemajor/aoeminor to be clear.  I have a simple association
between unit and aoemajor/aoeminor using a MAJPERMIN constant:

 	unit = aoemajor * MAJPERMIN + aoeminor;

This permits me to create device nodes that abstract the
network.  As a lunix example, /dev/etherd/e0.0 is the
AoE device with aoemajor=0, aoeminor=0.  In coraid's
implementation, aoemajor is a shelf id and aoeminor is
a slot id.  So this example uses the blade in shelf 0, slot 0.

The AoE protocol permits specifying the aoemajor,aoeminor
address in the frame.  It's possible to send out an
ethernet broadcast specifying a particular aoemajor,aoeminor
and have only the blade with that aoemajor,aoeminor process it.

So: if I could get an open on a device I could send out a
frame to see if the device is there at open time.  Due to
the current scheme I have to periodically send out a
broadcast (with aoemajor and aoeminor unspecified) to probe
the network.  In a setup with a large number of blades,
avoiding a periodic storm would be desirable.

Any thoughts on this?

Sam