From owner-freebsd-arch@FreeBSD.ORG  Wed Sep  1 21:09:01 2004
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 7264F16A4CE
	for <freebsd-arch@freebsd.org>; Wed,  1 Sep 2004 21:09:01 +0000 (GMT)
Received: from pooker.samsco.org (pooker.samsco.org [168.103.85.57])
	by mx1.FreeBSD.org (Postfix) with ESMTP id B633743D39
	for <freebsd-arch@freebsd.org>; Wed,  1 Sep 2004 21:08:58 +0000 (GMT)
	(envelope-from scottl@samsco.org)
Received: from [192.168.0.201] ([192.168.0.201])
	(authenticated bits=0)
	by pooker.samsco.org (8.12.11/8.12.10) with ESMTP id i81L8hxL078460;
	Wed, 1 Sep 2004 15:08:43 -0600 (MDT)
	(envelope-from scottl@samsco.org)
Message-ID: <413639CA.2060400@samsco.org>
Date: Wed, 01 Sep 2004 15:06:18 -0600
From: Scott Long <scottl@samsco.org>
User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.7.2) Gecko/20040831
X-Accept-Language: en-us, en
MIME-Version: 1.0
To: Sam <sah@softcardsystems.com>
References: <Pine.LNX.4.60.0409011215400.13505@athena>
	<413617A4.1030202@samsco.org> <Pine.LNX.4.60.0409011502420.1792@athena>
In-Reply-To: <Pine.LNX.4.60.0409011502420.1792@athena>
Content-Type: text/plain; charset=us-ascii; format=flowed
Content-Transfer-Encoding: 7bit
X-Spam-Status: No, hits=0.0 required=3.8 tests=none autolearn=no version=2.63
X-Spam-Checker-Version: SpamAssassin 2.63 (2004-01-11) on pooker.samsco.org
cc: freebsd-arch@freebsd.org
Subject: Re: disk_create and cdevsw_add
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Discussion related to FreeBSD architecture
	<freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 01 Sep 2004 21:09:01 -0000

Sam wrote:
> 
> 
> On Wed, 1 Sep 2004, Scott Long wrote:
> 
>> Sam wrote:
>>
>>> 'lo again,
>>>
>>> kern/subr_disk.c:/^disk_create/ takes
>>> two cdevsw types, and I only vaguely
>>> understand why.  Can someone explain it to me?
>>
>>
>> I'm not really clear on this myself, other than the first cdevsw
>> contains your actual table, and the second one is a dummy that you
>> allocate but don't actually touch.
>>
>>>
>>> I'm generally confused about resolving
>>> entry points into the driver.  Does a
>>> block device only get an open() after
>>> registering it with disk_create?
>>
>>
>> Yes.  disk_create() is just a modified form of cdevsw_add(), and
>> your cdevsw entry points are not accessable until that is called.
>>
>>> Supposing I want to set some ioctls for
>>> an aoecontrol utility (show all devices
>>> known, eg), what would aoecontrol open
>>> to ioctl?
>>
>>
>> You can either implement the ioctl handler in the same device as the
>> AoE device, or you can create a separate control device with it's own
>> major and minor that represents all of the AoE devices, or you can do
>> both.  You can also create a control device per AoE device, but that
>> isn't terribly common these days and has implications when porting to
>> 5.x and beyond.
>>
>> What kind of things will aoecontrol do?  If it will be creating and
>> destroying AoE device instances, then you definetly want a separate
>> control device.  You might want to look at my old 4.x RAIDFrame patches
>> that do this.  They can be found at http://people.freebsd.org/~scottl/rf
> 
> 
> Right now it would be useful to pull out the list of devices
> the driver knows about.  In lunix I have a char driver implementing
> a set of files, one of which is stat:
> 
> % cat /dev/etherd/stat
> /dev/etherd/e0.0    up
> /dev/etherd/e0.1    up
> ...
> 
> So I'm thinking that here I'll set up an ioctl so aoecontrol
> could spit out such information.

You can express this via a sysctl.  Think of the sysctl tree as serving
a similar purpose to all of the pseudofs trees in linux that have all of
their magic nodes.  Of course, expressing it via an ioctl is fine, too,
but sysctls are pretty easy to implement, not much harder than a proc
handler in Linux.

> Another idea is to permit
> a way to restrict the interfaces acceptable to do AoE on.
> Currently I broadcast on every interface to find devices
> I can talk to; I can imagine a sysadmin might find this undesirable.
> The former goes away with 5.x because if it's known, it's
> in /dev.  The latter could be enforced by making the user
> recompile the module, a nightmare for non-coders.

I'm not sure what you mean here.  You're going to have an AoE 'target' 
and and AoE 'initiator', where the target contains the actual storage,
and the initiator sends ATA-over-Eth commands to the target, right?
If so, then how can the initiator know what the available targets are
without doing a scan?  Or are you talking about auto-configuring the
target machine to have it present all locally-discovered devices as
AoE targets on the network?  If so, then you probably want to let the
admin have as much control over this process as possible.  Nothing
spells 'heart-attack' as quickly as a security leak via inadvertent
advertisement.

> 
> A further ponderance:
> 
> Each AoE device has a major,minor address.  Let's call them
> aoemajor/aoeminor to be clear.  I have a simple association
> between unit and aoemajor/aoeminor using a MAJPERMIN constant:
> 
>     unit = aoemajor * MAJPERMIN + aoeminor;
> 
> This permits me to create device nodes that abstract the
> network.  As a lunix example, /dev/etherd/e0.0 is the
> AoE device with aoemajor=0, aoeminor=0.  In coraid's
> implementation, aoemajor is a shelf id and aoeminor is
> a slot id.  So this example uses the blade in shelf 0, slot 0.
> 
> The AoE protocol permits specifying the aoemajor,aoeminor
> address in the frame.  It's possible to send out an
> ethernet broadcast specifying a particular aoemajor,aoeminor
> and have only the blade with that aoemajor,aoeminor process it.
> 
> So: if I could get an open on a device I could send out a
> frame to see if the device is there at open time.  Due to
> the current scheme I have to periodically send out a
> broadcast (with aoemajor and aoeminor unspecified) to probe
> the network.  In a setup with a large number of blades,
> avoiding a periodic storm would be desirable.
> 
> Any thoughts on this?
> 
> Sam
> 

I haven't read enough of the AoE paper to comment here, sorry.

Scott