From owner-freebsd-current@FreeBSD.ORG  Mon Jul 23 14:24:22 2012
Return-Path: <owner-freebsd-current@FreeBSD.ORG>
Delivered-To: current@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 4C15D1065674
	for <current@freebsd.org>; Mon, 23 Jul 2012 14:24:22 +0000 (UTC)
	(envelope-from imp@bsdimp.com)
Received: from mail-gg0-f182.google.com (mail-gg0-f182.google.com
	[209.85.161.182])
	by mx1.freebsd.org (Postfix) with ESMTP id F02D58FC17
	for <current@freebsd.org>; Mon, 23 Jul 2012 14:24:21 +0000 (UTC)
Received: by ggnm2 with SMTP id m2so6444940ggn.13
	for <current@freebsd.org>; Mon, 23 Jul 2012 07:24:21 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
	d=google.com; s=20120113;
	h=sender:subject:mime-version:content-type:from:in-reply-to:date:cc
	:content-transfer-encoding:message-id:references:to:x-mailer
	:x-gm-message-state;
	bh=jW10FhANvmNa9LoC5sB3gadZv/gMHHss6SiJANJv9Fs=;
	b=F4RUOP+096O7JuPIxVlT3/TfeWqVyx5WP/XP0KlTBcYg9KJT+OlMXHa9xzTjrMUVv6
	eY4hkvkFoDq2NY3os6a2BOJK2lC7LO1hn2pcELR2Jzl7BqSgBmRewPvX3nkoReeGMCGj
	6cgq/zDvftAw0Zjt4eA8VppO3RERnD56tGOOslTJr5PoRL03t3qDy0KEATLk5Zd/4/oE
	n4cUZFQNGLtb/s3Y1eR6LLXT2L6kRhTyJxeSgqlCWIHunSJgL3UZ7Jwmun5daunZvzr9
	ghGQy4KkY6fkPW5mffMqYQz39kAkjHJSqRZLYFHaTiX+kuMJNg8IXPsru6HG9V705y59
	7TwA==
Received: by 10.42.189.73 with SMTP id dd9mr8512714icb.49.1343053461098;
	Mon, 23 Jul 2012 07:24:21 -0700 (PDT)
Received: from 63.imp.bsdimp.com
	(50-78-194-198-static.hfc.comcastbusiness.net. [50.78.194.198])
	by mx.google.com with ESMTPS id ay5sm6098682igb.15.2012.07.23.07.24.18
	(version=TLSv1/SSLv3 cipher=OTHER);
	Mon, 23 Jul 2012 07:24:19 -0700 (PDT)
Sender: Warner Losh <wlosh@bsdimp.com>
Mime-Version: 1.0 (Apple Message framework v1084)
Content-Type: text/plain; charset=us-ascii
From: Warner Losh <imp@bsdimp.com>
In-Reply-To: <500D010A.5080808@freebsd.org>
Date: Mon, 23 Jul 2012 08:24:17 -0600
Content-Transfer-Encoding: quoted-printable
Message-Id: <EB3C2E43-CA98-4783-AA36-7232F7B93D54@bsdimp.com>
References: <500A0E24.80101@freebsd.org>
	<EABF0570-55F1-4758-B0FF-62561FFAC4EF@samsco.org>
	<20120722231234.6f748d05@kan.dyndns.org>
	<F1592617-FBD9-4D2A-80DA-BC8CF5D96F87@bsdimp.com>
	<500D010A.5080808@freebsd.org>
To: Julian Elischer <julian@freebsd.org>
X-Mailer: Apple Mail (2.1084)
X-Gm-Message-State: ALoCoQk97JhAmOd8fTBdby8KSZDvXaMRLdiC+XJvP25ICwTjeBgvlsfxE3NIg+tE+5W3UPE92y7b
Cc: FreeBSD Current <current@freebsd.org>
Subject: Re: PCIe hotplug
X-BeenThere: freebsd-current@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussions about the use of FreeBSD-current
	<freebsd-current.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>, 
	<mailto:freebsd-current-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-current>
List-Post: <mailto:freebsd-current@freebsd.org>
List-Help: <mailto:freebsd-current-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>,
	<mailto:freebsd-current-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 23 Jul 2012 14:24:22 -0000


On Jul 23, 2012, at 1:45 AM, Julian Elischer wrote:

> On 7/22/12 9:11 PM, Warner Losh wrote:
>> On Jul 22, 2012, at 9:12 PM, Alexander Kabaev wrote:
>>=20
>>> On Sun, 22 Jul 2012 20:22:33 -0600
>>> Scott Long <scottl@samsco.org> wrote:
>>>=20
>>>> On Jul 20, 2012, at 8:04 PM, Julian Elischer wrote:
>>>>=20
>>>>> Is anyone looking at PCIe hotplug support?
>>>>>=20
>>>>> I'm especially interested if anyone has a strategy for device
>>>>> re-insertion and reassociating the reinserted device with its old
>>>>> device_t so that it gets the same unit number.. (assumes access to
>>>>> a serial number or similar) Even if it is put back into a =
different
>>>>> slot.
>>>>>=20
>>>> Would the PCI system be responsible for figuring out this serial
>>>> number?  I don't think that it can, but it's a question to answer, =
I
>>>> guess.  If it can't then it's up to the driver to generate a unique
>>>> cookie that would be stored by the PCI subsystem.  This cookie =
would
>>>> have to be based off of data that can be retrieved from the PCI
>>>> config space and/or VPD space, since anything more would require
>>>> resource allocation, which is only allowed in the DEV_ATTACH phase,
>>>> and once you've hit that phase you've already pretty much sealed =
the
>>>> deal on unit number assignment.
>>>>=20
>>>> So what would probably happen is that the PCI layer provides a ring
>>>> buffer of cookie storage and a set of accessors for the drivers.  =
The
>>>> cookies would map to a key-value pair with the device unit name and
>>>> number.  During probe, a driver can look at PCI config space and
>>>> generate a cookie.  That cookie can then be communicated up to the
>>>> PCI layer for storage.  Maybe the driver calls a match routine that
>>>> returns a unit number on match and a store on failure, then the
>>>> driver calls a set_unit_number accessor.  Only the driver that wins
>>>> the bid would win the unit number reassignment or cookie storage.  =
Or
>>>> maybe the driver passes the cookie up as part of its return code, =
and
>>>> the match and unit assignment happens automatically.  Drivers that
>>>> don't want to participate in this simply wouldn't, and everything
>>>> would continue to operate the same way.  The two sticky parts are
>>>> rogue/buggy drivers that abuse the api and cause a flood of cookies
>>>> to be generated, and questions on when a unit number is eligible =
for
>>>> reuse.  For the first one, a ring buffer of cookies would solve the
>>>> immediate problem, but you might still have some risk of drivers
>>>> selectively wrapping the buffer for whatever accidental or evil
>>>> purpose.  For the second problem, maybe a unit number stays
>>>> persistent only if the PCIe hot remove mechanism requests it, and
>>>> then only until the ring-buffer wraps.
>>>>=20
>>>> Scott
>>>>=20
>>> I do not think the whole problem as depicted by Julian is even worth
>>> solving. Why keeping any data for the device that might _never_ come
>>> back? What if the device hierarchy just starts from the PCI-e and
>>> extends upwards and user still holds on to some vestiges of a =
previous
>>> device chain (say, by keeping a character control device sharing the
>>> same unit number open, common practice)? Reusing unit number is much
>>> trickier then, and might not be even possible. So, before one jumps
>>> into 'how', can we agree on 'why' first? When device goes away, it =
is
>>> not just this device's device_t that is disappearing, it is a whole
>>> tree rooted at that device. I see no point in trying to reconstruct
>>> that.
>> There's a reason that PC Card and CardBus never supported this at =
all.  The assumption was that reconnecting devices is so cheap that it =
isn't worth the bother.  This is true for all but some specialized =
devices today: network information is easy to reconstruct, storage =
drives are easy to reconfigure (since we already fail all in-flight =
transactions when the device goes away), etc.  I can see some advantage =
to having storage cope, but there already geom classes that can help =
people code when drives can go away.
>>=20
>>> PCI-e hotplug proper is very much orthogonal to the question of unit
>>> numbering and IS worth supporting.
>> Yes.  totally agreed.
>=20
> I'm not saying that it's vitally important but was wondering if people =
had a strategy for it..
> i.e. is it a question worth worrying about?
>=20
> In a separate forum Warner and I (yeah I know I'm answering Warner, =
but I'm addressing the others) discussed the feasibility  of surviving =
an "oops pulled the wrong card" event with regards to a particular flash =
memory card. I was just carrying that forwards as a thought experiment =
(There is actually a strategy that sounds feasible).
>=20
> The problem of getting a serial number out of the BAR space during =
probe is also possibly solvable in our case but the question of how long =
to remember a device is legitimate an My answer would be that
> 1/ a particular driver would be able to specify whether it could =
handle this, and
> 2/ it might be limited to some pragmatic number such as 16 or 32, or a =
time limit.

Actually,  for the case where there's expensive state to reconstruct, =
you'd likely want to keep the state around for an estimated time it =
would take to reconstruct it.  I think that this may necessarily have to =
be done outside the framework of newbus, or you'd need a new =
mechanism/state that the driver could request as part of its detach =
routine that says effectively 'keep this around'.  Any later probe on =
insert would have to be able to find this and tell newbus about it.  =
However, that does start to get ugly in a hurry.

It would be better, imho, that if a driver wanted to keep this info =
around, it would have to take responsibility for finding any such =
pending state and cope appropriately.  In the case of storage devices, =
you'd be able to have those storage volumes on a separate bus, and then =
you'd have all the control that you need.  But then you start wondering =
into issues with what to do with the pending transactions, what to do =
with new transactions, etc before giving up.

Then again, all these are nice to haves, especially since we don't have =
pcie hot plug at all right now.

Warner=