Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 04 Sep 2009 17:23:10 -0600 (MDT)
From:      "M. Warner Losh" <imp@bsdimp.com>
To:        attilio@FreeBSD.org
Cc:        arch@FreeBSD.org
Subject:   Re: NEWBUS states
Message-ID:  <20090904.172310.-1939841993.imp@bsdimp.com>
In-Reply-To: <3bbf2fe10909041546y2b5633e1ue063955568df1a06@mail.gmail.com>
References:  <3bbf2fe10909041455u552b0dbdm1708ea0a26365149@mail.gmail.com> <20090904.161634.-217944108.imp@bsdimp.com> <3bbf2fe10909041546y2b5633e1ue063955568df1a06@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
In message: <3bbf2fe10909041546y2b5633e1ue063955568df1a06@mail.gmail.com>
            Attilio Rao <attilio@FreeBSD.org> writes:
: 2009/9/5 M. Warner Losh <imp@bsdimp.com>:
: > [[ redirected to arch@ since too much of this discussion has been private ]]
: >
: > In message: <3bbf2fe10909041455u552b0dbdm1708ea0a26365149@mail.gmail.com>
: >            Attilio Rao <attilio@freebsd.org> writes:
: > : 2009/9/4 M. Warner Losh <imp@bsdimp.com>:
: > : > In message: <200909031340.n83Defkv034013@svn.freebsd.org>
: > : >            Attilio Rao <attilio@FreeBSD.org> writes:
: > : > : Modified: head/sys/sys/bus.h
: > : > : ==============================================================================
: > : > : --- head/sys/sys/bus.h        Thu Sep  3 12:41:00 2009        (r196778)
: > : > : +++ head/sys/sys/bus.h        Thu Sep  3 13:40:41 2009        (r196779)
: > : > : @@ -52,8 +52,11 @@ struct u_businfo {
: > : > :  typedef enum device_state {
: > : > :       DS_NOTPRESENT,                  /**< @brief not probed or probe failed */
: > : > :       DS_ALIVE,                       /**< @brief probe succeeded */
: > : > : +     DS_ATTACHING,                   /**< @brief attaching is in progress */
: > : > :       DS_ATTACHED,                    /**< @brief attach method called */
: > : > : -     DS_BUSY                         /**< @brief device is open */
: > : > : +     DS_BUSY,                        /**< @brief device is open */
: > : > : +     DS_DETACHING                    /**< @brief detaching is in progress */
: > : > : +
: > : > :  } device_state_t;
: > : >
: > : > device_state_t is exported to userland via devctl.  Well, that's not
: > : > entirely true...  It isn't used EXACTLY, but there's this in
: > : > devinfo.h:
: > : >
: > : > /*
: > : >  * State of the device.
: > : >  */
: > : > /* XXX not sure if I want a copy here, or expose sys/bus.h */
: > : > typedef enum devinfo_state {
: > : >        DIS_NOTPRESENT,                 /* not probed or probe failed */
: > : >        DIS_ALIVE,                      /* probe succeeded */
: > : >        DIS_ATTACHED,                   /* attach method called */
: > : >        DIS_BUSY                        /* device is open */
: > : > } devinfo_state_t;
: > : >
: > : > which is why devinfo is broken.
: > :
: > : I think the right fix here is to maintain in sync devinfo.h and bus.h
: > : definition by just having one. I see that the devices states are
: > : redefined in devinfo.h in order to avoid namespace pollution and this
: > : is a good point. What I propose is to add a new header (_bus.h,  to be
: > : included in both bus.h and devinfo.h) which just containst the device
: > : states.
: >
: > There's a lot of possible fixes.  It is broken right now.  Your commit
: > broke it.
: >
: > The problem is that we have multiple names for the same things, and
: > that's a defined API/ABI today....  So having a _bus.h won't solve
: > this problem without introducing name space pollution (well, I suppose
: > you could have __DS_NOTPRESENT, __DS_ALIVE, etc and then have
: > devinfo.h and bus.h map those in, but that still wouldn't totally
: > solve the problem).
: 
: What I would like to do is simply to typedef the enum I get from the
: _bus.h. I'm not sure if nothing else is still required.

You can't do just that.  There's the "DIS_foo vs DS_foo" issue as
well, which cannot be solved with typedefs.  Some mapping is
required..  this is a sill API, one could argue, but it is the one we
have today...

: > : > Also, DS_BUSY is used in many drivers to PREVENT detaching.  So the
: > : > change is bad from that point of view, since DS_DETACHING is now >
: > : > DS_BUSY.  There's really a partial ordering relationship now where
: > : > before there was a total ordering (DS_BUSY is > DS_ATTACHED and
: > : > DS_DETACHING is > DS_ATTACH, but DS_DETACHING isn't > DS_BUSY and
: > : > DS_BUSY isn't > DS_DETACHING).  I think that you've destroyed
: > : > information here by unconditionally setting it:
: > : >
: > : > -       if ((error = DEVICE_DETACH(dev)) != 0)
: > : > +       dev->state = DS_DETACHING;
: > : > +       if ((error = DEVICE_DETACH(dev)) != 0) {
: > : > +               KASSERT(dev->state == DS_DETACHING,
: > : > +                   ("%s: %p device state must not been changing", __func__,
: > : > +                   dev));
: > : > +               dev->state = DS_ATTACHED;
: > : >                return (error);
: > : > +       }
: > : > +       KASSERT(dev->state == DS_DETACHING,
: > : > +           ("%s: %p device state must not been changing", __func__, dev));
: > : >
: > : > And this looks racy between the check earlier and this setting.
: > : > Properly locked, this wouldn't destroy information...
: > :
: > : Sorry, I really don't understand what point are you making here, and
: > : what the scaring words of "destroying", "racy", etc. means. Can you
: > : explain better that part?
: >
: > I think it is a minor concern.  There's no locking now to prevent
: > multiple threads doing stuff.  I think it would be better to
: > completely omit this stuff than to put it in 1/2 way.
: 
: The point for this change is to add now parts which could introduce,
: in the future, ABI compatibility breakage so that we can MFC the
: newbus locking to 8.1. Obviously that is not a finished patch, because
: it still misses of the full context.

Yes.  I understand that.  I'm specifically suggesting that we only MFC
the changes to the ABI today, and MFC the changes to the code when it
is complete.

: > : > At the very least cardbus/cardbus.c and pccard/pccard.c need to be
: > : > looked at since they both have code that looks like:
: > : >
: > : >        for (tmp = 0; tmp < numdevs; tmp++) {
: > : >                struct cardbus_devinfo *dinfo = device_get_ivars(devlist[tmp]);
: > : >                int status = device_get_state(devlist[tmp]);
: > : >
: > : >                if (dinfo->pci.cfg.dev != devlist[tmp])
: > : >                        device_printf(cbdev, "devinfo dev mismatch\n");
: > : >                if (status == DS_ATTACHED || status == DS_BUSY)
: > : >                        device_detach(devlist[tmp]);
: > : >                cardbus_release_all_resources(cbdev, dinfo);
: > : >                cardbus_device_destroy(dinfo);
: > : >                device_delete_child(cbdev, devlist[tmp]);
: > : >                pci_freecfg((struct pci_devinfo *)dinfo);
: > : >        }
: > : >
: > : > which does ignore errors returned by device_detach for the DS_BUSY
: > : > case because there's not currently a good way to tell device_detach
: > : > that it *MUST* detach the device *NOW* without any possibility of veto
: > : > by the driver.  The above code also isn't DS_DETACHING aware, and may
: > : > be wrong in the face of this new state.
: > :
: > : How DS_DETACHING can cause problems here? device_detach() simply won't
: > : run if the state is DS_DETACHING as expected (another thread is alredy
: > : detaching and there is no need for it to detach).
: > : Also, please note that in this case, for the state == DS_BUSY he
: > : device_detach() won't do anything. You can't simply skip the return
: > : value and anything else, but the reality is still the operation won't
: > : happen.
: >
: > I explained about partial ordering already.  Let me try again.
: >
: > DS_BUSY isn't compatible with this definition.  It would be better to
: > merge DS_ATTACHING and DS_DETACHING into one state, say DS_TRANSITION.
: > Then we'd do the attach sequence as DS_ALIVE -> DS_TRANSITION ->
: > DS_ATTACHED (and later DS_BUSY maybe).  Then detach would go from
: > DS_ATTACHED -> DS_TRANSITION to tear it down.  There's a strong
: > ordering expected in the code, and finding all the subtle places that
: > this extra state beyond DS_BUSY causes problems will be hard.
: 
: We all agreed the one-state was the better option but it can't be done
: in this way because of the device_is_attached() used in the detach
: virtual functions. Using just one transition state will break
: device_is_attached() in those parts.

True.  However, this device_is_attached stuff is fundamentally broken
as it is today.  I took a closer look at the typical usage, and it
stands in as a proxy for 'did all the resources get allocated' and
even that's just the bus_resouce* stuff...

: The right fix, as pointed out in other e-mails, is to not use

e-mails that I wasn't cc'd on since this discussion happened in
private.  I hate to keep harping on this point, but there's been too
much of that lately...

: device_is_attached() in detach virtual functions. The better fix, in
: my idea would involve:
: - replace the device_is_attached() usage in detach virtual functions,
: with a more functional support

This would be transitioning to using a common set of code for
release_resources, ala the other most common driver idiom...

: - use one-state transition
: 
: But that is just too much job to push in before then 8.0-REL and if
: that would mean to not commit a patch and make impossible a future
: MFC, I prefer to go with a lesser-perfect-but-still-working-approach.

Yes.  The problem is that we're trying to guess what the right locking
approach will be at the 11th hour, and I'm worried we will guess wrong.

: I'm sorry if these points weren't clear before.

OK.  Let me ponder based on that...  It might be better for this round
of changes to leverage off the device 'flags' field to indicate that
we're attaching/detaching.  This would not break the
device_is_attached() usage, and would solve the interlock problem
nicely.  While it isn't as aesthetically pleasing as the new states,
it would allow us to easily MFC it without API/ABI breakage.  This
field surely would be covered by the same set of locks as the state
field.

I know that there's a good aesthetic argument to be made against this,
but on the other hand 'compatibility' hacks can violate one's
aesthetics.  We can migrate to a more pleasing state-based model in 9
and reduce the risk to other code from changing its semantics at this
late date.

: > In short, I think that http://people.freebsd.org/~imp/newbus-20090904
: > should be committed and MFC'd.  I've not addressed the devinfo
: > breakage yet...
: 
: 404

http://people.freebsd.org/~imp/newbus-20090904.diff
 
sorry about that...

: Btw, I don't agree about removing the changes within subr_bus.c. They
: are harmless (KASSERT and further checks)
: and will help as a reminder.

Why have them at all?  We're just speculating about what the protocol
will be, rather than abstracting down from a known-to-be-good
implementation.  This sort of thing has bit us in the past before.
Since at best we're speculating about what the best approach is, I'd
prefer to keep the leaps of faith as small as possible.

: For the moment the only thing I still see to be fixed is devinfo. I
: will provide a patch before the end of the day.

Well, and getting agreement on the model that will be used for the
state machine...  We're not there yet...  There may also be some
breakage from this change that's hidden, and I need to carefully audit
all uses of the state field, as well as functions that expose its
state to the rest of the kernel...

Warner



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20090904.172310.-1939841993.imp>