Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 04 Sep 2009 16:16:34 -0600 (MDT)
From:      "M. Warner Losh" <imp@bsdimp.com>
To:        attilio@freebsd.org
Cc:        arch@freebsd.org
Subject:   NEWBUS states (was Re: svn commit: r196779 - in head/sys: kern sys)
Message-ID:  <20090904.161634.-217944108.imp@bsdimp.com>
In-Reply-To: <3bbf2fe10909041455u552b0dbdm1708ea0a26365149@mail.gmail.com>
References:  <200909031340.n83Defkv034013@svn.freebsd.org> <20090904.112919.1521002024.imp@bsdimp.com> <3bbf2fe10909041455u552b0dbdm1708ea0a26365149@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
[[ redirected to arch@ since too much of this discussion has been private ]]

In message: <3bbf2fe10909041455u552b0dbdm1708ea0a26365149@mail.gmail.com>
            Attilio Rao <attilio@freebsd.org> writes:
: 2009/9/4 M. Warner Losh <imp@bsdimp.com>:
: > In message: <200909031340.n83Defkv034013@svn.freebsd.org>
: >            Attilio Rao <attilio@FreeBSD.org> writes:
: > : Modified: head/sys/sys/bus.h
: > : ==============================================================================
: > : --- head/sys/sys/bus.h        Thu Sep  3 12:41:00 2009        (r196778)
: > : +++ head/sys/sys/bus.h        Thu Sep  3 13:40:41 2009        (r196779)
: > : @@ -52,8 +52,11 @@ struct u_businfo {
: > :  typedef enum device_state {
: > :       DS_NOTPRESENT,                  /**< @brief not probed or probe failed */
: > :       DS_ALIVE,                       /**< @brief probe succeeded */
: > : +     DS_ATTACHING,                   /**< @brief attaching is in progress */
: > :       DS_ATTACHED,                    /**< @brief attach method called */
: > : -     DS_BUSY                         /**< @brief device is open */
: > : +     DS_BUSY,                        /**< @brief device is open */
: > : +     DS_DETACHING                    /**< @brief detaching is in progress */
: > : +
: > :  } device_state_t;
: >
: > device_state_t is exported to userland via devctl.  Well, that's not
: > entirely true...  It isn't used EXACTLY, but there's this in
: > devinfo.h:
: >
: > /*
: >  * State of the device.
: >  */
: > /* XXX not sure if I want a copy here, or expose sys/bus.h */
: > typedef enum devinfo_state {
: >        DIS_NOTPRESENT,                 /* not probed or probe failed */
: >        DIS_ALIVE,                      /* probe succeeded */
: >        DIS_ATTACHED,                   /* attach method called */
: >        DIS_BUSY                        /* device is open */
: > } devinfo_state_t;
: >
: > which is why devinfo is broken.
: 
: I think the right fix here is to maintain in sync devinfo.h and bus.h
: definition by just having one. I see that the devices states are
: redefined in devinfo.h in order to avoid namespace pollution and this
: is a good point. What I propose is to add a new header (_bus.h,  to be
: included in both bus.h and devinfo.h) which just containst the device
: states.

There's a lot of possible fixes.  It is broken right now.  Your commit
broke it.

The problem is that we have multiple names for the same things, and
that's a defined API/ABI today....  So having a _bus.h won't solve
this problem without introducing name space pollution (well, I suppose
you could have __DS_NOTPRESENT, __DS_ALIVE, etc and then have
devinfo.h and bus.h map those in, but that still wouldn't totally
solve the problem).

: > Also, DS_BUSY is used in many drivers to PREVENT detaching.  So the
: > change is bad from that point of view, since DS_DETACHING is now >
: > DS_BUSY.  There's really a partial ordering relationship now where
: > before there was a total ordering (DS_BUSY is > DS_ATTACHED and
: > DS_DETACHING is > DS_ATTACH, but DS_DETACHING isn't > DS_BUSY and
: > DS_BUSY isn't > DS_DETACHING).  I think that you've destroyed
: > information here by unconditionally setting it:
: >
: > -       if ((error = DEVICE_DETACH(dev)) != 0)
: > +       dev->state = DS_DETACHING;
: > +       if ((error = DEVICE_DETACH(dev)) != 0) {
: > +               KASSERT(dev->state == DS_DETACHING,
: > +                   ("%s: %p device state must not been changing", __func__,
: > +                   dev));
: > +               dev->state = DS_ATTACHED;
: >                return (error);
: > +       }
: > +       KASSERT(dev->state == DS_DETACHING,
: > +           ("%s: %p device state must not been changing", __func__, dev));
: >
: > And this looks racy between the check earlier and this setting.
: > Properly locked, this wouldn't destroy information...
: 
: Sorry, I really don't understand what point are you making here, and
: what the scaring words of "destroying", "racy", etc. means. Can you
: explain better that part?

I think it is a minor concern.  There's no locking now to prevent
multiple threads doing stuff.  I think it would be better to
completely omit this stuff than to put it in 1/2 way.

Also, it breaks the linear nature of device_state_t.

: > At the very least cardbus/cardbus.c and pccard/pccard.c need to be
: > looked at since they both have code that looks like:
: >
: >        for (tmp = 0; tmp < numdevs; tmp++) {
: >                struct cardbus_devinfo *dinfo = device_get_ivars(devlist[tmp]);
: >                int status = device_get_state(devlist[tmp]);
: >
: >                if (dinfo->pci.cfg.dev != devlist[tmp])
: >                        device_printf(cbdev, "devinfo dev mismatch\n");
: >                if (status == DS_ATTACHED || status == DS_BUSY)
: >                        device_detach(devlist[tmp]);
: >                cardbus_release_all_resources(cbdev, dinfo);
: >                cardbus_device_destroy(dinfo);
: >                device_delete_child(cbdev, devlist[tmp]);
: >                pci_freecfg((struct pci_devinfo *)dinfo);
: >        }
: >
: > which does ignore errors returned by device_detach for the DS_BUSY
: > case because there's not currently a good way to tell device_detach
: > that it *MUST* detach the device *NOW* without any possibility of veto
: > by the driver.  The above code also isn't DS_DETACHING aware, and may
: > be wrong in the face of this new state.
: 
: How DS_DETACHING can cause problems here? device_detach() simply won't
: run if the state is DS_DETACHING as expected (another thread is alredy
: detaching and there is no need for it to detach).
: Also, please note that in this case, for the state == DS_BUSY he
: device_detach() won't do anything. You can't simply skip the return
: value and anything else, but the reality is still the operation won't
: happen.

I explained about partial ordering already.  Let me try again.

DS_BUSY isn't compatible with this definition.  It would be better to
merge DS_ATTACHING and DS_DETACHING into one state, say DS_TRANSITION.
Then we'd do the attach sequence as DS_ALIVE -> DS_TRANSITION ->
DS_ATTACHED (and later DS_BUSY maybe).  Then detach would go from
DS_ATTACHED -> DS_TRANSITION to tear it down.  There's a strong
ordering expected in the code, and finding all the subtle places that
this extra state beyond DS_BUSY causes problems will be hard.

I'm not entirely sure how DS_DETACHING can cause problems here.
However, I've not looked at all the possible code paths and such to
make sure that there isn't a problem.  Given the strong ordering of
device_state_t that's present today, I'm not so glib that it would
cause no problems.  I want strong assurance that it won't.  If we use
DS_TRANSITION, then I know it won't.

It would solve the problems with only one new state, and would
preserve the ordering that's implicit in the code and hard to ferret
out.  This is only one example.

: > Of course, grepping the tree does show one or two places where DS_BUSY
: > is used inappropriately:
: >
: > rp.c:
: >
: > static int
: > rp_pcidetach(device_t dev)
: > {
: >        CONTROLLER_t    *ctlp;
: >
: >        if (device_get_state(dev) == DS_BUSY)
: >                return (EBUSY);
: >
: > is one example.  The above check should just be removed (ditto for its
: > SHUTDOWN) routine.
: >
: > So I think we should fix rp.c, but we need to talk through this change
: > a little more.  I'm surprised I wasn't even pinged about it, since it
: > hits code that I maintain and a simple grep would have found...
: 
: Still, I don't see a problem with the codes you mentioned (if not in
: the consumers, which were alredy "broken" before this change and which
: situation is not worse now), unless the devinfo breakage.

Well, the rp.c that I talked about is a minor breakage that can easily
be fixed.  It is unrelated to your changes, but my grep found the
breakage...

devinfo breakage, however is a bigger deal, as is the breaking of the
linearness of device_state_t.  Let's just use one new state.  It won't
make locking harder, and will also prevent a device from attaching
while someone else is detaching it better.

In summary:

I support adding one new state (and only one new state) to
device_state_t that is numerically smaller than DS_ATTACHED.  I think
we should take this opportunity to make the states sparser as well
(since that would allow us to insert them in the future, should a
proven need be found).  I think that the extra checks that were added
to subr_bus.c should be backed out.  I think that only the new state
enums and their new values should be MFC'd.  I further thing that
*ALL* future discussions of newbus ABI/API changes go through arch@.

In short, I think that http://people.freebsd.org/~imp/newbus-20090904
should be committed and MFC'd.  I've not addressed the devinfo
breakage yet...

Warner



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20090904.161634.-217944108.imp>