Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 11 Mar 2002 14:21:02 -0800
From:      Alfred Perlstein <bright@mu.org>
To:        John Baldwin <jhb@FreeBSD.org>
Cc:        davidc@freebsd.org, smp@freebsd.org
Subject:   Re: select fix and giant pushdown patch
Message-ID:  <20020311222102.GA92565@elvis.mu.org>
In-Reply-To: <XFMail.020311120529.jhb@FreeBSD.org>
References:  <20020311001131.GN26621@elvis.mu.org> <XFMail.020311120529.jhb@FreeBSD.org>

next in thread | previous in thread | raw e-mail | index | archive | help
* John Baldwin <jhb@FreeBSD.org> [020311 09:05] wrote:
> 
> > Index: dev/kbd/kbd.c
> > ===================================================================
> > RCS file: /home/ncvs/src/sys/dev/kbd/kbd.c,v
> > retrieving revision 1.27
> > diff -u -r1.27 kbd.c
> > --- dev/kbd/kbd.c     12 Sep 2001 08:37:06 -0000      1.27
> > +++ dev/kbd/kbd.c     10 Mar 2002 08:08:36 -0000
> > @@ -524,7 +524,7 @@
> >  #endif
> >       clist_alloc_cblocks(&sc->gkb_q, KB_QSIZE, KB_QSIZE/2); /* XXX */
> >       sc->gkb_rsel.si_flags = 0;
> > -     sc->gkb_rsel.si_pid = 0;
> > +     SEL_INIT(&sc->gkb_rsel);
> >       splx(s);
> >  
> >       return 0;
> 
> softc's are already zero'd so you don't need this at all.

ok.  fixed in the other places you mentioned as well.

> 
> > Index: kern/kern_ktrace.c
> > ===================================================================
> > RCS file: /home/ncvs/src/sys/kern/kern_ktrace.c,v
> > retrieving revision 1.60
> > diff -u -r1.60 kern_ktrace.c
> > --- kern/kern_ktrace.c        27 Feb 2002 19:10:50 -0000      1.60
> > +++ kern/kern_ktrace.c        10 Mar 2002 08:08:36 -0000
> > @@ -181,6 +181,8 @@
> >  
> >       if (error)
> >               return;
> > +
> > +     mtx_lock(&Giant);
> >       /*
> >        * don't let p_tracep get ripped out from under us
> >        */
> > @@ -200,6 +202,7 @@
> >               vrele(vp);
> >       FREE(kth, M_KTRACE);
> >       p->p_traceflag &= ~KTRFAC_ACTIVE;
> > +     mtx_unlock(&Giant);
> >  }
> 
> This will probably result in lots of nice lock order reversals.  That is why
> I'm working on getting things like this out from under Giant rather than
> bogusly pushing Giant down into them.

How exactly?  Every other place is covered by Giant, calling into Ktrace
while holding any lock other than Giant should kill you because Ktrace
may block.  I wouldn't worry about it.

> 
> Note that pushing Giant down into foo_drop routines or foo_free (we should pick
> a consistent name for those btw) is probably a good idea to handle things like
> fdrop() and crfree().
> 
> > +     si = TAILQ_FIRST(&td->td_selinfo);
> > +     while (si != NULL) {
> > +             nsi = TAILQ_NEXT(si, si_thrlist);
> > +             si->si_thread = NULL;
> > +             si = nsi;
> > +     }
> > +     TAILQ_INIT(&td->td_selinfo);
> > +}
> > +
> > +/*
> 
> Maybe:
> 
>         while(!TAILQ_EMPTY(&td->td_selinfo) {
>                 si = TAILQ_FIRST(...);
>                 nsi = TAILQ_NEXT(...);
>                 ...
>         }
> 
> w/o the TAILQ_INIT().

No, actually this code needs to be fixed up. :)  *slaps chad*
It should be a TAILQ_FOREACH followed by a TAILQ_INIT. :)

TAILQ_FOREACH(&td->td_selq, si, si_thrlist)
    si->si_thread = NULL;
TAILQ_INIT(&td->td_selq);

The idea is that you really don't need to dequeue the selinfos, just
null out the thread pointer and clear the thread's list of selinfos.

:)

> Also, calling the field td_selinfo makes it look like you have embedded a
> selinfo in the thread instead of a list of them.  Maybe something like
> td_silist or something that makes it clear it's a head of a list and not a
> selinfo itself?

Can do.

> 
> > +     mtx_lock(&sellock);
> >  retry:
> > +
> >       ncoll = nselcoll;
> >       mtx_lock_spin(&sched_lock);
> >       td->td_flags |= TDF_SELECT;
> >       mtx_unlock_spin(&sched_lock);
> > -     PROC_UNLOCK(td->td_proc);
> > +     mtx_unlock(&sellock);
> > +
> > +     /* XXX Is there a better place for this? */
> > +     TAILQ_INIT(&td->td_selinfo);
> 
> Yes, init it when the thread is created at fork() for now (Julian will move it
> when needed).  It only needs the INIT once.
> 
> The PROC_LOCK's were oringally due to TDF_SELECT being a p_flag and can
> definitely go away now that it is a td_flag.

I think we need an assertion that a cv can only be used with a single
mutex.

> 
> > @@ -838,23 +864,28 @@
> >               timo = ttv.tv_sec > 24 * 60 * 60 ?
> >                   24 * 60 * 60 * hz : tvtohz(&ttv);
> >       }
> > +
> > +     /* XXX What is the point of this? */
> >       mtx_lock_spin(&sched_lock);
> >       td->td_flags &= ~TDF_SELECT;
> >       mtx_unlock_spin(&sched_lock);
> 
> You have to have the lock to make sure no one else is writing to td_flags and
> to make sure you don't have a stale value while performing the update.

This is Chad's XXX, what he's talking about is why do we need to clear
the TDF_SELECT flag at all, not that it's done while holding sched lock
(which makes sense).

I think there's actually a race here, we seem to only recheck the
TDF_SELECT flag if there was a timeout specified in the select call.
So if there wasn't a timeout it looks like we could loose a race
where an fd being waited on but we don't recheck if a selwakeup occured
on it.  Am I missing something?  Basically, I think this code should be
something like:

       mtx_lock_spin(&sched_lock);
       if ((td->td_flags & TDF_SELECT) == 0) {
              mtx_unlock_spin(&sched_lock);
              goto restart;
       }
       td->td_flags &= ~TDF_SELECT;
       mtx_unlock_spin(&sched_lock);

??

Why was there extra hysterics put into the timeout code?

> 
> Thanks for fixing select() btw.  I was also thinking about how to fix this as I
> mentioned to you earlier.  The pfind() was definitely evil.  I don't like
> having a global lock per-se, but I can't think of a better way to do this
> myself.

You can look at the linux mechanism, but there it seems to pound heavily
on the schedlock by leaving multiple equivelants to "asleep" entries on
the wait channels.  I think Linux allows one to basically sleep on multiple
addresses asyncronously.  I'm not sure this is better because it requires
a LOT of overhead imo.

   a) the filedesc lock isn't held so each file must be "fhold()'d"
      and a pointer to it retained in a private store.
      because of this, when select exits instead of just having to walk
      a linked list NULL'ing out a pointer, they must walk the list of
      objects, fput()'ing them, and removing the async sleep structures
      (unless that's done via linking all the async sleep objects together.:)

   b) each file may require an async sleep event to be queued, this can
      pound on schedlock (that is unless they have some magical way of
      avoiding that).

> 
> > +     TAILQ_REMOVE(&td->td_selinfo, sip, si_thrlist);
> > +     mtx_lock_spin(&sched_lock);
> > +     if (td->td_wchan == (caddr_t)&selwait) {
> > +             if (td->td_proc->p_stat == SSLEEP)
> > +                     setrunnable(td);
> > +             else
> > +                     cv_waitq_remove(td);
> > +     } else
> > +             td->td_flags &= ~TDF_SELECT;
> > +     mtx_unlock_spin(&sched_lock);
> 
> This can probably be simplified now to something like this:
> 
>         mtx_lock_spin(&sched_lock);
>         if (td->td_proc->p_stat == SSLEEP)
>                 setrunnable(td);
>         mtx_unlock_spin(&sched_lock);

Huh, why?

> 
> I would rather you commit the select stuff by itself and do the Giant
> pushdown's afterwards though I might like to review just that part of the diff
> once the select changes are in.

Ok.  I'll post two seperate diffs tonight.

> 
> Also, the XXXKSE comment in sys/select.h about si_thread can go away now.

Ok.

-- 
-Alfred Perlstein [alfred@freebsd.org]
'Instead of asking why a piece of software is using "1970s technology,"
 start asking why software is ignoring 30 years of accumulated wisdom.'
Tax deductible donations for FreeBSD: http://www.freebsdfoundation.org/

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-smp" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20020311222102.GA92565>