FreeBSD Mail Archives

Date:      Fri, 29 Mar 2002 15:56:22 -0500 (EST)
From:      John Baldwin <jhb@FreeBSD.org>
To:        Matthew Dillon <dillon@apollo.backplane.com>
Cc:        freebsd-smp@FreeBSD.org
Subject:   Re: RE: Syscall contention tests return, userret() bugs/issues.
Message-ID:  <XFMail.20020329155622.jhb@FreeBSD.org>
In-Reply-To: <200203292017.g2TKHcD59955@apollo.backplane.com>


On 29-Mar-2002 Matthew Dillon wrote:
>:PCPU_SET(foo, PCPU_GET(foo) + 1) would work w/o a critical section actually.
>:To be honest, stats such as these don't have to be perfect, so using a simple
>:non-atomic increment is probably ok. :)
> 
>     I'm fairly sure that wouldn't work.  The counter on one cpu will be 
>     different from another, so if you read one cpu's counter and write it
>     to the other cpu's space the stats will not just be wrong, they will
>     be blown to bits.

Ah, doh, nm.

>     I agree in regards to simply using a non-atomic increment for a stats
>     counter.  We would still have to use PCPU_PTR to keep it within a single
>     cpu's per-cpu structure (even if we can't guarentee that it's OUR cpu's
>     per-cpu structure), but C does not guarentee a single instruction for
>     ++*ptr even though it generates it most of the time.  Hence, a
>     PCPU_ADD_INT() macro would be useful (non-locking, non-atomic,
>     single-instruction add to the current cpu's BLAH counter).  I could
>     whip this up in about 10 seconds for i386 but would need help on the
>     other architectures.

Actually, I was thinking of just using a single counter but only doing unlocked
increments on it.  It's just stats so it's not that important.  If people
really want to make it important then we should worry about getting it perfect,
but if it doesn't need to be perfect I wouldn't worry about it.

>:>       My current hack is somewhat messy.  Presumably we will eventually be
>:>       able to remove Giant from this path.  The question is:  When?  And
>:>       what is the current state of work in that area?  Also, it seems to
>:>       me that a signal can come in after the check (with or without my
>:>       hack) and not get recognized until the next system call or interrupt.
>:>       This is bad.
>:> 
>:>       Comments? John?
>:
>:The only reason Giant is in that path right now is for ktrace. :)  When
>:ktrace
>:is fixed (coming as soon as I get all of td_ucred cleared and out of the way)
>:then this won't need Giant anymore.  Alternatively, could tweak it right now
>:to
>:make the Giant lock/unlock #ifdef KTRACE.
> 
>     Most people like to compile KTRACE into the kernel so I do not think
>     we would gain much by #ifdef'ing around KTRACE just to optimize the path.

Yeah, tha thad occured to me.

>     I haven't looked into how much work is required re: ktrace but nothing
>     else here would take me more then a day.  I'm not in a huge rush so if
> you
>     get bogged down on the ktrace stuff (e.g. 3-4 week timeframe) and need
>     help just ring me up.  I'll remind you then.  In the mean time I would
>     like to work on the stats counter and NEEDRESCHED userret issue.

The ktrace stuff is done, I just want to get the td_ucred stuff done first
since it will include some slight changes to the ktrace stuff as far as
arguments passed around to make suser and friends happier.  Once that is done
I'll update the ktrace stuff polish it up (the ktrgenio case still needs some
work to make sure records still stay in order) and then it can be committed.

>:>       This case seems rather odd to me.  I understand why we might need to
>:>       obtain the sched_lock as a memory barrier but what happens if 
>:>       NEEDRESCHED is set after we release the sched_lock?  Interrupts
>:>       are enabled at the point where userret() is called, there is
>:>       nothing preventing either a signal or a NEEDRESCHED from occuring
>:>       after we've tested it but before we've returned to usermode,
>:>       causing us to not honor either until the next interrupt or syscall.
>:> 
>:>       Has anyone grappled with this issue yet?
>:
>:Yes, that is what the interrupt disable stuff in the old version of ast()
>:(that
>:is now MD) is about.  If another process or thread on another CPU triggers an
>:AST on this process, it will send an IPI when it does so.  Thus, if we don't
>:see the flag being set here, we will get the IPI via an interrupt after
>:resuming in user mode.  If you don't lock the access to ASTPENDING and
>:NEEDRESCHED there is a small race on archs w/ weaker memory models (i.e.
>:!i386)
>:and SMP where you could miss the flag, however, since we always send the IPI
>:in
>:that case and interrupts usually implicitly enforce the equivalence of
>:membars
>:it will probably never occur.  Also, if it did, all it would mean is possibly
>:missing a reschedule or signal (the only AST's we get async) until the next
>:kernel entry which really isn't that big of a deal.
>:
>:All of the MD code currently checks NEEDRESCHED and ASTPENDING w/o a lock
>:right
>:now anyways thanks to Jake's changes.
>:
>:-- 
>:
>:John Baldwin <jhb@FreeBSD.org>  <><  http://www.FreeBSD.org/~jhb/
> 
>     Hmm.  Ok, I see how it works.  int0x80_syscall calls syscall() and
>     then jumps to doreti, which checks for KEF_ASTPENDING or KEF_NEEDRESCHED
>     with interrupts disabled prior to the iret.

Right.

>     In that case, why is userret() checking KEF_NEEDRESCHED at all?  Is it
>     needed for non-i386 architectures or is it simply a lazy optimization?
>     (lazy == technical term).  It looks like the rescheduling code could be
>     moved out of userret() and into ast() directly.

Yes, it could.  In fact, most of userret() probably could just move into ast()
now.  Part of this is due to the fact that ast() used to be an actual trap of
type T_AST and relied on the fact that the trap return called userret().  The
new ast() should only be checking NEEDRESCHED now inside of userret().  Either
that or we merge ast() and userret() and call them userret() (I think userret()
is the more sensible name personally.)

>     Are signals dealt with the same way?  If not we have a race in the signal
>     handling code.  If so we can probably move that to a KEF flag / ast()
>     as well (which would either move the CURSIG/postsig loop from userret()
>     or ast(), or move it and retain it in userret() as a lazy optimization
>     whos initial CURSIG() check can be done without a lock.

Signals set ASTPENDING.  Bruce does want to use a separate flag for signals
so that we don't have to call CURSIG() except when a signal is actually pending.

-- 

John Baldwin <jhb@FreeBSD.org>  <><  http://www.FreeBSD.org/~jhb/
"Power Users Use the Power to Serve!"  -  http://www.FreeBSD.org/

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-smp" in the body of the message

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?XFMail.20020329155622.jhb>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation