Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 30 Jan 2010 18:09:48 +0530
From:      "C. Jayachandran" <c.jayachandran@gmail.com>
To:        "M. Warner Losh" <imp@bsdimp.com>
Cc:        freebsd-mips@freebsd.org, neelnatu@gmail.com
Subject:   Re: Code review: groundwork for SMP
Message-ID:  <98a59be81001300439o12ec3bf4pc5c03d4f1511297b@mail.gmail.com>
In-Reply-To: <20100129.100052.1013538172663276257.imp@bsdimp.com>
References:  <85D9D383-29A3-4F09-A2FE-61E4EA85CE9B@lakerest.net> <eaa228be1001282242q1f78fff2w9804da6cdadb3d1f@mail.gmail.com> <dffe84831001290725g2ca2574ap22b82f2ad38af2d6@mail.gmail.com> <20100129.100052.1013538172663276257.imp@bsdimp.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, Jan 29, 2010 at 10:30 PM, M. Warner Losh <imp@bsdimp.com> wrote:
> Greetings one and all. =C2=A0Thanks for weighing in on this issue.
>
> In general, I agree with Neel here. =C2=A0But I also think we need to see
> if we can be flexible and push this down into a per-cpu-type
> decision (which differs slightly from a per-platform type because we
> can have a CPU appearing in multiple platforms, or multiple CPUs
> appearing within one platform). =C2=A0If we make it a per-cpu-type
> solution, we could have a sys/mips/mips/pcpu_machdep.c which does the
> normal SMP stuff, as well as having sys/mips/xlr/pcpu_machdep.c which
> does something optimized for the XLR. =C2=A0Chances are good that differe=
nt
> CPUs will want to have different trade-offs here. =C2=A0We'd also need so=
me
> way to encode this in an include file, so there's some work to make
> PCPU macro different for different CPUs...

Yes, I think if the PCPU macros can be left to the specific
implementation (even if that involves just ifdef in pcpu.h), we are
fine. As I wrote earlier XLR preferred implementation would be to have
the this in direct-mapped memory and have pointers in per-CPU scratch
registers, and avoid TLB overhead.

[...]

> The XLR will have scheduler challenges as well. =C2=A0It will push the
> design assumptions of ULE beyond the breaking point, I fear.
> Hyperthreading already exists on intel, and ULE copes, a bit, with
> it. =C2=A0But with the high number of threads each CPU can have, we may
> need something with a little more smarts. =C2=A0Something that knows it
> might be better to schedule two different processes on two different
> cores, and leave some of the threads idle to reduce TLB pressure, for
> example.

Yes, this is one area we have not looked at much.

> Per CPU scratch registers do not exist on MIPS, in general. =C2=A0Some CP=
Us
> have them, and many do not. =C2=A0CP0 registers are plentiful in more
> modern designs, and some of them may even be useful for our needs.
> However, mfc0 and mtc0 often have pipeline hazards associated with
> them which will trip up the unwary. =C2=A0When reading the historical
> errata for MIPS CPUs, we often find that this is where we need to do
> the most workarounds.

In XLR, there are 8 scratch registers, which can be accessed in a few
cycles without any software visible hazards. So in our internal port
we keep things like PCPU pointer and 'curthread' in scratch registers.

> I guess this is a long way to say "I think we should commit Neel's
> patches. =C2=A0We should work along two fronts: (1) implementing Juli's
> idea of sharing kstack and pcpu data in one TLB and (2) making it so
> that CPUs where this is sub-optimal can swap in their own
> implementation."

Another suggestion I have on Neel's implementation would be to
allocate the pages (direct-mapped) at bootup, so that memory is not
taken for cpus which are not available (XLR can have 4-32 cpus
depending on the chip).

Regards,
JC.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?98a59be81001300439o12ec3bf4pc5c03d4f1511297b>