Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 13 Mar 2008 20:14:27 -1000 (HST)
From:      Jeff Roberson <jroberson@chesapeake.net>
To:        Joseph Koshy <joseph.koshy@gmail.com>
Cc:        freebsd-arch@freebsd.org
Subject:   Re: [PATCH] hwpmc(4) changes to use 'mp_maxid' instead of 'mp_ncpus'.
Message-ID:  <20080313200839.S1091@desktop>
In-Reply-To: <84dead720803132232k15c3aad7pe59875f0c84e0c27@mail.gmail.com>
References:  <20080313180805.GA83406@dragon.NUXI.org> <200803131516.12284.jhb@freebsd.org> <84dead720803132232k15c3aad7pe59875f0c84e0c27@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, 14 Mar 2008, Joseph Koshy wrote:

> On Fri, Mar 14, 2008 at 12:46 AM, John Baldwin <jhb@freebsd.org> wrote:
>> On Thursday 13 March 2008 02:08:05 pm David O'Brien wrote:
>> > Hi folks,
>> > Some folks at Juniper have submitted these changes to hwpmc(4).
>> > I am sending them here for public review.
>> >
>> > Their thoughts are:
>> >     The mp_ncpus refers to the count of the active CPU's.  Where as
>> >     mp_maxid refers to the count of all the cpus on the SMP.  Using
>> >     mp_ncpus in the cpu_id range-check of hwpmc module would lead to the
>> >     assumption that all the active CPU's in the SMP are not interleaved.
>> >     But for running on some platforms, the active and inactive cpus could
>> >     be interleaved making hwpmc not work for the cpus whose cpu_id is
>> >     greater than the active-cpu count.
>
> jhb>  This is correct, but you need to handle CPUs that are absent.  It might be
> jhb>  sufficient to update pmc_cpu_is_disabled() in kern_pmc.c to check
> jhb>  CPU_ABSENT(cpu) and claim the CPU is disabled if it is absent, but I'm not
> jhb>  sure that will catch everything as that seems aimed at handling having a
> jhb>  non-absent CPU halted (such as disabling HTT on i386).
>
> That is inline with the feedback (and sample patch to kern_pmc.c) that I
> had sent in to O'Brien.
>
> But there are other problems with the patch at various levels,
> probably not obvious to someone who is just looking at the kernel
> code.
>
> First, the relevance.  My understanding is that these changes are for
> a proprietary SMP platform that uses non-mainstream (Tier3 or
> Tier4) CPUs.  It so happens that Juniper decided to numbers CPUs
> 'sparsely' in their kernel variant and that is the motivation for this
> patch.
>
> IMO, as a policy, code changes for exotic hardware need to be
> maintained by vendors of said exotic hardware and not dumped on
> volunteers.

In general we accept vendor patches that are not disruptive even in the 
case that the general communit doesn't perceive the real value.  It is 
important for us to work with and encourage vendors.

>
> Second, when I designed the PMCTools API I didn't consider that CPU
> numbers could be 'sparse'.  [They aren't sparsely allocated
> on the i386/amd64---the code I looked at when I was designing
> PmcTools.]  So there are assumptions sprinkled throughout userland
> that that the integers 0..hw.ncpus  can select a valid CPU.  While
> all that can be tracked down and changed, and documentation updated,
> it is still work that I would prefer to defer until there is a chance
> that someone
> in the general public can use it.  I do need to prioritize how I spend my
> volunteer hours.

We're not asking you to support the feature.  It looks like juniper 
already has it tested and working.  We just need someone to review the 
patches and commit them.

>
> Third, IFF we as a project are going to support 'sparse CPU numbering,
> I would like to see the form that takes before making changes to
> HWPMC and tools.  For example:

The majority of the kernel already deals with sparse cpu mappings.  That's 
why we have CPU_ABSENT().  Please look at UMA and ULE for examples of code 
that I have written which use this macro correctly.  I'm sure there are 
other places that do as well that I'm not familiar with.

> - How will userland and in-kernel modules find out which CPUs are
>  physically present?   Would there be a bitmask on the lines of today's
>  machdep.hlt_cpus that we could query?  Could we make the
>  'all_cpus' bitmask visible to userland?  What happens when we
>  start supporting systems with more than 32 processors?

The kernel has the various cpumasks available in sys/smp.h.  Userland can 
now use cpusets to find out what processors are available to it.  In the 
future we are going to replace simple cpumasks with the cpuset_t structure 
from cpusets so on machines that support more than sizeof(register) * 8 
processors we will use arrays.

> - Will sysctl hw.ncpus represent the count of present CPUs or will it
>  represent the maximum CPU id?

That is the number of cpus not the maximum id.

> - How will userland  distinguish between absent CPUs those that
>  could be temporarily administratively disabled?

We don't presently make the distinction to the user.

> - Are we going to support 'transient' CPUs [that come and go]?  Why
>  would we want sparse CPU numbering otherwise?

That is a much more difficult problem and one which we have discussed for 
virtualization purposes.   This patch would further that eventual goal 
although obviously there is more work to do to get there.

>
> Nit: 'mp_maxid' appears to be an index, not a count as claimed above.

Yes, that is unfortunate for these purposes.

>
> If support for sparse CPU numbering is something useful, I feel the
> correct sequence should be to discuss it here, add sparse CPU
> numbering to the base i386/amd64 kernels (say) first and then
> propagate the feature to auxiliary code like HWPMC and userland.
>
> Changing HWPMC and its userland before the base kernel itself
> changes does not seem to be the right thing to do.

The rest of the generic code in the kernel already supports this.  Juniper 
claims to have tested and is using this feature.  Furthermore, it will get 
us a tiny step closer to being able to support pluggable cpus in a 
virtualized environment.

Thanks,
Jeff

>
> Regards,
> Koshy
> _______________________________________________
> freebsd-arch@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-arch
> To unsubscribe, send any mail to "freebsd-arch-unsubscribe@freebsd.org"
>



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20080313200839.S1091>