Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 21 Sep 2002 11:46:21 -0700
From:      Terry Lambert <tlambert2@mindspring.com>
To:        beemern <beemern@ksu.edu>
Cc:        smp@freebsd.org, jhb@freebsd.org
Subject:   Re: For those with P4 SMP problems..
Message-ID:  <3D8CBE7D.877EA3A4@mindspring.com>
References:  <Pine.GSO.4.33L.0209211040140.6780-100000@unix2.cc.ksu.edu>

next in thread | previous in thread | raw e-mail | index | archive | help
beemern wrote:
> i'm preparing to start in on Terry Lambert's suggestion, however,
> perhaps you (anyone) could clear up a few minor questions..
> 
> -he says other systems are "matching CPUs started at the time of the
> check"
> ..matching them with what?

The theory is that the BIOS has the corect information, but in the
wrong order, and FreeBSD cares about the order, but Linux and Windows
do not, because they;ve performed an additional optimization that lets
them start the APs simultaneously, and a side effect of this is that
they don't care about order of start, they just care *that* they start.


	/*
	 * start each AP in our list
	 */
	static int
	start_all_aps(u_int boot_addr)
	{
...
	        /* start each AP */
	        for (x = 1; x <= mp_naps; ++x) {
...
			bootSTK = &SMP_prvspace[x].idlestack[UPAGES*PAGE_SIZE];
	                bootAP = x;
   
	                /* attempt to start the Application Processor */
...
	                if (!start_ap(x, boot_addr)) {
...
			}
...
	                /* record its version info */
	                cpu_apic_versions[x] = cpu_apic_versions[0];
        
	                all_cpus |= (1 << x);           /* record AP in CPU map */
	}
...
	/*
	 * this function starts the AP (application processor) identified
	 * by the APIC ID 'physicalCpu'.  It does quite a "song and dance"
	 * to accomplish this.  This is necessary because of the nuances
	 * of the different hardware we might encounter.  It ain't pretty,
	 * but it seems to work.
	 */
	static int
	start_ap(int logical_cpu, u_int boot_addr)
	{
...
	        /* get the PHYSICAL APIC ID# */
	        physical_cpu = CPU_TO_ID(logical_cpu);
...
	}


...basically, what is happening here is that there is an iteration
through all of the logical CPUs, which is then used to start the
physical CPUs.

If they are all started at the same time, and the results are
collected before they are compared, as in Linux or Windows NT,
then the results are that they all start.

If they are attempted to be started serially, and the results
are also collected serially, then the result is that they do
not start.

The implication here is clear: serial start fails because the
APIC ID the BIOS claims is assigned to each CPU is not the APIC
ID which was actually assigned to the CPU.  But the concurrent
startup works, because the set of IDs known to the BIOS matches
the set of IDs assigned to the CPUs.

So you get the right answer to the "has this started?" question,
but you don't get the answer from the physical CPU you expected.

The serial start depends on getting the correct answer from the
CPU you expected, rather than just getting the correct answer
and not caring about the man behind the curtain.

Probably, the canonically correct thing to do would be to start
each CPU with code that reassigns it's real APIC ID into the
logical APIC ID, so that there is no longer a mismatch.


> -also, shouldn't our whole exercise of exhuastively hardcoding the apic
> for cpu1 from 1 to 11 have found out which one was the REAL one?

Not really.  If you look at the code, there's a bunch of coupled
information.  By serially attempting to start it, you assume not
only that the APIC ID that the BIOS erroneously believes to be
correct is used, but that the associated stack and other information
is also known to the processor.

Basically, you aren't going to be able to safely do about 4 things
in the same order, and expect them to work.  The start_all_aps()
code needs to be refactored, amd the start_ap() code needs to be
broken into between 3 and 5 parts (depending on how you handle
making the APs correspond to the the logical APs), and unrolled
so that it can b. done concurrently, instead of depending on serial
success.


> -finally, it appears Mr. Lambert is suggesting 2 mutually exclusive
> solutions (correct?) ..where the second one ("For extra points...") looks
> like the more complete and "right" solution, however, as noted in the
> previous question, shouldn't we have hit upon the correct id already by
> playing with the physical_cpu and CPU_TO_ID() as i and Mr. Feldkamp have
> been?
> 
> thanks for any further input/direction you can give.. i'm gonna poke
> around in the src and find where the cpu->apic assignments are made
> originally and just see what i can see

You should be able to start everything up, not caring about the
logical vs. physical APIC ID mapping, as long as you start all the
CPUs.  What will break, however, is that if the BIOS doesn't simply
contain the right physical APIC IDs, out of order, or if you need
to send a targetted IPI, instead of a broadcast IPI.

So the two "solutions" boil down to correcting the physical/logical
mapping, or reloading the physical APIC ID register.  Either one
works, but reloading the register lets you get rid of the logical
and physical indirection (assuming you shove the I/O APIC off to
ID 31, the last ID).  How correct, and when correct, these have to
be really depends on how often the logical to physical translation
happens, in order to explicitly signal a CPU.  I'd have to read all
the -current code in considerable detail to answer that question, or
just punt, and come up with a fix where the answer to the question
ends up not mattering.  That's the way I prefer... 8-).

Rewriting the APIC ID in each auxillary CPU is a pain in the neck;
the BIOS does it by holding 5 bits worth of pins on each CPU to a
specific value. You can do it, in theory: the code should not need
the BIOS to do the assignment to function, if you don't care about
not starting some CPUs, or starting particular ones... that gets
around all the normal BIOS bugs related to CPU detection, but it's
a much harder problem to solve, since you have to have a free APIC
ID to let you shuffle things around (hence the extra points ;^)).

-- Terry

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-smp" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3D8CBE7D.877EA3A4>