Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 9 Mar 2010 18:42:02 -0600
From:      Kevin Day <toasty@dragondata.com>
To:        John Baldwin <jhb@freebsd.org>
Cc:        freebsd-hackers@freebsd.org
Subject:   Re: Extremely slow boot on VMWare with Opteron 2352 (acpi?)
Message-ID:  <207B4180-B8AF-4C93-8BC7-7F1FFEEBB713@dragondata.com>
In-Reply-To: <201003091727.09188.jhb@freebsd.org>
References:  <2C7A849F-2571-48E7-AA75-B6F87C2352C1@dragondata.com> <201003091727.09188.jhb@freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help

On Mar 9, 2010, at 4:27 PM, John Baldwin wrote:

> On Tuesday 09 March 2010 3:40:26 pm Kevin Day wrote:
>>=20
>>=20
>> If I boot up on an Opteron 2218 system, it boots normally. If I boot =
the=20
> exact same VM moved to a 2352, I get:
>>=20
>> acpi0: <INTEL 440BX> on motherboard
>> PCIe: Memory Mapped configuration base @ 0xe0000000
>>   (very long pause)
>> ioapic0: routing intpin 9 (ISA IRQ 9) to lapic 0 vector 48
>> acpi0: [MPSAFE]
>> acpi0: [ITHREAD]
>>=20
>> then booting normally.
>=20
> It's probably worth adding some printfs to narrow down where the pause =
is=20
> happening.  This looks to be all during the acpi_attach() routine, so =
maybe=20
> you can start there.

Okay, good pointer. This is what I've narrowed down:

acpi_enable_pcie() calls pcie_cfgregopen(). It's called here with =
pcie_cfgregopen(0xe0000000, 0, 255). inside pcie_cfgregopen, the pause =
starts here:

        /* XXX: We should make sure this really fits into the direct =
map. */
        pcie_base =3D (vm_offset_t)pmap_mapdev(base, (maxbus + 1) << =
20);

pmap_mapdev calls pmap_mapdev_attr, and in there this evaluates to true:

        /*
         * If the specified range of physical addresses fits within the =
direct
         * map window, use the direct map.=20
         */
        if (pa < dmaplimit && pa + size < dmaplimit) {

so we call pmap_change_attr which called pmap_change_attr_locked. It's =
changing 0x10000000 bytes starting at 0xffffff00e0000000.  The very last =
line before returning from pmap_change_attr_locked is:

                pmap_invalidate_cache_range(base, tmpva);

And this is where the delay is. This is calling MFENCE/CLFLUSH in a loop =
8 million times. We actually had a problem with CLFLUSH causing panics =
on these same CPUs under Xen, which is partially why we're looking at =
VMware now. (see kern/138863). I'm wondering if VMware didn't encounter =
the same problem and replace CLFLUSH with a software emulated version =
that is far slower... based on the speed is probably invalidating the =
entire cache. A quick change to pmap_invalidate_cache_range to just =
clear the entire cache if the area being cleared is over 8MB seems to =
have fixed it. i.e.:

        else if (cpu_feature & CPUID_CLFSH)  {

to

        else if ((cpu_feature & CPUID_CLFSH) && ((eva-sva) < (2<<22))) {


However, I'm a little blurry on if everything leading to this point is =
correct. It's ending up with 256MB of memory for the pci area, which =
seems really excessive. Is the problem just that it wants room for 256 =
busses, or...? Anyone know this code path well enough to know if this is =
deviating from the norm?

-- Kevin




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?207B4180-B8AF-4C93-8BC7-7F1FFEEBB713>