Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 11 Jul 2021 18:29:51 -0700
From:      Mark Millard via freebsd-arm <freebsd-arm@freebsd.org>
To:        freebsd-arm <freebsd-arm@freebsd.org>
Subject:   Re: HoneyComb first-boot notes [a L3/L2/L1/RAM performance oddity]
Message-ID:  <C21C4DE5-5B1E-41FD-9268-F2E818CBCA11@yahoo.com>
In-Reply-To: <C0426887-59D9-4524-8542-8DA6DBAFF744@yahoo.com>
References:  <8A6C415F-A57B-4F2F-861F-052B487166D6.ref@yahoo.com> <8A6C415F-A57B-4F2F-861F-052B487166D6@yahoo.com> <YNGT5hcHOBd6cU4T@x230.ds> <40AE6447-77AF-4D0E-864F-AD52D9F3346F@yahoo.com> <YNGf999RsaTfNhcp@x230.ds> <Rv9QGaKflpIjLPsxUFG3ht12loej__FxMBy7SQ1QzDTk1NLcFjGb4ScQuF32SakZi68wjgPQpIVp2dipMoYteJIAMhSrXbPM6-mRSeL_744=@a9development.com> <C4D3B585-63B6-4C2A-B8DA-264073C6E2C2@yahoo.com> <12A4EDD1-A2AB-4CE3-AB0E-A4B5D6FB4674@yahoo.com> <5B1B5E1A-8AE4-4889-ABE6-50C206F896FB@yahoo.com> <7DBDC8AB-C80B-4E26-B58F-251A3D29CE41@yahoo.com> <5BBF1B55-F02C-4817-B805-677EDDC5B809@yahoo.com> <0B577668-97AB-44B6-B1A7-C68F6CC299E5@yahoo.com> <C0426887-59D9-4524-8542-8DA6DBAFF744@yahoo.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On 2021-Jul-11, at 04:03, Mark Millard <marklmi at yahoo.com> wrote:

> On 2021-Jul-10, at 22:09, Mark Millard <marklmi at yahoo.com> wrote:
>=20
>> On 2021-Jun-24, at 16:25, Mark Millard <marklmi at yahoo.com> wrote:
>>=20
>>> On 2021-Jun-24, at 16:00, Mark Millard <marklmi at yahoo.com> wrote:
>>>=20
>>>> On 2021-Jun-24, at 13:39, Mark Millard <marklmi at yahoo.com> =
wrote:
>>>>=20
>>>>> Repeating here what I've reported on teh solidrun discord:
>>>>>=20
>>>>> I decided to experiment with monitoring the temperatures reported
>>>>> as things are. For the default heat-sink/fan and the 2 other fans
>>>>> in the case, buildworld with load average 16.? for some time has
>>>>> stayed with tz0 through tz6 reporting between 61.0degC and =
66.0degC,
>>>>> say about 20degC for ambiant. (tz7 and tz8 report 0.1C.) During
>>>>> stages with lower load averages, the tz0..tz6 tempuratures back =
off
>>>>> some. So it looks like my default context keeps the system
>>>>> sufficiently cool for such use.
>>>>>=20
>>>>> I'll note that the default heat-sink's fan is not operating at =
rates
>>>>> that I hear it upstairs. I've heard the noisy mode from there =
during
>>>>> early parts of booting for Fedora 34 server, for example.
>>>>=20
>>>> So I updated my stable/13 source and built and installed
>>>> the update, then did a rm -fr of the build directory
>>>> tree context and started a from-scratch build. The
>>>> build had:
>>>>=20
>>>> SYSTEM_COMPILER: Determined that CC=3Dcc matches the source tree.  =
Not bootstrapping a cross-compiler.
>>>> and:
>>>> SYSTEM_LINKER: Determined that LD=3Dld matches the source tree.  =
Not bootstrapping a cross-linker.
>>>>=20
>>>> as is my standard context for doing such "how long does
>>>> it take" buildworld buildkernel testing.
>>>>=20
>>>> On aarch64 I do not build for targeting non-arm architectures.
>>>> This does save some time on the builds.
>>>=20
>>> I should have mentioned that my builds are based on tuning
>>> for the cortex-a72 via -mcpu=3Dcortex-a72 being used. This
>>> was also true of the live system that was running, kernel
>>> and world.
>>>=20
>>>> The results for the HoneyComb configuration I'm using:
>>>>=20
>>>> World build completed on Thu Jun 24 15:30:11 PDT 2021
>>>> World built in 3173 seconds, ncpu: 16, make -j16
>>>> Kernel build for GENERIC-NODBG-CA72 completed on Thu Jun 24 =
15:34:45 PDT 2021
>>>> Kernel(s)  GENERIC-NODBG-CA72 built in 274 seconds, ncpu: 16, make =
-j16
>>>>=20
>>>> So World+Kernel took a a little under 1 hr to build (-j16).
>>>>=20
>>>>=20
>>>>=20
>>>> Comparison/contrast to prior aarch64 systems that I've used
>>>> for buildworld buildkernel . . .
>>>>=20
>>>>=20
>>>> By contrast, the (now failed) OverDrive 1000's last timing
>>>> was (building releng/13 instead of stable/13):
>>>>=20
>>>> World build completed on Tue Apr 27 02:50:52 PDT 2021
>>>> World built in 12402 seconds, ncpu: 4, make -j4
>>>> Kernel build for GENERIC-NODBG-CA72 completed on Tue Apr 27 =
03:08:04 PDT 2021
>>>> Kernel(s)  GENERIC-NODBG-CA72 built in 1033 seconds, ncpu: 4, make =
-j4
>>>>=20
>>>> So World+Kernel took a a little under 3.75 hrs to build (-j4).
>>>>=20
>>>>=20
>>>> The MACCHIATObin Double Shot's last timing was
>>>> (building a 13-CURRENT):
>>>>=20
>>>> World build completed on Tue Jan 19 03:44:59 PST 2021
>>>> World built in 14902 seconds, ncpu: 4, make -j4
>>>> Kernel build for GENERIC-NODBG completed on Tue Jan 19 04:04:25 PST =
2021
>>>> Kernel(s)  GENERIC-NODBG built in 1166 seconds, ncpu: 4, make -j4
>>>>=20
>>>> So World+Kernel took a little under 4.5 hrs to build (-j4).
>>>>=20
>>>>=20
>>>> The RPi4B 8GiByte's last timing was
>>>> ( arm_freq=3D2000, sdram_freq_min=3D3200, force_turbo=3D1, USB3 SSD
>>>> building releng/13 ):
>>>>=20
>>>> World build completed on Tue Apr 20 14:34:38 PDT 2021
>>>> World built in 22104 seconds, ncpu: 4, make -j4
>>>> Kernel build for GENERIC-NODBG completed on Tue Apr 20 15:03:24 PDT =
2021
>>>> Kernel(s)  GENERIC-NODBG built in 1726 seconds, ncpu: 4, make -j4
>>>>=20
>>>> So World+Kernel took somewhat under 6 hrs 40 min to build.
>>>=20
>>> The -mcpu=3Dcortex-a72 use note also applies to the OverDrive 1000,
>>> MACCHIATObin Double Shot, and RPi4B 8 GiByte contexts.
>>>=20
>>=20
>> I've run into an issue where what FreeBSD calls cpu 0 has
>> significantly different L3/L2/L1/RAM subsystem performance
>> than all the other cores (cpu 0 being worse). Similarly for
>> compared/contrasted to all 4 MACCHIATObin Double Shot cores.
>>=20
>> A plot with curves showing the issue is at:
>>=20
>> =
https://github.com/markmi/acpphint/blob/master/acpphint_example_data/Honey=
CombFreeBSDcpu0RAMAccessPerformanceIsOdd.png
>>=20
>> The dark red curves in the plot show the expected general
>> shape for such and are for cpu 0. The lighter colored
>> curves are the MACCHIATObin curves. The darker ones are
>> the HoneyComb curves, where the L3/L2/L1 is relatively
>> effective (other than cpu 0).
>>=20
>> My notes on Discord (so far) are . . .
>>=20
>> The curves are from my C++ variant of the old Hierarchical
>> INTegration benchmark (historically abbreviated HINT). You
>> can read the approximate size of a level of cache  from=20
>> the x-axis for where the curve drops faster. So, right
>> (most obvious) to left (least obvious): L3 8 MiByte, L2 1
>> MiByte (per core pair, as it turns out), L1 32 KiByte.
>>=20
>> The curves here are for single thread  benchmark
>> configurations with cpuset used to control which CPU is
>> used. I first noticed this via odd performance variations
>> in multithreading with more cores allowed than in use (so
>> migrations to a variety of cpus over time).
>>=20
>> I explored all the CPUs (cores), not just what I plotted.
>> Only the one gets the odd performing memory access
>> structure in its curve.
>>=20
>> FYI: The FreeBSD boot is UEFI/ACPI based for both systems,
>> not U-Boot based.
>>=20
>=20
> Jon Nettleton has replicated the memory access performance
> issue on the one cpu via a different HoneyComb, running
> some Linux kernel, using tinymembench as the benchmark.
>=20

Jon reports that for HoneyCombs older and newer, EDK2's older
and newer: All show the behavior on cpu 0. "[I]t may have
always existed."

Jon also reports that U-Boot based booting does not get the
behavior.

(I've never used U-Boot to boot the HoneyComb for any OS
media that I've got around. In my U-Boot ignorance, my
quick attempts failed for FreeBSD main and Fedora 34
Server media that I've been using with EDK2's UEFI/ACPI.)

=3D=3D=3D
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?C21C4DE5-5B1E-41FD-9268-F2E818CBCA11>