Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 2 Dec 2019 14:56:25 -0800
From:      Mark Millard <marklmi@yahoo.com>
To:        freebsd-arm@freebsd.org
Subject:   Re: Comparing the OverDrive 1000 (A57) vs. MACCHIATObin Double Shot (A72) for buildworld and via a CPU/cache/RAM tradeoff-exploring benchmark (links corrected)
Message-ID:  <63787F5A-A3B7-434A-B594-999D95559BEE@yahoo.com>
In-Reply-To: <5F7E7618-A503-4D16-B83C-0379F4B6327F@yahoo.com>
References:  <92E7B63A-E790-4815-9D91-2161A4F66B71.ref@yahoo.com> <92E7B63A-E790-4815-9D91-2161A4F66B71@yahoo.com> <5F7E7618-A503-4D16-B83C-0379F4B6327F@yahoo.com>

next in thread | previous in thread | raw e-mail | index | archive | help
[Just correcting the links to be to .png files
and correcting some PowerMac11,2 related wording.]

On 2019-Dec-2, at 14:15, Mark Millard <marklmi at yahoo.com> wrote:

> It looks like the OverDrive 1000 vs. MACCHIATObin Double
> Shot comparison ends up being an example of memory
> access making the difference for the specific workload:
> -j4 buildworld for head -r355027 (building itself
> from scratch).
>=20
> buildworld times (not needing a llvm bootstrap build):
>=20
> OverDrive 1000:           13895 sec (about 3.86 hrs)
> MACCHIATObin Double Shot: 16561 sec (about 4.60 hrs)
>=20
> So a little under 45 min difference when the mean
> and geometric mean are both a little over 4.2 hrs.
>=20
> SSD ufs file systems: One with Samsung 860 Pro, the
> other with Samsung 850 Pro. I do not expect that I/O
> made much of a difference, but I did nothing to measure
> such for the buildworld activity.
>=20
> OverDrive RAM:     8GiByte, half in each of the 2 slots
> MACCHIATObin RAM: 16GiByte, all in its 1 slot.
>=20
> MACCHIATObin: jumpers set for the fastest CPU/RAM
> speed for the Double Shot.
>=20
> A comparison graph from exploring single threaded
> and multi-threaded CPU/cache and RAM limited
> performance (a variation on the old HINT serial
> and pthread benchmarks) is shown at:

Corrected link:

=
https://github.com/markmi/acpphint/blob/master/acpphint_example_data/acpph=
int-OverDrive_1000_MacchDblShot-threads_4-LP64-g%2B%2B_9_8.3_O3-libc%2B%2B=
_libstdc%2B%2B-DSIZE_large_fast_types-RAM.png

> There are curves for various involved types:
> double (d), unsigned long long (ull), unsigned
> long (ul), unsigned int (ui). The match for
> ull and ul for the context provides some
> evidence of the variability observed.
>=20
> (The OverDrive and MACCHIATObin were not benchmarked
> for the graph at the same version of head: -r352341
> based vs. -r355027 based.)
>=20
> (I did not set things such that the benchmark run
> would explore paging getting involved. Thus there
> is basically no I/O considered in the comparison
> graph.)
>=20
> The MACCHIATObin clearly wins single threaded and
> its memory subsystem was well matched to the single
> threaded use when the same-invovled-types are
> compared. (Single threaded are the blueish curves,
> MACCHIATObin having the lighter colors.)
>=20
> For multi-threaded in the range where RAM access
> limits things, the two systems are a close match.
> (Greenish colors, right side of plot, upper
> curves.)
>=20
> The range were the OverDrive 1000 is clearly faster
> is part of the middle of the multi-threaded curves.
> (This might be tied to whatever is done with the
> dual RAM slot structure or to the amount of caching,
> or some such, I do not know the details.)
>=20
> I would expect "-j1 buildworld" would take less time
> on the MACCHIATObin than on the OverDrive, but I'm
> not planing on measuring that.
>=20
>=20
>=20
> A more historical comparison, old PowerMac11,2
> (2 sockets, 2 cores each) vs. the MACCHIATObin,
> both having 16 GiBytes of RAM:
>=20
> For analogous benchmark graphs (matching types),
> the MACCHIATObin single threaded is faster than
> the old PowerMac11,2 single threaded and also is
> usually faster than that 11,2's multi-threaded
> benchmark data as well.

I should have pointed out that the MACCHIATObin
single threaded and PowerMac11,2 multi-threaded
results are similar where memory access limits
things, with use of double (d) being a little
slower on the MACCHIATObin in this region.

> Multi-threaded, the
> MACCHIATObin is faster for the exploration by
> the benchmark.

Corrected link:

=
https://github.com/markmi/acpphint/blob/master/acpphint_example_data/acpph=
int-MacchDblShot_PowerMac11%2C2-threads_4-LP64-g%2B%2B_9_O3-libc%2B%2B-DSI=
ZE_large_fast_types-RAM.png

> I expect that this is interesting for the likely
> difference in power usage during the benchmarking.
> (Not that I've measured the power usage.)
>=20
> (The FreeBSD head vintages are not the same in
> the graph: -r355027 based vs. -r352341 based.)
>=20



=3D=3D=3D
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?63787F5A-A3B7-434A-B594-999D95559BEE>