From owner-freebsd-amd64@freebsd.org Fri Sep 27 20:53:06 2019 Return-Path: Delivered-To: freebsd-amd64@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id C58F212CA2A for ; Fri, 27 Sep 2019 20:53:06 +0000 (UTC) (envelope-from marklmi@yahoo.com) Received: from sonic304-22.consmr.mail.ne1.yahoo.com (sonic304-22.consmr.mail.ne1.yahoo.com [66.163.191.148]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 46g3sY4JPzz4Jj4 for ; Fri, 27 Sep 2019 20:53:05 +0000 (UTC) (envelope-from marklmi@yahoo.com) X-YMail-OSG: waxFg.QVM1mnjqB3fF7um.UECE9jJ7GdltGt3ugDUMqHjdSFgCeAX7BWUB25VkX iOUBThaG8agCd1PkUoxerwcpf8jriyfX3yvgtMqHGj1GDU7HDPwTO.CCIDbByIdeSxWYrPkJHQe9 AXu8V_X73aOkmCZHGzpSd9.Jq2EgQRMkdSCwT8cpoST_QZkHYCBnctbo4OSzXioMjtIGFm866QBF WdNjzWdZreHH1jz.Ghh51cKpQ5tzktN9XR9FyH9UpLR3.ZPJbyaMqkUCG27RByTa7orfUg5GK.92 uozG_fG9xhbFwTFBVl85FO6c5eP3KDmlyq_sJjHIiO8hu.jxhay55uaStObCGWkMncxRLdiGd9EW XN3LrsDNi_VKOyNqdK_AINo7dlTx_HcXceYf1_pPlKlJpt.OHbhpXql9FTOgqvsh.FU3bnv64eij YSiOjuTaMqqedC.530u8BqMRmMYMKFjde1FeWFttD9aYYy3GIAsznKfy0B1A3SZMLXxp0e1I4OdB 2cp3St.T_KtxRyzO4OId3fCbhiBOQqwnXzZxUq7xUo_YxCdu5kyFCgilfdb0odXmGAKjuCmihr3e RGi1kisds4pC1JHjxLq0NN85PCFs5ilvVrcfepdORpWURJWxv327PeXy8K4tJcYCOYuptUuiLebh np.KHwx6xZaAzLZCYg9JPnQDCJNMw1j6z0TTYigkVVr6nNvAMeNG694H.nV42v14Ati15d1qnlrz EOc0gPl27oDlgn_NSVCxC10EfVYWf9Fo.Ao85YwTJQlDypynVhyqr7ZO19DiUIbDNx_DqI9vCWiV Uiiprn7.o0yPS7W1QUc3oDjj1cC52ssMkRY5DYxjnQkqRcKQ7w_scUwVPzDmDrtYKGJw5Bvrntpu OJUyj9_JKZEqurpj06dmTM_5ZsXbNbUxUdKi6m6cA0jvo9RhywBkm5MI579QeSwTO.asdsbr2wFp st8RpgZb.3H2RKN1hevNrzoFcFC2PNX9KErPBe2E6UXJS1EaYhgXF9JWLhTKyoHwbJVGS7dDv1sJ 8a2jPVJElOb17XBvVbzb5uL6ZbmuhipbHnnAV7VDiW_I6MmYxXkqVH72qQ.5KOIG6tY19kCuWzAa .X3yrHdGhpc8ABu.3nc9nzKFkRzwOFyh3G9ftkFG5zpqFW9QdLLgnwEcaVSdtbCsTx.oFIhsv8aq .rn.py9EGOBzSZ1w7tAZDymWWr0z4E073jWwC6_2GppQDnJK4ehds.wgjXp4XDbiRyPESsF0_VYX jwx_EaDwBAfc2xyIGD86VhkKzaxKBJMzxQrPuV3uREsd7JP2rN2w0GJRO_bpj Received: from sonic.gate.mail.ne1.yahoo.com by sonic304.consmr.mail.ne1.yahoo.com with HTTP; Fri, 27 Sep 2019 20:53:04 +0000 Received: by smtp405.mail.ne1.yahoo.com (Oath Hermes SMTP Server) with ESMTPA ID ac096501cfc03302cf9a9c0240bea97e; Fri, 27 Sep 2019 20:53:00 +0000 (UTC) From: Mark Millard Message-Id: <08CA4DA1-131C-4B14-BB57-EAA22A8CD5D9@yahoo.com> Mime-Version: 1.0 (Mac OS X Mail 12.4 \(3445.104.11\)) Subject: Re: head -r352341 example context on ThreadRipper 1950X: cpuset -n prefer:1 with -l 0-15 vs. -l 16-31 odd performance? Date: Fri, 27 Sep 2019 13:52:58 -0700 In-Reply-To: <20190927192434.GA93180@raichu> Cc: freebsd-amd64@freebsd.org, freebsd-hackers@freebsd.org To: Mark Johnston References: <704D4CE4-865E-4C3C-A64E-9562F4D9FC4E@yahoo.com> <20190925170255.GA43643@raichu> <4F565B02-DC0D-4011-8266-D38E02788DD5@yahoo.com> <78A4D18C-89E6-48D8-8A99-5FAC4602AE19@yahoo.com> <26B47782-033B-40C8-B8F8-4C731B167243@yahoo.com> <20190926202936.GD5581@raichu> <2DE123BE-B0F8-43F6-B950-F41CF0DEC8AD@yahoo.com> <6BC5F6BE-5FC3-48FA-9873-B20141FEFDF5@yahoo.com> <20190927192434.GA93180@raichu> X-Mailer: Apple Mail (2.3445.104.11) X-Rspamd-Queue-Id: 46g3sY4JPzz4Jj4 X-Spamd-Bar: +++ X-Spamd-Result: default: False [3.49 / 15.00]; TO_DN_SOME(0.00)[]; R_SPF_ALLOW(-0.20)[+ptr:yahoo.com]; FREEMAIL_FROM(0.00)[yahoo.com]; MV_CASE(0.50)[]; URI_COUNT_ODD(1.00)[9]; DKIM_TRACE(0.00)[yahoo.com:+]; DMARC_POLICY_ALLOW(-0.50)[yahoo.com,reject]; FROM_EQ_ENVFROM(0.00)[]; MIME_TRACE(0.00)[0:+,1:+,2:~]; IP_SCORE(0.00)[ip: (5.47), ipnet: 66.163.184.0/21(1.32), asn: 36646(1.05), country: US(-0.05)]; FREEMAIL_ENVFROM(0.00)[yahoo.com]; SUBJECT_ENDS_QUESTION(1.00)[]; MID_RHS_MATCH_FROM(0.00)[]; ASN(0.00)[asn:36646, ipnet:66.163.184.0/21, country:US]; ARC_NA(0.00)[]; R_DKIM_ALLOW(-0.20)[yahoo.com:s=s2048]; FROM_HAS_DN(0.00)[]; RCPT_COUNT_THREE(0.00)[3]; MIME_GOOD(-0.10)[multipart/alternative,text/plain]; DWL_DNSWL_NONE(0.00)[yahoo.com.dwl.dnswl.org : 127.0.5.0]; NEURAL_SPAM_MEDIUM(0.99)[0.992,0]; IP_SCORE_FREEMAIL(0.00)[]; TO_MATCH_ENVRCPT_SOME(0.00)[]; NEURAL_SPAM_LONG(0.99)[0.993,0]; RCVD_IN_DNSWL_NONE(0.00)[148.191.163.66.list.dnswl.org : 127.0.5.0]; RCVD_TLS_LAST(0.00)[]; RWL_MAILSPIKE_POSSIBLE(0.00)[148.191.163.66.rep.mailspike.net : 127.0.0.17]; RCVD_COUNT_TWO(0.00)[2] Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.29 X-BeenThere: freebsd-amd64@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Porting FreeBSD to the AMD64 platform List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 27 Sep 2019 20:53:06 -0000 On 2019-Sep-27, at 12:24, Mark Johnston wrote: > On Thu, Sep 26, 2019 at 08:37:39PM -0700, Mark Millard wrote: >>=20 >>=20 >> On 2019-Sep-26, at 17:05, Mark Millard wrote: >>=20 >>> On 2019-Sep-26, at 13:29, Mark Johnston = wrote: >>>> One possibility is that these are kernel memory allocations = occurring in >>>> the context of the benchmark threads. Such allocations may not = respect >>>> the configured policy since they are not private to the allocating >>>> thread. For instance, upon opening a file, the kernel may allocate = a >>>> vnode structure for that file. That vnode may be accessed by = threads >>>> from many processes over its lifetime, and may be recycled many = times >>>> before its memory is released back to the allocator. >>>=20 >>> For -l0-15 -n prefer:1 : >>>=20 >>> Looks like this reports sys_thr_new activity, sys_cpuset >>> activity, and 0xffffffff80bc09bd activity (whatever that >>> is). Mostly sys_thr_new activity, over 1300 of them . . . >>>=20 >>> dtrace: pid 13553 has exited >>>=20 >>>=20 >>> kernel`uma_small_alloc+0x61 >>> kernel`keg_alloc_slab+0x10b >>> kernel`zone_import+0x1d2 >>> kernel`uma_zalloc_arg+0x62b >>> kernel`thread_init+0x22 >>> kernel`keg_alloc_slab+0x259 >>> kernel`zone_import+0x1d2 >>> kernel`uma_zalloc_arg+0x62b >>> kernel`thread_alloc+0x23 >>> kernel`thread_create+0x13a >>> kernel`sys_thr_new+0xd2 >>> kernel`amd64_syscall+0x3ae >>> kernel`0xffffffff811b7600 >>> 2 >>>=20 >>> kernel`uma_small_alloc+0x61 >>> kernel`keg_alloc_slab+0x10b >>> kernel`zone_import+0x1d2 >>> kernel`uma_zalloc_arg+0x62b >>> kernel`cpuset_setproc+0x65 >>> kernel`sys_cpuset+0x123 >>> kernel`amd64_syscall+0x3ae >>> kernel`0xffffffff811b7600 >>> 2 >>>=20 >>> kernel`uma_small_alloc+0x61 >>> kernel`keg_alloc_slab+0x10b >>> kernel`zone_import+0x1d2 >>> kernel`uma_zalloc_arg+0x62b >>> kernel`uma_zfree_arg+0x36a >>> kernel`thread_reap+0x106 >>> kernel`thread_alloc+0xf >>> kernel`thread_create+0x13a >>> kernel`sys_thr_new+0xd2 >>> kernel`amd64_syscall+0x3ae >>> kernel`0xffffffff811b7600 >>> 6 >>>=20 >>> kernel`uma_small_alloc+0x61 >>> kernel`keg_alloc_slab+0x10b >>> kernel`zone_import+0x1d2 >>> kernel`uma_zalloc_arg+0x62b >>> kernel`uma_zfree_arg+0x36a >>> kernel`vm_map_process_deferred+0x8c >>> kernel`vm_map_remove+0x11d >>> kernel`vmspace_exit+0xd3 >>> kernel`exit1+0x5a9 >>> kernel`0xffffffff80bc09bd >>> kernel`amd64_syscall+0x3ae >>> kernel`0xffffffff811b7600 >>> 6 >>>=20 >>> kernel`uma_small_alloc+0x61 >>> kernel`keg_alloc_slab+0x10b >>> kernel`zone_import+0x1d2 >>> kernel`uma_zalloc_arg+0x62b >>> kernel`thread_alloc+0x23 >>> kernel`thread_create+0x13a >>> kernel`sys_thr_new+0xd2 >>> kernel`amd64_syscall+0x3ae >>> kernel`0xffffffff811b7600 >>> 22 >>>=20 >>> kernel`vm_page_grab_pages+0x1b4 >>> kernel`vm_thread_stack_create+0xc0 >>> kernel`kstack_import+0x52 >>> kernel`uma_zalloc_arg+0x62b >>> kernel`vm_thread_new+0x4d >>> kernel`thread_alloc+0x31 >>> kernel`thread_create+0x13a >>> kernel`sys_thr_new+0xd2 >>> kernel`amd64_syscall+0x3ae >>> kernel`0xffffffff811b7600 >>> 1324 >>=20 >> With sys_thr_new not respecting -n prefer:1 for >> -l0-15 (especially for the thread stacks), I >> looked some at the generated integration kernel >> code and it makes significant use of %rsp based >> memory accesses (read and write). >>=20 >> That would get both memory controllers going in >> parallel (kernel vectors accesses to the preferred >> memory domain), so not slowing down as expected. >>=20 >> If round-robin is not respected for thread stacks, >> and if threads migrate cpus across memory domains >> at times, there could be considerable variability >> for that context as well. (This may not be the >> only way to have different/extra variability for >> this context.) >>=20 >> Overall: I'd be surprised if this was not >> contributing to what I thought was odd about >> the benchmark results. >=20 > Your tracing refers to kernel thread stacks though, not the stacks = used > by threads when executing in user mode. My understanding is that a = HINT > implementation would spend virtually all of its time in user mode, so = it > shouldn't matter much or at all if kernel thread stacks are backed by > memory from the "wrong" domain. Looks like I was trying to think about it when I should have been = sleeping. You are correct. > This also doesn't really explain some of the disparities in the plots > you sent me. For instance, you get a much higher peak QUIS on FreeBSD > than on Fedora with 16 threads and an interleave/round-robin domain > selection policy. True. I suppose that there is the possibility that steady_clock's now() = results are odd for some reason for the type of context, leading to the = durations between such being on the short side where things look different. But the left hand side of the single-thread results (smaller memory = sizes for the vectors for the integration kernel's use) do not show such a = rescaling. (The single thread time measurements are strictly inside the thread of execution, no thread creation or such counted for any size.) The right = hand side of the single thread results (larger memory use, making smaller = cache levels fairly ineffective) do generally show some rescaling, but not as = drastic as multi-threaded. Both round-robin and prefer:1 showed such for single threaded. =3D=3D=3D Mark Millard marklmi at yahoo.com ( dsl-only.net went away in early 2018-Mar)