From owner-freebsd-stable@freebsd.org Tue Jan 30 19:51:41 2018 Return-Path: Delivered-To: freebsd-stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id E5C17EDED66 for ; Tue, 30 Jan 2018 19:51:40 +0000 (UTC) (envelope-from mike@sentex.net) Received: from smarthost2.sentex.ca (smarthost2.sentex.ca [IPv6:2607:f3e0:80:80::2]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (Client CN "smarthost.sentex.ca", Issuer "smarthost.sentex.ca" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 5B61C7E050; Tue, 30 Jan 2018 19:51:40 +0000 (UTC) (envelope-from mike@sentex.net) Received: from lava.sentex.ca (lava.sentex.ca [IPv6:2607:f3e0:0:5::11]) by smarthost2.sentex.ca (8.15.2/8.15.2) with ESMTPS id w0UJpdEc029084 (version=TLSv1 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO); Tue, 30 Jan 2018 14:51:39 -0500 (EST) (envelope-from mike@sentex.net) Received: from [192.168.43.26] (saphire3.sentex.net [192.168.43.26]) by lava.sentex.ca (8.15.2/8.15.2) with ESMTP id w0UJpbPT010426; Tue, 30 Jan 2018 14:51:37 -0500 (EST) (envelope-from mike@sentex.net) Subject: Re: Ryzen issues on FreeBSD ? To: Don Lewis Cc: Peter Moody , Pete French , freebsd-stable@freebsd.org, Andriy Gapon References: <8e842dec-ade7-37d1-6bd8-856ea1a827ca@sentex.net> <9b769e4e-b098-b294-0bce-8bb1c42e8a59@rootautomation.com> <730eb882-1c6a-afb7-0ada-396db44fb34b@ingresso.co.uk> <8b882970-4d5d-2a96-4dac-779cab07b9ae@sentex.net> <343acf99-3e9e-093a-7390-c142396c2985@sentex.net> <3dd9a61b-511d-db2e-80ca-cbc9a4b65f92@sentex.net> <55913e41-3a8a-9a4d-6862-e09a3d0f4d55@sentex.net> From: Mike Tancsa Organization: Sentex Communications Message-ID: <5e48bbc2-e872-46bd-eece-25acbb180f77@sentex.net> Date: Tue, 30 Jan 2018 14:51:36 -0500 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.5.2 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 8bit X-Scanned-By: MIMEDefang 2.78 X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.25 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 30 Jan 2018 19:51:41 -0000 On 1/28/2018 7:41 PM, Don Lewis wrote: > > My suspicion is a FreeBSD bug, probably a locking / race issue. I know > that we've had to make some tweeks to our code for AMD CPUs, like this: OK, I got back the CPUs from AMD (fast turn around!) And sadly, I am still able to hang the compile in about the same place. However, if I set hw.lower_amd64_sharedpage=0 it seems to hang in a different way. CTRL+t shows load: 0.43 cmd: python2.7 15736 [umtxn] 165.00r 14.46u 6.65s 0% 233600k make[1]: Working in: /usr/ports/net/samba47 make: Working in: /usr/ports/net/samba47 # procstat -t 15736 PID TID COMM TDNAME CPU PRI STATE WCHAN 15736 100855 python2.7 - -1 152 sleep usem 15736 100956 python2.7 - -1 124 sleep umtxn 15736 100957 python2.7 - -1 126 sleep umtxn 15736 100958 python2.7 - -1 124 sleep umtxn 15736 100959 python2.7 - -1 127 sleep umtxn 15736 100960 python2.7 - -1 126 sleep umtxn 15736 100961 python2.7 - -1 126 sleep umtxn 15736 100962 python2.7 - -1 126 sleep umtxn 15736 100963 python2.7 - -1 126 sleep umtxn 15736 100964 python2.7 - -1 127 sleep umtxn 15736 100965 python2.7 - -1 126 sleep umtxn 15736 100966 python2.7 - -1 126 sleep umtxn 15736 100967 python2.7 - -1 126 sleep umtxn # procstat -kk 15736 PID TID COMM TDNAME KSTACK 15736 100855 python2.7 - mi_switch+0xf5 sleepq_catch_signals+0x405 sleepq_wait_sig+0xf _sleep+0x231 umtxq_sleep+0x143 do_sem2_wait+0x68a __umtx_op_sem2_wait+0x4b amd64_syscall+0xa48 fast_syscall_common+0xfc 15736 100956 python2.7 - mi_switch+0xf5 sleepq_catch_signals+0x405 sleepq_wait_sig+0xf _sleep+0x231 umtxq_sleep+0x143 do_lock_umutex+0x885 __umtx_op_wait_umutex+0x48 amd64_syscall+0xa48 fast_syscall_common+0xfc 15736 100957 python2.7 - mi_switch+0xf5 sleepq_catch_signals+0x405 sleepq_wait_sig+0xf _sleep+0x231 umtxq_sleep+0x143 do_lock_umutex+0x885 __umtx_op_wait_umutex+0x48 amd64_syscall+0xa48 fast_syscall_common+0xfc 15736 100958 python2.7 - mi_switch+0xf5 sleepq_catch_signals+0x405 sleepq_wait_sig+0xf _sleep+0x231 umtxq_sleep+0x143 do_lock_umutex+0x885 __umtx_op_wait_umutex+0x48 amd64_syscall+0xa48 fast_syscall_common+0xfc 15736 100959 python2.7 - mi_switch+0xf5 sleepq_catch_signals+0x405 sleepq_wait_sig+0xf _sleep+0x231 umtxq_sleep+0x143 do_lock_umutex+0x885 __umtx_op_wait_umutex+0x48 amd64_syscall+0xa48 fast_syscall_common+0xfc 15736 100960 python2.7 - mi_switch+0xf5 sleepq_catch_signals+0x405 sleepq_wait_sig+0xf _sleep+0x231 umtxq_sleep+0x143 do_lock_umutex+0x885 __umtx_op_wait_umutex+0x48 amd64_syscall+0xa48 fast_syscall_common+0xfc 15736 100961 python2.7 - mi_switch+0xf5 sleepq_catch_signals+0x405 sleepq_wait_sig+0xf _sleep+0x231 umtxq_sleep+0x143 do_lock_umutex+0x885 __umtx_op_wait_umutex+0x48 amd64_syscall+0xa48 fast_syscall_common+0xfc 15736 100962 python2.7 - mi_switch+0xf5 sleepq_catch_signals+0x405 sleepq_wait_sig+0xf _sleep+0x231 umtxq_sleep+0x143 do_lock_umutex+0x885 __umtx_op_wait_umutex+0x48 amd64_syscall+0xa48 fast_syscall_common+0xfc 15736 100963 python2.7 - mi_switch+0xf5 sleepq_catch_signals+0x405 sleepq_wait_sig+0xf _sleep+0x231 umtxq_sleep+0x143 do_lock_umutex+0x885 __umtx_op_wait_umutex+0x48 amd64_syscall+0xa48 fast_syscall_common+0xfc 15736 100964 python2.7 - mi_switch+0xf5 sleepq_catch_signals+0x405 sleepq_wait_sig+0xf _sleep+0x231 umtxq_sleep+0x143 do_lock_umutex+0x885 __umtx_op_wait_umutex+0x48 amd64_syscall+0xa48 fast_syscall_common+0xfc 15736 100965 python2.7 - mi_switch+0xf5 sleepq_catch_signals+0x405 sleepq_wait_sig+0xf _sleep+0x231 umtxq_sleep+0x143 do_lock_umutex+0x885 __umtx_op_wait_umutex+0x48 amd64_syscall+0xa48 fast_syscall_common+0xfc 15736 100966 python2.7 - mi_switch+0xf5 sleepq_catch_signals+0x405 sleepq_wait_sig+0xf _sleep+0x231 umtxq_sleep+0x143 do_lock_umutex+0x885 __umtx_op_wait_umutex+0x48 amd64_syscall+0xa48 fast_syscall_common+0xfc 15736 100967 python2.7 - mi_switch+0xf5 sleepq_catch_signals+0x405 sleepq_wait_sig+0xf _sleep+0x231 umtxq_sleep+0x143 do_lock_umutex+0x885 __umtx_op_wait_umutex+0x48 amd64_syscall+0xa48 fast_syscall_common+0xfc If I kill the make, reboot and just type make, it completes after the reboot. If after the reboot, I do an rm -R work, it will hang again. With the default of hw.lower_amd64_sharedpage: 1 post reboot, CTRL+T shows load: 2.73 cmd: python2.7 15703 [usem] 40.92r 12.34u 3.45s 0% 233640k make[1]: Working in: /usr/ports/net/samba47 make: Working in: /usr/ports/net/samba47 root@amdtestr12:/home/mdtancsa # procstat -kk 15703 PID TID COMM TDNAME KSTACK 15703 100824 python2.7 - mi_switch+0xf5 sleepq_catch_signals+0x405 sleepq_wait_sig+0xf _sleep+0x231 umtxq_sleep+0x143 do_sem2_wait+0x68a __umtx_op_sem2_wait+0x4b amd64_syscall+0xa48 fast_syscall_common+0xfc 15703 100956 python2.7 - mi_switch+0xf5 sleepq_catch_signals+0x405 sleepq_wait_sig+0xf _sleep+0x231 umtxq_sleep+0x143 do_sem2_wait+0x68a __umtx_op_sem2_wait+0x4b amd64_syscall+0xa48 fast_syscall_common+0xfc 15703 100957 python2.7 - mi_switch+0xf5 sleepq_catch_signals+0x405 sleepq_wait_sig+0xf _sleep+0x231 umtxq_sleep+0x143 do_sem2_wait+0x68a __umtx_op_sem2_wait+0x4b amd64_syscall+0xa48 fast_syscall_common+0xfc 15703 100958 python2.7 - mi_switch+0xf5 sleepq_catch_signals+0x405 sleepq_wait_sig+0xf _sleep+0x231 umtxq_sleep+0x143 do_sem2_wait+0x68a __umtx_op_sem2_wait+0x4b amd64_syscall+0xa48 fast_syscall_common+0xfc 15703 100959 python2.7 - mi_switch+0xf5 sleepq_catch_signals+0x405 sleepq_wait_sig+0xf _sleep+0x231 umtxq_sleep+0x143 do_sem2_wait+0x68a __umtx_op_sem2_wait+0x4b amd64_syscall+0xa48 fast_syscall_common+0xfc 15703 100960 python2.7 - mi_switch+0xf5 sleepq_catch_signals+0x405 sleepq_wait_sig+0xf _sleep+0x231 umtxq_sleep+0x143 do_sem2_wait+0x68a __umtx_op_sem2_wait+0x4b amd64_syscall+0xa48 fast_syscall_common+0xfc 15703 100961 python2.7 - mi_switch+0xf5 sleepq_catch_signals+0x405 sleepq_wait_sig+0xf _sleep+0x231 umtxq_sleep+0x143 do_sem2_wait+0x68a __umtx_op_sem2_wait+0x4b amd64_syscall+0xa48 fast_syscall_common+0xfc 15703 100962 python2.7 - mi_switch+0xf5 sleepq_catch_signals+0x405 sleepq_wait_sig+0xf _sleep+0x231 umtxq_sleep+0x143 do_sem2_wait+0x68a __umtx_op_sem2_wait+0x4b amd64_syscall+0xa48 fast_syscall_common+0xfc 15703 100963 python2.7 - mi_switch+0xf5 sleepq_catch_signals+0x405 sleepq_wait_sig+0xf _sleep+0x231 umtxq_sleep+0x143 do_sem2_wait+0x68a __umtx_op_sem2_wait+0x4b amd64_syscall+0xa48 fast_syscall_common+0xfc 15703 100964 python2.7 - mi_switch+0xf5 sleepq_catch_signals+0x405 sleepq_wait_sig+0xf _sleep+0x231 umtxq_sleep+0x143 do_sem2_wait+0x68a __umtx_op_sem2_wait+0x4b amd64_syscall+0xa48 fast_syscall_common+0xfc 15703 100965 python2.7 - mi_switch+0xf5 sleepq_catch_signals+0x405 sleepq_wait_sig+0xf _sleep+0x231 umtxq_sleep+0x143 do_lock_umutex+0x885 __umtx_op_wait_umutex+0x48 amd64_syscall+0xa48 fast_syscall_common+0xfc 15703 100966 python2.7 - mi_switch+0xf5 sleepq_catch_signals+0x405 sleepq_wait_sig+0xf _sleep+0x231 umtxq_sleep+0x143 do_sem2_wait+0x68a __umtx_op_sem2_wait+0x4b amd64_syscall+0xa48 fast_syscall_common+0xfc 15703 100967 python2.7 - mi_switch+0xf5 sleepq_catch_signals+0x405 sleepq_wait_sig+0xf _sleep+0x231 umtxq_sleep+0x143 do_sem2_wait+0x68a __umtx_op_sem2_wait+0x4b amd64_syscall+0xa48 fast_syscall_common+0xfc root@amdtestr12:/home/mdtancsa # procstat -t 15703 PID TID COMM TDNAME CPU PRI STATE WCHAN 15703 100824 python2.7 - -1 152 sleep usem 15703 100956 python2.7 - -1 125 sleep usem 15703 100957 python2.7 - -1 127 sleep usem 15703 100958 python2.7 - -1 125 sleep usem 15703 100959 python2.7 - -1 125 sleep usem 15703 100960 python2.7 - -1 126 sleep usem 15703 100961 python2.7 - -1 126 sleep usem 15703 100962 python2.7 - -1 126 sleep usem 15703 100963 python2.7 - -1 126 sleep usem 15703 100964 python2.7 - -1 126 sleep usem 15703 100965 python2.7 - -1 126 sleep umtxn 15703 100966 python2.7 - -1 126 sleep usem 15703 100967 python2.7 - -1 125 sleep usem root@amdtestr12:/home/mdtancsa # ---Mike > > ------------------------------------------------------------------------ > r321608 | kib | 2017-07-27 01:37:07 -0700 (Thu, 27 Jul 2017) | 9 lines > > Use MFENCE to serialize RDTSC on non-Intel CPUs. > > Kernel already used the stronger barrier instruction for AMDs, correct > the userspace fast gettimeofday() implementation as well. > > > > I did go back and look at the build runaways that I've occasionally seen > on my AMD FX-8320E package builder. I haven't seen the python issue > there, but have seen gmake get stuck in a sleeping state with a bunch of > zombie offspring. > > -- ------------------- Mike Tancsa, tel +1 519 651 3400 Sentex Communications, mike@sentex.net Providing Internet services since 1994 www.sentex.net Cambridge, Ontario Canada