From owner-freebsd-stable@freebsd.org Fri Feb 23 20:22:34 2018 Return-Path: Delivered-To: freebsd-stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 6197CF1AA0F for ; Fri, 23 Feb 2018 20:22:34 +0000 (UTC) (envelope-from mike@sentex.net) Received: from smarthost2.sentex.ca (smarthost2.sentex.ca [IPv6:2607:f3e0:80:80::2]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (Client CN "smarthost.sentex.ca", Issuer "smarthost.sentex.ca" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 0ED1B76B00; Fri, 23 Feb 2018 20:22:33 +0000 (UTC) (envelope-from mike@sentex.net) Received: from lava.sentex.ca (lava.sentex.ca [IPv6:2607:f3e0:0:5::11]) by smarthost2.sentex.ca (8.15.2/8.15.2) with ESMTPS id w1NKMXOD023384 (version=TLSv1 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO); Fri, 23 Feb 2018 15:22:33 -0500 (EST) (envelope-from mike@sentex.net) Received: from [192.168.43.26] (saphire3.sentex.ca [192.168.43.26]) by lava.sentex.ca (8.15.2/8.15.2) with ESMTP id w1NKMUVd010198; Fri, 23 Feb 2018 15:22:31 -0500 (EST) (envelope-from mike@sentex.net) Subject: Ryzen lockup on bhyve was (Re: new Ryzen lockup issue ?) To: Nimrod Levy Cc: FreeBSD-STABLE Mailing List References: <92a60e14-f532-2647-d45d-b500fc59ba88@sentex.net> From: Mike Tancsa Organization: Sentex Communications Message-ID: <425be16f-9fdc-9ed6-72b1-02e28bfd130f@sentex.net> Date: Fri, 23 Feb 2018 15:22:31 -0500 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.6.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 8bit X-Scanned-By: MIMEDefang 2.78 X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.25 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 23 Feb 2018 20:22:34 -0000 Actually I can confirm the same sort of hard lockup happens on my Epyc board with RELENG11. It also happens in current. I will file a PR and post on freebsd-current in case someone has any suggestions on how to try and figure out whats going on. I upgraded the box to 12.0-CURRENT #0 r329866 in order to see if it could avoid the lockup, but same deal. The vmm driver does seem different when loaded, but the same lock up under load CPU: AMD Ryzen 5 1600X Six-Core Processor (3593.35-MHz K8-class CPU) Origin="AuthenticAMD" Id=0x800f11 Family=0x17 Model=0x1 Stepping=1 Features=0x178bfbff Features2=0x7ed8320b AMD Features=0x2e500800 AMD Features2=0x35c233ff Structured Extended Features=0x209c01a9 XSAVE Features=0xf AMD Extended Feature Extensions ID EBX=0x7 SVM: NP,NRIP,VClean,AFlush,DAssist,NAsids=32768 TSC: P-state invariant, performance statistics AMD-Vi: IVRS Info VAsize = 64 PAsize = 48 GVAsize = 2 flags:0 driver bug: Unable to set devclass (class: ppc devname: (unknown)) ivhd0: on acpi0 ivhd0: Flag:b0 ivhd0: Features(type:0x11) MsiNumPPR = 0 PNBanks= 2 PNCounters= 0 ivhd0: Extended features[31:0]:22294ada HATS = 0x2 GATS = 0x0 GLXSup = 0x1 SmiFSup = 0x1 SmiFRC = 0x2 GAMSup = 0x1 DualPortLogSup = 0x2 DualEventLogSup = 0x2 ivhd0: Extended features[62:32]:f77ef Max PASID: 0x2f DevTblSegSup = 0x3 MarcSup = 0x1 ivhd0: supported paging level:7, will use only: 4 ivhd0: device range: 0x0 - 0xffff ivhd0: PCI cap 0x190b640f@0x40 feature:19 On 2/23/2018 12:35 PM, Nimrod Levy wrote: > Now that is a fascinating data point. My machine that I've been having > issues with has been running a bhyve vm from the beginning.  I never > made the connection. I'll try throwing some network traffic at the VM > and see if I can make it lock up. > > On Fri, Feb 23, 2018 at 10:14 AM, Mike Tancsa > wrote: > > On 2/22/2018 3:41 PM, Mike Tancsa wrote: > > On 2/21/2018 3:04 PM, Mike Tancsa wrote: > >> Not sure if I have found another issue specific to Ryzen, or a bug that > >> manifests itself on Ryzen systems easier.  I installed the latest > >> virtualbox from the ports and was doing some network performance tests > >> between a vm and the hypervisor using iperf3.  The guest is just a > >> RELENG11 image and the network is an em nic bridged to epair1b > > > > This looks possibly related to VirtualBox. Doing the same tests and more > > using bhyve, I dont get any lockup.  Not to mention, network IO is MUCH > > faster. > > > Actually, it just took a little bit longer to lock up the box with bhyve > on RELENG_11 as the hypervisor.   Would be great if anyone can confirm > this locks up their Ryzen boxes ? I tried 2 different boxes to eliminate > a hardware issue.  Also tried a similar test on Ubuntu and I can spin up > 4 instances and run without lockups. > > Just grab a copy of > > https://download.freebsd.org/ftp/releases/VM-IMAGES/11.1-RELEASE/amd64/Latest/FreeBSD-11.1-RELEASE-amd64.raw.xz > > > and make 2 copies. tmp.raw and tmp2.raw > > > kldload vmm > ifconfig tap0 create > ifconfig tap1 create > ifconfig tap1 up > ifconfig tap0 up > ifconfig bridge0 create addm tap0 addm tap1 > ifconfig bridge0 192.168.99.1/24 > > screen -d -m sh /usr/share/examples/bhyve/vmrun.sh -c 4 -m 6144M -t tap0 > -d tmp.raw BSD11a > screen -d -m sh /usr/share/examples/bhyve/vmrun.sh -c 4 -m 6144M -t tap1 > -d tmp2.raw BSD11b > > Install netperf on the 2 vms and give the vtnet interface > 192.168.99.2/24 and 192.168.99.3/24 > > > In both VMs pkg install iperf3 and start it up as > iperf -s > > In the hypervisor, > iperf -t 10000 -R -c 192.168.99.2 > iperf -t 10000 -c 192.168.99.3 > > > the box locks up solid after 5-20 min.  Same hardware with Ubuntu and > virtual box and 4 instances work fine, no lockups after a day so not > sure whats up but it seems to be something with the Ryzen CPU running as > a hypervisor or with some type of load :( > > Prior to lockup I had a stream of netstat -m writing to a file every 5 > seconds. The last entry was below. It doesnt seem to be leak. > > Thu Feb 22 17:14:28 EST 2018 > 8694/10281/18975 mbufs in use (current/cache/total) > 8225/5211/13436/2038424 mbuf clusters in use (current/cache/total/max) > 8225/5184 mbuf+clusters out of packet secondary zone in use > (current/cache) > 461/3747/4208/1019211 4k (page size) jumbo clusters in use > (current/cache/total/max) > 0/0/0/301988 9k jumbo clusters in use (current/cache/total/max) > 0/0/0/169868 16k jumbo clusters in use (current/cache/total/max) > 20467K/27980K/48447K bytes allocated to network (current/cache/total) > 0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters) > 0/0/0 requests for mbufs delayed (mbufs/clusters/mbuf+clusters) > 0/0/0 requests for jumbo clusters delayed (4k/9k/16k) > 0/0/0 requests for jumbo clusters denied (4k/9k/16k) > 0 sendfile syscalls > 0 sendfile syscalls completed without I/O request > 0 requests for I/O initiated by sendfile > 0 pages read by sendfile as part of a request > 0 pages were valid at time of a sendfile request > 0 pages were requested for read ahead by applications > 0 pages were read ahead by sendfile > 0 times sendfile encountered an already busy page > 0 requests for sfbufs denied > 0 requests for sfbufs delayed > > > >         ---Mike > > > > > -- > ------------------- > Mike Tancsa, tel +1 519 651 3400 x203 > > Sentex Communications, mike@sentex.net > Providing Internet services since 1994 www.sentex.net > > Cambridge, Ontario Canada > _______________________________________________ > freebsd-stable@freebsd.org > mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-stable > > To unsubscribe, send any mail to > "freebsd-stable-unsubscribe@freebsd.org > " > > -- ------------------- Mike Tancsa, tel +1 519 651 3400 x203 Sentex Communications, mike@sentex.net Providing Internet services since 1994 www.sentex.net Cambridge, Ontario Canada