FreeBSD Mail Archives

Date:      Sat, 17 Mar 2018 18:01:10 +0000
From:      Nimrod Levy <nimrodl@gmail.com>
To:        Mike Tancsa <mike@sentex.net>
Cc:        FreeBSD-STABLE Mailing List <freebsd-stable@freebsd.org>
Subject:   Re: Ryzen lockup on bhyve was (Re: new Ryzen lockup issue ?)
Message-ID:  <CAMgUhpqa6OZ29aVpTNA=pVmuBLCCWfR5zzGEz6bqqcrvkAB-1Q@mail.gmail.com>
In-Reply-To: <CAMgUhpohQBJ1as3V7M3girPPFiw5vsT7asZGzHWCv47cm2YU%2BQ@mail.gmail.com>
References:  <a687883a-b2a8-5b18-f63e-754a2ed445c0@sentex.net> <bbcc09cf-0072-8510-156f-5c20c301d43f@sentex.net> <92a60e14-f532-2647-d45d-b500fc59ba88@sentex.net> <CAMgUhpo1C_0L86Xkzmuz5%2Be3C3zk5RNkVS9aEBEwF-2XZ4d1sQ@mail.gmail.com> <425be16f-9fdc-9ed6-72b1-02e28bfd130f@sentex.net> <CAMgUhpohQBJ1as3V7M3girPPFiw5vsT7asZGzHWCv47cm2YU%2BQ@mail.gmail.com>

Looks like I got almost 4 full weeks before it locked up this morning
:(



On Fri, Feb 23, 2018 at 3:33 PM Nimrod Levy <nimrodl@gmail.com> wrote:

> After a couple of hours of running the iperf commands you were testing
> with, I'm unable to duplicate this so far.
>
> I'm running with FreeBSD stable from 17-Feb with the commits noted in
> https://reviews.freebsd.org/D14347 pulled in.
>
> I've also lowered the memory clock and disabled c-states in the bios.
>
> The bhyve VM is running CentOS.
>
> The system has been up for over 6 days and has been running the iperf3
> loop for over 2 hours.
>
> The hardware is an Asus prime B350-Plus with a Ryzen 5 1600 and 32G of RAM.
>
> --
> Nimrod
>
>
> On Fri, Feb 23, 2018 at 3:22 PM, Mike Tancsa <mike@sentex.net> wrote:
>
>> Actually I can confirm the same sort of hard lockup happens on my Epyc
>> board with RELENG11.  It also happens in current. I will file a PR and
>> post on freebsd-current in case someone has any suggestions on how to
>> try and figure out whats going on.
>>
>> I upgraded the box to
>> 12.0-CURRENT #0 r329866
>> in order to see if it could avoid the lockup, but same deal.  The vmm
>> driver does seem different when loaded, but the same lock up under load
>>
>> CPU: AMD Ryzen 5 1600X Six-Core Processor            (3593.35-MHz
>> K8-class CPU)
>>   Origin="AuthenticAMD"  Id=0x800f11  Family=0x17  Model=0x1  Stepping=1
>>
>>
>> Features=0x178bfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2,HTT>
>>
>>
>> Features2=0x7ed8320b<SSE3,PCLMULQDQ,MON,SSSE3,FMA,CX16,SSE4.1,SSE4.2,MOVBE,POPCNT,AESNI,XSAVE,OSXSAVE,AVX,F16C,RDRAND>
>>   AMD Features=0x2e500800<SYSCALL,NX,MMX+,FFXSR,Page1GB,RDTSCP,LM>
>>   AMD
>>
>> Features2=0x35c233ff<LAHF,CMP,SVM,ExtAPIC,CR8,ABM,SSE4A,MAS,Prefetch,OSVW,SKINIT,WDT,TCE,Topology,PCXC,PNXC,DBE,PL2I,MWAITX>
>>   Structured Extended
>>
>> Features=0x209c01a9<FSGSBASE,BMI1,AVX2,SMEP,BMI2,RDSEED,ADX,SMAP,CLFLUSHOPT,SHA>
>>   XSAVE Features=0xf<XSAVEOPT,XSAVEC,XINUSE,XSAVES>
>>   AMD Extended Feature Extensions ID EBX=0x7<CLZERO,IRPerf,XSaveErPtr>
>>   SVM: NP,NRIP,VClean,AFlush,DAssist,NAsids=32768
>>   TSC: P-state invariant, performance statistics
>>
>>
>> AMD-Vi: IVRS Info VAsize = 64 PAsize = 48 GVAsize = 2 flags:0
>> driver bug: Unable to set devclass (class: ppc devname: (unknown))
>> ivhd0: <AMD-Vi/IOMMU ivhd with EFR> on acpi0
>> ivhd0: Flag:b0<IotlbSup,Coherent>
>> ivhd0: Features(type:0x11) MsiNumPPR = 0 PNBanks= 2 PNCounters= 0
>> ivhd0: Extended features[31:0]:22294ada<PPRSup,NXSup,GTSup,IASup> HATS =
>> 0x2 GATS = 0x0 GLXSup = 0x1 SmiFSup = 0x1 SmiFRC = 0x2 GAMSup = 0x1
>> DualPortLogSup = 0x2 DualEventLogSup = 0x2
>> ivhd0: Extended features[62:32]:f77ef<USSup> Max PASID: 0x2f
>> DevTblSegSup = 0x3 MarcSup = 0x1
>> ivhd0: supported paging level:7, will use only: 4
>> ivhd0: device range: 0x0 - 0xffff
>> ivhd0: PCI cap 0x190b640f@0x40 feature:19<IOTLB,EFR,CapExt>
>>
>>
>>
>> On 2/23/2018 12:35 PM, Nimrod Levy wrote:
>> > Now that is a fascinating data point. My machine that I've been having
>> > issues with has been running a bhyve vm from the beginning.  I never
>> > made the connection. I'll try throwing some network traffic at the VM
>> > and see if I can make it lock up.
>> >
>> > On Fri, Feb 23, 2018 at 10:14 AM, Mike Tancsa <mike@sentex.net
>> > <mailto:mike@sentex.net>> wrote:
>> >
>> >     On 2/22/2018 3:41 PM, Mike Tancsa wrote:
>> >     > On 2/21/2018 3:04 PM, Mike Tancsa wrote:
>> >     >> Not sure if I have found another issue specific to Ryzen, or a
>> bug that
>> >     >> manifests itself on Ryzen systems easier.  I installed the latest
>> >     >> virtualbox from the ports and was doing some network performance
>> tests
>> >     >> between a vm and the hypervisor using iperf3.  The guest is just
>> a
>> >     >> RELENG11 image and the network is an em nic bridged to epair1b
>> >     >
>> >     > This looks possibly related to VirtualBox. Doing the same tests
>> and more
>> >     > using bhyve, I dont get any lockup.  Not to mention, network IO
>> is MUCH
>> >     > faster.
>> >
>> >
>> >     Actually, it just took a little bit longer to lock up the box with
>> bhyve
>> >     on RELENG_11 as the hypervisor.   Would be great if anyone can
>> confirm
>> >     this locks up their Ryzen boxes ? I tried 2 different boxes to
>> eliminate
>> >     a hardware issue.  Also tried a similar test on Ubuntu and I can
>> spin up
>> >     4 instances and run without lockups.
>> >
>> >     Just grab a copy of
>> >
>> >
>> https://download.freebsd.org/ftp/releases/VM-IMAGES/11.1-RELEASE/amd64/Latest/FreeBSD-11.1-RELEASE-amd64.raw.xz
>> >     <
>> https://download.freebsd.org/ftp/releases/VM-IMAGES/11.1-RELEASE/amd64/Latest/FreeBSD-11.1-RELEASE-amd64.raw.xz
>> >
>> >
>> >     and make 2 copies. tmp.raw and tmp2.raw
>> >
>> >
>> >     kldload vmm
>> >     ifconfig tap0 create
>> >     ifconfig tap1 create
>> >     ifconfig tap1 up
>> >     ifconfig tap0 up
>> >     ifconfig bridge0 create addm tap0 addm tap1
>> >     ifconfig bridge0 192.168.99.1/24 <http://192.168.99.1/24>;
>> >
>> >     screen -d -m sh /usr/share/examples/bhyve/vmrun.sh -c 4 -m 6144M -t
>> tap0
>> >     -d tmp.raw BSD11a
>> >     screen -d -m sh /usr/share/examples/bhyve/vmrun.sh -c 4 -m 6144M -t
>> tap1
>> >     -d tmp2.raw BSD11b
>> >
>> >     Install netperf on the 2 vms and give the vtnet interface
>> >     192.168.99.2/24 <http://192.168.99.2/24>; and 192.168.99.3/24
>> >     <http://192.168.99.3/24>;
>> >
>> >     In both VMs pkg install iperf3 and start it up as
>> >     iperf -s
>> >
>> >     In the hypervisor,
>> >     iperf -t 10000 -R -c 192.168.99.2
>> >     iperf -t 10000 -c 192.168.99.3
>> >
>> >
>> >     the box locks up solid after 5-20 min.  Same hardware with Ubuntu
>> and
>> >     virtual box and 4 instances work fine, no lockups after a day so not
>> >     sure whats up but it seems to be something with the Ryzen CPU
>> running as
>> >     a hypervisor or with some type of load :(
>> >
>> >     Prior to lockup I had a stream of netstat -m writing to a file
>> every 5
>> >     seconds. The last entry was below. It doesnt seem to be leak.
>> >
>> >     Thu Feb 22 17:14:28 EST 2018
>> >     8694/10281/18975 mbufs in use (current/cache/total)
>> >     8225/5211/13436/2038424 mbuf clusters in use
>> (current/cache/total/max)
>> >     8225/5184 mbuf+clusters out of packet secondary zone in use
>> >     (current/cache)
>> >     461/3747/4208/1019211 4k (page size) jumbo clusters in use
>> >     (current/cache/total/max)
>> >     0/0/0/301988 9k jumbo clusters in use (current/cache/total/max)
>> >     0/0/0/169868 16k jumbo clusters in use (current/cache/total/max)
>> >     20467K/27980K/48447K bytes allocated to network
>> (current/cache/total)
>> >     0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters)
>> >     0/0/0 requests for mbufs delayed (mbufs/clusters/mbuf+clusters)
>> >     0/0/0 requests for jumbo clusters delayed (4k/9k/16k)
>> >     0/0/0 requests for jumbo clusters denied (4k/9k/16k)
>> >     0 sendfile syscalls
>> >     0 sendfile syscalls completed without I/O request
>> >     0 requests for I/O initiated by sendfile
>> >     0 pages read by sendfile as part of a request
>> >     0 pages were valid at time of a sendfile request
>> >     0 pages were requested for read ahead by applications
>> >     0 pages were read ahead by sendfile
>> >     0 times sendfile encountered an already busy page
>> >     0 requests for sfbufs denied
>> >     0 requests for sfbufs delayed
>> >
>> >
>> >
>> >             ---Mike
>> >
>> >
>> >
>> >
>> >     --
>> >     -------------------
>> >     Mike Tancsa, tel +1 519 651 3400 x203 <(519)%20651-3400>
>> >     <tel:%2B1%20519%20651%203400%20x203>
>> >     Sentex Communications, mike@sentex.net <mailto:mike@sentex.net>
>> >     Providing Internet services since 1994 www.sentex.net
>> >     <http://www.sentex.net>;
>> >     Cambridge, Ontario Canada
>> >     _______________________________________________
>> >     freebsd-stable@freebsd.org <mailto:freebsd-stable@freebsd.org>
>> >     mailing list
>> >     https://lists.freebsd.org/mailman/listinfo/freebsd-stable
>> >     <https://lists.freebsd.org/mailman/listinfo/freebsd-stable>;
>> >     To unsubscribe, send any mail to
>> >     "freebsd-stable-unsubscribe@freebsd.org
>> >     <mailto:freebsd-stable-unsubscribe@freebsd.org>"
>> >
>> >
>>
>>
>> --
>> -------------------
>> Mike Tancsa, tel +1 519 651 3400 x203
>> Sentex Communications, mike@sentex.net
>> Providing Internet services since 1994 www.sentex.net
>> Cambridge, Ontario Canada
>>
>
> --

--
Nimrod

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAMgUhpqa6OZ29aVpTNA=pVmuBLCCWfR5zzGEz6bqqcrvkAB-1Q>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation